Snowshoe spam is a type of spam which is hard to detect. This is because the spammer tries to spread out the sending load over many host in order to evade detection.
Our method combines active DNS measurements with Machine Learning to detect snowshoe spam domains with a time advantage over regular methods.
Snowshoe spam is a type of spam that is notoriously hard to detect. Anti-abuse vendors estimate that 15% of spam can be classified as snowshoe spam. Differently from regular spam, snowshoe spammers distribute sending of spam over many hosts, in order to evade detection by spam reputation systems (blacklists). To be successful spammers need to appear as legitimate as possible, for example, by adopting email best practices, such as the Sender Policy Framework (SPF). This requires spammers to register and configure legitimate DNS domains.
Many previous studies have relied on DNS data to detect spam. However, this often happens based on passive DNS data. This limits detection to domains that have actually been used and have been observed on passive DNS sensors.
To overcome this limitation, we take a different approach. We make use of active DNS measurements, covering more than 60% of the global DNS namespace, in combination with machine learning to identify malicious domains crafted for snowshoe spam. Our results show that we are able to detect snowshoe spam domains with a precision of over 93%.
More importantly, we are able to detect a significant fraction of the malicious domains up to 100 days earlier than existing blacklists, which suggests our method can give us a time advantage in the fight against spam. In addition to testing the efficacy of our approach in comparison to existing blacklists, we validated our approach over a 3-month period in an actual mail filter system at a major Dutch network operator. Not only did this demonstrate that our approach works in practice, the operator has actually decided to deploy our method in production, based on the results obtained.