logo
Published on Security Incite: Analysis on Information Security (http://securityincite.com)

Testing Spam Products - Use Corpuses at Your Own Risk

By Mike Rothman
Created 2006-03-09 08:06
Saw a post yesterday on the Ferris Research blog [read it here [1]] about a publicly available spam corpus called TREC 2005 (for more information about TREC read here [2]). Ferris’ position on the value of TREC 2005 is:
The corpus is primarily intended for academic research and development of anti-spam filters and has significant restrictions on its use. This collection is important as it provides a standardized collection to test and compare spam filters in both academic and commercial contexts.


They are wrong. Using any corpus older than a month and obscuring the mail headers is actually detrimental to testing and comparing spam filters. Why? Because spam is a real time phenomenon and using “stale-mail” to test it is a waste of time. Your results will smell worse than 30 day old Wonderbread.

To be clear, a bulk of spam is no longer sent by that shady character using a spam cannon in his garage to blast out 200 million messages a day. Spam is sent by a worldwide network of zombies that have made it much harder to track and stop the onslaught.

A key technical innovation in defending against these zombies was the reputation system. IronPort’s Senderbase [3] and CipherTrust’s TrustedSource [4] are the two highest profile reputation systems out there. Basically, by tracking the types of messages coming from a specific IP (and using some fancy mathematics), you can get a pretty good feel for whether they are a legitimate sender or not.

Combining reputation with heuristics and signatures creates a cocktail of techniques that can be used to more accurately detect spam. Now anyone that says they can consistently always stop 99% of spam is lying to you. Spamming techniques change fast enough that effectiveness will ebb and flow as the spammers and anti-spammers engage in constant point-counterpoint. But in general, most of the solutions out there do a good enough job.

Now back to TREC 2005. I am a big fan of bake-offs (technical evaluations) during the procurement process (see Buying Security products post [4]). Having users compare spam catch rates using stale-mail is a disservice because real time reputation checks cannot happen on stale mail. Who the message is coming from is a critical part of today’s detection techniques. So, using a pre-baked corpus eliminates that set of tests and will make your results suspect at best.

It is also a very bad idea to just forward the test corpus through a bit blaster. This puts your email security gateway as the second hop in and obscures the true sender’s mail header. This dramatically impacts your ability to accurately detect the spam. I can get into more technical nuances off-line, but take my word for it. Your results will be crap. In fact, a number of well-known publications used this technique in early anti-spam reviews and their results weren’t worth the paper they were printed on. But it took them 18 months (and a lot of my personal blood and sweat) to get them to see the fault in this testing methodology.

So how do you test anti-spam products? Basically you need to use them in real mail flow. I believe that you set up a set of test users (that are a bit more understanding than your CEO) and run their ACTUAL mail through the box for a month. Then you can gauge real time effectiveness and select the best fit for your organization. 

UPDATE: Let me clarify a bit that a corpus like this will be useful to an anti-spam research, who presumably understands how to tune their heuristics and/or signatures. My point is that this kind of corpus will NOT be useful to end users trying to compare anti-spam products.

 


Source URL:
http://securityincite.com/blog/mike-rothman/testing-spam-products-use-corpuses-at-your-own-risk