Geeks With Blogs
pemo (Notes from a Small [Academic] Island) ++++++++++++++++++ It's all academic! ++++++++++++++++++

I use Cloudmark's SpamNet – and I'm very happy with it.  However, I suspect (and I might be able to confirm this if I could ever convince myself to read 'small print') that it only bases its spam-assessment on an email's subject and/or sender's address (envelope or reply).  Apologies if this is otherwise!


The trouble with this is that it's really easy to alter both subject and address(es) – as it is of course, the content.  However, the content is usually the 'important bit' – after all, it conveys the message proper: and you wouldn't want to alter that very much if you were a spammer.  Of course, the body of an email can also be pretty big, and so, whilst on the one-hand it makes ense to parse it, it doesn't make a great deal of sense for SpamNet to upload this to their servers for further analysis – even though it'd probably help them catch more spam: it's all about bandwidth (your and theirs), and your privacy.


However, let's now consider proper email systems like GMail (as opposed to add-ons), because, if anyone can stop spam, it surely must be GMail!


Think about it – Google parse the content of any email coming into a GMail in-box: so, for example, they could easily produce a subsystem to GMail that compares one incoming-email's content with others that arrive with them around the same time (most copies of a spam mail are sent at the same time you know).  Simple NLP probability-based calculations based upon this technique would probably catch 99.99% of spam.

Posted on Monday, August 30, 2004 4:26 PM | Back to top

Copyright © pemo | Powered by: