SpamSieve

September 26, 2003

Erik posted his SpamSieve 2.0 stats, and I’m equally impressed. Still getting more false positives than I like, but I’m sure that will improve as I’m diligent about telling SpamSieve what it’s doing wrong through the scripts. What I *love* about SpamSieve is that it really learns from its mistakes. Once you tell it that something is spam, it is. And more important, when you tell it that it has a false positive it doesn’t make the same mistake again.

My stats since September 10, 2003 (SpamSieve 2.0):

Good messages: 2333
Spam messages: 1458
False positives: 134
False negatives: 31
Correct: 95.6%

Michael Tsai, the author of SpamSieve commented on Erik’s blog that the reason for the false positives is that he has more spam than good in his corpus. I’m looking at mine and I see that I have 190 good messages and 1458 spam. Ah ha! I went through every folder of saved mail I have and told SpamSieve that it’s all good (over 2000 messages worth) and we’ll see if the false positive rate goes down. I’m checking my spam folder often and it would be nice to going back to doing it every once in a while (if at all). Part of the problem is that I had to start a fresh corpus when I set up the G5.

Related posts: