Pre-client spam filtering (Phil Gyford: Writing)

Writing

Tuesday 5 August 2003

Pre-client spam filtering

The last few days I’ve been collecting my email via Knowspam.net, one of those challenge and response systems where unapproved people have to click a link and verify their existence before their email reaches you. It seemed a good solution to the spam problem — the problem being receiving any spam at all — but I’ve decided the whole idea is not for me.

Eudora 6 beta’s spam filtering already ensured I rarely saw the stuff, but I still had to download the spam for it to be filtered. Not the end of the world when I’m at my computer, but if I wanted to check in via webmail or my phone, I’d rather there was no spam to download at all.

So I needed something that would act as a bouncer, ensuring only decent, upstanding email reached me would be ideal. So after a friend’s recommendation I gave Knowspam.net a spin, and it was a joy to use. It has a clean and simple design, and set-up is an easy process. You can upload your contacts (in CSV or Vcard format), adding them automatically to your whitelist of approved addresses who will never be challenged. It provides an SMTP server through which you can send mail, with every recipient automatically added to the whitelist. Its database of “known humans” is shared between users in the background, minimising again the number of people who will be challenged when contacting you. All in all it does the job well. Suddenly the only mail reaching me was from real people, none of whom wanted to sell me Viagra.

For the first day or two I was excited about this return to the good ol’ days of spamless email but I’ve now returned to downloading all my spam and letting Eudora do its work. Why? A discussion about the system on Haddock put into words the vague feelings of unease I’d had about the idea (apologies to those whose words and links I’m rehashing), and coupled with more personal considerations I decided it wasn’t for me.

Fundamentally, the challenge and response system depends on the sender responding. Even with a comprehensive whitelist you’ll be sent email from new contacts or from friends using webmail addresses, mobile phone accounts or just using new addresses. Even if they’re willing to verify their humanity and give their mail a second push towards you there are reasons why they might not: they could go offline after sending the mail, they may just miss the mail, or even mistake it for spam.

There’s also the issue of the databases such services are building. What industry would benefit from a large database of known working email addresses? Yup, spammers. While I’m sure Knowspam.net, who are lovely folk by all accounts, have nothing but the best intentions and would never do anything untoward with such data, the general idea of giving such a service the addresses of all my acquaintances feels slightly wrong, no matter how simple it makes life. With a more unscrupulous service, anyone responding to a challenge is also asking for more spam.

I also can’t help thinking that once a service has a large database they could charge a pretty penny for allowing “direct marketing” companies access to the spam-free inboxes of these people who have been returned to a more innocent, wide-eyed state. If you didn’t want to receive such premium spam, I’m sure you’d be able to pay a little bit more to, once again, live spam-free.

Spam that pretends to be from a known working address is also a problem and I’m not sure how challenge and response systems could get round this.

Those are the “bigger” issues, but from a more personal and social angle the whole idea didn’t feel quite right. Basically, it’s a selfish system. To me the challenge emails say “I’m too busy to deal with spam. If your email is important enough, spend some of your time proving your worth to me.” And I really don’t want to give people that impression or feel like I’m imposing on people in any way; spam isn’t a big enough problem for me to do that.

The system’s reliance on senders responding isn’t worth the benefit to me. I’m bad at depending on other people at the best of times, and don’t want to trust my email to their goodwill. It feels like this barrier changes how email works too much and reduces the openness that makes the internet what it is.

Finally, I realised I was actually spending more time dealing with spam than I was before, which really wasn’t the idea. Because I was so worried about missing a “real” email I was often scanning the site’s list of un-responded-to mails in search of anything important that might be held one step from my inbox. Eudora’s Bayesian client-side filtering is good enough that on my occasional checks I rarely find anything wrongly marked as spam, so the only time I usually “deal” with spam is when downloading mail takes longer than it should.

If you’re slightly less neurotic about missing a single email, and you can ignore the bigger issues, Knowspam.net and its ilk might be just right for you. But I’m still looking for a more server-side (or at least, pre-client) solution that will just excise spam from my life entirely. I may give Know-spam.com, Bayesian filtering between server and client, a whirl at some point, if I can find how much they plan to charge.

Commenting is disabled on posts once they’re 30 days old.