Which DNSBLs work well?

This is a question I get quite often and it’s a tough one to answer. I don’t really bother with running my own mail system any more, as I’m tired of the headache and happy to leave the server-level spam prevention to somebody else.

And I'm tired of taking other peoples' word for it that a certain blocklist works well or doesn't work well -- I've been burned a number of times by people listing stuff on a blocklist outside of a list's defined charter. It's very frustrating. And lots of people publish stats on how much mail they block with a given list, which is an incomplete measure of whether or not a list is any good. Think about it. If you block all mail, you're going to block all spam. But you're going to block all the rest of your inbound mail, too. And when you block mail with a DNSBL, you don't always have an easy way to tell if that mail was actually wanted or not.

So, I decided to tackle it a bit differently than other folks have. See, I have my own very large spamtrap, and the ability to compare lots of data on the fly.

For this project, I've created two feeds. One is a spam feed, composed of mail received by my many spamtrap addresses, with lots of questionable mail and obvious non-spam weeded out. I then created a non-spam feed. In this “hamtrap” I am directed solicited mail that I signed up for from over 400 senders, big and small. Now, I just have to sit back, watch the mail roll in, and watch the data roll up.

For the past week or so, I’ve been checking every piece of mail received at either the spamtrap or hamtrap against a bunch of different blocklists. I wrote software to ensure that the message is checked within a few minutes of receipt, a necessary step to gather accurate blocklist “hit” data.

After that first week, here’s what I’ve found. It might be obvious to you, or it might not: Spamhaus is a very accurate blocklist, and some others...aren't. Spamhaus’s “ZEN” blocklist correctly tagged about two-thirds of my spam, and tagged no desired mail incorrectly. Fairly impressive, especially when compared to some other blocklists. SORBS correctly tagged 55% of my spam mail, but got it wrong on the non-spam side of things ten percent of the time. If you think throwing away ten percent of the mail you want is troublesome, how about rejecting a third of desired mail? That’s what happens if you use the Fiveten blocklist. It correctly would block 58% of my spam during the test period, but with a false positive rate of 34%, that would make it unacceptable blocklist to use in any corporate environment where you actually want to receive mail your users asked to receive.

One fairly surprising revelation is that Spamcop’s blocklist is nowhere as bad as I had previously believed it to be. I’ve complained periodically here about how Spamcop’s math is often wrong, how it too often lists confirmed opt-in senders, how it is too aggressive against wanted mail, but...my data (so far) shows a complete lack of false positives. This is a nice change, and it makes me very happy to see. Assuming this trend keeps up, I think you'll see me rewriting and putting disclaimers in front of some of my previous rants on that topic.

NJABL Dynablock List Now Obsolete

With the advent of Spamhaus's new PBL anti-spam blocking list, it appears that the NJABL Dynablock list is now obsolete. I just saw the following post on the public SPAM-L mailing list, from the NJABL folks: The following text was sent to list AT njabl.org on Jan 19, 2007. Judging from the number of DNS queries still being handled for dynablock.njabl.org, the message doesn't seem to have made it to a wide enough audience.

If you use or know people who use dynablock.njabl.org, this is important information:

With the advent of Spamhaus's PBL (http://spamhaus.org/pbl/), dynablock.njabl.org has become obsolete. Rather than maintain separatesimilar DNSBL zones, NJABL will be working with Spamhaus on the PBL. Effective immediately, dynablock.njabl.org exists as a copy of the Spamhaus PBL. After dynablock users have had ample time to update their configurations, the dynablock.njabl.org zone will be emptied.

Other NJABL zones (i.e. dnsbl, combined, bhnc, and the qw versions) will continue, business as usual, except that combined will eventually lose its dynablock component.

If you currently use dynablock.njabl.org we recommend you switch immediately to pbl.spamhaus.org.

If you currently use combined.njabl.org, we recommend you add pbl.spamhaus.org to the list of DNSBLs you use.

You may also want to consider using zen.spamhaus.org, which is a combination zone consisting of Spamhaus's SBL, XBL, and PBL zones.

(Editor's note: I'm very happy with ZEN so far. See this post detailing my recent experiences.)

Spamcop Roundup

5/22/2007: This information is out of date. Please click here for my latest take on Spamcop's SCBL.

My most recent take on Spamcop, from February 2007, can be found here. In that commentary, I talk about the history of the Spamcop spam reporting service, its current corporate ownership, and my take on how this type of DNSBL works, especially as to how it relates to to the impact against solicited (wanted) mail.

In February 2007, I found that Microsoft is using Spamcop to filter inbound (corporate) mail. By corporate mail, I mean mail to microsoft.com users, not mail to MSN/Hotmail users. This surprised me, because of what I believe are aggressive listing practices on the part of Spamcop. Indeed, how the issue was brought to my attention was by an unhappy person mad because he couldn't send one-to-one mail to Microsoft, because Spamcop blocked it.

Also, back in 2003, I published an article about the ongoing issues I was having with Spamcop blocking opt-in confirmation requests. Back then I found (through some admittedly unscientific survey techniques) that admins using the SCBL seemed to assume that all blocked mail must be spam because Spamcop blocked it. Not a very encouraging find. It was also a bit insulting to be lectured on how confirmed opt-in worked by people who were blocking confirmed opt-in requests, especially considering I've been pushing senders to implement and utilize confirmed opt-in/double opt-in for many years.

Spamhaus ZEN: Recommended

Look for a longer article from me in the near future on Spamhaus; I'm collecting a ton of data against a large spam corpus and hope to summarize and publish my findings within the next month or so.

Until then, feel free to bop on over to Spam Resource, where I talk about my experience using the Spamhaus ZEN list to tag and filter inbound mail to our abuse desk. I've been quite pleased with the results.

Also of note is that Microsoft is using both Spamcop and Spamhaus to reject mail to their corporate users. (They're NOT using it on MSN Hotmail.)

Update: Find my full review of Spamhaus ZEN here on DNSBL Resource.

DCC: Spam filter?

The Distributed Checksum Clearinghouse (DCC), created by Vernon Schryver, is a very powerful tool to help system administrators identify and block bulk mail. The project's website suggests a strong correlation between "bulk" and "spam," but as I do a bit more research, I don't think it's always that simple.

There's a common misconception in the spam filtering world (and the sending world) -- people think DCC is a spam blocking list. It's not, though. It's a tool to help users block bulk mail, not spam mail. That's an important distinction.

Think about it. There are a lot of types of bulk mail you might have signed up for and might want, things like newsletters you actually subscribed to, messages from companies you've done business with and actually want to hear from, or news, weather and traffic alerts you might be waiting for. (I don't need an email message to warn me that it's snowing outside, but I know that lots of people sign up for these.)

DCC tells you whether or not the mail attempting to be delivered was sent to lots of people besides you. Sure, spam is sent to lots of people all at once, but so is a bunch of solicited mail. What defines spam is whether or not you signed up to receive it. If you signed up to receive it, whether or not other people are getting it too has no bearing on the fact that you asked for it.

If a filter like DCC rejects a piece of mail you actually solicited and wished to receive, I would consider that a "false positive." To help prevent false positives, proper DCC usage dictates that you whitelist, ahead of time, all the sources of legitimate list or bulk mail you wish to receive. They include this sample file to get started, and they recommend this whitelist of example small messages that are most likely to be caught up in the filtering, even if solicited.

As Vernon Schryver himself said on the DCC mailing list recently, false positives "speak to a misuse or misunderstanding of [DCC]." He says that in a sense, there's no such thing as a DCC false positive. My interpretation of his comments is that he means that it's up to users of DCC to know what they're getting in to. DCC blocks mail sent to multiple recipients, and it's up to you to whitelist any mail sources you want to receive mail from.

DCC is a very powerful tool. That's both a plus and a minus. If you know what you're doing, comfortable working without a safety net, manually compiling lists of sites you want to receive any sort of bulk or list mail from, then maybe it can work for you to help reduce spam.

But, if you're not clear on the difference between bulk and spam, are not clear on what sites are sending you bulk or list mail that you or your users will want, then it's not going to work the way you think, and it's going to reject mail that you or your users asked for.

Internet Service Providers (ISPs), when deciding whether or not to accept a sender's mail, do measure whether or not your message is being sent to multiple people. It's not the only thing they look at, though. The smarter ISPs tie in a reputation measurement to that process. Meaning, is this mail coming from a good sender, or a bad sender? Does this sender generate spam complaints? Does this sender generate an above average percentage of bounces? Wrap that all up together, and an ISP has good info available to them to decide what mail to accept. Don't measure any of those things, and you're left with an incomplete view -- no easy way to tell the good mail from the bad. It's up to you to know about and whitelist the good senders ahead of time. If you don't, you're going to reject mail from them, presumably mail that you or your users wanted to receive.