Blocklist Resource: SORBS: Accuracy Rates and False Positives

The blocking list SORBS (aka the “Spam and Open Relay Blocking System”) was created in 2002 by Australian Matthew Sullivan. SORBS publishes a main “aggregate zone” (dnsbl.sorbs.net) containing listings meeting a multitude of criteria beyond open relaying mail services. SORBS also publishes multiple other zones meeting various criteria.

As related previously, SORBS appears to be undergoing changes. Some of these changes appear to relate to the fact that the SORBS maintainer has repeatedly taken issue with the methodology used by DNSBL.com to measure accuracy rates and false positive rates.

SORBS has indicated that they have the ability to feed false or different data in response to queries from DNSBL.com. As such, it's unclear if recent query results are indicative of results seen by other users. Because of concerns that SORBS may be attempting to sway the data reported, it's important to share current data and information, so that system administrators can make an educated determination as to whether or not it would be wise to use this DNSBL.

Historical Information

I've been tracking data on the main SORBS zone, dnsbl.sorbs.net, since March, 2007. Here's what I've found.

For most of the past fifteen weeks, the DNSBL had an effectiveness rate varying between fifty percent and fifty six percent, week over week. This means that SORBS correctly blocked a piece of spam in my spamtrap about five to six times out of ten.
For many weeks, I believe SORBS clearly suffered from significant false positive issues. As measured by my own calculations (see here and here for more info), the false positive rate is in the 7.9% - 11.1% range. This means that if your users sign up for the same kind of mail that I did, that for every one hundred pieces of solicited mail your users signed up for and expected to receive, SORBS is likely to block seven to twelve of them.

Recent Data Changes

On July 9, 2007, changes were made to SORBS. As you can see from the chart above, around this time (near the start of week 12), the net result is that the effectiveness rate and false positive rates have both significantly declined.
Since July 9, 2007, I have not noted any additional false positive from the main SORBS zone. Because of indication from SORBS that they are able to feed false data, it is unclear if the results I am seeing are accurate.
Similarly, the effectiveness rate of the main SORBS zone seems to have greatly declined as well. Since July 9, 2007, it is hovering in the 18% range.

There are two possible conclusions to make here:

SORBS is somehow able to feed different blocklist data to DNSBL.com than to others. If so, then the historical data I have summarized above is likely to be the most accurate view of SORBS. Or,
SORBS has gutted its lists and the poor effectiveness rates I'm now seeing are reflective of how it would likely work for others.

It's hard to say which scenario is the more accurate one, and what future testing will reveal. I'll certainly continue to collect data, but right now, there's an open question of SORBS' effectiveness and false positives.

As of Thursday, July 19, 2007, SORBS changed the default zone mentioned in configuration guidance pages from dnsbl.sorbs.net to a domain not owned by SORBS. As a result, if any SORBS user copies and pastes a configuration snippet from one of the SORBS configuration pages verbatim, the result is that 100% of a site's inbound email will be blocked. My recommendation is to proceed with caution – if you are not sure what you're doing with DNSBL use and mail server configuration, a misstep here will have significant consequences.

SORBS has leveled the following criticism, assumably as justification for for the results published on DNSBL.com. Below is an overview and response to the points raised:

SORBS claims that the DNSBL.com email feed data is US-centric. This is true. The domains involved in these hamtraps and spamtraps are "dot com " domains, and have always been hosted in the US. If this means that SORBS is inaccurate as a result, it suggests that SORBS is Australia-centric, and likely will not work as well for those in other countries.
SORBS claims that a false positive as defined on DNSBL.com is not what everyone calls a false positive. This is true. I consider a false positive to be a requested message that was blocked. Others have different definitions. I believe the definition used on DNSBL.com to be accurate. I further believe that the most common definition of a false positive as used by regular end users or system administrators is most likely to align with my own.
SORBS is unable to verify false positive hits, as DNSBL.com does not provide IP addresses correlating to false positive hits. This is true. If data were provided to any blocklist operator regarding false positives, this would enable the DNSBL to whitewash over the issues by removing the IP addresses reported (and no others). This is similar to why blocklist groups do not provide spamtrap information – they do not want their spamtraps “compromised,” which would allow a bad sender to simply stop sending to spamtraps, but continue spamming elsewhere. Therefore, this information is not provided to any blocklist. (Other list operators have been more understanding.)
SORBS claims that the zone “dnsbl.sorbs.net” being queried by DNSBL.com is not the zone used by most users or recommended by SORBS as the main or default zone. This is untrue. It has or had clearly been positioned as the default zone or default recommended configuration choice, and remains the zone first listed, positioned as the “aggregate zone” as of July 20, 2007.
SORBS claims that Spamhaus volunteers have (or had) access to the SORBS database and have entered listings in the past to drive significant false positive issues. I am not associated with either SORBS or Spamhaus so I can't speak to this accusation.
SORBS claims that the methodology of checking mail against DNSBLs within 15 minutes of receipt is inaccurate. This is untrue. Anyone who uses a DNSBL is enabling their mail server or spam filter to check the mail against the DNSBL within seconds to minutes of receipt. If, as SORBS states, their DNSBL distribution model is such that it suffers from this methodology, then it suggests that it may be slow to respond to real spam trends. (10/29/2007 update: At a recent conference, over a beer with a colleague who builds tools to block spam for a living, I was gently chided over this bit of methodology. I was told that I was letting mail get far too old. 15 minutes is a hundred years as far as spam vector measurement is concerned; the vendor in question uses a 60 second interval at maximum. By this logic, I was being too forgiving as far as slowly updating anti-spam blocklists were concerned. This is further at odds with the criticism from SORBS.)
SORBS has picked a specific sender as the source for the SORBS false positive rates I report, saying that this sender is a "habitual source of spam." I have no financial interest or any other connection to the sender in question, except that I ordered pillows from them in December, 2006, and was happy with the product and service they provided. As a result, I signed up to receive mail from them, and happily do so. If I used SORBS to reject mail, that mail would not reach me. Additionally, this sender is far from the only source of false positives I found when utilizing the SORBS blocklist. (11/09/2007 update: The specific sender is/was Overstock.com. SORBS categorizes Overstock as a spammer. Matthew Sullivan (now known as Michelle Sullivan), in fact, indicated that "1000's of people who receive unsolicited commercial/bulk email from them." There are two additional problems with his characterizations here. First, Overstock.com is not listed on ANY OTHER of the approximately 47 blocklists I check, except FIVETEN (which lists many hundreds of potentially legitimate senders, and therefore, is not very useful as a second opinion here.) It's not on any of the lists that commonly do list supposedly-legitimate senders who may have run afoul of spamtraps. Second, the last mail I had received from Overstock.com was on May 25, 2007. This is significantly before the July 9th cutoff of my data, and measured false positives were on the rise even with no further mail from Overstock.com in the data set. Incidentally, I have no idea why I've received no mail since. I didn't unsubscribe.)

Additionally, SORBS has made numerous statements questioning the accuracy of data published here, and characterizing this project as something other than honest and transparent. Here's how it works: I have a feed of mail, and I check all mail received for DNSBL hits. I give internet users a live, rolling snapshot of how various lists intersect with my mail steams. That's all there is to it. I leave it to you, the reader, to decide if I've been honest and clear at every step of this process, and as always, I welcome your feedback.

(11/18/2007 Update: Added the phrase "that if your users sign up for the same kind of mail that I did" above to clarify false positive comments.)