Blocklist Resource: dnsbl review

Showing posts with label dnsbl review. Show all posts

PSBL: Easy On, Easy Off

The Passive Spam Block List, or PSBL (psbl.surriel.com) is a spamtrap-driven anti-spam blocklist that has been around since at least June, 2003. Created by Rik van Riel, who explains on the PSBL website that “the idea is that 99% of the hosts that send me spam never send me legitimate email, but that people whose mail server was used by spammers should still be able to send me email."

The passive nature of the list means that there's no probing or poking of remote servers on the internet (which tends to make ISPs very angry and was a significant issue back in the days of testing for open relays). It also means that there is no debate or argument with listees. As the PSBL website states, “Want to remove your mail server from PSBL? Go ahead.” No need for lawsuit threats, arguments over why listing is denied, or anything of the sort. Anyone can remove any entry for any reason.

Sounds scary, doesn't it? In theory, bad guys could game the system, and rob PSBL of its ability to stop spam. Thankfully, the data shows that this isn't something to worry about. PSBL is a pretty neat tool that can help system administrators filter or reject spam in a way that makes it very easy to prevent false positives. And even though it doesn't take a line as hard as Spamhaus or Spamcop, it manages to block some spam that they do not.

Success Rates

PSBL's success rate seems to greatly vary from week to week. Over the past ninety days, its overall effective rate is 41.4% against the spam hitting my spamtraps. Over the past thirty days, it has been 36.5% effective against spam.

False Positives
False positives are often non-zero, but generally very low. For the past eleven weeks, consistently under 1%. I suspect that this is due to the “easy on, easy off” removal policy-- If anyone trying to send you mail receives a bounce message back from you referring to the PSBL website, it's very easy for them to have their sending IP address removed from the list.

Additive Numbers
Even though PSBL catches a lower amount of spam (on its own) than some other more well-known blocklists, it manages to catch some spam that those other lists do not. To determine this, I took the last thirty days worth of results, and looked for intersection and overlap between PSBL and other blocklists.

What I found is that about 9% of successful PSBL hits against spam stopped spam from IP addresses not found on Spamhaus ZEN. When compared against Spamcop, the numbers were even higher -- about 13% of successful PSBL hits stopped spam from IP addresses not listed on Spamcop.

This suggests to me that PSBL would be an excellent blocklist to configure second or third in your mail server configuration. That 9% of IP addresses not found on both Spamhaus and PSBL won't lead to a straight 9% boost in spam filtering effectiveness, due to lists being different sizes. But, if your data is like mine, you're likely to receive a boost of 3% or more.

Conclusion: I recommend PSBL. It helps to block spam that some other lists could miss, and it has friendly anti-false positive policies that make any revealed blocking issues easy to resolve.

The usual caveats applies here: This data illustrates how my own mail streams intersect with PSBL. Your mileage may vary, and I strongly recommend that you test and review results against your own mail streams.

Spamhaus ZEN: The DNSBL Resource Review

Spamhaus ZEN is a composite blocking list run by the Spamhaus Project. This UK-based organization was created in 1998 by Steve Linford, and is maintained by a group of employees spread across the globe.

APEWS: Doing the Math

I'm guilty. I admit it. I've called APEWS listings "random," which isn't quite right. Arbitrary would be a better word for it. Not to mention broad, and questionable.

APEWS, the "anonymous" blocking list meant to be an early warning system for spam, generates a lot of worry from administrators and end users who find themselves listed by way of plugging their IP address into an online lookup tools like DNSStuff. Though it doesn't result in much (if any) of anyone's mail being rejected, as it's not widely used, some people still think they're being labeled a spammer, and don't know what to do about it.

They've usually done nothing to warrant the listing; the simple fact of the matter is that they happen to have an IP address on the internet, and there's more than a 1/3 chance that this IP address will be on the APEWS blocklist.

As I've indicated previously, APEWS has IP address entries accounting for about 42% of the raw numerical depth of V4 IP address space, though I'm not excluding non-routable space and overlap between some listings. When one takes those factors into consideration, APEWS seems to list somewhere around 38% of currently routable IP4 network space.

Time for an experiment. What if I take a large chunk of address space, say, 42%, and list it all? I've got detailed records of spam and ham, and it's easy to bump my corpus up against an imaginary blocklist I've just made up right here on the back of this napkin.

Here's what happens when I do that: Over the past ten days or so, my 42% listing of IP space would've captured 62.8% of spam, but also incorrectly captured non-spam 31.5% of the time.

When I skinny my imaginary blocklist down to 38% of IP4 space, I get a 62.21% hit rate on spam and 31.15% false positive rate against non-spam. (In other words, just about the same numbers.)

To me, this is evidence that APEWS seems to be blocking some spam based on the "stopped clock is right twice a day" principle. List a large chunk of IP address space, and you're going to catch a significant amount spam, though inaccurately.

It further suggests to me that if I added a few rules to start my focus points with a bit of accuracy, I could probably tune this to get a hit rate close to what I see from APEWS, with its 73% hit rate against spam, and 26% false positive rate against non-spam (21 day average ending on 9/2/2007).

The conclusion I draw from this exercise is that only the barest thought has been given to the processes by which APEWS decides which IP addresses to list and for what reason. If I can get more than halfway there with a couple hours of sloppy bar napkin math, then perhaps they haven't thought it through too deeply.

What to do if you are listed on APEWS

If you are listed on the APEWS blocking list, as confirmed by checking their website, here's how I would recommend that you handle the situation. (Who the heck am I?)

Note: This isn’t guidance on how to avoid a blocklisting or sidestep anti-spam groups. If you have a spam issue, fix it. Don't spam, ever, for any reason. This is information is regarding how to address an issue with a list that is very aggressive at listing non-abusing IP addresses and networks, with no published, attainable path to resolution.

Don't despair. Be calm.
Do NOT post to a USENET newsgroup or to Google Groups, asking for assistance. Any replies you get will be from people who do NOT work for or with APEWS, and most of those replies will be unhelpful.
I can't stress this point strongly enough: Posting requests for help on the Internet will not get you any assistance. The APEWS FAQ directs people to post questions, but the only thing that happens is that discussion groups are overrun with questions, and the only people who answer those questions are (a) not involved with APEWS and (b) rarely polite or helpful.
APEWS ability to be used as a spam filter has been greatly reduced and restricted due to perceived malfeasance on the part of the APEWS maintainer(s). UCEPROTECT and SORBS, blocklist groups who used to publish the APEWS data, are no longer doing so as of August 13, 2007. This means that the two main channels available to administrators to use APEWS as a spam filter have been revoked. This means that if your mail bounced due to an APEWS listing before or on August 13, 2007, you might want to try to send your mail again – it would likely get through, as the list is even LESS widely used than it was up until August 13, 2007.
APEWS is very aggressive (meaning its use as a spam filter drives a lot of false positive blocking) and as measured by me on August 11, 2007, lists approximately 42% of the Internet. (By “the Internet” I mean IP4 address space.) In other words, they list nearly half the Earth, suggesting that anybody who actually wants to receive mail probably cannot use APEWS as a spam filter. This strongly suggests that very few people are going to block your mail because of the APEWS listing.
Anyone using APEWS as a spam filter is going against the advice of a multitude of other anti-spam advocates and email professionals. See my news and commentary roundup for more information and links to feedback from others in the anti-spam arena.
Since APEWS is not widely used, your next step should be a review of your bounce data. Have you received any bounces that reference an APEWS block?
If not, don’t worry about it. You just determined that you’re not having blocking issues that you can trace back to APEWS. It’s annoying that you’re listed on the website, but there’s little easy recourse available to you to address that.
If yes, you have received a bounce message that references APEWS, contact the site that blocked your mail. Call them on the phone or email them from a different email account (Hotmail, Gmail, etc.) Show them that APEWS is problematic and not widely used. Explain to them that you do not spam, and that APEWS has listed you even though you do not spam. Provide them links to this page on DNSBL.com with more information about APEWS.

If you want to learn more about APEWS, I've collected everything I know about this “anonymous” blocking list here on DNSBL.com.

I hope you find this information helpful. Please feel free to contact me with your comments or feedback. But, please note that I'm unable to consult with you regarding your specific situation -- I've already got a full time day job, and I'm not looking for consulting clients.

APEWS News and Commentary Roundup

APEWS, the Anonymous Postmasters Early Warning System, is an “anonymous” blocking list that claims to run in the style of SPEWS. That is to say, its goal is to be an “early warning system,” catching and stopping spam before other lists or filters have the opportunity to do so.

The APEWS blocking list was first announced by way of an anonymous posting to the newsgroup news.admin.net-abuse.blocklisting on January 12, 2007. Though this newsgroup post originated from the IP address 149.9.0.57 (registered to US provider PSI/Cogent), the list is widely believed to be run from Germany.

If you are listed on APEWS and wondering what to do, visit this page for my suggestions.

Accuracy

A quick review of the past thirteen weeks of my own stats.dnsbl.com data shows that the list has been ramping up in aggressiveness the entire time that I've been tracking it. What was barely a 20% effectiveness rate against spam eleven weeks ago is up to 80+ percent on a week-by-week basis. However, false positives have risen similarly.

The rising spam match rate is based on what I would characterize as the “stopped clock is right twice a day” principle. List enough IP addresses, and eventually you're going to stop some spam. The side effect is that you're going to block legitimate mail (and lots of it) at the same time. Against my personal hamtrap data, APEWS blocks two out of ten of every legitimate piece of newsletter or list mail that I've signed up for.

I'm not kidding about "listing enough IP addresses," either. As of today (August 11, 2007), APEWS lists just about 1.8 billion IP addresses - by the raw numbers alone, this is 42% of the entire IP4 networking space. Much of the IP space listed isn't even routable; suggesting little attention is being paid to what IP addresses are actually able to transmit traffic (email or otherwise). Also, APEWS has been growing at a very fast rate. From July 20th through today, they have added an additional 7.5 million IP addresses. These are data points that, in my opinion, suggest that the list is bloated, questionably targeted, and inaccurate.

09/30/2007 update: Click here to read about how I can similarly block around 60% of spam just by arbitrarily listing 42% of the internet.

Based on this data, and the recommendations of other trusted blocklist operators and anti-abuse folks, I personally would not use APEWS to filter incoming mail.

Controversy and Commentary

The blocklist is considered controversial by many other blocklist operators, ISP abuse staff, and anti-spam advocates.

Matthew Sullivan, SORBS maintainer, indicates that as of August 9, 2007, SORBS will no longer be publishing the APEWS blocklist zones via DNS.
Claus V. Wolfhausen, maintainer of UCEPROTECT, another German-run blocklist, indicates that UCEPROTECT will no longer publish the APEWS blocklist zones. (Previously: Claus warned that unless APEWS were to make immediate, significant changes to its policies, UCEPROTECT will no longer publish the APEWS blocklist zones.)
Suresh Ramasubramanian, respected anti-abuse manager for large mailbox provider Outblaze, categorizes APEWS as “meant to be used by fools.”
Steve Linford, Spamhaus maintainer, has suggested numerous times on newsgroups and elsewhere that APEWS is poorly run and is not widely used.
Kevin Liston and others from the Internet Storm Center have indicated that APEWS is using the ISC "top source" data to support blocklist entries, in violation of the data's license, and against the wishes of those who provide this data. ISC says that the data "is not supposed to be used as a blocklist as it is bound to include false positives" and that "APEWS may be a useful 'anti-spam" list if you do not mind losing a lot of valid e-mail as well."

Misplaced Newsgroup Discussion

If you read either of the two popular anti-spam newsgroups (news.admin.net-abuse.blocklisting and news.admin.net-abuse.email), you already know that both groups are often overrun with requests (example) from people who find that they are listed by APEWS. I find over 2,000 messages on these groups relating to APEWS remove requests, which is a high number considering that the blocklist is less than a year old. The blocklist group is run “anonymously.” Question 41 of the APEWS FAQ asks how one contacts APEWS. The answer includes the following: One does not. APEWS does not accept removal request by email, fax, voicemail or letters.” [...] “General blocklist related issues can be discussed in the public forums mentioned above. The newsgroups news.admin.net-abuse.blocklisting (NANABL) and news.admin.net-abuse.email (NANAE) are good choices.

This is likely why many administrators post to these newsgroups, asking for assistance, when finding their IP addresses are listed. The FAQ does warn that “abusing these newsgroups & lists by posting removal request you will make a fool of yourself,” but that doesn't seem to be a deterrent. I would theorize that this is because a lot of the people on the wrong side of listings do not understand why they are listed and do not now how to “fix” whatever issue led to the listing, as the listings are often broad and vague.

ISP Perspective

Vincent Schönau, an ISP abuse adminstrator, has related his APEWS experiences to me in email, and given me permission to share them here.

Other blacklists have employed the 'escalations' strategy in the past, but APEWS has taken it to a whole new level; a few spams from a providers ip ranges will cause all or most of the providers ip space to be listed in APEWS, with comments such as 'unprofessional / negligent provider'. What this means is that if your provider is a noticeable source of e-mail, sooner or later, it's going to get listed. Several providers of 'blacklist checks','blacklist comparisons', 'e-mail reputation checks' and include APEWS data. Apparently this is causing systems administrators who are desperate to reduce the amount of spam they're receiving to think that using it might work - perhaps because not all of those sources include the data on false positives for the blacklists. In practice, this means that several times a week, I'm spending time explaining to my users how they should work around the e-mail delivery-problems they're seeing which may or may not be related to APEWS. I could be spending this time taking action against compromised hosts in our network instead. This hurts providers who do take action against the abuse from their network more than providers who didn't care in the first place.

Others have related similar stories to me, of how long after spammers were booted, that a listing still persists. In one instance, a provider had a compromised machine, which was identified and disconnected within two hours of sending spam. Three days later APEWS listed it, and six weeks later, the listing persists, even though the issue is long since addressed.

If you are listed on APEWS and wondering what to do, visit this page for my suggestions.

SORBS: Accuracy Rates and False Positives

The blocking list SORBS (aka the “Spam and Open Relay Blocking System”) was created in 2002 by Australian Matthew Sullivan. SORBS publishes a main “aggregate zone” (dnsbl.sorbs.net) containing listings meeting a multitude of criteria beyond open relaying mail services. SORBS also publishes multiple other zones meeting various criteria.

As related previously, SORBS appears to be undergoing changes. Some of these changes appear to relate to the fact that the SORBS maintainer has repeatedly taken issue with the methodology used by DNSBL.com to measure accuracy rates and false positive rates.

SORBS has indicated that they have the ability to feed false or different data in response to queries from DNSBL.com. As such, it's unclear if recent query results are indicative of results seen by other users. Because of concerns that SORBS may be attempting to sway the data reported, it's important to share current data and information, so that system administrators can make an educated determination as to whether or not it would be wise to use this DNSBL.

Historical Information

I've been tracking data on the main SORBS zone, dnsbl.sorbs.net, since March, 2007. Here's what I've found.

For most of the past fifteen weeks, the DNSBL had an effectiveness rate varying between fifty percent and fifty six percent, week over week. This means that SORBS correctly blocked a piece of spam in my spamtrap about five to six times out of ten.
For many weeks, I believe SORBS clearly suffered from significant false positive issues. As measured by my own calculations (see here and here for more info), the false positive rate is in the 7.9% - 11.1% range. This means that if your users sign up for the same kind of mail that I did, that for every one hundred pieces of solicited mail your users signed up for and expected to receive, SORBS is likely to block seven to twelve of them.

Recent Data Changes

On July 9, 2007, changes were made to SORBS. As you can see from the chart above, around this time (near the start of week 12), the net result is that the effectiveness rate and false positive rates have both significantly declined.
Since July 9, 2007, I have not noted any additional false positive from the main SORBS zone. Because of indication from SORBS that they are able to feed false data, it is unclear if the results I am seeing are accurate.
Similarly, the effectiveness rate of the main SORBS zone seems to have greatly declined as well. Since July 9, 2007, it is hovering in the 18% range.

There are two possible conclusions to make here:

SORBS is somehow able to feed different blocklist data to DNSBL.com than to others. If so, then the historical data I have summarized above is likely to be the most accurate view of SORBS. Or,
SORBS has gutted its lists and the poor effectiveness rates I'm now seeing are reflective of how it would likely work for others.

It's hard to say which scenario is the more accurate one, and what future testing will reveal. I'll certainly continue to collect data, but right now, there's an open question of SORBS' effectiveness and false positives.

As of Thursday, July 19, 2007, SORBS changed the default zone mentioned in configuration guidance pages from dnsbl.sorbs.net to a domain not owned by SORBS. As a result, if any SORBS user copies and pastes a configuration snippet from one of the SORBS configuration pages verbatim, the result is that 100% of a site's inbound email will be blocked. My recommendation is to proceed with caution – if you are not sure what you're doing with DNSBL use and mail server configuration, a misstep here will have significant consequences.

SORBS has leveled the following criticism, assumably as justification for for the results published on DNSBL.com. Below is an overview and response to the points raised:

SORBS claims that the DNSBL.com email feed data is US-centric. This is true. The domains involved in these hamtraps and spamtraps are "dot com " domains, and have always been hosted in the US. If this means that SORBS is inaccurate as a result, it suggests that SORBS is Australia-centric, and likely will not work as well for those in other countries.
SORBS claims that a false positive as defined on DNSBL.com is not what everyone calls a false positive. This is true. I consider a false positive to be a requested message that was blocked. Others have different definitions. I believe the definition used on DNSBL.com to be accurate. I further believe that the most common definition of a false positive as used by regular end users or system administrators is most likely to align with my own.
SORBS is unable to verify false positive hits, as DNSBL.com does not provide IP addresses correlating to false positive hits. This is true. If data were provided to any blocklist operator regarding false positives, this would enable the DNSBL to whitewash over the issues by removing the IP addresses reported (and no others). This is similar to why blocklist groups do not provide spamtrap information – they do not want their spamtraps “compromised,” which would allow a bad sender to simply stop sending to spamtraps, but continue spamming elsewhere. Therefore, this information is not provided to any blocklist. (Other list operators have been more understanding.)
SORBS claims that the zone “dnsbl.sorbs.net” being queried by DNSBL.com is not the zone used by most users or recommended by SORBS as the main or default zone. This is untrue. It has or had clearly been positioned as the default zone or default recommended configuration choice, and remains the zone first listed, positioned as the “aggregate zone” as of July 20, 2007.
SORBS claims that Spamhaus volunteers have (or had) access to the SORBS database and have entered listings in the past to drive significant false positive issues. I am not associated with either SORBS or Spamhaus so I can't speak to this accusation.
SORBS claims that the methodology of checking mail against DNSBLs within 15 minutes of receipt is inaccurate. This is untrue. Anyone who uses a DNSBL is enabling their mail server or spam filter to check the mail against the DNSBL within seconds to minutes of receipt. If, as SORBS states, their DNSBL distribution model is such that it suffers from this methodology, then it suggests that it may be slow to respond to real spam trends. (10/29/2007 update: At a recent conference, over a beer with a colleague who builds tools to block spam for a living, I was gently chided over this bit of methodology. I was told that I was letting mail get far too old. 15 minutes is a hundred years as far as spam vector measurement is concerned; the vendor in question uses a 60 second interval at maximum. By this logic, I was being too forgiving as far as slowly updating anti-spam blocklists were concerned. This is further at odds with the criticism from SORBS.)
SORBS has picked a specific sender as the source for the SORBS false positive rates I report, saying that this sender is a "habitual source of spam." I have no financial interest or any other connection to the sender in question, except that I ordered pillows from them in December, 2006, and was happy with the product and service they provided. As a result, I signed up to receive mail from them, and happily do so. If I used SORBS to reject mail, that mail would not reach me. Additionally, this sender is far from the only source of false positives I found when utilizing the SORBS blocklist. (11/09/2007 update: The specific sender is/was Overstock.com. SORBS categorizes Overstock as a spammer. Matthew Sullivan (now known as Michelle Sullivan), in fact, indicated that "1000's of people who receive unsolicited commercial/bulk email from them." There are two additional problems with his characterizations here. First, Overstock.com is not listed on ANY OTHER of the approximately 47 blocklists I check, except FIVETEN (which lists many hundreds of potentially legitimate senders, and therefore, is not very useful as a second opinion here.) It's not on any of the lists that commonly do list supposedly-legitimate senders who may have run afoul of spamtraps. Second, the last mail I had received from Overstock.com was on May 25, 2007. This is significantly before the July 9th cutoff of my data, and measured false positives were on the rise even with no further mail from Overstock.com in the data set. Incidentally, I have no idea why I've received no mail since. I didn't unsubscribe.)

Additionally, SORBS has made numerous statements questioning the accuracy of data published here, and characterizing this project as something other than honest and transparent. Here's how it works: I have a feed of mail, and I check all mail received for DNSBL hits. I give internet users a live, rolling snapshot of how various lists intersect with my mail steams. That's all there is to it. I leave it to you, the reader, to decide if I've been honest and clear at every step of this process, and as always, I welcome your feedback.

(11/18/2007 Update: Added the phrase "that if your users sign up for the same kind of mail that I did" above to clarify false positive comments.)

Spamcop BL: Take Another Look (It’s Accurate!)

If you know me, you know that in the past, I’ve made no secret of my disdain for the Spamcop DNSBL, aka the SCBL. I’ve worked in spam prevention, deliverability, and the email realm for a long time, in various capacities. I’ve created and run at least two blocking lists that you know about. Later, I helped to design and create a system that processed thousands of confirmed opt-in/double opt-in newsletter signups a day. Combine those two details together and that’s what led me to take issue with Spamcop. My employer’s COI/DOI signup servers kept getting listed by Spamcop, based on some really bad math to measure email volume thresholds and make a determination as to what to list.

I was trying to do the right thing. I was implementing what Spamcop (and other anti-spam groups) want: confirmed opt-in/double opt-in. Yet Spamcop was listing the servers and subsequent mailings regardless. It made me really frustrated, and I was very disappointed. See, it’s not really fighting spam. It’s just blocking mail you don’t like, or don’t care about. While perfectly allowed, I am of the opinion that it’s lame to do so under the banner of “fighting the good fight” to stop spam. I’ve shared my thoughts on this topic in just about every available forum—websites, blogs, discussion lists. I know I’ve personally guided many sysadmins away from using the SCBL in the past, because it was easily, demonstrably, listing things that were obviously not spam.

In February 2007, I found that Microsoft was using the SCBL to filter/reject inbound corporate email. (Note that I said corporate email—mail sent to users at micrsoft.com, not users of MSN or Hotmail. I don’t know whether or not SCBL data is used in MSN Hotmail delivery determination, but from what I’ve observed, it doesn’t seem to be.) This started me off on another rant on how ill-advised I felt this was, based on my prior experiences with Spamcop. Some kindly folks (and some less kindly) suggested that I needed to revisit my opinion of the Spamcop blocking list, because things have changed.

After a lot of measuring and discussion, I’m here to tell you: Spamcop’s DNSBL has changed, and for the better. It works very well nowadays, as personally measured by me. The open question on Spamcop was what drove me to dive into my massive DNSBL tracking project. I started that back on March 10^th. Ever since then I’ve been compiling data on Spamcop blocking list matches against both spam and non-spam. Here’s what I see:

DNSBL	Spam hits	Acc %	Ham hits	Failure Rate
Spamcop SCBL	156194	49.37%	0	0.00%
Spamhaus ZEN	255521	80.77%	5	0.10%
Spamcop+ZEN	267795	84.65%	5	0.10%

Range:	~ 74 days
Total Spam	316348
Total Ham	4999

As you can see, Spamcop helps you attack nearly 50% of spam received, while affecting no legitimate senders. Very few lists do better. Spamhaus ZEN (which combines multiple lists) does better, but will occasionally have a false positive, based on some reputational issue perceived with a given sender.

My recommendation: Spamcop’s blocking list is safe to use, and will effectively help you reduce the amount of spam you have to deal with. Where I find it particularly useful is as an addition to Spamhaus ZEN: If you block mail from entities on either list, you get a 3.8% percent boost in effectiveness. Meaning, just under four percent of my spam hits are found on the Spamcop list, but not on Spamhaus.

For historical reasons only, here are links to my previous articles on Spamcop:

Spamcop Roundup http://www.dnsbl.com/2007/03/spamcop-roundup.html Spamcop BL: A blocklist with a hair trigger http://www.dnsbl.com/2007/02/spamcop-bl-list-with-hair-trigger.html Microsoft using Spamcop BL http://www.spamresource.com/2007/02/microsoft-using-spamcop-and-spamhaus.html My Problems with Spamcop http://www.spamresource.com/2003/03/problems-with-spamcop.html

Which DNSBLs work well?

This is a question I get quite often and it’s a tough one to answer. I don’t really bother with running my own mail system any more, as I’m tired of the headache and happy to leave the server-level spam prevention to somebody else.

And I'm tired of taking other peoples' word for it that a certain blocklist works well or doesn't work well -- I've been burned a number of times by people listing stuff on a blocklist outside of a list's defined charter. It's very frustrating. And lots of people publish stats on how much mail they block with a given list, which is an incomplete measure of whether or not a list is any good. Think about it. If you block all mail, you're going to block all spam. But you're going to block all the rest of your inbound mail, too. And when you block mail with a DNSBL, you don't always have an easy way to tell if that mail was actually wanted or not.

So, I decided to tackle it a bit differently than other folks have. See, I have my own very large spamtrap, and the ability to compare lots of data on the fly.

For this project, I've created two feeds. One is a spam feed, composed of mail received by my many spamtrap addresses, with lots of questionable mail and obvious non-spam weeded out. I then created a non-spam feed. In this “hamtrap” I am directed solicited mail that I signed up for from over 400 senders, big and small. Now, I just have to sit back, watch the mail roll in, and watch the data roll up.

For the past week or so, I’ve been checking every piece of mail received at either the spamtrap or hamtrap against a bunch of different blocklists. I wrote software to ensure that the message is checked within a few minutes of receipt, a necessary step to gather accurate blocklist “hit” data.

After that first week, here’s what I’ve found. It might be obvious to you, or it might not: Spamhaus is a very accurate blocklist, and some others...aren't. Spamhaus’s “ZEN” blocklist correctly tagged about two-thirds of my spam, and tagged no desired mail incorrectly. Fairly impressive, especially when compared to some other blocklists. SORBS correctly tagged 55% of my spam mail, but got it wrong on the non-spam side of things ten percent of the time. If you think throwing away ten percent of the mail you want is troublesome, how about rejecting a third of desired mail? That’s what happens if you use the Fiveten blocklist. It correctly would block 58% of my spam during the test period, but with a false positive rate of 34%, that would make it unacceptable blocklist to use in any corporate environment where you actually want to receive mail your users asked to receive.

One fairly surprising revelation is that Spamcop’s blocklist is nowhere as bad as I had previously believed it to be. I’ve complained periodically here about how Spamcop’s math is often wrong, how it too often lists confirmed opt-in senders, how it is too aggressive against wanted mail, but...my data (so far) shows a complete lack of false positives. This is a nice change, and it makes me very happy to see. Assuming this trend keeps up, I think you'll see me rewriting and putting disclaimers in front of some of my previous rants on that topic.

Spamhaus ZEN: Recommended

Look for a longer article from me in the near future on Spamhaus; I'm collecting a ton of data against a large spam corpus and hope to summarize and publish my findings within the next month or so.

Until then, feel free to bop on over to Spam Resource, where I talk about my experience using the Spamhaus ZEN list to tag and filter inbound mail to our abuse desk. I've been quite pleased with the results.

Also of note is that Microsoft is using both Spamcop and Spamhaus to reject mail to their corporate users. (They're NOT using it on MSN Hotmail.)

Update: Find my full review of Spamhaus ZEN here on DNSBL Resource.

DCC: Spam filter?

The Distributed Checksum Clearinghouse (DCC), created by Vernon Schryver, is a very powerful tool to help system administrators identify and block bulk mail. The project's website suggests a strong correlation between "bulk" and "spam," but as I do a bit more research, I don't think it's always that simple.

There's a common misconception in the spam filtering world (and the sending world) -- people think DCC is a spam blocking list. It's not, though. It's a tool to help users block bulk mail, not spam mail. That's an important distinction.

Think about it. There are a lot of types of bulk mail you might have signed up for and might want, things like newsletters you actually subscribed to, messages from companies you've done business with and actually want to hear from, or news, weather and traffic alerts you might be waiting for. (I don't need an email message to warn me that it's snowing outside, but I know that lots of people sign up for these.)

DCC tells you whether or not the mail attempting to be delivered was sent to lots of people besides you. Sure, spam is sent to lots of people all at once, but so is a bunch of solicited mail. What defines spam is whether or not you signed up to receive it. If you signed up to receive it, whether or not other people are getting it too has no bearing on the fact that you asked for it.

If a filter like DCC rejects a piece of mail you actually solicited and wished to receive, I would consider that a "false positive." To help prevent false positives, proper DCC usage dictates that you whitelist, ahead of time, all the sources of legitimate list or bulk mail you wish to receive. They include this sample file to get started, and they recommend this whitelist of example small messages that are most likely to be caught up in the filtering, even if solicited.

As Vernon Schryver himself said on the DCC mailing list recently, false positives "speak to a misuse or misunderstanding of [DCC]." He says that in a sense, there's no such thing as a DCC false positive. My interpretation of his comments is that he means that it's up to users of DCC to know what they're getting in to. DCC blocks mail sent to multiple recipients, and it's up to you to whitelist any mail sources you want to receive mail from.

DCC is a very powerful tool. That's both a plus and a minus. If you know what you're doing, comfortable working without a safety net, manually compiling lists of sites you want to receive any sort of bulk or list mail from, then maybe it can work for you to help reduce spam.

But, if you're not clear on the difference between bulk and spam, are not clear on what sites are sending you bulk or list mail that you or your users will want, then it's not going to work the way you think, and it's going to reject mail that you or your users asked for.

Internet Service Providers (ISPs), when deciding whether or not to accept a sender's mail, do measure whether or not your message is being sent to multiple people. It's not the only thing they look at, though. The smarter ISPs tie in a reputation measurement to that process. Meaning, is this mail coming from a good sender, or a bad sender? Does this sender generate spam complaints? Does this sender generate an above average percentage of bounces? Wrap that all up together, and an ISP has good info available to them to decide what mail to accept. Don't measure any of those things, and you're left with an incomplete view -- no easy way to tell the good mail from the bad. It's up to you to know about and whitelist the good senders ahead of time. If you don't, you're going to reject mail from them, presumably mail that you or your users wanted to receive.

Spamcop BL: A blacklist with a hair trigger

The Spamcop Blocking List (SCBL) is a DNSBL populated with data obtained from spamtrap hits and spam reports from users of the popular Spamcop spam reporting service. The Spamcop spam reporting service was originally created by Julian Haight. It was later purchased by Ironport Systems. Ironport has since been purchased by networking and communications technology company Cisco. (In spite of this transition to corporate ownership, the Spamcop site's front page contains a prominent legal defense fund link, and contains further information on the fund in the Spamcop FAQ.)

Unlike the more privately-run CBL, which is designed to minimize the impact on legitimate mail, the SCBL regularly blocks sources of mail that some feel are legitimate. It has been described as having a "hair trigger" by respected anti-spam and internet guru John Levine, and I related some of the issues I've had with Spamcop from 2003 over here on spamresource.com. In fact, back around that time, the SCBL information page said this regarding using the list: "This blocking list is somewhat experimental and should not be used in a production environment where legitimate email must be delivered." As I look at the same page today, in February, 2006, I can see that guidance has since been modified somewhat. Spamcop now recommends "use of the SCBL in concert with an actively maintained whitelist of wanted email senders. SpamCop encourages SCBL users to tag and divert email, rather than block it outright." Both then and now, they go on to add, "The SCBL is aggressive and often errs on the side of blocking mail."

Translated: "Don't block mail with this blocking list, it will block mail you want."

Like ISP feedback loops, the spam complaints lodged by Spamcop users are sometimes found to be erroneous. That's not to say that where there's smoke, there's never a fire. But just like with feedback loop reports, significant spam issues generate far more reports than than the day-to-day noise of people lodging spam reports about email from a company they previously did business with, or otherwise had a potentially legitimate reason to be contacted by a given sender. (As an example, I noted my issues with confirmed opt-in/double opt-in systems being listed by Spamcop in 2003; I don't believe I'm the only one to ever have observed that kind of issue.) My experience with Spamcop has taught me that it's not always that good at drawing the line between blocking spam and blocking wanted mail.

Spamcop's probably really good at blocking spam-in-progress from infected servers spewing illegal spam. (Though, the CBL isn't too shabby at that, either.) The problem is, Spamcop will block mail in a number of edge cases, like if an email service provider is tasked with serving mail on behalf of some e-commerce or travel site. If you want to ensure that you're always going to receive your follow up emails from the department store you ordered that purse from, or the hotel reservation from a booking site that outsources their confirmation email, choosing to outright block mail from servers listed on the SCBL may not be your best choice.

CBL: Block those exploits!

The Composite Blocking List (CBL) is a DNSBL that helps you block mail from exploited computers. That includes abused open proxy servers, as well as virus and trojan-infected spam spewers, the primary vector for most of the illegal spam people are receiving nowadays. By some counts, there are millions of these computers in the world, and besides spam, they’re also responsible for denial-of-service attacks, virus distribution, phishing, etc.

As the CBL website indicates, the data behind the listings is sourced from very large spamtrap-receiving domains and various email infrastructures. Their intent is to list only IP addresses that exhibit characteristics specific to open proxies, viruses, stealth spamware applications loaded on a computer without the user’s knowledge, etc. They don’t knowingly attempt to block any sort of legitimate mail. And I would characterize “legitimate” very broadly here – legitimate senders like most email service providers (and their clients) should rarely, if ever find their mail blocked by a CBL listing.

Though, on occasion, it does happen. CBL doesn’t ever list good senders intentionally. The problem is that some computers share IP addresses with others, behind a NAT (network address translation) device or firewall. Your legitimate mail could be going out to the internet over an IP address shared with an infected, spam-spewing Windows desktop. It’s fairly rare, but when it does happen, CBL makes it easy for you to address those kinds of issues, by allowing you to remove any entry from the list. This allows you to again send mail to the site that was rejecting it due to the listing. Keep in mind that if they again later see bad traffic coming from that IP, it could get listed again. That means it’s important to figure out what on your network is infected or spewing, and fix it.

I recommend use of the CBL (or one of the other lists that includes the CBL data) to filter or reject inbound mail. It helps to block some of the worst types of illegal spam out there, and the risk of blocking legitimate mail is very low.

The CBL listing data is integrated into the Spamhaus XBL (and is therefore also part of Spamhaus ZEN). If you use either of these Spamhaus DNSBLs to tag, filter or reject inbound mail, then there’s no need to utilize the CBL as well – you’re already doing so.