Saturday, September 29, 2007

New: Blacklist Statistics Center

Please take a moment to check out the new DNSBL Resource Blacklist Statistics Center. Replacing the old DNSBL stats page, this new section of DNSBL Resource provides week-by-week graphs for twenty different blacklists. At a glance, one can easily see the accuracy rates and false positive rates for the past thirteen weeks for any blacklist in the system. This data will be refreshed weekly, automatically, and new blacklists can be added easily.

Please don't hesitate to drop me a line with your feedback on this new functionality.

In the future, look for a more detailed index page for the statistics center, one that will show information on mail-in-progress.

Additionally, work is under way to add “second stage” filtering and URIBL-style blacklists to the statistics center. Stay tuned!

Monday, September 24, 2007

Status of completewhois.com: IN FLUX

Update 9/30/2007: The website www.completewhois.com is operational again, but some links appear to be broken. My attempts to query their DNSBLs have all timed out. While CompleteWhois may be on the mend, it seems that it may be too soon to give the all clear.

Previous updates below:

This post in the newsgroup news.admin.net-abuse.email, and my own testing both confirm that any DNSBLs running under completewhois.com are dead, no longer available.

If you were utilizing any DNSBL under completewhois.com, I recommend that you immediately remove it from your configuration. Time outs waiting for DNS replies will slow your inbound mail delivery. I have also heard mention of a wildcard DNS entry, which would result in any user of this now-dead list rejecting all of their inbound mail. I was not able to personally confirm the wildcard entry at the time of this posting.

This includes the zones invalidipwhois.dnsiplists.completewhois.com, hijacked.dnsiplists.completewhois.com, bogons.dnsiplists.completewhois.com, and combined-HIB.dnsiplists.completewhois.com.

Some SpamAssassin rules check completewhois.com DNSBLs. These queries are timing out, and are likely to be causing slower than expected mail delivery. The rules in question are: RCVD_IN_WHOIS_INVALID, RCVD_IN_WHOIS_BOGONS, RCVD_IN_WHOIS_HIJACKED, and RCVD_IN_WHOIS. At this time I would recommend disabling these rules.

Not all error messages explicitly refer to the full zone name. It's safe to assume that anything checking for "Bogons" is actually checking a completewhois.com DNSBL, and therefore would be returning a false positive response, or causing mail delays due to lookup timeouts. As an example, bounces with the error message "5.7.1 has been blocked by Bogons" are related to this issue, and indicate that a receiving site is continuing to use the completewhois.com lists, and is therefore unintentionally rejecting desired mail.

What happened to the list? Why did it go away? I'm not sure; this list is not run by a group that I have any contact with, and additional information is currently difficult to come by.

I'll be sure to update this page with more information as it becomes available.

Monday, September 03, 2007

APEWS: Doing the Math

I'm guilty. I admit it. I've called APEWS listings "random," which isn't quite right. Arbitrary would be a better word for it. Not to mention broad, and questionable.

APEWS, the "anonymous" blacklist meant to be an early warning system for spam, generates a lot of worry from administrators and end users who find themselves listed by way of plugging their IP address into an online lookup tools like DNSStuff. Though it doesn't result in much (if any) of anyone's mail being rejected, as it's not widely used, some people still think they're being labeled a spammer, and don't know what to do about it.

They've usually done nothing to warrant the listing; the simple fact of the matter is that they happen to have an IP address on the internet, and there's more than a 1/3 chance that this IP address will be on the APEWS blacklist.

As I've indicated previously, APEWS has IP address entries accounting for about 42% of the raw numerical depth of V4 IP address space, though I'm not excluding non-routable space and overlap between some listings. When one takes those factors into consideration, APEWS seems to list somewhere around 38% of currently routable IP4 network space.

Time for an experiment. What if I take a large chunk of address space, say, 42%, and blacklist it all? I've got detailed records of spam and ham, and it's easy to bump my corpus up against an imaginary blacklist I've just made up right here on the back of this napkin.

Here's what happens when I do that: Over the past ten days or so, my 42% blacklisting of IP space would've captured 62.8% of spam, but also incorrectly captured non-spam 31.5% of the time.

When I skinny my imaginary blacklist down to 38% of IP4 space, I get a 62.21% hit rate on spam and 31.15% false positive rate against non-spam. (In other words, just about the same numbers.)

To me, this is evidence that APEWS seems to be blocking some spam based on the "stopped clock is right twice a day" principle. List a large chunk of IP address space, and you're going to catch a significant amount spam, though inaccurately.

It further suggests to me that if I added a few rules to start my focus points with a bit of accuracy, I could probably tune this to get a hit rate close to what I see from APEWS, with its 73% hit rate against spam, and 26% false positive rate against non-spam (21 day average ending on 9/2/2007).

The conclusion I draw from this exercise is that only the barest thought has been given to the processes by which APEWS decides which IP addresses to list and for what reason. If I can get more than halfway there with a couple hours of sloppy bar napkin math, then perhaps they haven't thought it through too deeply.