Lashback’s UBL: Tracking Unsubscribe Abuse

Looking for a blacklist that helped you block mail from people who send mail to lists of people who have unsubscribed? Well, that’s what Lashback’s Unsubscribe Blacklist (UBL) does.

To find out how it works, I went straight to the source. Here’s how Brandon Phillips, Lashback’s CEO, described it to me during a recent email exchange: “LashBack continually places seed/probe email addresses on every suppression file we discover (300k+ so far). We do this so we can monitor to see when email addresses (consumer unsubscribe requests) are harvested from those files and start receiving email. This is one of many internal checks our systems perform in an effort to establish "Trust" with an advertiser/sender.

“We know that if those uniquely identifiable probes start receiving email it's because a specific unsubscribe request has been misused. The IP address of the sender who's sending to that probe then ends up on our UBL.”

Lashback’s in a unique position to help identify and track bad guys who misuse unsubscribe requests. Here’s how they can do that.

First, they provide a service called Unsub Monitor. If you’re a list manager, this helps to notify you if your unsubscribe processing system ever breaks. All modern list management systems are a combination of code and data, and like anything else, where code and data intersect, bugs and crashes can follow. It’s great tool to help you, as a list manager, make sure you’re not breaking the law. (As the Yesmail/FTC case showed, people with the power to fine you are indeed watching to ensure that people, once unsubscribed, stay unsubscribed.)

In addition to that, they offer the Lashback Toolbar for end user. This is an Outlook plug-in that gives email users a standardized unsubscribe link for emails. It links back to a specific unsubscribe process for a specific list, and Lashback is able to track which ones work, and which ones don’t. This enables them to be able to tellwhen somebody continues to send email to a user even after that user has unsubscribed. Bad senders who abuse their unsubscribe list by sending mail to it can then end up on the UBL.

They obviously haven’t listed every bad actor in the whole world, but the fact that they are tracking 130,000 bad senders in the UBL currently (as of March 29) suggests that they mean business.

If you want to use their blacklist on your own system, the DNSBL zone is ubl.unsubscore.com. You can also download a text or XML file of the listed IPs for use in other spam filtering applications. See their UBL Resources page for more information.

Which blacklists work well?

This is a question I get quite often and it’s a tough one to answer. I don’t really bother with running my own mail system any more, as I’m tired of the headache and happy to leave the server-level spam prevention to somebody else.

And I'm tired of taking other peoples' word for it that a certain blacklist works well or doesn't work well -- I've been burned a number of times by people listing stuff on a blacklist outside of a list's defined charter. It's very frustrating. And lots of people publish stats on how much mail they block with a given list, which is an incomplete measure of whether or not a list is any good. Think about it. If you block all mail, you're going to block all spam. But you're going to block all the rest of your inbound mail, too. And when you block mail with a DNSBL, you don't always have an easy way to tell if that mail was actually wanted or not.

So, I decided to tackle it a bit differently than other folks have. See, I have my own very large spamtrap, and the ability to compare lots of data on the fly.

For this project, I've created two feeds. One is a spam feed, composed of mail received by my many spamtrap addresses, with lots of questionable mail and obvious non-spam weeded out. I then created a non-spam feed. In this “hamtrap” I am directed solicited mail that I signed up for from over 400 senders, big and small. Now, I just have to sit back, watch the mail roll in, and watch the data roll up.

For the past week or so, I’ve been checking every piece of mail received at either the spamtrap or hamtrap against a bunch of different blacklists. I wrote software to ensure that the message is checked within a few minutes of receipt, a necessary step to gather accurate blacklist “hit” data.

After that first week, here’s what I’ve found. It might be obvious to you, or it might not: Spamhaus is a very accurate blacklist, and some others...aren't. Spamhaus’s “ZEN” blacklist correctly tagged about two-thirds of my spam, and tagged no desired mail incorrectly. Fairly impressive, especially when compared to some other blacklists. SORBS correctly tagged 55% of my spam mail, but got it wrong on the non-spam side of things ten percent of the time. If you think throwing away ten percent of the mail you want is troublesome, how about rejecting a third of desired mail? That’s what happens if you use the Fiveten blacklist. It correctly would block 58% of my spam during the test period, but with a false positive rate of 34%, that would make it unacceptable blacklist to use in any corporate environment where you actually want to receive mail your users asked to receive.

One fairly surprising revelation is that Spamcop’s blacklist is nowhere as bad as I had previously believed it to be. I’ve complained periodically here about how Spamcop’s math is often wrong, how it too often lists confirmed opt-in senders, how it is too aggressive against wanted mail, but...my data (so far) shows a complete lack of false positives. This is a nice change, and it makes me very happy to see. Assuming this trend keeps up, I think you'll see me rewriting and putting disclaimers in front of some of my previous rants on that topic.

Want to see for yourself? I'm posting summary data daily, automatically, over on stats.dnsbl.com.

Data Interpretation Guide

The top chart contains a rollup of data for the past X days. Counts of number of spam messages and ham (non-spam) messages are included. The “accuracy percentage” is a percentage measurement derived from the number of spam messages that a particular blacklist correctly gauged as spam. The “inaccuracy percentage” is a percentage measurement derived from the number of ham (non-spam) messages that were incorrectly gauged to be spam.

The second chart provides a day-by-day breakdown of this data.

For example, on March 10, 2007, the Spamhaus blacklist correctly tagged 71% of spam received, and incorrectly tagged no non-spam mail. The Fiveten blacklist correctly tagged 66% of the spam, but incorrectly reported one third of the non-spam mail as spam. The determination I make from this data is that Spamhaus blocks more spam than Fiveten, and does it more accurately. If I used the Fiveten list, I would block much desired mail. With Spamhaus, no desired mail would have been blocked on that day.

Note that not everyone is going to agree with my classification of false positives, and that's fine. In my determination, a false positive is a piece of mail that I signed up for that would've been blocked by a given blacklist. I think that's accurate. You will find, though, that some blacklists list things that I would not consider spam. For example, some lists will block mail from any sender who is not 100% confirmed opt-in (aka double opt-in). Since very few senders are fully 100% confirmed opt-in, lists such as these inherently block mail from many senders. A list operated in this fashion would have a vastly different interpretation of what constitutes a false positive than I would. It would be within their charter to list and facilitate the blocking of mail from sites like this, even if they haven't sent spam. This wouldn't be considered a false positive by such a list, but would potentially be considered a false positive by me.

Click here for information on what and how I determine to be spam and not spam.

Spam & Ham: Overview & FAQ

Updated: November 10, 2007.


A lot of people have asked how the spam and ham (non-spam) data is compiled for the Blacklist Statistics Center here at DNSBL Resource. Where does it come from? What senders does it represent? Here's an updated overview of what goes in to the spam and ham (non-spam) feeds here at DNSBL Resource.


On the spam side of things, the input comes from a series of spamtrap domains and email addresses.

  • When I first set this project up, I took a bunch of old, dead email addresses and domains that I have had for years but haven't been using lately. I turned them back on, reviewed long snaphots of incoming data, and weeded out a lot of “edge case” stuff – things that I probably did actually sign up for (like virus notifications, updates from my domain registrar, etc.). Anything that didn't look like something I might have signed up for was assumed to be spam.

  • I also have some filtering in place to try to keep out backscatter. Backscatter (or outscatter) usually consists of misdirected bounces received in response to somebody else's spam run, bounced back by a mail server that should know better. This is clearly a problem, but there is vast disagreement on the anti-spam front as to whether or not backscatter equals spam. Since few agree, and I want to focus on spam, I ignore this as much as possible. A little leaks through here and there, but I don't think it's enough to skew any stats.

  • I recently registered some new domains that I and others knew were already were on spam lists. Anybody sending to these new domains clearly is doing a bad thing – sending to very old addresses, ignoring bounces, forging header information, etc. These also feed into the spam results.

  • From all of these sources, I get an average of over twelve thousand spam messages a day.


On the ham (non-spam) side of things, here's what I've done:

  • First, I signed up for a bunch of email lists. Stuff that I think regular users sign up for. Some of it is commercial, some of it isn't.

  • By commercial, I mean newsletters from different retailers, ones where I have a pretty strong suspicion that people actually sign up for their mail. Clothing stores, electronics retailers, etc.

  • Restaraunts. Some national chains and etc., but mostly info from my favorites in and around Chicago, Minneapolis, and other places I travel to.

  • Lots of media-related things. By this I mean news alerts from different newspaper and TV stations. Weekly newsletters for my favorite public radio shows. International media, national media, some local media. Movie reviews, too.

  • Some travel-related things. Notifications from different travel sites on upcoming sales, airport delays, etc.

  • A bit of geek stuff. Virus alerts, some how-to newsletters, various tech and science newsletters, etc.

  • In addition to all of this, there's a lot of one-to-one mail in the loop now, too. Mail from users at AOL, Hotmail, Yahoo, Gmail, and other big ISPs.

Frequently Asked Questions about the Spam and Ham Sources

What happens if I receive both spam and ham from the same IP address?

There's no evidence that this is happening yet, but if it happens, the spam is going to show up in the spam bucket, and the ham is going to show up in the ham bucket. I'm calculating based on specific email messages received, not just the IP address of the sender. Under no circumstances have I ever taken spam and counted it as ham, or vice versa.

But big company X is sending you ham (desired mail) and sending other people spam!

I kick senders out of the hamtrap feed if I see them doing something bad, like sending spam or re-purposing email addresses. I don't, however, take a blacklist's word alone that somebody must be a spammer simply because they're blacklisted. Clearly, not every blacklist gets it right every time. Even a good blacklist might list somebody who is sending me wanted mail, perhaps because they're sending unwanted mail to someone else. My take on this is that the more often this happens, the more likely it is that the blacklist is overly aggressive or questionably accurate. It's up to readers of my site to decide if the data I report suggests the same to them. Not everyone is likely to come to the same conclusion.

But the big ISP mail servers also send spam – aren't you going to mislead people by counting a mail from AOL as a false positive hit if that same AOL server is also sending spam?

Sure, every network emits spam sometimes, to some degree. I think the big mail servers at the big ISPs are probably no different. But, can you safely block mail from these IP addresses? After all, they send millions of legitimate messages daily. If you care about not blocking mail that your users want, you are probably going to tread lightly when it comes to deciding whether or not to block servers like that. I suspect that blacklist publishes face similar challenges. Maybe this data reveals exactly how quick on the trigger a blacklist may be in that situation.

But this is too much ham (non-spam) for one person to receive; it's not reflective of normal mail.

Sure, it's a bit concentrated, and the volume is somewhat high, but it's not supposed to be reflective of one single person's mailbox. Instead, it's actually a combination of a bunch of kinds of desired mail, from a bunch of different sources, that regular users are (in my humble estimation) are likely to receive. A single user at an ISP is unlikely to receive the 12,000+ spam messages I receive every day – it's similarly a combination of spam sent to a bunch of different users.

Clearly, you must be gaming this data to make blacklist X look good at the expense of blacklist Y.

No, I am not. I'm simply reporting how these blacklists intersect with my own mail streams. Your mailstreams may be different than mine. The same goes for any blacklist – not all are created equal. Not all have access to the same amount, or same quality, of data from which to decide what to list. Some might work better in foreign countries (I am in the US), some might work better in a hobbyist or educational setting (I think my data is more reflective of what a small to midsize ISP might see.) I have had some blacklist operators tell me that my data nearly exactly matches theirs, and I have had other blacklist operators tell me that my data is nothing like theirs. As always, your results may vary.

You really need to show results based on unique IP addresses.

I don't dedupe (remove duplicates from) the results based on IP address because I'm not counting IP addresses; I'm counting email messages. This isn't about who has the biggest list with the most IP addresses; it's about how accurate it is against my own mail stream. Any regular user who finds that a blacklist blocked ten spams from the same IP address is going to call that ten hits; not one hit.

I don't like this data because of X, Y or Z.

The best recommendation I can give in this situation is that you should consider generating your own statistics and sharing them with the world. I know that my mail streams and results definitely match what some people see – because in a lot of cases those people have contacted me and told me so. It's also exactly reflective of my own mail stream. Just because it's what we see doesn't mean that this is exactly what you'll see if you use the same blacklists. There are too many open variables, from the side of my spamtraps, to which spam lists I'm on, the composition of the mail your users sign up for, etc. As I said above, your results may vary.

Incidentally, I'm not above some friendly competition. I'd love to see more sites like this out there.

If you have any questions or comments about anything here, about the Blacklist Statistics Center, or anything on DNSBL Resource, please don't hesitate to contact me.

NJABL Dynablock List Now Obsolete

With the advent of Spamhaus's new PBL anti-spam blacklist, it appears that the NJABL Dynablock list is now obsolete. I just saw the following post on the public SPAM-L mailing list, from the NJABL folks:

The following text was sent to list AT njabl.org on Jan 19, 2007. Judging from the number of DNS queries still being handled for dynablock.njabl.org, the message doesn't seem to have made it to a wide enough audience.

If you use or know people who use dynablock.njabl.org, this is important information:

With the advent of Spamhaus's PBL (http://spamhaus.org/pbl/), dynablock.njabl.org has become obsolete. Rather than maintain separatesimilar DNSBL zones, NJABL will be working with Spamhaus on the PBL. Effective immediately, dynablock.njabl.org exists as a copy of the Spamhaus PBL. After dynablock users have had ample time to update their configurations, the dynablock.njabl.org zone will be emptied.

Other NJABL zones (i.e. dnsbl, combined, bhnc, and the qw versions) will continue, business as usual, except that combined will eventually lose its dynablock component.

If you currently use dynablock.njabl.org we recommend you switch immediately to pbl.spamhaus.org.

If you currently use combined.njabl.org, we recommend you add pbl.spamhaus.org to the list of DNSBLs you use.

You may also want to consider using zen.spamhaus.org, which is a combination zone consisting of Spamhaus's SBL, XBL, and PBL zones.

(Editor's note: I'm very happy with ZEN so far. See this post detailing my recent experiences.)

Spamcop Roundup

5/22/2007: This information is out of date. Please click here for my latest take on Spamcop's SCBL.

My most recent take on Spamcop, from February 2007, can be found here. In that commentary, I talk about the history of the Spamcop spam reporting service, its current corporate ownership, and my take on how this type of blacklist works, especially as to how it relates to to the impact against solicited (wanted) mail.

In February 2007, I found that Microsoft is using Spamcop to filter inbound (corporate) mail. By corporate mail, I mean mail to microsoft.com users, not mail to MSN/Hotmail users. This surprised me, because of what I believe are aggressive listing practices on the part of Spamcop. Indeed, how the issue was brought to my attention was by an unhappy person mad because he couldn't send one-to-one mail to Microsoft, because Spamcop blocked it.

Also, back in 2003, I published an article about the ongoing issues I was having with Spamcop blocking opt-in confirmation requests. Back then I found (through some admittedly unscientific survey techniques) that admins using the SCBL seemed to assume that all blocked mail must be spam because Spamcop blocked it. Not a very encouraging find. It was also a bit insulting to be lectured on how confirmed opt-in worked by people who were blocking confirmed opt-in requests, especially considering I've been pushing senders to implement and utilize confirmed opt-in/double opt-in for many years.

Spamhaus ZEN: Recommended

Look for a longer article from me in the near future on Spamhaus; I'm collecting a ton of data against a large spam corpus and hope to summarize and publish my findings within the next month or so.

Until then, feel free to bop on over to Spam Resource, where I talk about my experience using the Spamhaus ZEN blacklist to tag and filter inbound mail to our abuse desk. I've been quite pleased with the results.

Also of note is that Microsoft is using both Spamcop and Spamhaus to reject mail to their corporate users. (They're NOT using it on MSN Hotmail.)

Update: Find my full review of Spamhaus ZEN here on DNSBL Resource.