DCC: Spam filter?

The Distributed Checksum Clearinghouse (DCC), created by Vernon Schryver, is a very powerful tool to help system administrators identify and block bulk mail. The project's website suggests a strong correlation between "bulk" and "spam," but as I do a bit more research, I don't think it's always that simple.

There's a common misconception in the spam filtering world (and the sending world) -- people think DCC is a spam blacklist. It's not, though. It's a tool to help users block bulk mail, not spam mail. That's an important distinction.

Think about it. There are a lot of types of bulk mail you might have signed up for and might want, things like newsletters you actually subscribed to, messages from companies you've done business with and actually want to hear from, or news, weather and traffic alerts you might be waiting for. (I don't need an email message to warn me that it's snowing outside, but I know that lots of people sign up for these.)

DCC tells you whether or not the mail attempting to be delivered was sent to lots of people besides you. Sure, spam is sent to lots of people all at once, but so is a bunch of solicited mail. What defines spam is whether or not you signed up to receive it. If you signed up to receive it, whether or not other people are getting it too has no bearing on the fact that you asked for it.

If a filter like DCC rejects a piece of mail you actually solicited and wished to receive, I would consider that a "false positive." To help prevent false positives, proper DCC usage dictates that you whitelist, ahead of time, all the sources of legitimate list or bulk mail you wish to receive. They include this sample file to get started, and they recommend this whitelist of example small messages that are most likely to be caught up in the filtering, even if solicited.

As Vernon Schryver himself said on the DCC mailing list recently, false positives "speak to a misuse or misunderstanding of [DCC]." He says that in a sense, there's no such thing as a DCC false positive. My interpretation of his comments is that he means that it's up to users of DCC to know what they're getting in to. DCC blocks mail sent to multiple recipients, and it's up to you to whitelist any mail sources you want to receive mail from.

DCC is a very powerful tool. That's both a plus and a minus. If you know what you're doing, comfortable working without a safety net, manually compiling lists of sites you want to receive any sort of bulk or list mail from, then maybe it can work for you to help reduce spam.

But, if you're not clear on the difference between bulk and spam, are not clear on what sites are sending you bulk or list mail that you or your users will want, then it's not going to work the way you think, and it's going to reject mail that you or your users asked for.

Internet Service Providers (ISPs), when deciding whether or not to accept a sender's mail, do measure whether or not your message is being sent to multiple people. It's not the only thing they look at, though. The smarter ISPs tie in a reputation measurement to that process. Meaning, is this mail coming from a good sender, or a bad sender? Does this sender generate spam complaints? Does this sender generate an above average percentage of bounces? Wrap that all up together, and an ISP has good info available to them to decide what mail to accept. Don't measure any of those things, and you're left with an incomplete view -- no easy way to tell the good mail from the bad. It's up to you to know about and whitelist the good senders ahead of time. If you don't, you're going to reject mail from them, presumably mail that you or your users wanted to receive.