dcc(8)

NAME

DCC - Distributed Checksum Clearinghouse

DESCRIPTION

The Distributed Checksum Clearinghouse or DCC is a coopera
tive, distributed system intended to detect "bulk" mail or mail
sent to many people. It allows individuals receiving a single
mail message to determine that many other people have received
essentially identical copies of the message and so reject or dis
card the message.
Freely redistributable source for the server, client, and
utilities is available at Rhyolite Software, http://www.rhyo
lite.com/dcc/
How the DCC Is Used
The DCC can be viewed as a tool for end users to enforce
their right to "opt-in" to streams of bulk mail by refusing bulk
mail except from sources in a "whitelist." Whitelists are the
responsibility of DCC clients, since only they know which bulk
mail they solicited.
The only false positives (mail marked as "bulk" by a DCC
server that is not) occur when one of the recipients of a message
report it to a DCC server as having been received many times or
when the "fuzzy" checksums of differing messages are the same.
The fuzzy checksums ignore aspects of messages in order to com
pute identical checksums for substantially identical messages.
The fuzzy checksums are designed to ignore only differences that
do not affect meanings.
It is not reasonable to worry about third parties reporting
your incoming or outgoing mail to a DCC server as bulk unless you
give them copies. If you trust yourself and your correspondents
to not report your mutual mail as bulk, then false positives are
not a concern.
A DCC server computes a lower bound on the total number of
addresses to which a message has been sent by counting checksums
reported by DCC clients. Each client must decide which bulk mes
sages are unsolicited and what degree of "bulkiness" is objec
tionable. Client DCC software marks, rejects, or discards mail
that is bulk according to local thresholds on target addresses
from DCC servers and unsolicited according to local whitelists.
DCC servers are usually configured to receive reports from as
many targets as possible, including sources that cannot be trust
ed to not exaggerate the number of copies of a message they see.
An end user of a DCC client angry about receiving a message could
report it with 10,000,000 separate DCC packets or with a single
report claiming as many targets. An unprincipled user could sub
scribe a "spam trap" to mailing lists such as those of the IETF
or CERT. Such abuses of the system area not problems, because
much legitimate mail is "bulk." You cannot reject bulk mail un
less you have a whitelist of sources of legitimate bulk mail.
The DCC can also be used by an Internet service provider to
detect bulk mail coming from its own customers. In such circum
stances, the DCC client might be configured to only log bulk mail
from unexpected (not white-listed) sources. See the -N option
for dccm(8) or dccifd(8).
What the DCC Is
A DCC server accumulates counts of cryptographically secure
checksums of messages but not the messages themselves. It ex
changes reports of frequently seen checksums with other servers.
DCC clients send reports of checksums related to incoming mail to
a nearby DCC server running dccd(8). Each report from a client
includes the number of recipients for the message. A DCC server
accumulates the reports and responds to clients the the current
total number of recipients for each checksum. The client adds an
SMTP header to incoming mail containing the total counts. It
then discards or rejects mail that is not "white-listed" and has
counts that exceed local thresholds.
A special value of the number of addressees is "MANY" and
means it is certain that this message was bulk and might be unso
licited, perhaps because it came from a locally blacklisted
source or was addressed to an invalid address or "spam trap."
The special value "MANY" is merely the largest value that fits in
the fixed sized field containing the count of addressees. That
"infinity" accumulated total can be reached with millions of in
dependent reports as well as with one or two.
DCC servers share or flood reports of checksums that are
seen frequently. Each server has its own threshold for determin
ing "frequently," because a message sent to 50 addressees in a
domain with 60 mailboxes is more likely to be unsolicited bulk
advertising than a message sent to 100 addressees in a domain
with 600,000 mailboxes.
To keep a server's database of checksums from growing with
out bound, checksums are forgotten when they become old. Check
sums with large totals are kept longer. See dbclean(8).
DCC clients pick the nearest working DCC server using a
small shared or memory mapped file, /var/dcc/map. It contains
server names, port numbers, passwords, recent performance mea
sures, and so forth. This file allows clients to use quick re
transmission timeouts and to waste little time on servers that
have temporarily stopped working or become unreachable. The
utility program cdcc(8) is used to maintain this file as well as
to check the health of servers.
X-DCC Headers
The DCC includes several programs used by clients. Dccm(8)
uses the sendmail "milter" interface to query a DCC server, add
header lines to incoming mail, and reject mail whose total check
sum counts are high. Dccm is intended to be run with SMTP
servers using sendmail.
Dccproc(8) adds header lines to mail presented by file name
or stdin, but relies on other programs such as procmail to deal
with mail with large counts. Dccsight(8) is similar but deals
with previously computed checksums.
Dccifd(8) is similar to dccproc but is not run separately
for each mail message and so is far more efficient. It receives
mail messages via a socket somewhat like dccm, but with a simpler
protocol that can be used by Perl scripts or other programs.
DCC SMTP header lines are of the form:

X-DCC-brand-Metrics: chost server-ID; bulk cknm1=count ck
nm2=count ...
where
brand is the "brand name" of the DCC server, such as
"RHYOLITE".
chost is the name or IP address of the DCC client that
added the header line to the SMTP message.
server-ID is the numeric ID of the DCC server that the
DCC client contacted.
bulk is present if one or more checksum counts exceed
ed the DCC client's thresholds to make the message "bulky."
cknm1,cknm2,... are types of checksums, and one of
IP address of SMTP client
env_From SMTP envelope value
From SMTP header line
Message-ID SMTP header line
Received last Received: header line in the
SMTP message
substitute SMTP header line chosen by the DCC
client, prefixed with the name of the header
Body SMTP body ignoring white-space
Fuz1 filtered or "fuzzy" body checksum
Fuz2 another filtered or "fuzzy" body
checksum
Counts for IP, env_From, From, Message-Id,
Received, and substitute checksums are omitted by the DCC client
if the server says it has no information. Counts for Body, Fuz1,
and Fuz2 are omitted if the message body is empty or contains too
little of the right kind of information for the checksum to be
computed.
count is the total number of recipients of messages
with that checksum reported directly or indirectly to the DCC
server. The special count "MANY" means that DCC client have
claimed that the message is directed at millions of recipients.
"MANY" imples the message definitely bulk, but not necessarily
unsolicited. The special counts "OK" and "OK2" mean the checksum
has been marked "good" or "half-good" by DCC servers.
An example header line is:

X-DCC-RHYOLITE-Metrics: calcite.rhyolite.com 101; Body=16
Fuz1=16 Fuz2=16
DCC clients commonly accept any mail regardless of other
checksum counts with at least one "OK" or at least two "OK2"
counts among IP, env_from, and From checksum counts. It is com
mon to reject other mail with large (including "MANY") counts
among Received, Body, Fuz1, and Fuz2 counts. It is generally not
wise to reject mail based on the other counts. For example,
"MAILER-DAEMON" appears to send vast quantities of mail.
Mailing lists
Legitimate mailing list traffic differs from spam only in
being solicited by recipients. Each client should have a private
whitelist.
DCC whitelists can also mark mail as unsolicited bulk using
blacklist entries for commonly forged marks such as "From: us
er@public.com".
Systems that send many essentially identical copies of so
licited mail such as "auto-responders," should be in the DCC
servers whitelists because their messages are often substantially
identical and so "bulk."
White and Blacklists
DCC server and client whitelist files share a common format.
Server files are always named whitelist and one is required to be
in the DCC home directory with the other server files. Client
whitelist files are commonly named whiteclnt in the DCC home di
rectory or a subdirectory specified with the -U option for dc
cm(8). They specify mail that should not be reported to a DCC
server or that is unsolicited bulk.
A DCC whitelist file contains blank lines, comments starting
with "#", and lines of the forms:
include pathname
option setting
count ip hostname
count env_From 821-path
count env_To dest-mailbox count From 822-mailbox
count substitute header string count Message-ID <string> count Received string
count hex_type hex_cksum
where
include can occur only in the main whitelist file.
pathname should be absolute or relative to the DCC
home directory.
option setting can only be in a DCC client whitelist
or whiteclnt file and affect only dccifd(8) and dccm(8). Set
tings in per-user whiteclnt files override settings in the global
file. Setting can be
log-all to log all mail messages.
log-normal
to log only messages that meet the
logging thresholds.
dcc-on
dcc-off Control DCC filtering. See the dis
cussion of -W for dccm(8) and dccifd(8).
greylist-off
greylist-on
to control greylisting. Greylisting
for other recipients in the same SMTP transaction can still cause
greylist temporary rejections. greylist-off in the main white
clnt file.
greylist-log-on
greylist-log-off
to control logging of greylisted mail
messages.
DNSBL-on
DNSBL-off
honor or ignore results of DNS black
list checks configured with -B for dccm(8) and dccifd(8).
The default in the main whiteclnt file is
equivalent to
option log-normal option dcc-on
option greylist-on option greylist-log-on option DNSBL-off
count is null and assumed to be the same as on the
previous line or one of
MANY indicating millions of targets have re
ceived messages with that checksum.
OK if the message is OK.
OK2 if it is "half OK." Two OK2 checksums
associated with a message are generally equivalent to an OK.
hostname is an
address IPv4 or IPv6.
block of 2 to 1024 IPv4 or IPv6 addresses
in the standard form xxx.yyy.zzz.www/mm with mm limited for serv
er whitelists to 16 for IPv4 or 112 for IPv6.
name that will be converted to one or
more IP addresses.
dest-mailbox is an RFC 821 address or a local user
name.
821-path is an RFC 821 address.
822-mailbox is an RFC 822 address with optional name. header is the name of an SMTP header such as "Sender"
or the name of one of two SMTP envlope values, "HELO" or
"Mail_Host" for the sendmail resolved host name from the 821-path
in the message's 821-path.
hex_type is the string hex followed by a blank and one
of the preceding checksum types or body, Fuz1, or Fuz2. hex_cksum is a string of four hexadecimal numbers ob
tained from a DCC log file.
A DCC server never shares or floods reports containing
checksums marked in its whitelist with OK or OK2 to other
servers. A DCC client does not report or ask its server about
messages with a checksum marked OK or OK2 in the client
whitelist. This is intended to allow a DCC client to keep pri
vate mail so private that even its checksums are not disclosed.
Checksums of the IP address of the SMTP client sending a
mail message are practically unforgeable, because it is impracti
cal for an SMTP client to "spoof" its address or pretend to use
some other IP address. That would make the IP address of the
sender useful for white-listing, except that the IP address of
the SMTP client is often not available to users of dccproc(8).
In addition, legitimate mail relays make whitelist entries for IP
addresses of little use. For example, the IP address from which
a message arrived might be that of a local relay instead of the
home address of a white-listed mailing list.
Envelope and header From values can be forged, so whitelist
entries for their checksums are not completely reliable.
Checksums of env_To values are never sent to DCC servers.
They are valid in only whiteclnt files and used only by dccm(8),
dccifd(8), and other DCC clients with access to the envelope Rcpt
To value. They are another mechanism used by DCC clients to pro
tect the privacy of some mail.
Greylists
The DCC server, dccd(8), can be used to maintain a greylist
database for some DCC clients including dccm(8) and dccifd(8).
Greylisting involves temporarily refusing mail from unfamiliar
SMTP clients and is unrelated to Distributed Checksum Clearing
houses.
See http://projects.puremagic.com/greylisting/
Privacy
Because sending mail is a less private act than receiving
it, and because sending bulk mail is usually not private at all
and cannot be very private, the DCC tries first to protect the
privacy of mail recipients, and second the privacy of senders of
mail that is not bulk.
DCC clients necessarily disclose some information about mail
they have received. The DCC database contains checksums of mail
bodies, header lines, and source addresses. While it contains
significantly less information than is available by "snooping" on
Internet links, it is important that the DCC database be treated
as containing sensitive information and to not put the most pri
vate information in the DCC database. Given the contents of a
message, one might determine whether that message has been re
ceived by a system that subscribes to the DCC. Guesses about the
sender and addressee of a message can also be validated if the
checksums of the message have been sent to a DCC server.
Because the DCC is distributed, organizations can operate
their own DCC servers, and configure them to share or "flood" on
ly the checksums of bulk mail that is not in local whitelists.
DCC clients should not report the checksums of messages
known to be private to a DCC server. For example, checksums of
messages local to a system or that are otherwise known a priori
to not be unsolicited bulk should not be sent to a remote DCC
server. This can accomplished by adding entries for the sender
to the client's local whitelist file. Client whitelist files can
also include entries for email recipients whose mail should not
be reported to a DCC server.
Additional privacy protections are provided by the thresh
olds at which DCC servers exchange or flood reports. These
thresholds are primarily intended to reduce the traffic among DCC
servers using the observation that the vast majority of messages
are sent to a handful of addressees and so are useless to other
DCC servers. A DCC server's peer reporting thresholds also en
sure that checksums shared with peer DCC servers are "bulk" and
so intrinsically not private.
Security
Whenever considering security, one must first consider the
risks. The worst DCC security problems are unauthorized commands
to a DCC service, denial of the DCC service, and corruption of
DCC data. The worst that can be done with remote commands to a
DCC server is to turn it off or otherwise cause it to stop re
sponding. The DCC is designed to fail gracefully, so that a de
nial of service attack would at worst allow delivery of mail that
would otherwise be rejected. Corruption of DCC data might at
worst cause mail that is already somewhat "bulk" by virtue of be
ing received by two or more people to appear have higher recipi
ent numbers. Since all DCC users must "white-list" all sources
of legitimate bulk mail, this is also not a concern. Such secu
rity risks should be addressed, but only with defenses that don't
cost more than the possible damage from an attack..
The DCC must contend with senders of unsolicited bulk mail
who resort to unlawful actions to express their displeasure at
having their advertising blocked. Because the DCC protocol is
based on UDP, an unhappy advertiser could try to flood a clear
inghouse server with packets supposedly from subscribers or non
subscribers. DCC servers defend against that attack by rate-lim
it requests from non-subscribers.
Also because of the use of UDP, clients must be protected
against forged answers to their queries. Otherwise an unsolicit
ed bulk mail advertiser could send a stream of "not spam" answers
to an SMTP client while simultaneously sending mail that would
otherwise be rejected. This is not a problem for authenticated
clients of the DCC because they share a secret with the DCC.
Unauthenticated DCC clients do not share any secrets with the
DCC, except for unique and unpredictable bits in each query or
report sent to the DCC. Therefore, DCC servers cryptographically
sign answers to unauthenticated clients with bits from the corre
sponding queries. This protects against attackers that do not
have access to the stream of packets from the DCC client.
The passwords or shared secrets used in the DCC client and
server programs are "cleartext" for several reasons. In any
shared secret authentication system, at least one party must know
the secret or keep the secret in cleartext. You could encrypt
the secrets in a file, but because they are used by programs, you
would need a cleartext copy of the key to decrypt the file some
where in the system, making such a scheme more expensive but no
more secure than a file of cleartext passwords. Asymmetric sys
tems such as that used in UNIX allow one party to not know the
secrets, but they must be and are designed to be computationally
expensive when used in applications like the DCC that involve
thousands or more authentication checks per second. Moreover,
because of "dictionary attacks," asymmetric systems are now lit
tle more secure than keeping passwords in cleartext. An adver
sary can compare the hash values of combinations of common words
with /etc/passwd hash values to look for bad passwords. Worse,
by the nature of a client/server protocol like that used in the
DCC or a UNIX shell login, clients must have the cleartext pass
word. Since it is among the more numerous and much less secure
clients that adversaries would seek files of DCC passwords, it
would be a waste to complicate the DCC server with an asymmetric
system like that used by UNIX.
The DCC protocol is vulnerable to dictionary attacks to re
cover passwords. An adversary could capture some DCC packets,
and then check to see if any of the 100,000 to 1,000,000 pass
words in so called "cracker dictionaries" applied to a packet
generated the same signature. This is a concern only if DCC
passwords are poorly chosen, such as any combination of words in
an English dictionary. There are ways to prevent this vulnera
bility regardless of how badly passwords are chosen, but they are
computationally expensive and require additional network round
trips. Since DCC passwords are created and typed into files once
and do not need to be remembered by people, it is cheaper and
quite easy to simply choose good passwords that are not in dic
tionaries.
Reliability
It is better to fail to filter unsolicited bulk mail than to
fail to deliver legitimate mail, so DCC clients fail in the di
rection of assuming that mail is legitimate or even white-listed.
A DCC client sends a report or other request and waits for
an answer. If no answer arrives within a reasonable time, the
client retransmits. There are many things that might result in
the client not receiving an answer, but the most important is
packet loss. If the client's request does not reach the server,
it is easy and harmless for the client to retransmit. If the
client's request reached the server but the server's response was
lost, a retransmission to the same server would be misunderstood
as a new report of another copy of the same message unless it is
detected as a retransmission by the server. The DCC protocol in
cludes transactions identifiers for this purpose. If the client
retransmitted to a second server, the retransmission would be
misunderstood by the second server as a new report of the same
message.
Each request from a client includes a timestamp to aid the
client in measuring the round trip time to the server and to let
the client pick the closest server. Clients monitor the speed of
all of the servers they know including those they are not cur
rently using, and use the quickest.
Client and Server-IDs
Servers and clients use numbers or IDs to identify them
selves. ID 1 is reserved for anonymous, unauthenticated clients.
All other IDs are associated with a pair of passwords in the ids
file, the current and next or previous and current passwords.
Clients included their client IDs in their messages. When they
are not using the anonymous ID, they digitally sign their mes
sages to servers with the first password associated with their
client-ID. Servers treat messages with signatures that match
neither of the passwords for the client-ID in their own ids file
as if the client had used the anonymous ID.
Each server has a unique server-ID less than 32768. Servers
use their IDs to identify checksums that they flood to other
servers. Each server expects local clients sending administra
tive commands to use the server's ID and sign administrative com
mands with the associated password.
Server-IDs must be unique among all systems that share re
ports by "flooding." All servers must be told of the IDs all
other servers whose reports can be received in the local
/var/dcc/flod file described in dccd(8). However, server-IDs can
be mapped during flooding between independent DCC organizations.
Passwd-IDs are server-IDs that should not be assigned to
servers but used to specify passwords used in the inter-server
flooding protocol. They are used in publicly readable configura
tion files to specify passwords in private files.
The client identified by a client-ID might be a single com
puter with a single IP address, a single but multi-homed comput
er, or many computers. Client-IDs are not used to identify
checksum reports, but the organization operating the client. A
client-ID need only be unique among clients using a single serv
er. A single client can use different client-IDs for different
servers, each client-ID authenticated with a separate password.
An obscure but important part of all of this is that the in
ter-server flooding algorithm depends on server-IDs and times
tamps attached to reports of checksums. The inter-server flood
ing mechanism requires cooperating DCC servers to maintain rea
sonable clocks ticking in UTC. Clients include timestamps in
their requests, but as long as their timestamps are unlikely to
be repeated, they need not be very accurate.
Installation Considerations
DCC clients on a computer share information about which
servers are currently working and their speeds in a shared memory
segment. This segment also contains server host names, IP ad
dresses, and the passwords needed to authenticate known clients
to servers. That generally requires that dccm(8), dccproc(8),
dccifd(8), and cdcc(8) execute with an UID that can write to the
DCC home directory and its files. The sendmail interface, dccm,
is a daemon that can be started by an "rc" or other script al
ready running with the correct UID. The other two, dccproc and
cdcc need to be set-UID because they are used by end users. They
relinquish set-UID privileges when not needed.
Files that contain cleartext passwords including the shared
file used by clients must be readable only by "owner."
The data files required by a DCC can be in a single "home"
directory, often /var/dcc. Distinct DCC servers can run on a
single computer, provided they use distinct UDP port numbers and
home directories. It is possible and convenient for the DCC
clients using a server on the same computer to use the same home
directory as the server.
The DCC source distribution includes sample control files.
They should be modified appropriately and then copied to the DCC
home directory. Files that contain cleartext passwords must not
be publicly readable.
The DCC source includes "feature" m4 files to configure
sendmail to use dccm(8) to check a DCC server about incoming
mail.
See also the INSTALL.txt or INSTALL.html file.
Client Installation
Installing a DCC client starts with obtaining or compiling
program binaries for the client server data control tool, cd
cc(8). Installing the sendmail DCC interface, dccm(8), or dc
cproc(8), the general or procmail(1) interface is the main part
of the client installation. Connecting the DCC to sendmail with
dccm is most powerful, but requires administrative control of the
system running sendmail.
As noted above, cdcc and dccproc should be set-UID to a
suitable UID. Root or 0 is thought to be safe for both, because
they are careful to release privileges except when they need them
to read or write files in the DCC home directory. A DCC home di
rectory should be created, often in /var/dcc. It must be owned
and writable by the UID to which cdcc is set.
After the DCC client programs have been obtained, contact
the operator(s) of the chosen DCC server(s) to obtain each serv
er's hostname, port number, and a client-ID and corresponding
password. No client-IDs or passwords are needed touse DCC
servers that allow anonymous clients. Use the load or add com
mands of cdcc to create a map file in the DCC home directory. It
is usually necessary to create a client whitelist file of the
format described above. To accommodate users sharing a computer
but not ideas about what is solicited bulk mail, the client
whitelist file can be any valid path name and need not be in the
DCC home directory.
If dccm is chosen, arrange to start it with suitable argu
ments before sendmail is started. See the homedir/dcc_conf file
and the misc/rcDCC script in the DCC source. The procmail DCCM
interface, dccproc(8), can be run manually or by a procmailrc(5)
rule.
Server Installation
The DCC server, dccd(8), also requires that the DCC home di
rectory exist. It does not use the client shared or memory
mapped file of server addresses, but it requires other files.
One is the ids file of client-IDs, server-IDs, and corresponding
passwords. Another is a flod file of peers that send and receive
floods of reports of checksums with large counts. Both files are
described in dccd(8).
The server daemon should be started when the system is re
booted, probably before sendmail. See the misc/rcDCC and
misc/start-dccd files in the DCC source.
The database should be cleaned regularly with dbclean(8)
such as by running the crontab job that is in the misc directory.

SEE ALSO

cdcc(8), dbclean(8), dcc(8), dccd(8), dccifd(8), dccm(8),
dccproc(8), dblist(8), dccsight(8), sendmail(8).

HISTORY

The Distributed Checksum Clearinghouse is based on an idea
of Paul Vixie with code designed and written at Rhyolite Software
starting in 2000. This describes version 1.2.74.
BSD December 8, 2007
Copyright © 2010-2025 Platon Technologies, s.r.o.           Index | Man stránky | tLDP | Dokumenty | Utilitky | O projekte
Design by styleshout