+
+@node SpamAssassin
+@subsection SpamAssassin, Vipul's Razor, DCC, etc
+@cindex SpamAssassin
+@cindex Vipul's Razor
+@cindex DCC
+
+The days where the hints in the previous section was sufficient in
+avoiding spam is coming to an end. There are many tools out there
+that claim to reduce the amount of spam you get. This section could
+easily become outdated fast, as new products replace old, but
+fortunately most of these tools seem to have similar interfaces. Even
+though this section will use SpamAssassin as an example, it should be
+easy to adapt it to most other tools.
+
+If the tool you are using is not installed on the mail server, you
+need to invoke it yourself. Ideas on how to use the
+@code{:postscript} mail source parameter (@pxref{Mail Source
+Specifiers}) follows.
+
+@lisp
+(setq mail-sources
+ '((file :prescript "formail -bs spamassassin < /var/mail/%u")
+ (pop :user "jrl"
+ :server "pophost"
+ :postscript "mv %t /tmp/foo; formail -bs spamc < /tmp/foo > %t")))
+@end lisp
+
+Once you managed to process your incoming spool somehow, thus making
+the mail contain e.g. a header indicating it is spam, you are ready to
+filter it out. Using normal split methods (@pxref{Splitting Mail}):
+
+@lisp
+(setq nnmail-split-methods '(("spam" "^X-Spam-Flag: YES")
+ ...))
+@end lisp
+
+Or using fancy split methods (@pxref{Fancy Mail Splitting}):
+
+@lisp
+(setq nnmail-split-methods 'nnmail-split-fancy
+ nnmail-split-fancy '(| ("X-Spam-Flag" "YES" "spam")
+ ...))
+@end lisp
+
+Some people might not like the idea of piping the mail through various
+programs using a @code{:prescript} (if some program is buggy, you
+might lose all mail). If you are one of them, another solution is to
+call the external tools during splitting. Example fancy split method:
+
+@lisp
+(setq nnmail-split-fancy '(| (: kevin-spamassassin)
+ ...))
+(defun kevin-spamassassin ()
+ (save-excursion
+ (let ((buf (or (get-buffer " *nnmail incoming*")
+ (get-buffer " *nnml move*"))))
+ (if (not buf)
+ (progn (message "Oops, cannot find message buffer") nil)
+ (set-buffer buf)
+ (if (eq 1 (call-process-region (point-min) (point-max)
+ "spamc" nil nil nil "-c"))
+ "spam")))))
+@end lisp
+
+That is about it. As some spam is likely to get through anyway, you
+might want to have a nifty function to call when you happen to read
+spam. And here is the nifty function:
+
+@lisp
+ (defun my-gnus-raze-spam ()
+ "Submit SPAM to Vipul's Razor, then mark it as expirable."
+ (interactive)
+ (gnus-summary-show-raw-article)
+ (gnus-summary-save-in-pipe "razor-report -f -d")
+ (gnus-summary-mark-as-expirable 1))
+@end lisp
+
+@node Hashcash
+@subsection Hashcash
+@cindex hashcash
+
+A novel technique to fight spam is to require senders to do something
+costly for each message they send. This has the obvious drawback that
+you cannot rely on that everyone in the world uses this technique,
+since it is not part of the Internet standards, but it may be useful
+in smaller communities.
+
+While the tools in the previous section work well in practice, they
+work only because the tools are constantly maintained and updated as
+new form of spam appears. This means that a small percentage of spam
+will always get through. It also means that somewhere, someone needs
+to read lots of spam to update these tools. Hashcash avoids that, but
+instead requires that everyone you communicate with supports the
+scheme. You can view the two approaches as pragmatic vs dogmatic.
+The approaches have their own advantages and disadvantages, but as
+often in the real world, a combination of them is stronger than either
+one of them separately.
+
+@cindex X-Hashcash
+The ``something costly'' is to burn CPU time, more specifically to
+compute a hash collision up to a certain number of bits. The
+resulting hashcash cookie is inserted in a @samp{X-Hashcash:}
+header. For more details, and for the external application
+@code{hashcash} you need to install to use this feature, see
+@uref{http://www.cypherspace.org/~adam/hashcash/}. Even more
+information can be found at @uref{http://www.camram.org/}.
+
+If you wish to call hashcash for each message you send, say something
+like:
+
+@lisp
+(require 'hashcash)
+(add-hook 'message-send-hook 'mail-add-payment)
+@end lisp
+
+The @code{hashcash.el} library can be found at
+@uref{http://users.actrix.gen.nz/mycroft/hashcash.el}, or in the Gnus
+development contrib directory.
+
+You will need to set up some additional variables as well:
+
+@table @code
+
+@item hashcash-default-payment
+@vindex hashcash-default-payment
+This variable indicates the default number of bits the hash collision
+should consist of. By default this is 0, meaning nothing will be
+done. Suggested useful values include 17 to 29.
+
+@item hashcash-payment-alist
+@vindex hashcash-payment-alist
+Some receivers may require you to spend burn more CPU time than the
+default. This variable contains a list of @samp{(ADDR AMOUNT)} cells,
+where ADDR is the receiver (email address or newsgroup) and AMOUNT is
+the number of bits in the collision that is needed. It can also
+contain @samp{(ADDR STRING AMOUNT)} cells, where the STRING is the
+string to use (normally the email address or newsgroup name is used).
+
+@item hashcash
+@vindex hashcash
+Where the @code{hashcash} binary is installed.
+
+@end table
+
+Currently there is no built in functionality in Gnus to verify
+hashcash cookies, it is expected that this is performed by your hand
+customized mail filtering scripts. Improvements in this area would be
+a useful contribution, however.
+
+@node Filtering Spam Using spam.el
+@subsection Filtering Spam Using spam.el
+@cindex spam filtering
+@cindex spam.el
+
+The idea behind @code{spam.el} is to have a control center for spam detection
+and filtering in Gnus. To that end, @code{spam.el} does two things: it
+filters incoming mail, and it analyzes mail known to be spam.
+
+So, what happens when you load @code{spam.el}? First of all, you get
+the following keyboard commands:
+
+@table @kbd
+
+@item M-d
+@itemx M s x
+@itemx S x
+@kindex M-d
+@kindex S x
+@kindex M s x
+@findex gnus-summary-mark-as-spam
+@code{gnus-summary-mark-as-spam}.
+
+Mark current article as spam, showing it with the @samp{H} mark.
+Whenever you see a spam article, make sure to mark its summary line
+with @kbd{M-d} before leaving the group.
+
+@item M s t
+@itemx S t
+@kindex M s t
+@kindex S t
+@findex spam-bogofilter-score
+@code{spam-bogofilter-score}.
+
+You must have bogofilter processing enabled for that command to work
+properly.
+
+@xref{Bogofilter}.
+
+@end table
+
+Gnus can learn from the spam you get. All you have to do is collect
+your spam in one or more spam groups, and set the variable
+@code{spam-junk-mailgroups} as appropriate. In these groups, all messages
+are considered to be spam by default: they get the @samp{H} mark. You must
+review these messages from time to time and remove the @samp{H} mark for
+every message that is not spam after all. When you leave a spam
+group, all messages that continue with the @samp{H} mark, are passed on to
+the spam-detection engine (bogofilter, ifile, and others). To remove
+the @samp{H} mark, you can use @kbd{M-u} to "unread" the article, or @kbd{d} for
+declaring it read the non-spam way. When you leave a group, all @samp{H}
+marked articles, saved or unsaved, are sent to Bogofilter or ifile
+(depending on @code{spam-use-bogofilter} and @code{spam-use-ifile}), which will study
+them as spam samples.
+
+Messages may also be deleted in various other ways, and unless
+@code{spam-ham-marks-form} gets overridden below, marks @samp{R} and @samp{r} for
+default read or explicit delete, marks @samp{X} and @samp{K} for automatic or
+explicit kills, as well as mark @samp{Y} for low scores, are all considered
+to be associated with articles which are not spam. This assumption
+might be false, in particular if you use kill files or score files as
+means for detecting genuine spam, you should then adjust
+@code{spam-ham-marks-form}. When you leave a group, all _unsaved_ articles
+bearing any the above marks are sent to Bogofilter or ifile, which
+will study these as not-spam samples. If you explicit kill a lot, you
+might sometimes end up with articles marked @samp{K} which you never saw,
+and which might accidentally contain spam. Best is to make sure that
+real spam is marked with @samp{H}, and nothing else.
+
+All other marks do not contribute to Bogofilter or ifile
+pre-conditioning. In particular, ticked, dormant or souped articles
+are likely to contribute later, when they will get deleted for real,
+so there is no need to use them prematurely. Explicitly expired
+articles do not contribute, command @kbd{E} is a way to get rid of an
+article without Bogofilter or ifile ever seeing it.
+
+@strong{TODO: @code{spam-use-ifile} does not process spam articles on group exit.
+I'm waiting for info from the author of @code{ifile-gnus.el}, because I think
+that functionality should go in @code{ifile-gnus.el} rather than @code{spam.el}.}
+
+To use the @code{spam.el} facilities for incoming mail filtering, you
+must add the following to your fancy split list
+@code{nnmail-split-fancy} or @code{nnimap-split-fancy}:
+
+@example
+(: spam-split)
+@end example
+
+Note that the fancy split may be called @code{nnmail-split-fancy} or
+@code{nnimap-split-fancy}, depending on whether you use the nnmail or
+nnimap back ends to retrieve your mail.
+
+The @code{spam-split} function will process incoming mail and send the mail
+considered to be spam into the group name given by the variable
+@code{spam-split-group}. Usually that group name is @samp{spam}.
+
+The following are the methods you can use to control the behavior of
+@code{spam-split}:
+
+@menu
+* Blacklists and Whitelists::
+* BBDB Whitelists::
+* Blackholes::
+* Bogofilter::
+* Ifile spam filtering::
+* Extending spam.el::
+@end menu
+
+@node Blacklists and Whitelists
+@subsubsection Blacklists and Whitelists
+@cindex spam filtering
+@cindex whitelists, spam filtering
+@cindex blacklists, spam filtering
+@cindex spam.el
+
+@defvar spam-use-blacklist
+Set this variables to t (the default) if you want to use blacklists.
+@end defvar
+
+@defvar spam-use-whitelist
+Set this variables to t if you want to use whitelists.
+@end defvar
+
+Blacklists are lists of regular expressions matching addresses you
+consider to be spam senders. For instance, to block mail from any
+sender at @samp{vmadmin.com}, you can put @samp{vmadmin.com} in your
+blacklist. Since you start out with an empty blacklist, no harm is
+done by having the @code{spam-use-blacklist} variable set, so it is
+set by default. Blacklist entries use the Emacs regular expression
+syntax.
+
+Conversely, whitelists tell Gnus what addresses are considered
+legitimate. All non-whitelisted addresses are considered spammers.
+This option is probably not useful for most Gnus users unless the
+whitelists is very comprehensive. Also see @ref{BBDB Whitelists}.
+Whitelist entries use the Emacs regular expression syntax.
+
+The Blacklist and whitelist location can be customized with the
+@code{spam-directory} variable (@file{~/News/spam} by default). The whitelist
+and blacklist files will be in that directory, named @file{whitelist} and
+@file{blacklist} respectively.
+
+@node BBDB Whitelists
+@subsubsection BBDB Whitelists
+@cindex spam filtering
+@cindex BBDB whitelists, spam filtering
+@cindex BBDB, spam filtering
+@cindex spam.el
+
+@defvar spam-use-bbdb
+
+Analogous to @code{spam-use-whitelist} (@pxref{Blacklists and
+Whitelists}), but uses the BBDB as the source of whitelisted addresses,
+without regular expressions. You must have the BBDB loaded for
+@code{spam-use-bbdb} to work properly. Only addresses in the BBDB
+will be allowed through; all others will be classified as spam.
+
+@end defvar
+
+@node Blackholes
+@subsubsection Blackholes
+@cindex spam filtering
+@cindex blackholes, spam filtering
+@cindex spam.el
+
+@defvar spam-use-blackholes
+
+This option is disabled by default. You can let Gnus consult the
+blackhole-type distributed spam processing systems (DCC, for instance)
+when you set this option. The variable @code{spam-blackhole-servers}
+holds the list of blackhole servers Gnus will consult. The current
+list is fairly comprehensive, but make sure to let us know if it
+contains outdated servers.
+
+The blackhole check uses the @code{dig.el} package, but you can tell
+@code{spam.el} to use @code{dns.el} instead for better performance if
+you set @code{spam-use-dig} to nil. It is not recommended at this
+time to set @code{spam-use-dig} to nil despite the possible
+performance improvements, because some users may be unable to use it,
+but you can try it and see if it works for you.
+
+@end defvar
+
+@node Bogofilter
+@subsubsection Bogofilter
+@cindex spam filtering
+@cindex bogofilter, spam filtering
+@cindex spam.el
+
+@defvar spam-use-bogofilter
+
+Set this variable if you want to use Eric Raymond's speedy Bogofilter.
+This has been tested with a locally patched copy of version 0.4. Make
+sure to read the installation comments in @code{spam.el}.
+
+With a minimum of care for associating the @samp{H} mark for spam
+articles only, Bogofilter training all gets fairly automatic. You
+should do this until you get a few hundreds of articles in each
+category, spam or not. The shell command @command{head -1
+~/.bogofilter/*} shows both article counts. The command @kbd{S t} in
+summary mode, either for debugging or for curiosity, triggers
+Bogofilter into displaying in another buffer the @emph{spamicity}
+score of the current article (between 0.0 and 1.0), together with the
+article words which most significantly contribute to the score.
+
+@end defvar
+
+@node Ifile spam filtering
+@subsubsection Ifile spam filtering
+@cindex spam filtering
+@cindex ifile, spam filtering
+@cindex spam.el
+
+@defvar spam-use-ifile
+
+Enable this variable if you want to use Ifile, a statistical analyzer
+similar to Bogofilter. Currently you must have @code{ifile-gnus.el}
+loaded. The integration of Ifile with @code{spam.el} is not finished
+yet, but you can use @code{ifile-gnus.el} on its own if you like.
+
+@end defvar
+
+@node Extending spam.el
+@subsubsection Extending spam.el
+@cindex spam filtering
+@cindex spam.el, extending
+@cindex extending spam.el
+
+Say you want to add a new back end called blackbox. Provide the following:
+
+@enumerate
+@item
+documentation
+
+@item
+code
+
+@example
+(defvar spam-use-blackbox nil
+ "True if blackbox should be used.")
+@end example
+
+Add
+@example
+ (spam-use-blackbox . spam-check-blackbox)
+@end example
+to @code{spam-list-of-checks}.
+
+@item
+functionality
+
+Write the @code{spam-check-blackbox} function. It should return
+@samp{nil} or @code{spam-split-group}. See the existing
+@code{spam-check-*} functions for examples of what you can do.
+@end enumerate
+
+@node Filtering Spam Using Statistics (spam-stat.el)
+@subsection Filtering Spam Using Statistics (spam-stat.el)
+@cindex Paul Graham
+@cindex Graham, Paul
+@cindex naive Bayesian spam filtering
+@cindex Bayesian spam filtering, naive
+@cindex spam filtering, naive Bayesian
+
+Paul Graham has written an excellent essay about spam filtering using
+statistics: @uref{http://www.paulgraham.com/spam.html,A Plan for
+Spam}. In it he describes the inherent deficiency of rule-based
+filtering as used by SpamAssassin, for example: Somebody has to write
+the rules, and everybody else has to install these rules. You are
+always late. It would be much better, he argues, to filter mail based
+on whether it somehow resembles spam or non-spam. One way to measure
+this is word distribution. He then goes on to describe a solution
+that checks whether a new mail resembles any of your other spam mails
+or not.
+
+The basic idea is this: Create a two collections of your mail, one
+with spam, one with non-spam. Count how often each word appears in
+either collection, weight this by the total number of mails in the
+collections, and store this information in a dictionary. For every
+word in a new mail, determine its probability to belong to a spam or a
+non-spam mail. Use the 15 most conspicuous words, compute the total
+probability of the mail being spam. If this probability is higher
+than a certain threshold, the mail is considered to be spam.
+
+Gnus supports this kind of filtering. But it needs some setting up.
+First, you need two collections of your mail, one with spam, one with
+non-spam. Then you need to create a dictionary using these two
+collections, and save it. And last but not least, you need to use
+this dictionary in your fancy mail splitting rules.
+
+@menu
+* Creating a spam-stat dictionary::
+* Splitting mail using spam-stat::
+* Low-level interface to the spam-stat dictionary::
+@end menu
+
+@node Creating a spam-stat dictionary
+@subsubsection Creating a spam-stat dictionary
+
+Before you can begin to filter spam based on statistics, you must
+create these statistics based on two mail collections, one with spam,
+one with non-spam. These statistics are then stored in a dictionary
+for later use. In order for these statistics to be meaningful, you
+need several hundred emails in both collections.
+
+Gnus currently supports only the nnml back end for automated dictionary
+creation. The nnml back end stores all mails in a directory, one file
+per mail. Use the following:
+
+@defun spam-stat-process-spam-directory
+Create spam statistics for every file in this directory. Every file
+is treated as one spam mail.
+@end defun
+
+@defun spam-stat-process-non-spam-directory
+Create non-spam statistics for every file in this directory. Every
+file is treated as one non-spam mail.
+@end defun
+
+Usually you would call @code{spam-stat-process-spam-directory} on a
+directory such as @file{~/Mail/mail/spam} (this usually corresponds
+the the group @samp{nnml:mail.spam}), and you would call
+@code{spam-stat-process-non-spam-directory} on a directory such as
+@file{~/Mail/mail/misc} (this usually corresponds the the group
+@samp{nnml:mail.misc}).
+
+@defvar spam-stat
+This variable holds the hash-table with all the statistics -- the
+dictionary we have been talking about. For every word in either
+collection, this hash-table stores a vector describing how often the
+word appeared in spam and often it appeared in non-spam mails.
+
+If you want to regenerate the statistics from scratch, you need to
+reset the dictionary.
+
+@end defvar
+
+@defun spam-stat-reset
+Reset the @code{spam-stat} hash-table, deleting all the statistics.
+
+When you are done, you must save the dictionary. The dictionary may
+be rather large. If you will not update the dictionary incrementally
+(instead, you will recreate it once a month, for example), then you
+can reduce the size of the dictionary by deleting all words that did
+not appear often enough or that do not clearly belong to only spam or
+only non-spam mails.
+@end defun
+
+@defun spam-stat-reduce-size
+Reduce the size of the dictionary. Use this only if you do not want
+to update the dictionary incrementally.
+@end defun
+
+@defun spam-stat-save
+Save the dictionary.
+@end defun
+
+@defvar spam-stat-file
+The filename used to store the dictionary. This defaults to
+@file{~/.spam-stat.el}.
+@end defvar
+
+@node Splitting mail using spam-stat
+@subsubsection Splitting mail using spam-stat
+
+In order to use @code{spam-stat} to split your mail, you need to add the
+following to your @file{~/.gnus} file:
+
+@example
+(require 'spam-stat)
+(spam-stat-load)
+@end example
+
+This will load the necessary Gnus code, and the dictionary you
+created.
+
+Next, you need to adapt your fancy splitting rules: You need to
+determine how to use @code{spam-stat}. In the simplest case, you only have
+two groups, @samp{mail.misc} and @samp{mail.spam}. The following expression says
+that mail is either spam or it should go into @samp{mail.misc}. If it is
+spam, then @code{spam-stat-split-fancy} will return @samp{mail.spam}.
+
+@example
+(setq nnmail-split-fancy
+ `(| (: spam-stat-split-fancy)
+ "mail.misc"))
+@end example
+
+@defvar spam-stat-split-fancy-spam-group
+The group to use for spam. Default is @samp{mail.spam}.
+@end defvar
+
+If you also filter mail with specific subjects into other groups, use
+the following expression. It only the mails not matching the regular
+expression are considered potential spam.
+
+@example
+(setq nnmail-split-fancy
+ `(| ("Subject" "\\bspam-stat\\b" "mail.emacs")
+ (: spam-stat-split-fancy)
+ "mail.misc"))
+@end example
+
+If you want to filter for spam first, then you must be careful when
+creating the dictionary. Note that @code{spam-stat-split-fancy} must
+consider both mails in @samp{mail.emacs} and in @samp{mail.misc} as
+non-spam, therefore both should be in your collection of non-spam
+mails, when creating the dictionary!
+
+@example
+(setq nnmail-split-fancy
+ `(| (: spam-stat-split-fancy)
+ ("Subject" "\\bspam-stat\\b" "mail.emacs")
+ "mail.misc"))
+@end example
+
+You can combine this with traditional filtering. Here, we move all
+HTML-only mails into the @samp{mail.spam.filtered} group. Note that since
+@code{spam-stat-split-fancy} will never see them, the mails in
+@samp{mail.spam.filtered} should be neither in your collection of spam mails,
+nor in your collection of non-spam mails, when creating the
+dictionary!
+
+@example
+(setq nnmail-split-fancy
+ `(| ("Content-Type" "text/html" "mail.spam.filtered")
+ (: spam-stat-split-fancy)
+ ("Subject" "\\bspam-stat\\b" "mail.emacs")
+ "mail.misc"))
+@end example
+
+
+@node Low-level interface to the spam-stat dictionary
+@subsubsection Low-level interface to the spam-stat dictionary
+
+The main interface to using @code{spam-stat}, are the following functions:
+
+@defun spam-stat-buffer-is-spam
+called in a buffer, that buffer is considered to be a new spam mail;
+use this for new mail that has not been processed before
+
+@end defun
+
+@defun spam-stat-buffer-is-no-spam
+called in a buffer, that buffer is considered to be a new non-spam
+mail; use this for new mail that has not been processed before
+
+@end defun
+
+@defun spam-stat-buffer-change-to-spam
+called in a buffer, that buffer is no longer considered to be normal
+mail but spam; use this to change the status of a mail that has
+already been processed as non-spam
+
+@end defun
+
+@defun spam-stat-buffer-change-to-non-spam
+called in a buffer, that buffer is no longer considered to be spam but
+normal mail; use this to change the status of a mail that has already
+been processed as spam
+
+@end defun
+
+@defun spam-stat-save
+save the hash table to the file; the filename used is stored in the
+variable @code{spam-stat-file}
+
+@end defun
+
+@defun spam-stat-load
+load the hash table from a file; the filename used is stored in the
+variable @code{spam-stat-file}
+
+@end defun
+
+@defun spam-stat-score-word
+return the spam score for a word
+
+@end defun
+
+@defun spam-stat-score-buffer
+return the spam score for a buffer
+
+@end defun
+
+@defun spam-stat-split-fancy
+for fancy mail splitting; add the rule @samp{(: spam-stat-split-fancy)} to
+@code{nnmail-split-fancy}
+
+This requires the following in your @file{~/.gnus} file:
+
+@example
+(require 'spam-stat)
+(spam-stat-load)
+@end example
+
+@end defun
+
+Typical test will involve calls to the following functions:
+
+@example
+Reset: (setq spam-stat (make-hash-table :test 'equal))
+Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
+Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
+Save table: (spam-stat-save)
+File size: (nth 7 (file-attributes spam-stat-file))
+Number of words: (hash-table-count spam-stat)
+Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
+Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
+Reduce table size: (spam-stat-reduce-size)
+Save table: (spam-stat-save)
+File size: (nth 7 (file-attributes spam-stat-file))
+Number of words: (hash-table-count spam-stat)
+Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
+Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
+@end example
+
+Here is how you would create your dictionary:
+
+@example
+Reset: (setq spam-stat (make-hash-table :test 'equal))
+Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
+Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
+Repeat for any other non-spam group you need...
+Reduce table size: (spam-stat-reduce-size)
+Save table: (spam-stat-save)
+@end example
+