The idea behind @code{spam.el} is to have a control center for spam detection
and filtering in Gnus. To that end, @code{spam.el} does two things: it
-filters incoming mail, and it analyzes mail known to be spam.
+filters incoming mail, and it analyzes mail known to be spam or ham.
+@emph{Ham} is the name used throughout @code{spam.el} to indicate
+non-spam messages.
So, what happens when you load @code{spam.el}? First of all, you get
the following keyboard commands:
Mark current article as spam, showing it with the @samp{H} mark.
Whenever you see a spam article, make sure to mark its summary line
-with @kbd{M-d} before leaving the group.
+with @kbd{M-d} before leaving the group. This is done automatically
+for unread articles in @emph{spam} groups.
@item M s t
@itemx S t
@end table
-Gnus can learn from the spam you get. All you have to do is collect
-your spam in one or more spam groups, and set the variable
-@code{spam-junk-mailgroups} as appropriate. In these groups, all messages
-are considered to be spam by default: they get the @samp{H} mark. You must
-review these messages from time to time and remove the @samp{H} mark for
-every message that is not spam after all. When you leave a spam
-group, all messages that continue with the @samp{H} mark, are passed on to
-the spam-detection engine (bogofilter, ifile, and others). To remove
-the @samp{H} mark, you can use @kbd{M-u} to "unread" the article, or @kbd{d} for
-declaring it read the non-spam way. When you leave a group, all @samp{H}
-marked articles, saved or unsaved, are sent to Bogofilter or ifile
-(depending on @code{spam-use-bogofilter} and @code{spam-use-ifile}), which will study
-them as spam samples.
+Also, when you load @code{spam.el}, you will be able to customize its
+variables. Try @code{customize-group} on the @samp{spam} variable
+group.
+
+The concepts of ham processors and spam processors are very important.
+Ham processors and spam processors for a group can be set with the
+@code{spam-process} group parameter, or the
+@code{gnus-spam-process-newsgroups} variable. Ham processors take
+mail known to be non-spam (@emph{ham}) and process it in some way so
+that later similar mail will also be considered non-spam. Spam
+processors take mail known to be spam and process it so similar spam
+will be detected later.
+
+Gnus learns from the spam you get. You have to collect your spam in
+one or more spam groups, and set or customize the variable
+@code{spam-junk-mailgroups} as appropriate. You can also declare
+groups to contain spam by setting their group parameter
+@code{spam-contents} to @code{gnus-group-spam-classification-spam}, or
+by customizing the corresponding variable
+@code{gnus-spam-newsgroup-contents}. The @code{spam-contents} group
+parameter and the @code{gnus-spam-newsgroup-contents} variable can
+also be used to declare groups as @emph{ham} groups if you set their
+classification to @code{gnus-group-spam-classification-ham}. If
+groups are not classified by means of @code{spam-junk-mailgroups},
+@code{spam-contents}, or @code{gnus-spam-newsgroup-contents}, they are
+considered @emph{unclassified}. All groups are unclassified by
+default.
+
+In spam groups, all messages are considered to be spam by default:
+they get the @samp{H} mark when you enter the group. You must review
+these messages from time to time and remove the @samp{H} mark for
+every message that is not spam after all. To remove the @samp{H}
+mark, you can use @kbd{M-u} to "unread" the article, or @kbd{d} for
+declaring it read the non-spam way. When you leave a group, all
+spam-marked (@samp{H}) articles are sent to a spam processor which
+will study them as spam samples.
Messages may also be deleted in various other ways, and unless
-@code{spam-ham-marks-form} gets overridden below, marks @samp{R} and @samp{r} for
-default read or explicit delete, marks @samp{X} and @samp{K} for automatic or
-explicit kills, as well as mark @samp{Y} for low scores, are all considered
-to be associated with articles which are not spam. This assumption
-might be false, in particular if you use kill files or score files as
-means for detecting genuine spam, you should then adjust
-@code{spam-ham-marks-form}. When you leave a group, all _unsaved_ articles
-bearing any the above marks are sent to Bogofilter or ifile, which
-will study these as not-spam samples. If you explicit kill a lot, you
-might sometimes end up with articles marked @samp{K} which you never saw,
-and which might accidentally contain spam. Best is to make sure that
-real spam is marked with @samp{H}, and nothing else.
-
-All other marks do not contribute to Bogofilter or ifile
-pre-conditioning. In particular, ticked, dormant or souped articles
-are likely to contribute later, when they will get deleted for real,
-so there is no need to use them prematurely. Explicitly expired
-articles do not contribute, command @kbd{E} is a way to get rid of an
-article without Bogofilter or ifile ever seeing it.
-
-@strong{TODO: @code{spam-use-ifile} does not process spam articles on group exit.
-I'm waiting for info from the author of @code{ifile-gnus.el}, because I think
-that functionality should go in @code{ifile-gnus.el} rather than @code{spam.el}.}
+@code{spam-ham-marks} gets overridden below, marks @samp{R} and
+@samp{r} for default read or explicit delete, marks @samp{X} and
+@samp{K} for automatic or explicit kills, as well as mark @samp{Y} for
+low scores, are all considered to be associated with articles which
+are not spam. This assumption might be false, in particular if you
+use kill files or score files as means for detecting genuine spam, you
+should then adjust the @code{spam-ham-marks} variable.
+
+@defvar spam-ham-marks
+You can customize this variable to be the list of marks you want to
+consider ham. By default, the list contains the deleted, read,
+killed, kill-filed, and low-score marks.
+@end defvar
+
+@defvar spam-spam-marks
+You can customize this variable to be the list of marks you want to
+consider spam. By default, the list contains only the spam mark.
+@end defvar
+
+When you leave @emph{any} group, regardless of its
+@code{spam-contents} classification, all spam-marked articles are sent
+to a spam processor, which will study these as spam samples. If you
+explicit kill a lot, you might sometimes end up with articles marked
+@samp{K} which you never saw, and which might accidentally contain
+spam. Best is to make sure that real spam is marked with @samp{H},
+and nothing else.
+
+When you leave a @emph{spam} group, all spam-marked articles are
+marked as expired after processing with the spam processor. This is
+not done for @emph{unclassified} or @emph{ham} groups.
+
+When you leave a @emph{ham} group, all ham-marked articles are sent to
+a ham processor, which will study these as non-spam samples.
+
+@strong{TODO: The @code{ifile} spam processor does not work at this
+time. I'm waiting for info from the author of @code{ifile-gnus.el},
+because I think that functionality should go in @code{ifile-gnus.el}
+rather than @code{spam.el}. You can still use @code{spam-use-ifile}
+to tell @code{spam-split} you want to use ifile for splitting incoming
+mail.}
To use the @code{spam.el} facilities for incoming mail filtering, you
must add the following to your fancy split list
@code{nnimap-split-fancy}, depending on whether you use the nnmail or
nnimap back ends to retrieve your mail.
-The @code{spam-split} function will process incoming mail and send the mail
-considered to be spam into the group name given by the variable
-@code{spam-split-group}. Usually that group name is @samp{spam}.
+The @code{spam-split} function will process incoming mail and send the
+mail considered to be spam into the group name given by the variable
+@code{spam-split-group}. By default that group name is @samp{spam},
+but you can customize it.
The following are the methods you can use to control the behavior of
-@code{spam-split}:
+@code{spam-split} and their corresponding spam and ham processors:
@menu
* Blacklists and Whitelists::
@cindex spam.el
@defvar spam-use-blacklist
-Set this variables to t (the default) if you want to use blacklists.
+Set this variable to t if you want to use blacklists when splitting
+incoming mail. Messages whose senders are in the blacklist will be
+sent to the @code{spam-split-group}. This is an explicit filter,
+meaning that it acts only on mail senders @emph{declared} to be
+spammers.
@end defvar
@defvar spam-use-whitelist
-Set this variables to t if you want to use whitelists.
+Set this variable to t if you want to use whitelists when splitting
+incoming mail. Messages whose senders are not in the whitelist will
+be sent to the @code{spam-split-group}. This is an implicit filter,
+meaning it believes everyone to be a spammer unless told otherwise.
+Use with care.
+@end defvar
+
+@defvar gnus-group-spam-exit-processor-blacklist
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameters or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is
+added to a group's @code{spam-process} parameter, the senders of
+spam-marked articles will be added to the blacklist.
+@end defvar
+
+@defvar gnus-group-ham-exit-processor-whitelist
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameters or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is
+added to a group's @code{spam-process} parameter, the senders of
+ham-marked articles in @emph{ham} groups will be added to the
+whitelist. Note that this ham processor has no effect in @emph{spam}
+or @emph{unclassified} groups.
@end defvar
Blacklists are lists of regular expressions matching addresses you
consider to be spam senders. For instance, to block mail from any
sender at @samp{vmadmin.com}, you can put @samp{vmadmin.com} in your
-blacklist. Since you start out with an empty blacklist, no harm is
-done by having the @code{spam-use-blacklist} variable set, so it is
-set by default. Blacklist entries use the Emacs regular expression
-syntax.
+blacklist. You start out with an empty blacklist. Blacklist entries
+use the Emacs regular expression syntax.
Conversely, whitelists tell Gnus what addresses are considered
legitimate. All non-whitelisted addresses are considered spammers.
This option is probably not useful for most Gnus users unless the
-whitelists is very comprehensive. Also see @ref{BBDB Whitelists}.
-Whitelist entries use the Emacs regular expression syntax.
+whitelists is very comprehensive or permissive. Also see @ref{BBDB
+Whitelists}. Whitelist entries use the Emacs regular expression
+syntax.
-The Blacklist and whitelist location can be customized with the
-@code{spam-directory} variable (@file{~/News/spam} by default). The whitelist
-and blacklist files will be in that directory, named @file{whitelist} and
+The blacklist and whitelist file locations can be customized with the
+@code{spam-directory} variable (@file{~/News/spam} by default), or
+the @code{spam-whitelist} and @code{spam-blacklist} variables
+directly. The whitelist and blacklist files will by default be in the
+@code{spam-directory} directory, named @file{whitelist} and
@file{blacklist} respectively.
@node BBDB Whitelists
@cindex BBDB, spam filtering
@cindex spam.el
-@defvar spam-use-bbdb
+@defvar spam-use-BBDB
Analogous to @code{spam-use-whitelist} (@pxref{Blacklists and
Whitelists}), but uses the BBDB as the source of whitelisted addresses,
without regular expressions. You must have the BBDB loaded for
-@code{spam-use-bbdb} to work properly. Only addresses in the BBDB
+@code{spam-use-BBDB} to work properly. Only addresses in the BBDB
will be allowed through; all others will be classified as spam.
@end defvar
+@defvar gnus-group-ham-exit-processor-BBDB
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameters or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is
+added to a group's @code{spam-process} parameter, the senders of
+ham-marked articles in @emph{ham} groups will be added to the
+BBDB. Note that this ham processor has no effect in @emph{spam}
+or @emph{unclassified} groups.
+@end defvar
+
@node Blackholes
@subsubsection Blackholes
@cindex spam filtering
@end defvar
+@defvar spam-blackhole-servers
+
+The list of servers to consult for blackhole checks.
+
+@end defvar
+
+@defvar spam-use-dig
+
+Use the @code{dig.el} package instead of the @code{dns.el} package.
+The default setting of t is recommended.
+
+@end defvar
+
+Blackhole checks are done only on incoming mail. There is no spam or
+ham processor for blackholes.
+
@node Bogofilter
@subsubsection Bogofilter
@cindex spam filtering
@defvar spam-use-bogofilter
-Set this variable if you want to use Eric Raymond's speedy Bogofilter.
-This has been tested with a locally patched copy of version 0.4. Make
-sure to read the installation comments in @code{spam.el}.
+Set this variable if you want @code{spam-split} to use Eric Raymond's
+speedy Bogofilter. This has been tested with a locally patched copy
+of version 0.4. Make sure to read the installation comments in
+@code{spam.el}.
With a minimum of care for associating the @samp{H} mark for spam
articles only, Bogofilter training all gets fairly automatic. You
score of the current article (between 0.0 and 1.0), together with the
article words which most significantly contribute to the score.
+If the @code{bogofilter} executable is not in your path, Bogofilter
+processing will be turned off.
+
+@end defvar
+
+
+@defvar gnus-group-spam-exit-processor-bogofilter
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameters or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is
+added to a group's @code{spam-process} parameter, spam-marked articles
+will be added to the bogofilter spam database, and ham-marked articles
+will be added to the bogofilter ham database. @strong{Note that the
+Bogofilter spam processor is the only spam processor to also do ham
+processing.}
@end defvar
@node Ifile spam filtering
@defvar spam-use-ifile
-Enable this variable if you want to use Ifile, a statistical analyzer
-similar to Bogofilter. Currently you must have @code{ifile-gnus.el}
-loaded. The integration of Ifile with @code{spam.el} is not finished
-yet, but you can use @code{ifile-gnus.el} on its own if you like.
+Enable this variable if you want @code{spam-split} to use Ifile, a
+statistical analyzer similar to Bogofilter. Currently you must have
+@code{ifile-gnus.el} loaded. The integration of Ifile with
+@code{spam.el} is not finished yet, but you can use
+@code{ifile-gnus.el} on its own if you like.
@end defvar
+Ifile can only be used to filter incoming mail into spam and ham
+through the @code{spam-split} function. It will be better integrated
+with @code{spam.el} with the next release of @code{ifile-gnus.el}.
+
@node Extending spam.el
@subsubsection Extending spam.el
@cindex spam filtering
@cindex spam.el, extending
@cindex extending spam.el
-Say you want to add a new back end called blackbox. Provide the following:
+Say you want to add a new back end called blackbox. For filtering
+incoming mail, provide the following:
@enumerate
-@item
-documentation
@item
code
@code{spam-check-*} functions for examples of what you can do.
@end enumerate
+For processing spam and ham messages, provide the following:
+
+@enumerate
+
+@item
+code
+
+Note you don't have to provide a spam or a ham processor. Only
+provide them if Blackbox supports spam or ham processing.
+
+@example
+(defvar gnus-group-spam-exit-processor-blackbox "blackbox"
+ "The Blackbox summary exit spam processor.
+Only applicable to spam groups.")
+
+(defvar gnus-group-ham-exit-processor-blackbox "blackbox"
+ "The whitelist summary exit ham processor.
+Only applicable to non-spam (unclassified and ham) groups.")
+
+@end example
+
+@item
+functionality
+
+@example
+(defun spam-blackbox-register-spam-routine ()
+ (spam-generic-register-routine
+ ;; the spam function
+ (lambda (article)
+ (let ((from (spam-fetch-field-from-fast article)))
+ (when (stringp from)
+ (blackbox-do-something-with-this-spammer from))))
+ ;; the ham function
+ nil))
+
+(defun spam-blackbox-register-ham-routine ()
+ (spam-generic-register-routine
+ ;; the spam function
+ nil
+ ;; the ham function
+ (lambda (article)
+ (let ((from (spam-fetch-field-from-fast article)))
+ (when (stringp from)
+ (blackbox-do-something-with-this-ham-sender from))))))
+@end example
+
+Write the @code{blackbox-do-something-with-this-ham-sender} and
+@code{blackbox-do-something-with-this-spammer} functions. You can add
+more complex code than fetching the message sender, but keep in mind
+that retrieving the whole message takes significantly longer than the
+sender through @code{spam-fetch-field-from-fast}, because the message
+senders are kept in memory by Gnus.
+
+@end enumerate
+
+
@node Filtering Spam Using Statistics (spam-stat.el)
@subsection Filtering Spam Using Statistics (spam-stat.el)
@cindex Paul Graham