	Bayespam is a spam filter for qmail inspired by Paul Graham's "A Plan for
Spam" <http://www.paulgraham.com/spam.html>.  It uses Bayesian classification
to determine if a particular piece of email is spam or not.  I'd get into the
theory more, but I recommend you read Mr. Graham's paper -- it explains it
very well.

NOTE:
	Bayespam 0.9.2 is NOT backwards compatible with Bayespam 0.9!  You
should delete the old Perl scripts, and especially be sure to delete
any ratings files you have made -- Bayespam 0.9.2 cannot understand
them!

NOTE:
	I just got through writing, testing, and deploying this filter on my own
email server.  I don't guarantee it's 100% foolproof yet, so use it with
caution.  Please report any problems, improvements, or reports on how well it
works to me!

Building the Corpus Rating File
===============================
	The first step in setting up Bayespam is building what I call the corpus
rating file, a file that describes the "spam probability" of any particular
token.  To build this, you'll need a directory of spam emails and a directory
of non-spam emails.  (You should use your own emails...each corpus is
individually "tuned" to each user's email).  Personally, I did this using the
contents of the appropriate qmail Maildir directories.

	Once you have these directories of files, say named "spam_mail" and
"not_spam_mail", execute this command:
bayes_process_email.pl --good not_spam_mail --spam spam_mail -o bayes_rating.db
	This will build the file bayes_rating.db, which is used by the
bayes_spam_check.pl script.  You can build an individualized corpus file for
each user, keeping each one in that user's directory, or make a single global
corpus -- it's up to you.  Check the INSTALL file for how to use the corpus.

Hints, Tips, Ideas, and Suggestions from Users
==============================================
	Some hints and ideas on getting the most out of Bayespam, sent in by
several Bayespam users.  I haven't necessarily tested any of these, so caveat
user.  Please use caution when trying these out!

	Nate Underwood suggests using mbox2maildir to convert mbox files to
maildir format. [Get it at http://www.qmail.org/mbox2maildir .  I've found
another program that does that, though I've not used either of these:
http://www.firstpr.com.au/web-mail/mb2md/ -g.]
	Nate also suggests being sure to use a comprehensive (ie: more than a
couple) email corpus to train Bayespam.

	Brian Asker has some ideas for getting Bayespam to work with
sendmail/procmail/mbox at http://www.asker.net/software/bayespam/ .  I can't
say how applicable these changes are to Bayespam 0.9.2, but hopefully they
are. :)

	Grahame Bowland has this code to make Bayespam work with procmail:
Procmailrc file:
----------------
:0fw
| /usr/local/bin/bayes_spam_check.pl /home/<USER>/corpus.dat
<your email address>
:0:
* ^X-Spam-Status: Yes
spam
-----------------------
Change the end of bayes_spam_check.pl to:
-----------------------
my $probability_of_spam = $prod / ( $prod + $minus_one_prod );

# append diagnostic information to the headers
my $in_header = 1;
foreach $line (@email_message) {
  if ($in_header && $line =~ /^$/) {
    if ($probability_of_spam > $spam_minimum) {
      print "X-Spam-Status: Yes\n";
    } else {
      print "X-Spam-Status: No\n";
    }
    printf("X-Spam-Probability: %.3f\n", $probability_of_spam);
    $in_header = 0;
  }
  print $line;
}

# We don't want to freak procmail
exit 0;
-----------------------

	Several users report that using very small corpora, using someone else's
spam, or using a very large spam corpora can give bad results.  I think that
we'll find the best results are obtained with your own spam and email, and
when both corpora are approximately the same size.

Comments, suggestions?
======================
	Please be sure to contact me if you have any problems o suggestions, or
if you just want to let me know how well the filter works for you.

Gary Arnold <garnold@garyarnold.com>

-g.