Bayesian (or Other Statistical) Filtering

234
Votes

Bayesian (or Other Statistical) Filtering

Last activity: 1 year ago
Add a new statistical filter test to ORF. The most widely known and used statistical technique is Bayesian filtering, a "learning" email classification method. When properly trained and maintained, statistical filters work very effectively. However, they require a reliable feed of actual legitimate and spam messages for initial training, so end-users are likely required to participate in the training.
2

Comments

Actually training is often easier than that (no end-user involvement): Just temporarily turn the other tests into On Arrival (instead of Before Arrival) and watch the outgoing email. Any email sent is obviously legitimate email, as is email from people on the whitelist or auto-whitelist. Spam that is blocked by the other filters (like DNS, HELO, URIBL, etc) is obviously spam and can be considered Spam. After collecting sufficient amounts of both the existing tests can be placed back to Before Arrival and any messages getting past them can be compared to the pool of Legitimate or Spam messages to see which it most likely is. Major benefits: if messages are continued to be collected the data is constantly updated as email and spam trends change. And the Bayesian database is customised for each installation based on the type of messages they send and the type of spam they receive.
by PSaul more than 10 years ago
YES! bayesian would be awesome! Goodbye mailessentials hello orfilter with bayesian!!
by kelly more than 10 years ago
It would be nice!
by Nick more than 10 years ago
Bayes filters should be built on end-users Exchange black-list folders. So Bayes training will be simple and user-involved.
by Sergey more than 10 years ago
I have moved many clients now from GFI to ORF, and can say for smaller servers at least, I would not use this feature. I find using mainly DNS and URI well cathes a very high % of the spam and pushing too much more makes it more like than I can accept that real emails get caught. Not sure if it might be useful for larger servers with high spam loads.
by thomasrw more than 10 years ago
I'd love to see this option as well because it's probably the only way to stop those manually generated scams from Hotmail etc.
by Nuf more than 10 years ago
I hope this option will be done in next release
by Igor Sinyov more than 10 years ago
@Igor Sinyov: sorry, the feature set for ORF 5 is already closed, and Bayesian will not be implemented in the next version.
by Krisztian Fekete (Vamsoft) more than 10 years ago
Bayes filtering is quite good, but suffers from too much "approximation", on the other hand, using the "Hidden Markov Model" you will obtain the same results but with much more accuracy; on the other hand, the HMM is somewhat "heavier" on resources than Bayes yet, I think that the tradeoff is worth.
by ObiWan more than 10 years ago
Adding a Bayesian filter in ORF would be a huge deal, I hope the developers will consider it. That and the "SpamRazr" engine are the only things left I use on GFI's MailEssentials.
by Bill more than 10 years ago
Please add this feature
by gerald.jackson 7 years ago
With the latest fresh viruses, adding a bayes filter would be very useful. We have been using ORF for several years, but this functionality is very lacking.
by DimAll 5 years ago

My Comment

Please sign in or sign up to comment.
hnp1 | hnp2