Paul Donnelly wrote:I don't have much experience filtering spam from forums, but I've found it pretty easy to filter spam in my news reader just by checking for keywords (such as handbags, wristwatches, and the like). In this case, "wow gold" seems like it would catch all the spam so far. Would keyword filtering be possible in this case?
There should be some library in PHP that does spam checking, since it is pretty simple. This is Paul Graham's idea.
The idea is not to test "keywords" themselves - someone could make a joke about "Wow! Gold! Look!" - not to mention you own post - but the rest of the post could tell that it is not a spam. The idea is to create a (hash)table of words with some score for each word. The score of a word is calculated more or less like this:
- Code: Select all
(defun score-of-word (word)
(/ (number-of-spams-using word) (number-of-total-posts))
Off course, these numbers would be stored in the table themselves, this is just a scratch. Then you take some statistical average (or normal average) of the scores of each word in a post. Based on that score you can tell if the post is a spam or not. The simplest version of this idea wouldn't take 50 lines and would already be very efficient - at least according to Paul Graham.
I think I saw this in Paul Graham's book "ANSI Common Lisp" or "On Lisp", but I am not sure which. Anyway, for reference:
http://www.paulgraham.com/spam.html
http://www.koders.com/lisp/fid7F8E2D70FFC4A5B0D7EC32C39BAE82FA117B5A89.aspx?s=smtp+server
