Spammers

Post by **gugamilare** » Thu Mar 12, 2009 3:53 pm

Paul Donnelly wrote:I don't have much experience filtering spam from forums, but I've found it pretty easy to filter spam in my news reader just by checking for keywords (such as handbags, wristwatches, and the like). In this case, "wow gold" seems like it would catch all the spam so far. Would keyword filtering be possible in this case?

There should be some library in PHP that does spam checking, since it is pretty simple. This is Paul Graham's idea.
The idea is not to test "keywords" themselves - someone could make a joke about "Wow! Gold! Look!" - not to mention you own post - but the rest of the post could tell that it is not a spam. The idea is to create a (hash)table of words with some score for each word. The score of a word is calculated more or less like this:

Code: Select all

(defun score-of-word (word)
  (/ (number-of-spams-using word) (number-of-total-posts))

Off course, these numbers would be stored in the table themselves, this is just a scratch. Then you take some statistical average (or normal average) of the scores of each word in a post. Based on that score you can tell if the post is a spam or not. The simplest version of this idea wouldn't take 50 lines and would already be very efficient - at least according to Paul Graham.

I think I saw this in Paul Graham's book "ANSI Common Lisp" or "On Lisp", but I am not sure which. Anyway, for reference:

http://www.paulgraham.com/spam.html
http://www.koders.com/lisp/fid7F8E2D70F ... mtp+server

Post by **findinglisp** » Thu Mar 12, 2009 4:14 pm

Let me look at some of the options. Generally, I try to login and look at the forum every 24 or so hours during the week, but often don't get to it over the weekend. I delete all spam and spammer accounts when I find it.

In looking at things behind the scenes, it's clear that there are a lot of bots registering users on the board. They don't generally get through the whole process, however. Right now, you need both a captcha and a valid email address to activate the account. Lots of accounts are getting created, which suggests that captcha isn't very effective and the bots are able to get past it. The requirement of a valid email address is what stops most of them.

I'll try to determine if there are any additional phpBB mods that will increase the hurdle further. If anybody has any suggestions, whether specifically phpBB-related or not, please feel free to put them on the table.

Post by **findinglisp** » Thu Mar 12, 2009 4:45 pm

Okay, I just enabled another standard option that should reduce the amount of spam you guys see. Specifically, there is an option where if a user has less than N posts, any new post is held for moderation. After N posts, things are posted immediately, without moderation. I set N to 2. That should prevent you from having to see most of the spam. I'll continue to research some of the other options and mods to see if there are better overall solutions to actually eliminate whether the bots can register. After some quick Googling, it looks like the next version of phpBB includes a better (more difficult) captcha module that should at least slow things down until the spammers figure out how to break it. There are some other things that I'll try, too.

Post by **findinglisp** » Thu Mar 12, 2009 5:13 pm

Okay, I just added a custom field to the registration profile. Basically, this is a little hack that makes the fields required to be filled out upon registration differ from the standard phpBB registration screen, which should help trip up bots. This means a spammer has to:

Answer the new non-standard antispambot question
Defeat the CAPTCHA
Register with a valid email and use the validation link
Generate at least a valid number of initial postings before automatic moderation turns off

I'm hopeful that these last couple of antispam features will keep most of the spamming out.

Paul Donnelly · Post by **Paul Donnelly** » Thu Mar 12, 2009 6:43 pm

gugamilare wrote:not to mention you own post

Yes, that crossed my mind.

Post by **findinglisp** » Fri Mar 13, 2009 9:03 am

findinglisp wrote:
Answer the new non-standard antispambot question

Generate at least a valid number of initial postings before automatic moderation turns off

These two things are already paying dividends. There were no spambot registrations in the system this morning when I came in. Typically, I get 20+ per day, so having zero overnight was a good sign that the new spambot question is helping. Second, there was spam waiting for moderator approval from a spambot that registered a few days ago, but you guys didn't have to see it. Nice! This might be the trick, at least for a little while.

Wodin · Post by **Wodin** » Fri Mar 13, 2009 3:40 pm

Excellent :) Good job.

Thanks.

Exolon · Post by **Exolon** » Fri Mar 13, 2009 5:12 pm

My only concern with this is that the automatic moderation of newbie posts could lead to poor findinglisp being overwhelmed with a backlog of awaiting-moderation posts by both legitimate newbies and spammers, but it sounds like the traffic isn't high enough for that yet.

Post by **findinglisp** » Wed Mar 18, 2009 11:19 pm

Yup, these changes have put a good dent on the spam. The bots can't easily register anymore (like zero bots registered after I changed that), and the initial moderator speedbump has kept a few bots that registered before the changes from getting too far before I catch them. In a few weeks, I figure I will have flushed out the bots and we'll be pretty-well limited to real people here. Ahhhhh... I love it when a plan comes together.

As for the 1-post hurdle, I think it's reasonable and shouldn't hold up anybody that much. Again, I'll try to look every 24-hours. Eventually, it makes sense to choose a couple of additional moderators. If somebody wants to put their hat in the ring for that, I'm open to it. Let me know and I'll consider a promotion for you.

LispForum

Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers

Re: Spammers