Scirra cog

About Us

We're a London based startup that develops Construct 2, software that lets you make your own computer games!

Archives

Browse all our blog posts

Latest Blog Entries

We love brains!

Join us! Joiiinnn ussss! Mooooree brains!

Reducing Website Spam

by Tom | 1st, December 2011

User WGfunStorm recently asked on our forum how we keep the forum spam free without using a CAPTCHA. I've taken a bit of time with this website to try and make it as spam free as possible and at the moment it seems to be working pretty well. Here's a few of the techniques we use.

Rename default pages

One of the simplest and most effective techniques to reduce spam signups is by renaming your registration and login pages.

We use Web Wiz Forums for our website at the moment. It's a relatively well known piece ASP forum software. The default registration and login pages are register.asp and login_user.asp. If you are someone who writes spam software the easiest way to write your software is probably by finding out all the default registration and login pages for popular forum software.

By moving these pages to different locations with different names it seems to have a drastic effect on the number of spam registrations we receive. Our new pages are now scirra.com/register and scirra.com/login.

Although this seems rather easy to circumnavigate with a bit of thought from the spam software they generally don't. This is probably for a number of reasons:

  • Not many sites do this, so it's not worth the time
  • Sites that take these sorts of measures probably clean up spam quickly so they aren't very good targets
  • There isn't a short supply of easy targets out there

We used to be on PHPBB3, and spam software seems to know this. We get hundreds of 404's to old PHPBB registration pages, as well as lots of 404's to the old WebWiz forum registration and login pages. It's important to note this is of no fault of the forum software writers, even if they engineered a way of tackling this the spam software would adapt to this. Some of our top 404's to various pages over roughly a 2 month period are:

404 URL Count
/forum/signup493
/forum/ucp.php?mode=register263
/register/forum165
/forum/ucp.php?mode=login141
/phpbb3/ucp.php?mode=register&sid=c69...116
/phpbb3/ucp.php?mode=register&sid=725...113
/phpbb3/ucp.php?mode=register&sid=a87b8...112
/forum/viewtopic.php/profile.php?mode=register107
/phpbb3/ucp.php?mode=register&coppa=098
/phpbb3/ucp.php?mode=register&sid=93...88
And lots more...

As you can see there are also some other URL's in the table which don't exist and never have. This is just spam software probing or guessing common URLs.

When I first started noticing these 404's I made the mistake of creating 301 redirects to the actual new page URLs as a good webmaster should! However for these particular pages it was naive and had bad consequences. It was basically akin to flipping a switch on for spammers. When the old URLs redirected to the new ones we got dozens of spammers daily wasting a lot of our time! It's a good idea to check that no sites are linking to your registration and login pages as referers from legitimate sources but this is rare and those can be dealt with individually using HTTP Referer if it's an important source. The best and easiest way though is just to let them all 404 and contact individual sites to update there links if any do exist.

Renaming your common entry point URLS to something different seems to block a lot of spam users. I don't think naming it to something obscure would offer any other benefits, just different is the most important thing.

Setting up Honeypots

Putting honeypots on our entry scripts seems to be an effective measure as well. Some bots that do find these pages (either because they are more intelligently written or because they function differently). A honeypot is a juicy looking target which is actually a trap, like the common jam-in-a-bottle wasp trap.

If you look at our registration page you will see it's pretty trim as this is designed to make registering as accessible as possible.

If you view the source you will see that there are some hidden fields:

<input type="hidden" name="Username" value=""/>

The field called Username is actually a hidden unused field that users can't type anything into. The actual username field has an obscure name.

As spam software will probably be scouring the HTML for fields it will sometimes come across the username field and automatically fill it in. This means when the form is submitted with this username field containing any value we can reject the registration. Actual users wont be filling this input box with any values!

This method is also very effective, blocking a few registrations a day. There are other ways of doing honeypots - they all rely on the fact that the spam bot isn't smart enough to realise the actual username field is a different one, or that the field is hidden. To work those things out is actually pretty darned difficult so I don't imagine there are many spam bots out there that do this.

Following up Spammers

On occasion I have visited some URLs spammers have posted before deleting them. I contact the site support/owners and ask them why they are spamming our website.

Most of the time they are oblivious to it. Some of the time they feign ignorance. The ones who are oblivious to it after a bit more questioning appear to have hired 'SEO Experts' to help improve their website rankings. These 'experts' then start up their various pieces of spam software and sit back often charging the site owners a lot of money for that service.

The SEO industry is full of spammers and ignorance. There are GOOD and HONEST SEO people out there, but they are rare and to find them you need to know what you are looking for in the first place which is a skill in itself. When buying SEO, always understand exactly what you are buying. If you're hiring in the dark, you're probably helping to support the spam industry.

Other times the site owners just tell me to get over it and remove the links if it bothers me. This is frustrating for webmasters as it's really not empathising with how much time folk like us have to spend daily cleaning up other people's spam! It can be laborious and frustrating. It's also an important job, a clean forum and website leaves a good impact on new visitors.

Awesome Moderators and Users

A small amount of manual work is still necessary to clean up the small amount of spam that slips through. Also, some spam is posted by actual hired people rather than automatic software, and it's never going to be easy to automatically prevent that kind of "manual" spam. Fortunately, since it's expensive to do this, the volume of manual spam is small. We're also lucky on this site to have an excellent group of moderators and users! The moderators spend time helping us deal with any spam that does get through and for that we are very grateful!

The same applies for our users who report spam when it gets through - a big thank you as well! All of this allows us to promptly clean up whenever something gets through.

The Problem with CAPTCHAS

CAPTCHAS are those boxes on websites that verify you're a human being by asking you to type in some words you see, or answering a question to verify your probably a human.


Uhhhhhh.....

The above image is of course an exception usually but it illustrates the point well. Sometimes they go wrong, and the assumption your users can actually complete them reliably can be costly. They take time to fill out and can be annoying. All these factors will lose you signups.

Not only this, but some websites I've visited and attempted to register on get you stuck in an endless washing machine of re-entering information. You squint and carefully enter the CAPTCHA. It's wrong! You re-enter it correctly and resubmit. You need to enter your password again! You enter your password again. Please re enter the CAPTCHA code! No! I can't be bothered anymore! Using a CAPTCHA on your website has to be executed very carefully as common implementations like this will lose you a lot of registrations.

General accessibility is another important point in regards to CAPTCHAS that has been debated heavily. It's really best not to have them if possible. Also, some of them are so reliably solved by software that they provide no protection at all! This sometimes makes them a good way to frustrate all your users for no advantage at all.

Final Words

With the simple honeypot and renamed entry point pages we get one or two spammers a day now. This is easily manageable with manual anti-spam and worth the efforts of prevention. These days it's also a much more effective prevention than a CAPTCHA.

A lot of spam prevention on a website is staying ahead of the pack. Most people can't be bothered/don't know how to implement some spam prevention techniques. This means for a site that does, the spammers will generally move on to the easier and juicier targets. Cleaning up any spam that makes it through promptly is one way of staying ahead of the pack.

Some spammers are paid humans, or even backed by CAPTCHA human farms in poorer countries. For this kind of spam there really is not way to block it easily. The only thing we can do is discourage the behaviour by making it not cost effective. The way to do that is by cleaning the spam as soon as it appears!

Now follow us and share this

Tags:

Comments

3
Jayjay 16.8k rep

Great article, I was wondering how we could have less spam with more visitors... =P

Thursday, December 01, 2011 at 4:03:45 PM
5
Bigheti 15.7k rep

Interesting. This demonstrates the seriousness and expertise that the Scirra team! Congratulations and thank you!

Thursday, December 01, 2011 at 4:12:44 PM
5
drpool 5,041 rep

Interesting article, hope only non-spammers read it, you might have offered to much information here...

Thursday, December 01, 2011 at 4:26:12 PM
3
SullyTheStrange 6,112 rep

Very clever ideas in there! I guess you have to know exactly how the spammers operate if you want to defeat them... Where you a spambot writer in a previous life, Tom? :P

Thursday, December 01, 2011 at 5:55:29 PM
2
Tom 48.6k rep

@drpool I had considered it!

@Sully lol, no thankfully!

Thursday, December 01, 2011 at 5:56:45 PM
2
Sharpshooter 3,514 rep

I love the "Know thine enemy" approach.

I think the best part of this post is the fact that this kind of information goes past learning a game engine. If more websites adopted ideas like this, it could do plenty to curtail spam and provide a better community.

Thursday, December 01, 2011 at 6:42:19 PM
1
Velojet 20.7k rep

Thanks very much for sharing those ideas, Tom. I'll be using some on clients' sites now! Fully agree with your views on the SEO industry and on CAPTCHAs.

Thursday, December 01, 2011 at 7:41:55 PM
1
Velojet 20.7k rep

Oh yes, and LOL at your CAPTCHA image (but not too far away from some I've seen).

Thursday, December 01, 2011 at 7:43:56 PM
1
tavitooo 5,063 rep

Hi, I am a Spammer! I am not human, and I am concerned because your system is very powerfull..

(Hilarious) jeje, great article! Scirra team are the best!!!.

Love scirra!!

Saludos

Thursday, December 01, 2011 at 7:49:58 PM
3
Wastrel 11.6k rep

Very interesting article [INSERT NAME HERE]! I totally agree with your viewpoint [INSERT NAME HERE]!

I think a new set of golf clubs would help out with your situation...

:D

Thursday, December 01, 2011 at 8:28:22 PM
6
freeg131 2,947 rep

A great post Tom. Shows how you can help to prevent spam without inconveniencing real members and that by taking the proactive approach the time investment is considerably less than reacting to spam as it appears.

Also agree with your comments about SEO (I work in the industry). Since you can enter the marketplace with virtually no costs and there is a big outsourcing culture spammy links appear an awful lot and most webmasters will experience them. SEO end clients are often so disconnected from the actual link building process that they don't know what is actually being done

CAPTCHAs are pretty useless nowadays when there are numerous services available that will solve them for you which can be easily integrated into forum spamming, or any other, software. They only serve to annoy legitimate users. I have come across CAPTCHAs containing equations, foreign characters and other general gubbins that would be impossible to type into the validation box.

As an aside, a nice thread I point people to whenever the topic of forum spam comes around - here's an example of why you shouldn't do it: http://www.phonelosers.com/index.php?topic=6764.30

Thursday, December 01, 2011 at 9:42:49 PM
2
Noga 6,645 rep

Thanks for the tips Tom.
@freeg131 - I've read the whole topic, pretty hilarious:)

Thursday, December 01, 2011 at 11:27:49 PM
1
TiAm 9,152 rep

CAPTCHAs suck.

Enough said.

Friday, December 02, 2011 at 6:29:02 PM
1
plauk 6,483 rep

I'd already heard that SEO was mostly bunk, so it doesn't surprise me that it is a major source of spam. Ironically, it probably stems from Google's own anti-spam measures. In an effort to prevent people from trying to rig their results, they have to keep their magic formula a secret and change it on a regular basis. This inevitably creates an industry that promotes psuedoscientific 'cures' for something that probably shouldn't be cured.

Saturday, December 03, 2011 at 10:19:47 PM
1
cjr1974 5,109 rep

awsome blog here!! I know as well as any webmaster that these spammers can destroy your site quickly...my site became ravaged by these spammers so i had to rethink my steps and find a strategy for dealing with them...i found the idea to use the hidden fields and it cleaned my site immediatly thats awsome stuff!!

Wednesday, December 07, 2011 at 4:14:45 PM

Leave a comment

Everyone is welcome to leave their thoughts! Register a new account or login.