How to delete spam users from your site

June 22nd, 2010 3:41pm in delete spam, delete users, drupal, spam

So your users table has too many spam users because you forgot to configure Captcha on your user registration page? Yeah, one of our new clients had the same problem. There’s a ton of info online about preventing spam users from signing up, but not a whole lot on what to do AFTER they’re already in your production database.

The short of it: there’s no magical way to delete just the spam users, you have to spend some time working on it.

SQL is your friend

Get yourself Sequel Pro connect to a local copy of the database and fire up the query window.

Your Mileage May Vary

I performed a large number of test queries to come up with these, and in retrospect, they’re fairly basic but only obvious after some trial and error. Also, they will certainly work on a Drupal 6 database but YMMV on others. In principle, a users table in any other database should have the same or similar fields that you can use to dissect the data – so please share if you come up with variations.

1. Select users based on accounts created in quick succession and never accessed

Looking through the table I noticed that most spammy user accounts were created very close to each other in time. I suppose this is because there’s an automated script doing the signing up so it makes sens that they would be sequential. The following query will select all the users in your users table that were created within 5 seconds of each other and that were never used to log in.

select u.* from users u where u.access=0 and (((select created from users where uid=u.uid+1)-u.created)<5000)  

Turns out that 95% of spam users in the table (13,171 out of 13,879) were identified using just this one query! Here’s what a typical result looked like:

 

2. Select users based on repeated fields

It seems that spammers like to repeat junk field information. So I looked at the profile fields first and last names and noticed that the same value was being used for both first and last name fields. That doesn’t represent a real person, in most cases. So I ran the following query:

select (select value from profile_values where fid=1 and uid=v.uid) as first, (select value from profile_values where fid=2 and uid=v.uid) as last, (select value from profile_values where fid=4 and uid=v.uid) as addr, v.* from profile_values v  group by uid having first=last

and ended up with a nice group of users that were clearly spammers. And quite an international group too!

This eliminated another 3% of accounts that we didn’t have to manually go through. 

3. Select users with email address variations

Once you’ve deleted users using the first two queries, you’ll start to notice additional patterns emerging. For example, simply sorting by the email address field in the users table yields variations of the same email address. Why you would do a variation is beyond me, but this is a signature nonetheless. Query is trivial here, simply:

select * from users order by mail

And doing so yields some interesting variations that we can easily select and delete:

4. Select users that have the word “buy” in their profiles

Yes, it’s obvious. Have a look at the profile fields that are setup for your users and select any profiles that have the word “buy” in it. Or, “viagra”. Or, “v14gra”. But the word “buy” usually gets the remainder. Sample query:

select  p.*, u.* from users u, profile_values p where u.uid=p.uid and p.value like '%buy%' order by u.uid

And some sample results:

Use with caution

Word of warning: these are four simple queries that will allow you to select many users at a time in some cases. Always be really careful when doing bulk deletes: there’s no guarantee of zero false positives. So check your results. And play with a local non-production database first before hitting the real thing.

Flavor

Ten Seven, Interactive is a web studio in Minneapolis dedicated to helping you realize your design intentions accurately in XHTML.

Call 612 868 7884.
Send an email.
Follow on Twitter.

We use Basecamp, Highrise, Freshbooks and pair Networks to get things done!