Block empty User Agent headers

Posted 2008-04-19 in Spam by Johann.

Blocking requests without a user agent header is a simple step to reduce web abuse. I’ve shown before that this can be a significant number.

On this server, 985 requests without user agent were made in the last four weeks which constitutes 6 % of the 14388 blocked requests. 6 % might not sound much but once I started white listing user agents, the percentage of blocked requests went up from less than 1 % to over 4 %. Unless you are also white listing user agents and block entire netblocks as aggressively as I do, your number can be higher.

lighttpd

Blocking empty user agents is simple in lighttpd. Edit your lighttpd.conf as follows:

Ensure that mod_access is loaded:

server.modules = (
    "mod_accesslog",
    "mod_access",
    … other modules …
)

Add the following line:

$HTTP["useragent"] == "" {
 url.access-deny = ( "" )
}

Reload the lighttpd configuration and you’re done.

Apache

Enable mod_rewrite and add the following to your configuration:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteRule ^.* - [F]

Contributed by Andrew.

If you use a web server other than lighttpd or Apache, please add the configuration to this entry. Thanks.

7 comments

#1 2008-04-21 by Awesome AnDrEw

Google is guilty of User-Agent masking as well when an individual uses the mobile proxy service on one's website. The Google service makes a request from 64.233.160.136 identifying itself as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)", and then no more than a second later uses 72.14.195.56 and a blank User-Agent to request all embedded content such as images. I've read that the mobile Google page is often used by content scrapers, and so I've gone ahead and restricted access from that as well.

#2 2008-04-21 by Awesome AnDrEw

Sorry to post again so quickly, but I just noticed upon posting that I included an unnecessary [OR] flag, because I took it from my own htaccess. That's not required unless you're blocking more than one user-agent (which you probably would be anyway).

#3 2008-04-23 by Awesome AnDrEw

I also didn't notice that last line before, but figured I could help with the Apache configuration. Assuming mod_rewrite is enabled all that's required is the following:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteRule ^.* - [F]

#4 2009-03-09 by Wiz

RewriteCond %{HTTP_REFERER} ^$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]
RewriteRule .* - [F]

#5 2009-03-09 by Johann

Wiz,

thanks for the code. Can you explain what the first line does?

#6 2009-03-09 by Wiz

Johann;
I should have given an explanation when I posted this, but it was late and I was tired.

RewriteCond %{HTTP_REFERER} ^$ [NC]
This line matches a blank REFERER field. It is ANDed with the following rule, which detects a blank USER AGENT:
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]

When BOTH conditions are met I send them a 403 status. What isn't shown in my sample rule is the necessity to whitelist certain IP ranges used by known good bots, which are cloaked. Without going into the semantics of cloaked visits, here is how I allow cloaked MSN bots through this expanded ruleset:

# Allow MSN cloaked bots
RewriteCond %{REMOTE_ADDR} !^207\.46\.|131\.107\.
# Allow requests for the document HEAD only
RewriteCond %{REQUEST_METHOD} !^HEAD$
# Allow requests for your FavIcon from AOL and other cloaked UA and REFERER
RewriteCond %{REQUEST_URI} !/favicon\.ico$

The remaining rules are already explained and define the blocking conditions.

I hope this helps add clarity to my contribution.

#7 2009-03-09 by York

Hello Johann,

I think the unnessesary [OR] flag in the .htaccess recipe might even prevent it to work its magic on some servers, if this is the only RewriteCond statement.

If you have a list of bots that should be blocked, each line has an [OR] flag, except the last line, like this:

RewriteCond %{HTTP_USER_AGENT} ^ScumBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^AnotherIntruder [OR]
RewriteCond %{HTTP_USER_AGENT} ^$

Subscribe

RSS 2.0, Atom or subscribe by Email.

Top Posts

  1. DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
  2. 6 fast jQuery Tips: More basic Snippets
  3. xslt.js version 3.2 released
  4. xslt.js version 3.0 released XML XSLT now with jQuery plugin
  5. Forum Scanners - prevent forum abuse
  6. Automate JavaScript compression with YUI Compressor and /packer/

Navigation