Mozilla/4.0 (compatible;)

Posted 2007-05-09 in Spam by Johann.

When I first published this entry in May 2007, I thought this was just another web scraper.

… "GET / HTTP/1.1" 200 7518 "-" "Mozilla/4.0 (compatible;)" "-"
… "GET /help/copyright.html HTTP/1.1" 200 4127 "-" "Mozilla/4.0 (compatible;)" "-"
… "GET /help/sitemap.html HTTP/1.1" 200 4902 "-" "Mozilla/4.0 (compatible;)" "-"
… "GET /favicon.ico HTTP/1.1" 200 11502 "-" "Mozilla/4.0 (compatible;)" "-"
… "GET /misc/common.css HTTP/1.1" 200 894 "-" "Mozilla/4.0 (compatible;)" "-"

Blue Coat proxies

With a little header analysis, I now know that these requests are caused by Blue Coat’s proxy products. These proxies seem to employ a pre-fetching strategy, meaning they analyze pages as they download them and follow links so that future requests can be served from the proxy cache.

Who uses their proxies? I think Hewlett-Packard do, I know Citigroup and Nokia do. In fact I think a lot of companies have their proxies installed judging from the entries in my header log file.

Blue Coat’s stealth crawling

I could live with the fact that their software makes a ton of highly speculative requests but Blue Coat also have been stealth scanning my web site (most likely for malware) – just like Symantec.

Subscribe

RSS 2.0, Atom or subscribe by Email.

Top Posts

  1. DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
  2. 6 fast jQuery Tips: More basic Snippets
  3. xslt.js version 3.2 released
  4. xslt.js version 3.0 released XML XSLT now with jQuery plugin
  5. Forum Scanners - prevent forum abuse
  6. Automate JavaScript compression with YUI Compressor and /packer/

Navigation