panscient.com = bad bot

Posted 2007-05-23 in Spam by Johann.

Another one for the garbage can. It creates bad requests and doesn’t respect robots.txt although they claim to do so.

38.99.203.110 … "GET / HTTP/1.1" … "panscient.com"
(robots.txt not asked for)
38.99.203.110 … "GET /;5B=/ HTTP/1.1" … "panscient.com" (WTF?)
38.99.203.110 … "GET /<prohibited directory> HTTP/1.1" … "panscient.com"

7 comments

#1 2008-01-03 by Jonathan Baxter

Hi Johann,

I just took a look at your robots.txt. I don't think these kind of entries do what you want them to do:

Disallow: /blablog/*month*

since wildcards are not permitted in "Disallow" lines. Eg, from Web Server Administrator's Guide to the Robots Exclusion Protocol:

"Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".

Cheers,

Jonathan Baxter
CEO Panscient Inc

#2 2009-01-25 by Johann

Hello Jonathan,

thanks for coming over.

As for the wildcards, they do work with the bots of Google, Yahoo! and MSN.

#3 2009-01-25 by Jonathan Baxter

Looks like MSN won't work with your robots.txt but Yahoo and Google will. Our crawlers currently obey the standard robots protocol, but we'll look into implementing these extensions.

#4 2009-01-25 by Johann

You surely mean your crawlers obey robots.txt now, right?

#5 2009-01-25 by Mike

Hi Johann,

According to http://www.botslist.com/search?name=panscient, the x-rrobotstxt header shows that they do read the robots.txt file. But the x-cracker header shows that they requested a resource that does not (or no longer) exist on my server.

So may be they are crawling from very old cache or something.

#6 2009-11-26 by Jeff

BTW, 38. is U.S. government stuff and U.S. government contractors... Just to give you a heads-up on this.

#7 2009-11-26 by Johann

Jeff,

I'm pretty sure not all of 38.0.0.0/8 is US-government related. I've seen some dial-up blocks in there, too, if I'm not totally wrong.

Subscribe

RSS 2.0, Atom or subscribe by Email.

Top Posts

  1. DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
  2. 6 fast jQuery Tips: More basic Snippets
  3. xslt.js version 3.2 released
  4. xslt.js version 3.0 released XML XSLT now with jQuery plugin
  5. Forum Scanners - prevent forum abuse
  6. Automate JavaScript compression with YUI Compressor and /packer/

Navigation