panscient.com = bad bot
Posted 2007-05-23 in Spam by Johann.
Another one for the garbage can. It creates bad requests and doesn’t respect robots.txt
although they claim to do so.
38.99.203.110 … "GET / HTTP/1.1" … "panscient.com" (robots.txt not asked for) 38.99.203.110 … "GET /;5B=/ HTTP/1.1" … "panscient.com" (WTF?) 38.99.203.110 … "GET /<prohibited directory> HTTP/1.1" … "panscient.com"
7 comments
#1 2008-01-03 by Jonathan Baxter
Hi Johann,
I just took a look at your robots.txt. I don't think these kind of entries do what you want them to do:
Disallow: /blablog/*month*
since wildcards are not permitted in "Disallow" lines. Eg, from Web Server Administrator's Guide to the Robots Exclusion Protocol:
"Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".
Cheers,
Jonathan Baxter
CEO Panscient Inc
Hello Jonathan,
thanks for coming over.
As for the wildcards, they do work with the bots of Google, Yahoo! and MSN.
#3 2009-01-25 by Jonathan Baxter
Looks like MSN won't work with your robots.txt but Yahoo and Google will. Our crawlers currently obey the standard robots protocol, but we'll look into implementing these extensions.
Hi Johann,
According to http://www.botslist.com/search?name=panscient, the x-rrobotstxt header shows that they do read the robots.txt file. But the x-cracker header shows that they requested a resource that does not (or no longer) exist on my server.
So may be they are crawling from very old cache or something.
#6 2009-11-26 by Jeff
BTW, 38. is U.S. government stuff and U.S. government contractors... Just to give you a heads-up on this.
Subscribe
RSS 2.0, Atom or subscribe by Email.
Top Posts
- DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
- 6 fast jQuery Tips: More basic Snippets
- xslt.js version 3.2 released
- xslt.js version 3.0 released XML XSLT now with jQuery plugin
- Forum Scanners - prevent forum abuse
- Automate JavaScript compression with YUI Compressor and /packer/