Google has a realtime feed of Twitter

Posted 2009-07-04 in WWW by Johann.

I’ve just discovered that Google has a realtime feed of Twitter.

I twittered this:

Testing something highly interesting http://invx.com/a

…and seconds later, Googlebot showed up to grab the URL at invx.com:

66.249.65.83 invx.com [04/Jul/2009:21:49:39 +0200] "GET /a HTTP/1.1" 404 136 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

You don’t need to check, the IP address does belong to Google. I twittered another URL before and that too was crawled, so it’s not that Google crawls the public timeline.

Some of my thoughts:

  • One more sign that “nofollow” doesn’t mean anything.
  • I wonder if Google pays Twitter for the feed.
  • Maybe twittering URLs can even get you indexed?

5 comments

#1 2009-07-08 by Tim Mahoney

Very cool. Considering that the URLs people tweet are usually bit.ly or tr.im urls, I wonder if Google follows those URLs through to their end result...

#2 2009-07-08 by Johann

Tim,

I tested it and the only link that was crawled by Google was the one shortened through ow.ly so they probably have tinyurl, bit.ly, cli.gs and is.gd on their blacklist.

#3 2009-07-08 by Johann

Correction: 5 minutes later, the cli.gs link was also crawled.

#4 2009-07-09 by York

Johann,

Google crawls everything, even banned sites. Sometimes tinyurl links have been spotted in the old version of the Google Webmastertools, so Google must know the target of those links. Currently, the tools seem to list only links that do not go through a redirect. tinyurl does not have a robots.txt file to disallow crawling. I don't know about the other URL shortening services.

The nofollow attribute has been introduced not to prevent crawling, the name is misleading. I think nofollow links will result in a crawl when disovered by Googlebot, just like ordinary links.

#5 2009-07-10 by Johann

York,

of course nofollow does not necessarily prevent discovery, but I found it strange that Google has apparently blacklisted some URL shortening services.

Subscribe

RSS 2.0, Atom or subscribe by Email.

Top Posts

  1. DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
  2. 6 fast jQuery Tips: More basic Snippets
  3. xslt.js version 3.2 released
  4. xslt.js version 3.0 released XML XSLT now with jQuery plugin
  5. Forum Scanners - prevent forum abuse
  6. Automate JavaScript compression with YUI Compressor and /packer/

Navigation