Yesterday a friend of mine sent me a section of her traffic logs that were showing some odd information. According to what was recorded there her brand new, as of yet unlinked-to website was ranking on the first page of Google for the single keyword, [free]. If she actually had managed to rank for that phrase it would be an amazing feat to say the least. The competition for that single word is enormous. Unsurprisingly, when performing that actual search her site is nowhere to be found. The site in question is barely one week old, and hasn’t even been launched yet.
What is surprising, to me anyways, is that it appears that the traffic is actually coming from a bot at Google… a bot that is cloaked, sending fake referrers, and behaving in exactly the same manner as MSN’s referrer spamming bot that first showed up a little over 2 years ago. I blogged about it back then, as did many others. Eventually, after much feedback from the community, they did halt the referrer spam practice. It was a bad idea for them to do it in the first place, and quite a few webmasters were perturbed about it. Two years was too long for it to go on, but at least they did finally stop doing it.
Now it looks like Google, for some unfathomable reason, has decided to start doing the exact same thing. The entries in my friend’s traffic logs looked like this:
18.104.22.168 – – [14/Feb/2010:16:34:03 -0600] “GET / HTTP/1.1” 200 19361 “http://www.google.com/search?hl=en&q=free&btnG=Google+Search&aq=f&oq=” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
22.214.171.124 – – [14/Feb/2010:16:36:28 -0600] “GET / HTTP/1.1” 200 19361 “http://www.google.com/search?hl=en&q=free&btnG=Google+Search&aq=f&oq=” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
The IP’s in question definitely belong to Google (as can be seen here 126.96.36.199, and here 188.8.131.52). However, unlike normal Googlebot IP’s, these are not associated with the Google domain name via dns. For instance, if you do a host lookup on 184.108.40.206 you will see that it resolves to the hostname crawl-66-249-71-233.googlebot.com. The IP’s that the referrer spam is coming from do not resolve to any hostname. Presumably, going on the logic that MSN gave when they were first called out for doing this, the reason for not having a reverse dns associated with the IP’s is to hide the fact that they actually are from Google. Similarly the user-agent of these bots is being cloaked as well. Instead actually identifying as Googlebot, “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”, these bots are pretending to be an actual user using IE6 on Windows.
Unlike actual web surfers, Google, you have no expectation of privacy. When you are a bot, skulking around trying to disguise yourself as someone else is poor netiquette to say the least. I am not sure exactly what prompted you to start doing this, but you really should just stop.
Barry Schwartz of Search Engine Land contacted Google about this, and they replied back that this is indeed them performing cloaked spidering. However, according to them it is not being done for spam detection purposes, and the particular referrers used were in error:
Turns out, we were running an experiment to detect malware targeting Hot Trends queries related to the Haiti crisis. Because this experiment was developed in response to an urgent situation we moved quickly and as a result used an incorrect Google search referrer which we’re now working to fix. Thanks for calling this issue to our attention and we apologize for any confusion we may have caused.