Is Google Referrer Spamming Too Now?

Yesterday a friend of mine sent me a section of her traffic logs that were showing some odd information. According to what was recorded there her brand new, as of yet unlinked-to website was ranking on the first page of Google for the single keyword, [free]. If she actually had managed to rank for that phrase it would be an amazing feat to say the least. The competition for that single word is enormous. Unsurprisingly, when performing that actual search her site is nowhere to be found. The site in question is barely one week old, and hasn’t even been launched yet.

What is surprising, to me anyways, is that it appears that the traffic is actually coming from a bot at Google… a bot that is cloaked, sending fake referrers, and behaving in exactly the same manner as MSN’s referrer spamming bot that first showed up a little over 2 years ago. I blogged about it back then, as did many others. Eventually, after much feedback from the community, they did halt the referrer spam practice. It was a bad idea for them to do it in the first place, and quite a few webmasters were perturbed about it. Two years was too long for it to go on, but at least they did finally stop doing it.

Now it looks like Google, for some unfathomable reason, has decided to start doing the exact same thing. The entries in my friend’s traffic logs looked like this:

74.125.126.81 – – [14/Feb/2010:16:34:03 -0600] “GET / HTTP/1.1” 200 19361 “http://www.google.com/search?hl=en&q=free&btnG=Google+Search&aq=f&oq=” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”

72.14.192.3 – – [14/Feb/2010:16:36:28 -0600] “GET / HTTP/1.1” 200 19361 “http://www.google.com/search?hl=en&q=free&btnG=Google+Search&aq=f&oq=” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”

The IP’s in question definitely belong to Google (as can be seen here 74.125.126.81, and here 72.14.192.3). However, unlike normal Googlebot IP’s, these are not associated with the Google domain name via dns. For instance, if you do a host lookup on 66.249.71.233 you will see that it resolves to the hostname crawl-66-249-71-233.googlebot.com. The IP’s that the referrer spam is coming from do not resolve to any hostname. Presumably, going on the logic that MSN gave when they were first called out for doing this, the reason for not having a reverse dns associated with the IP’s is to hide the fact that they actually are from Google. Similarly the user-agent of these bots is being cloaked as well. Instead actually identifying as Googlebot, “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”, these bots are pretending to be an actual user using IE6 on Windows.

Unlike actual web surfers, Google, you have no expectation of privacy. When you are a bot, skulking around trying to disguise yourself as someone else is poor netiquette to say the least. I am not sure exactly what prompted you to start doing this, but you really should just stop.

Update:

Barry Schwartz of Search Engine Land contacted Google about this, and they replied back that this is indeed them performing cloaked spidering. However, according to them it is not being done for spam detection purposes, and the particular referrers used were in error:

Turns out, we were running an experiment to detect malware targeting Hot Trends queries related to the Haiti crisis. Because this experiment was developed in response to an urgent situation we moved quickly and as a result used an incorrect Google search referrer which we’re now working to fix. Thanks for calling this issue to our attention and we apologize for any confusion we may have caused.

12 thoughts on “Is Google Referrer Spamming Too Now?”

  1. No, they’re just checking if your friend is serving different content based on referrer (think “doorways”).

  2. IF, and that’s a big IF, Google’s checking for cloaking pages with such blatantly faked requests, I’d call that sneaky, too. As for the IPs you’ve published, these could very well be assigned to AppEngine what opens just another can of worms, but has nothing to do with spam filters.

  3. This may be a silly question. But did she have personalized search enabled?

    This will usually cause that kind of irregularity.

  4. Quick note to the inbred who was asking how I knew that the IP addresses belonged to Google… they are linked in the article. You can click on them and it will take you to the ARIN IP whois utility.

    If you are going to try to call people out on the interwebs about technical shit then you should probably educate yourself a little bit better about how it all works first. Just a suggestion though. If you want to know why your comment wasn’t approved, you can read about that here: On Freedom Of Speech And Social Media (A Quick Note To Anonymous Commenters).

  5. I think the informations and idea in this article are wrong.

    If Google wanted really to do anonymous spidering with cloaked User Agents, they simply would buy new ips and servers under another company name – instead of locking down reverse dns for some already bought ips in some google ip blocks. I mean we talk about Google not some wannabe-script-kiddies here.

  6. Hmm. Interesting observation in the log files. Still, it’s difficult to know why this is occurring. However, is it possible that Google is spidering as IE6 to understand how some web pages may break under that version, especially given Google’s move ?RKTML5 in its own apps?

  7. The reason they’re doing this is to look for malware. Most of the time when a script is uploaded that contains redirects to malware it will only respond to specific referrers or browsers. This is because scripts will often chain together their redirects in an attempt to hide from researchers.

    This is particularly common with pages that are landing pages for google ads- malware authors will create the ads, have them point to a page that looks fine to the google bot but for someone with an exploitable browser, coming from the right place, the page will redirect to an exploit.

    So, for example-

    1) This page looks like an ad normally, but with the right into will redirect to 2
    2) If the browser is exploitable and the referring was the original ad, redirect to 3
    3-?) If the browser is exploitable and the referrer is correct, redirect to next
    ?+1) exploit the browser.

    So its very important to find that initial page, since even if you know where the final exploit is you still need to have all the pieces to unlock it.

  8. @Robert – I posted an update with their reply a couple of hours ago, which does state that it is malware they are looking for, but I think I forgot to clear WP-Cache after I did so. My bad.

Leave a Comment

*