A quick update to the Microsoft Rogue Bot Fiasco. It looks like now they have correctly DNS’d the IP range that they are sending these bogus requests from. Previously, all of the IP’s (which I first mentioned were all coming from the 65.55.165.* block) reverse DNS’d to names such as bl2sch1081901.phx.gbl. They have apparently changed this, so the IP’s are more readily identifiable as coming from Microsoft, reverse DNS’ing to the Live.com domain, eg. livebot-65-55-165-99.search.live.com.
I also saw someone report that the bot was coming from an additional IP block, 65.55.210.*, which also reverse DNS’s back to the Live.com domain. I do not know if requests from this range are carrying the same bogus referrer strings as from the other, as I do not have any requests in my own logs matching that IP block. If anyone else has experienced getting hits from other IP blocks, please comment on it here or yell at eKstreme about it (he’s been tracking this as well).
So, why would Microsoft (MSFT) suddenly re-DNS the entire Class C block that they were hammering websites with referrer spam from, screwing up their their stats and traffic logs, and potentially fucking with their AdSense earnings? My guess is that once the community bitching about them doing all this got loud enough, someone in their legal department got wind of it, and realized that what Microsoft was doing probably violated 99% of the ISP Terms Of Service or Acceptable Use Policies out there. For instance, if you were to read through the AUP for Level3 Communications (LVLT), you would see:
A User may not attempt to gain unauthorized access to, or attempt to interfere with or compromise the normal functioning, operation or security of, any portion of the Level 3 Network. A User may not use the Service to engage in any activities that may interfere with the ability of others to access or use the Service or the Internet. A User may not use the Service to monitor any data, information or communications on any network or system without authorization. A User may not attempt to gain unauthorized access to the user accounts or passwords of other Users.
Yeah, I would call performing actions that could affect AdSense earnings interfering with my ability to use the internet, and ignoring robots.txt pretty much without a doubt falls under monitoring my system without authorization. I’m guessing that they decided if they were doing it using a faked identity (*.phx.gbl) they were more likely to get called on it for using deceptive practices, whereas now if it hits the news they won’t look quite as bad. It will be interesting to see how this story unfolds further as time goes on.