A couple of days ago I posted my assertion that Rand Fishkin had lied about the details of the new Linkscape tool on SEOmoz. During the discussion that followed, Rand continued to maintain that they owned the bots that collected the data that powered the tool, despite several points on that being very unclear, and that his bots had collected those 30 billion pages.
Right in the heat of the argument, someone decided to drop a comment on my blog that struck me as a little odd for some reason:
So who’s behind dotnetdotcom.org? “few Seattle based guys” “Trust us” ? WTF!? Why are there absolutely no names on that site? – some guy called smallfish
I had looked at that site before when Rand had released all of the info as to where the data from his tool actually came from. I had dismissed it, since Rand was claiming to have 30 billion pages in his index. The download on this site was only for 3.2 million pages out the initial 11 million pages that they had collected so far, what they were calling “the first part” of their index.
Since right at that moment Rand and I were arguing about whether or not Linkscape actually had a bot of it’s own that had collected the pages in their index, it hit me. “Aha!”, I thought. “Rand is probably going to reveal at some point that they actually own the DotBot. I mean, being able to say that you collected 11 million of the pages is better than having not collected any of them, right?”
So, I trotted off to dotnetdotcom.org to take a second look, just in case that turned out to be what was happening. Once I got there, I notice that something was different. When I had visited the page on Friday, these were the stats I saw:
Saturday night, however, when I went to look, this is what I saw:
That’s right… smack in the middle of an argument between Rand and myself, where he was insisting that he owned a bot capable of spidering an index of the size he was boasting, one of the sources he listed (the one that no one knew who the owners really were) jumped 7 billion pages in size. Talk about your random coincidences.
What it does is it starts with the date that the DotBot went online, which is June 10th, 2008, calculates the number of seconds between then and now, and uses that as the starting point for how many pages it has spidered so far. It then counts up the display at one page per second.
All they did was add in 7 billion pages to the start number, and added in a proportional boost to the other factors they are displaying as well (domains, robots.txt, and “clogged tubes”). They even left the clock counting up at 1 page per second. 😀
Now, to put that in perspective… 7 billion pages at 1 page per second would in fact take 7 billion seconds to spider. This is not counting any processing or indexing time, this is just the collection of the raw data itself. That is:
116.6 million minutes
1.94 million hours
221 years (not counting leap years or time travel, of course)
in order for them to actually gather all of those pages. And they’re claiming that they did it in 4 months.
Uh huh. Right. 😀
I figured even if my hunch about this being related to the Linkscape issue was wrong, it’s still noteworthy that a company that is self professed to be worried about making the internet “as open as possible” would be trying to pull a fast one like this.