/sigh
Ok Jason, we get it, you’re desperate. But stealing content from Wikipedia in order to replace what you deleted? Come on!
I am flipping through Mahalo.com today, just seeing if
Smackdown!
/sigh
Ok Jason, we get it, you’re desperate. But stealing content from Wikipedia in order to replace what you deleted? Come on!
I am flipping through Mahalo.com today, just seeing if
Jason Calacanis replied to my post from yesterday. In it he discusses how he is indeed deleting many of the spammy pages that I had pointed out. Some, like the duplicate content doorway pages, he continues to defend. Either way, progress is being made.
However, he still kinda kills it by tossing in at the end about how this whole scrutiny on his site is “absurd”, and anyone who calls him on it is being “vicious”:
Last week, after Matt Cutts gave Jason Calacanis a warning about Mahalo.com’s spammier pages (and probably a few stern looks as well), Jason changed a few items. He had them rename their spambot from “searchclick” to “stub”, thinking a less obvious name would throw off anyone looking into the spam situation. Very briefly they added a noindex meta tag to the content-less pages (a change that they then undid after just one day, of course). Probably the biggest change that they made, however, is that they decided to actually turn off (for now anyways) the bot that was creating all of those pages that were nothing more than scraped content.
What then, you may ask yourself, is Jason going to replace all of these pages with, exactly? I know that’s what I was asking. As I pointed out
Last week at SMX West, during the Ask The Search Engines panel, moderator Danny Sullivan asked Matt Cutts why he didn’t ban Mahalo.com for spamming Google. Matt stated that he had talked to Jason Calacanis, Mahalo.com CEO, about the issues, and warned him that Google might “take action” if Jason didn’t make some changes to the spammy side of Mahalo. Matt also made the following statement, in reference to Aaron Wall’s post on the subject:
All the pages Aaron pointed out now have noindex on them. – Matt Cutts
Matt was referring to all of the autogenerated pages that both Aaron I blogged about in our posts, the ones with
Last month Jason Calacanis wrote a rather sarcastic post aimed at Aaron Wall, which I am assuming was written in response to Aaron’s post, “Black Hat SEO Case Study: How Mahalo Makes Black Look White!“. In it Aaron discusses how sites that are composed largely of nothing more than auto-generated pages wrapped in adsense can get accepted and even gain authority in Google if they have enough financing and press. In Jason’s rebuttal to this was a claim about rankings that Mahalo had “earned” (and I use the term loosely) for “VIDEO GAME walkthrough”. I originally misinterpreted what he was trying to say, and thought that he meant rankings for that exact phrase. I commented how that wasn’t exactly a great accomplishment before realizing that what he actually meant was rankings for [{insert video game name} walkthrough], and that Mahalo has a couple top 10 rankings for that genre of search phrases.
Jason sent me an email to correct me on what he was talking about. We replied to each other back and forth a couple times, and a few very interesting things were revealed in that conversation:
Yesterday a friend of mine sent me a section of her traffic logs that were showing some odd information. According to what was recorded there her brand new, as of yet unlinked-to website was ranking on the first page of Google for the single keyword, [free]. If she actually had managed to rank for that phrase it would be an amazing feat to say the least. The competition for that single word is enormous. Unsurprisingly, when performing that actual search her site is nowhere to be found. The site in question is barely one week old, and hasn’t even been launched yet.
What is surprising, to me anyways, is that it appears that the traffic is actually coming from a bot at Google… a bot that is cloaked, sending fake
Yesterday a friend of mine, Sebastian, wrote a post titled, “How do Majestic and LinkScape get their raw data?“. Basically it is a renewed rant about SEOmoz and their deceptions surrounding the Linkscape product that they launched back in October 2008, a little over 15 months ago. The controversy is based around the fact that moz basically lied about how it was exactly they were obtaining their data, which in part was probably motivated by wanting to make themselves look like they were more technically capable than they actually are.
Now, I covered this back when the launch actually happened, in this Linkscape post, resulting in quite a few comments, and there was more than a little heated conversation in the Sphinn thread as well. This prompted some people, both on Sebastian’s post and in the Sphinn thread on it, to ask why all of the renewed interest?
It is not extreme, its just that it isn’t new. The fact that they bought the index (partially)? That was known from the beginning. The fact that they don’t provide a satisfying way of blocking their bots (or the fact that they didn’t want to reveal their bots user agent)? Check. The fact that they make hyped statements to push Linkscape? Check. {…} I don’t get the renewed excitement. – Branko, aka SEO Scientist
Well, I guess you could say that it’s my fault. Or, you could blame it on SEOmoz themselves, or their employees, depending on how you look at it. You see, the story goes like this…
Back when SEOmoz first launched Linkscape, it would have been damn near impossible for a shop their size to have performed the feats they were claiming, all on their own. Rand was making the claim “Yes – We spidered all 30 billion pages”. He also claimed to have done it within “several weeks”. Now, even if we stretch “several” to mean something that it normally would not, say, 6 (since a 6 week update period is now what they are claiming for the tool), we’re still talking a huge amount of resources to accomplish that task. A conservative estimate of the average website, considering only html, is 25KB of text:
30,000,000,000 websites x (25 x 1024) bytes per website = 768,000,000,000,000 bytes of data (768 trillion bytes, which is 698.4TB)
(698.4TB / 45 days of crawling) x 30 days in a month = 465.6TB bandwidth per month
Now, I know that one of the reasons that Rand can get away with some of his claims is that most people just don’t grasp the sheer size
Today over at ReadWriteWeb Sarah Perez wrote an article on how Google was gaining ground on their share of the search market. In the article she talked about the latest buzz from Google Analytics blog having to do with changes to the way Google.com handles clicks in their serps, which were a implemented as result of what Google would break in analytics packages by implementing AJAX driven search results. She notes that even though the speed benefit Google gains from going AJAX would be minimal on a per-search basis, when multiplied by the millions of searches performed every day it would eventually add up to more of a market share for them.
Although a change to AJAX technology would only make searches milliseconds faster, those milliseconds add up, allowing people to do more searches, faster. And that would let Google grow even more, eating up percentage points along the way. – Sarah Perez
However, what was missed by many
Last month I blogged about the fact that I had noticed that Google was playing around with delivering the SERP’s via AJAX. I pointed out that due to the way that referrers work, using AJAX to generate the pages would cause all traffic coming from Google to look like it was coming from Google’s homepage instead of from a search. This means in turn that analytics packages, including Google Analytics, would no longer be able to track what keywords searched on in Google were sending traffic to the webmaster’s websites. There was a bit of a buzz about it, and Google seemed to stop the testing shortly thereafter. Google’s only reply on the subject was “sometimes we test stuff”, to point to a post from three years ago that also said, “sometimes we test stuff”, to say that they didn’t intend to break referrer tracking, and that was it.
Shortly thereafter, the tests
On Friday I wrote a piece on how it looked like Google was testing AJAX results in the main serps. Some discussion followed as to whether, if this change were to become a widespread permanent one, this would affect Firefox plugins that existed (definitely some existing ones would stop working), break some of the rank checking tools out there (they would have to be re-written I’m sure), and even some people asking if it would thwart serps scrapers from using serps for auto page generation (not for long, no).
While those things would definitely be affected in at least the short term, there is a much greater impact from Google switching to AJAX. All of the issues mentioned involve a very small subset of the webmastering community. What actually breaks if Google makes this switchover, and is in fact broken during any testing they are doing, is much more widespread. Every single