Jason Calacanis replied to my post from yesterday. In it he discusses how he is indeed deleting many of the spammy pages that I had pointed out. Some, like the duplicate content doorway pages, he continues to defend. Either way, progress is being made.
However, he still kinda kills it by tossing in at the end about how this whole scrutiny on his site is “absurd”, and anyone who calls him on it is being “vicious”:
At the end of the day the absurd microscope we’re being put under by the SEO community is actually making our product better, and yes even improving the SEO of our best quality pages. For that, I thank you guys.
That being said, you really don’t have be so vicious about it. – Jason Calacanis
Jason, there is nothing “absurd” about this shit at all. Ya know, that whole Thou Dost Protest Too Much line? That applies here.
You’re sitting there calling this “vicious” and yet continuing to lie about the situation. You appear to be forgetting that I am not some moron who doesn’t have the facts. You made a statement over on HN and are making a similar statement here trying to implicate that other people put all of those short pages on your site. This isn’t the case, and for ffs, it was explained on Mahalo itself how they got there in the first place. Those pages that I blogged about, and that you are now saying are being noindexed (and which should all be getting removed, not just noindexed) were generated by a bot, one which ran on your site, one that turned people’s searches into static pages, which did so if and only if the page generated clicks, and which then added those pages to your sitemap so Google could more easily find them. This bot was once named “searchclick”, and you actual went out of your way to rename it to “stub” in order to make it sound less conspicuous. Wtf Jason, do you really think there are many things more indicative of awareness of the wrongdoing than going out of your way to actually cover up evidence…?
YOU are the spammer, Jason, you who had people modify whatever original software Mahalo was based on to behave like this. This is not some “consipiricy of the SEO’s” designed to get back at you for anything. You were spamming, acting like a dick when people called you on it, and then on top of that you are getting preferential treatment from Google during this whole thing to boot. Tons of people are getting banned from Google, or AdSense, or both, every day, and they don’t get to talk to a Google rep. They get a form letter and little to no information about what they did wrong (if anything). You get a personal talking to from the head of the Google spam team no less, and you’re response is to whine about how all the “trolls” are being mean to you? Give me a break, Jason.
Matt Cutts himself told you that if you didn’t clean things up that Google might take action. Matt is the head of the Web Spam team. You seriously want us to believe that you thought he was talking about something other than spam when he said that..?
To put things in perspective, not counting the Mahalo Answers side of things (which I haven’t even started to look at) you started out with 12 xml sitemaps each containing up to 50,000 pages in them. The total number of pages was 598,661. After the trimming down, you now have 3 xml sitemaps, with a total of 128,324 pages left. That means that you just whacked 78.6% of the main site, and you still have empty pages in the sitemap. Just looking by hand, it looks like between the remaining empty pages you told me you were going to eventually remove from the sitemap and the fluff that is in there, only about 1 out of 3 of the remaining pages should actually still exist. I pointed one example out on Twitter, but here’s another set so you can again see what I am talking about when I say fluff:
Same topic with slight rewordings, none with any substantial amounts of content, those could easily be (and should be) a single page. Now, I understand that it takes time to get to all of those pages, but still, the numbers are kind of huge.
If this helps you get a better mental image of why this shit is not an “absurd microscope”, here are some cool infographics for you to look at (these are to scale, by the way, not some made up ratios):
We know those pages didn’t add up to a ton of traffic, Jason, you keep telling us. As I said before it’s not about that, it’s about all the free PageRank they were pulling in. We’ll see where things sit after the cleanup actually gets finished and Google gets around to respidering what is and what isn’t left. My guess? You will be surprised how large an impact this “small portion” of your site was actually having.