Last Thursday, in response to Matt Cutts stating that he needed more than “arbitrary inurl searches” to sway him (which was in turn in response to a Hacker News submission about Mahalo and the plethora of keyword rich domains they were apparently building out) I wrote a post explaining in some detail how the latest Mahalo spam is in fact spam. I demonstrated in the post how Jason had developed a linkfarm which was being used as a link source back to Mahalo.com. It wasn’t just that the individual sites were all linking back to the mother site, which would in fact be normal, but also that the pages were linking back to specific pages within the main site, pages that in many cases had few, if any, links going to them aside from the ones from this linkfarm.
Each time it happens Matt’s defense of Mahalo spamming Google just gets more perplexing. In this latest round he started by saying that his job was not to have knee jerk reactions, as if Mahalo hadn’t already established a pattern of spamming over a long period of time, and that Matt is pretending he hadn’t already had a talk with Jason and told him that if he didn’t raise the bar with his site that Google would take action on Mahalo. From there it got even weirder – Matt looked at the linkfarm and basically told me that a) he didn’t care as long as it wasn’t passing link juice, and b) he’s the only one who could tell if that was the case.
I could have sworn that it was if you were caught trying to spam you were penalized, and you couldn’t get the penalty removed unless you promised not to do it again. Now, where did I get such a crazy and wild idea? Oh yeah, I remember now… it was from Matt Cutts:
Now we come to the heart of things: what goes into a reinclusion request. Fundamentally, Google wants to know two things: 1) that any spam on the site is gone or fixed, and 2) that it’s not going to happen again. – Matt Cutts on the bare essentials of a reconsideration request
The reasons Matt gives out for defending Mahalo seem to be getting more and more creative (even if not more believable). Jason’s, on the other hand, are the same old song and dance he has been spouting since I first called him on his bs and demonstrated that the vast majority of his site was nothing more than empty, auto-generated pages. On Thursday’s post, before he started to lose it with his “fuck you losers, I’m rich” tirade, Jason made this statement:
We have humans write pages of at least 300 words. We don’t index 99.99% of pages with < 300 (it would have to be something unique), and we police the system to get short pages up to 300 words within 30 days. - Jason Calacanis, 4 days ago
Orly? Let’s take a look at those claims, shall we?
The Mahalo coupon pages are about the crappiest pages I have found on the site. When I was doing my initial investigation I stumbled across quite a few of them. My guess is that [{brand} coupon] generates AdSense blocks with a decent eCPM since they are, after all, “targeted” pages. None of the Mahalo “coupon” pages actually have any coupons, which of course means that the end user is much more likely to click on one of the ads when they land there, and more required clicks does means a poorer user experience. What content these pages do have is fluff text that gives ample opportunity for Mahalo to link back to itself, and have spammy signals that are easy to spot like when there are near-identical versions of the same topic page, usually by doing one page for “coupons” and another for “printable coupons” (and no, there is nothing to print out on those pages either). Therefore i picked those as where I would look first to point out, yet again, how Jason was simply pulling these claims out of his ass with no supporting truths behind them.
Digging back into my old data, from March 13th, I was able to determine that from the day the site started adding content up until that point in time Mahalo had amassed 2,655 coupon based pages. When I re-scanned and looked this time I found that there was now 16,601 of these pages. That is a huge increase for 3 months, and a ton of content to create uniquely, even if you ditch quality altogether. Mahalo currently only has a grand total of 90,494 of actual pages on that side of things, so that means 18% of the site is made up of “coupon” pages – and by that I mean coupon pages that don’t actually have any coupons on them.
What’s more, it actually looks like there is a chance that 9,932 of those pages were added last week, over a 3 day period. How the hell do you get writers to create 9,932 pages of even crappy content, all about coupons, in only 3 days?
As I started looking into it I suddenly understood… they didn’t just ditch the quality to create those pages, they went ahead and ditched the content, yet again. I checked over 30 pages, and time after time I found what I found was auto-generated pages that were nothing but ads, affiliate links, and scraper feeds.
http://www.mahalo.com/1800pools-coupons:
(click to view full page screenshot)
http://www.mahalo.com/tigerdirect-coupons:
(click to view full page screenshot)
http://www.mahalo.com/topnotchcare-com-coupons:
(click to view full page screenshot)
Most of the pages I checked had the affiliate links provided by Savings.com, and most linked to the same two questions pages: one discussing the Outback coupons page, and one discussing “grocery coupons”… and in every case neither question had anything to do with what the actual “coupon” page was supposedly about:
For the pages that did not have Savings.com affiliate feeds on them it was because they were using as keywords the names of sites that wouldn’t actually be Savings.com publishers, like GBB.org and RLS Forum. It looks like Jason somehow got his hands on a list of sites that for some reason or another looked like they might have offered some sort of coupon. These were then dumped into the database in the form of pages, and were then checked to see if they matched up with the Savings.com feed. If they did, great, if not that’s ok too, they still had AdSense on them – despite the fact that putting AdSense on pages without actual content is a direct violation of Google AdSense policies:
That’s ok though, I am sure Jason doesn’t care that he is risking the bulk of the site’s revenue stream by violating the terms of the program, since it looks like the AdSense team is giving him just as much of a pass as the spam team is.
In addition to the pages simply being devoid of content, Jason also uses the tactic of creating near-duplicate versions of some of these pages in order to get the most out of the long-tail phrase variations:
http://www.mahalo.com/1and1-coupons
http://www.mahalo.com/1and1-internet-coupons
http://www.mahalo.com/1and1-web-hosting-coupons
http://www.mahalo.com/1and1affiliate-com-coupons
Let’s look at Jason’s statements again…
We have humans write pages
Well, no. You have humans write some pages, but an assload are still auto-generated. In addition to the ones shown here, Google also says that you still have 13,200 pages that you scraped from Wikipedia in their index:
Adding the above auto-generated pages in with the Wikipedia ones, that means that at this point an estimated 33% of the Mahalo content pages are scraped or auto-generated, and that’s just the stuff that’s easy to find. Yay footprints.
of at least 300 words
Again, no, even on the human generated pages that is not always true. Take a look, for instance, at the 1and1 page on Mahalo.com that all 4 of the above coupons reference:
Including words of 3 letters and less that page still only has 212 words of human generated content on it. I also pointed out last week that some of the Wikipedia scraped pages remained thin, such as the one on “The Alice B. Toklas Cookbook”, which has only 261 words on it.
We don’t index 99.99% of pages with < 300 [words]
Bullshit. Not one single one of the pages I examined had a “noindex” tag on it, or was blocked by robots.txt. In fact, just the opposite – every single one of them was pushed to Mahalo’s sitemap, to make it easier for Google to find (and index) them.
we police the system to get short pages up to 300 words within 30 days
Again, bullshit. The 1and1 page has been that way since at least March 11th:
And the Alice B. Toklas one since March 12th:
So Jason, please, enough with the bs. Quit claiming stuff that simply is not true, especially when it’s so damn easy to disprove what you say. I still have no idea why it is that Matt Cutts is choosing to ignore your spam, but to the rest of us it’s as plain as day. And no, Jason… going in now and trying to clean it up in no way changes the fact that you spammed in the first place.
If Calacanis wasn’t such a “big dog” I don’t think Google would be so lenient with him.
What “normal” people usually have to go through when something even moderately questionable is found on their sites:
http://www.seo-scoop.com/2008/01/24/matt-cutts-why-am-i-still-being-punished/
Oh please… same story, different day!
We have all kinds of pages that get made in the system that are short/don’t have content immediately–just like Wikipedia, About and Google Knol.
If a page stays short for more than a few weeks it is nofollow, noindexed and/or deleted. All pages start short… the issue is do you keep them indexed or not, and do you attempt to build them out.
We build great value with our coupon and deal pages, helping people find out how to get those deals. The 2.0 version of these pages is even better and will be out in the fall.
as I’ve said over and over: if a page isn’t quality it’s not going to be indexed in the search engines. we do sweeps of these pages monthly and remove them or build them out.
Also, the search engines are not stupid… they look at an in progress page and say “oh, this page doesn’t have original content right now, these other 30 pages on the internet do–let’s rank those.”
When we look at our Analytics there is a very clear and specific correlation between the number of original words and how much traffic we get.
Anyone can go to laarge content site and pick out the bottom 10% of pages and say “these suck” — you have to look at our top pages and the 100k+ we invest in these type of pages every MONTH.
There are very few companies that invest as much as we do in original content, and none of the major content sites out there deindex and delete short pages as aggressively as we do: not even Google own Knol, eHow, Associated Content or Wikipedia! Those sites are filled with short content pages that are NEVER going to be updated. We sweep through out system for these pages and reconcile them monthly.
Seriously… go look at how to play guitar chords or how to bake a cake… and stop pulling the short content pages that are being worked on.
Jason, like I said before, this is the main issue with discussing this with you… you can spew completely unsubstantiated crap like the comment above all day long with no effort whatsoever, and you completely ignore all of the direct evidence presented.
You have to be a damn crackhead to think those two statements agree with each other, Jason… and you are the one making both of them.
Well, Jason, if he is only picking out the bottom ten percent of the pages on your site, I would like to see some of the top 10%. Could you provide us with a link?
I actually requested to Jason that he stop dropping live links to Mahalo.com in the comments. He is free to provide urls that don’t turn into hyperlinks if he wants.
I can tell you right now though that Jason has a small select set of pages that he claims he paid a large amount of money for that he uses as his counter example to the thousands of crappy ones that keep getting found. I am not sure why he thinks that those pages in any way offset the fact that the bulk of his site is spam beyond a doubt, but it seems to be one of his more persistent arguments.
All content pages *start* crappy and then grow into diamonds. The important thing is that you have a process for doing that, and Mahalo has one of the best ever created.
You see, any time a page gets over 1,000 page views with invest another $10 in it. If the page gets 5k or 10k views we invest another $50-$100 in the page.
This means that any page that ranks AND gets any level of traffic gets, literally, hundreds to thousands of dollars invested in it.
This is just simple logic, and I’m not sure why folks don’t get it. We actually DEINDEX pages from Google/Bng/Yahoo is they are not improved. This is an even better system than Google Knol, Wikipedia, About, etc. which don’t deindex their short pages.
The SEOs can keep attacking me, but if you take a look at the top 5,000 pages which represent 95%+ of Mahalo’s traffic and 99% of our revenue, they are all between good, great and excellent in terms of quality.
You’re picking out pages in process that don’t rank and/or don’t get traffic. They are stubs and will be built out or de-indexed. we do this process monthly… so, come back in 60-90 days and take a look.
best jason
Jason, you’re lying. I offer up proof that you are lying, and you come back just saying the same thing again. I offer more proof, you repeat yourself some more.
I mean, is the issue here that you have gotten so good at conning that you have even duped yourself now, and can no longer follow reality..?
Why is this douchebag still allowed to have his site indexed on google? i call for a ban petition
Here’s an alternate explanation for Calacanis’s answers, from The New York Times: http://opinionator.blogs.nytimes.com/2010/06/20/the-anosognosics-dilemma-1/?hp .
This is NOT a linkfarm. It is a poor attempt at a linkfarm by a wanna-be blackhat. It’s not a link farm because because there aren’t enough garbage sites supporting the garbage answer sites. Those aren’t being propped up by anything. If you were creating a true link farm, you would make hundreds of little garbage scraper sites to support the thin answer sites. Do you agree with that Michael? This is basically a half-ass attempt at a link farm by someone who wishes he was a true blackhat SEO.
Michael you asked “I mean, is the issue here that you have gotten so good at conning that you have even duped yourself now, and can no longer follow reality..?”
I encourage you to watch the clips of Calacanis on the FOX Poker Stars show he linked to from his twitter. He plays poker like a sociopathic nutcase. He has absolutely no changes in body language–no tells at all– when he bluffs crazily. He’ll make very risky raises and show no differences in body language or speaking, and he doesn’t perspire or even show a slight redness in the face during what should be high pressure moments. Conclusion, he’s a bonafide sociopath….
calcanis is on the savings.com board of execs. hence why there are savings.com links all over mahalos junk.
Too big to fail?
I think the main question here should be: why does ADSENSE accept this kind of violation?
There are several known cases like this one where Google knew exactly that their own adsense policy is violated but would not do anything about it…
Judging by the amount of AdSense ads on the page named above (www.mahalo.com/1800pools-coupons)it still makes economic sense to use AdSense for monetization.
So if AdSense stopped those violators (or made it more difficult for them) we would have much less junk out there…
PLUS, it would be easier to put a negativ “spam” ranking-marker on those violators, because people with REAL content do NOT cover their page with 75% ads…
But – sadly – like the big banks out there, some of those violators just seem too big to fail…
While I agree 100% that JC is a narcissistic sociopath, I wouldn’t cite his lack of “tells” during poker games as evidence. That’s something that can be learned by a poker player, and is usually the mark of a reasonably experienced player.
Maybe someone big at google has invested in mahalo? Is there a way to find that out?
Getting people to write thousands of pages of content for you (maybe millions?) and then raising the minimum payout on them before they can cash out is not cool. And then after that, before most can get to that higher rate, pull the plug on writers telling them you changed the game and that they no longer will earn a % of ad income as per original agreement is not cool either. And then if that’s not enough, have members invest hours of writing thinking that they own their content (as per the original t&c) and then erasing that and saying: nope, changed our minds, we own all your content now…is more blackhat than anything I’ve seen to date.
I’ve been following this scandal the past few days and each time I turn a page I discover something new, this stunt is the mother of all stunts IMO. The good news is, you can only screw thousands of people so many times before you’re done, at some point the internet just isn’t large enough to hide. And just who are the people investing in this sort of thing?
Old terms of service
http://accentuateservices.com/images/Terms%20of%20Service_1277520688035cache
Changed sometime around the day of the announcement to:
http://accentuateservices.com/images/Terms%20of%20Service_1277520655092new
Has even changed again since then, for a less inflammatory flavor.
Jason’s webinar that was part of the announcement said that most writers who had not reached the $150 payout limit as of the moment of the announcement could no longer receive their payment through paypal as was understood when writers originally produced the work. Unless they could reach this limit in a very short time – difficult once you are no longer able to be given tasks to do under the new system.
Instead, writers were cheerily told they could buy something from the Mahalo store with their remaining balance (again difficult for many, as the store does not ship outside the US); or donate it to charity.
Howzabout just PAY for work completed under the old TOS.
Plenty of discussion and members have been removed/banned on the site also, in the last few days… but that happens quite often when J.C. takes such a whim.
A new crop of Mahalo spam pages, 300+ like this one were added to the site yesterday:
http://www.mahalo.com/how-to-advocate-for-drug-problems-5min
All these new ones have a video from the 5min site and one line of text (plus the usual autogenerated link stuff).
Crazier and crazier over there.
more mischief from the Jmonster:
http://blog.fluther.com/an-open-letter-to-jason-calacanis/
had no idea this was going on …
@balinesecat – actually, that Fluther post was from about a year and a half ago.