A little over a week ago, on the Friday before last, Matt Cutts, the head of Google’s Web Spam Team, wrote a post on the Official Google Blog titled “Google search and search engine spam”. This post, and the upcoming changes it discussed, were most likely in response to a growing trend of dissatisfaction with Google’s results that have been cropping up around the blogosphere. In the post Matt talks about how Google feels that things are in fact not as bad as people are saying, and that “Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness.” He does say that recently, due to increase in both “size and freshness” that of course some spam did get indexed, and also states that as the old, tired, run of the mill spam decreased in Google’s index that Google will now be shifting it’s focus on to content that just sucks:
As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. – Matt Cutts
Whoa. This, especially coming from Matt Cutts, is huge. For those who don’t know, “content farms” are organizations that generate websites composed of large amounts of low cost “fluff” or filler content, with little to no regard to quality. The content is generated not based on having information and the desire to share it, but rather in response to queries that might get typed into a search engine, and are built for search spiders rather than human consumption. They include companies like Demand Media, Mahalo, and Associated Content.
Historically speaking, Matt has pretty much refused to come right out and say that these content farms were indeed spam, despite the fact that they clearly violated Google’s quality guidelines:
Doorway pages are typically large sets of poor-quality pages where each page is optimized for a specific keyword or phrase… Google’s aim is to give our users the most valuable and relevant search results. Therefore, we frown on practices that are designed to manipulate search engines and deceive users by directing them to sites other than the ones they selected, and that provide content solely for the benefit of search engines. Google may take action on doorway sites and other sites making use of these deceptive practice, including removing these sites from the Google index. – Google Webmaster Tools Help
Regardless of the very clear wording of their policies, Google has to date not banned any of these content farms for their violations. In fact, quite the opposite – Matt has in the past even defended these sites, and in Mahalo’s case at least given warnings to them which he then allowed them to ignore. He alluded to the fact that one of the algorithm updates from last year, Mayday, was supposed to help filter out “really kind of lower quality” sites, and many people thought he must be talking about content farms back then, but alas that turned out to be a bust. So when he comes right out and says, hey, you’ve waited long enough, now we’re going to target content farms for reals, y’all, then yeah, that’s a Pretty Big Deal.
Now, Richard Rosenblatt, the CEO of Demand Media, may be may be in denial about his company being a content farm, but that definition has existed for quite some time, and regardless of what you call it low quality content built specifically for search engines is in violation of Google’s guidelines. However, he still persists in his belief that as long as you can get some people to call it something else, his “partnership with Google” will keep them protected regardless of what happens:
This is why our partnership with Google makes sense. 1) We help them fill the gaps in their index, where they don’t have quality content. 2) We’re the largest supplier of all video to YouTube, over two billion views and 3) we’re a large AdSense partner. So our relationship is synergistic, and it’s a great partnership. And it’s a partnership that we’re excited to continue to expand. – Richard Rosenblatt, attempting to give Google’s PR team a heart attack
I am guessing that Mr. Rosenblatt missed the section in Matt’s post where he very specifically discussed the fact that no special partnerships would protect the content mills from these changes:
One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
- Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
- Displaying Google ads does not help a site’s rankings in Google; and
- Buying Google ads does not increase a site’s rankings in Google’s search results.
– Matt Cutts, being crystal clear
Then Friday rolls around, and Matt announces that these changes already happened earlier in the week. If you didn’t notice any changes, then that’s probably because, according to Matt, less than half of a percent of queries would show any perceptible ranking differences. If you didn’t notice any changes in queries involving content farms, well… as near as I can tell that is because there weren’t any. In fact, in his announcement post Matt doesn’t even use the phrase “content farms” at all, and instead only discusses that the net effect of these changes is that in cases where content was scraped, searchers are more likely to see the original content first. He then thanks Jeff Atwood (one of the ones who wrote a story discussing Google’s decline in quality that had a large audience) and Stack Overflow’s team (a site that Jeff co-founded) for their feedback. A few people asked about the omission in the comments, but as of yet anyway Matt has not replied to any of them.
As to the results themselves, for the most part I am seeing what I was seeing before, so that “less than half of a percent” doesn’t surprise me. If you search for [mcdonalds coupons] the #1 site is still a Mahalo page that doesn’t actually have any coupons on it, and very little original content. If you search for [mcdonalds free salad coupons] you get a different Mahalo page that does actually have a picture of a coupon on it (good only in Canada, and expired in July 2010, however), and if you search for [mcdonalds happy meal coupons] the second listing is a Mahalo page, again with no coupons on it. These pages are filled with riveting dialog, such as the section labeled “McDonalds Happy Meal Coupons Coupon Policies,” which states:
The policies for McDonalds Happy Meal coupons may have certain restrictions and these might include not being able to combine discounts or limiting the period of use. Make sure you read and understand the instructions listed on the coupon carefully to ensure that you know when the coupon will become valid and when it will expire as well as what special restrictions apply. Also included in this information will be which product or products the coupon can be used to purchase. Insuring that you understand the coupon policy can help you to avoid any mistakes during the checkout process. – Content Mahalo actually paid for
It’s not just Mahalo, of course… type in [how to reset your blackberry] and you will find ranking just fine a page from eHow that is nothing more than the phrase “hit alt+right-shift+delete” wrapped in light, fluffy filler content. I also still see queries where the duplicate content outranks the original, such as the copy of a Wikipedia page that ranks #1 for [elvett semic]. The changes, whatever they were, truly are barely (if at all) perceptible. The change was so small that one of Matt’s readers asked, “I’m wondering why announce it if you’ve gotten the feedback and the algorithm update would presumably be of such little consequence that no one would likely notice or comment on it unless you told everyone.” Indeed, why make such a big deal out of something when almost no one can tell the difference?
To answer that you need to take a look at exactly what it was that did change. When I search in Google now for questions that were asked on Stack Overflow, at least for the queries I checked, I now see SO ranking instead of sites that scrape their content. This is of course how it should be, and the main concern that the people from that community were
bitching giving feedback about to Matt. Stack Overflow is, as I mentioned, the site that was co-founded by Jeff Atwood, who is the author of the much quoted post that generated quite a bit of buzz about Google’s decline in quality. Many of the frequenters of Stack Overflow are also regulars on Hacker News which (not so) coincidentally Matt decided to hold a good portion of the discussion about these changes, both before and after they were implemented. While the HN and SO communities in and of themselves might be tiny compared to the web as a whole, the fact is that their voices do carry within the online community. Start buzz there about Google showing quite a bit of improvement and it has a very good chance of spreading, even if the data set demonstrating that is overall quite small. Add to that the fact that Richard Rosenblatt, CEO of Demand Media knows that the changes aren’t targeted at his company (and when asked if Google had discussed the changes with him, replies “I can’t comment on that.”), and then toss in Jason Calacanis’s ingratiating comments on Matt’s blog post about the changes going live:
It was clear that Mahalo was getting grouped into the “content farm” space… – Jason Calacanis
No kidding? Really? Past tense there, eh Jason?
So Matt loosely ties the concepts of “content farms” and “scrapers” together in a blog post on the official Google Blog, and claims that they are taking action against them. He then announces a change that appears to only affect scraper sites, and furthermore only those scraping a specific dissatisfied community, publicly thanks that community for their help, and then doesn’t mention the phrase “content farm” again. Even though the changes were practically non-existent, there is a good chance that the overall impression from those who don’t look too closely is that action was indeed taken, and that if what were formerly referred to as content farms are still ranking well, then obviously they must be there for a reason.
From a strategic standpoint it’s actually rather clever. If I were Google and I needed to conceal special relationships I had with companies (especially if I was thinking that the FTC might want to get involved in my business) then I too would probably try very hard to sway the public opinion about the labels attached to the sites those companies owned, and shift the focus to something I could fix without caring about the damage, and then crowd source a tech community to help spread the impression that things were better. Most people probably won’t even pay enough attention to notice.