<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Googlebot Creates Pages Instead of Simply Indexing Them: New FORM Crawling Algo Goes Bad</title>
	<atom:link href="http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/feed/" rel="self" type="application/rss+xml" />
	<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/</link>
	<description>Smackdown!</description>
	<lastBuildDate>Sun, 14 Mar 2010 23:08:49 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ryan</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-24273</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Thu, 03 Sep 2009 23:20:07 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-24273</guid>
		<description>this is slightly off topic, but google now discovers search functions on sites and lets you use them from the search bar.  Try browsing around in chrome and then typing in the domain name while on the site.  If the site has a search feature Google has identified it will give you the option to search with that feature.  Its like the open search without creating the xml file required.  Kinda nifty.</description>
		<content:encoded><![CDATA[<p>this is slightly off topic, but google now discovers search functions on sites and lets you use them from the search bar.  Try browsing around in chrome and then typing in the domain name while on the site.  If the site has a search feature Google has identified it will give you the option to search with that feature.  Its like the open search without creating the xml file required.  Kinda nifty.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casdeiro</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-10323</link>
		<dc:creator>Casdeiro</dc:creator>
		<pubDate>Fri, 01 Aug 2008 09:47:39 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-10323</guid>
		<description>As a second line of defense we might try to add a captcha to that forum form ;-)</description>
		<content:encoded><![CDATA[<p>As a second line of defense we might try to add a captcha to that forum form <img src='http://smackdown.blogsblogsblogs.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Casdeiro</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-10322</link>
		<dc:creator>Casdeiro</dc:creator>
		<pubDate>Fri, 01 Aug 2008 09:46:22 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-10322</guid>
		<description>Recently we discovered we were being overload by queries from Googlebot. I guess it&#039;s something to do with this issue you posted about, because our phpBB2 forum&#039;s tables are overloaded with a lot of nonsense searches. Strangely no one of this searches seems to be indexed at Google as you can see here: http://www.google.com/search?q=site:www.fillos.org+inurl:search

Anyway, we&#039;ll try to forbid index.php?name=PNphpBB2&amp;file=search for any bot at ROBOTS.TXT. It should help, don&#039;t you think?

Thanks for telling people about this issue. Overload of dynamic content site is a main collateral issue of this new Google&#039;s behaviour.</description>
		<content:encoded><![CDATA[<p>Recently we discovered we were being overload by queries from Googlebot. I guess it&#8217;s something to do with this issue you posted about, because our phpBB2 forum&#8217;s tables are overloaded with a lot of nonsense searches. Strangely no one of this searches seems to be indexed at Google as you can see here: <a href="http://www.google.com/search?q=site:www.fillos.org+inurl:search" rel="nofollow">http://www.google.com/search?q.....url:search</a></p>
<p>Anyway, we&#8217;ll try to forbid index.php?name=PNphpBB2&amp;file=search for any bot at ROBOTS.TXT. It should help, don&#8217;t you think?</p>
<p>Thanks for telling people about this issue. Overload of dynamic content site is a main collateral issue of this new Google&#8217;s behaviour.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: If Googleplex Employees Don&#8217;t Understand The Webmaster Guidelines, How Can They Expect Webmasters To Adhere To Them? &#124; Smackdown!</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-5247</link>
		<dc:creator>If Googleplex Employees Don&#8217;t Understand The Webmaster Guidelines, How Can They Expect Webmasters To Adhere To Them? &#124; Smackdown!</dc:creator>
		<pubDate>Tue, 27 May 2008 13:52:01 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-5247</guid>
		<description>[...] to start indexing on it&#8217;s own, with very little control being given to webmasters (since now Googlebot crawls HTML search forms). What it is not referring to is a few odd pages that were cached in order to facilitate [...]</description>
		<content:encoded><![CDATA[<p>[...] to start indexing on it&#8217;s own, with very little control being given to webmasters (since now Googlebot crawls HTML search forms). What it is not referring to is a few odd pages that were cached in order to facilitate [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Does Google love your site too much?</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-5227</link>
		<dc:creator>Does Google love your site too much?</dc:creator>
		<pubDate>Tue, 27 May 2008 08:34:06 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-5227</guid>
		<description>[...] quite easy to block these pages being indexed as other people have pointed out but 99.9% of webmasters won&#8217;t know about the issue. The best method to [...]</description>
		<content:encoded><![CDATA[<p>[...] quite easy to block these pages being indexed as other people have pointed out but 99.9% of webmasters won&#8217;t know about the issue. The best method to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael VanDeMar</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-5074</link>
		<dc:creator>Michael VanDeMar</dc:creator>
		<pubDate>Sun, 25 May 2008 04:04:57 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-5074</guid>
		<description>Gab, I was just quoting what Matt Cutts had said, but yes, I would take it to mean essentially the same thing as you did. &quot;New links&quot; would be links leading to new content, not new links pointing to content that they already knew existed.</description>
		<content:encoded><![CDATA[<p>Gab, I was just quoting what Matt Cutts had said, but yes, I would take it to mean essentially the same thing as you did. &#8220;New links&#8221; would be links leading to new content, not new links pointing to content that they already knew existed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gab Goldenberg (goes to deep crawling page)</title>
		<link>http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/comment-page-1/#comment-5069</link>
		<dc:creator>Gab Goldenberg (goes to deep crawling page)</dc:creator>
		<pubDate>Sun, 25 May 2008 03:22:53 +0000</pubDate>
		<guid isPermaLink="false">http://smackdown.blogsblogsblogs.com/?p=73#comment-5069</guid>
		<description>Mike, I&#039;m not sure if the terms matter and if we&#039;re thinking the same, but in case we&#039;re not, I thought the new crawling was aimed at discovering deep web content - like orphaned pages - on quality domains. Not at discovering links. 

The difference (as I see it) is that if the purpose is to discover new links, then it can matter for links pointing outwards to other sites too. My understanding of the point of the crawling was the 80-20 rule being applied to search - 80% of the high quality stuff coming from 20% of the sites... Ergo if a site is high quality, let&#039;s try and index ALL of it (but not necessarily who/what it links to). (More detail if you click my name; it goes to a page where I guessed what the rationale behind discovering these new pages might be.)

I mean, with circle-of-trust type algos, I guess discovering external links comes to the same thing as discovering more pages on the site you&#039;re crawling. Depends how cynical you are about high quality sites linking out to their sponsors etc though...

All that said, your search tech knowledge is greater than mine, so I&#039;d love to get your take!

p.s. The flip side of the canonical issues is at least those guys know they have trusted sites now!</description>
		<content:encoded><![CDATA[<p>Mike, I&#8217;m not sure if the terms matter and if we&#8217;re thinking the same, but in case we&#8217;re not, I thought the new crawling was aimed at discovering deep web content &#8211; like orphaned pages &#8211; on quality domains. Not at discovering links. </p>
<p>The difference (as I see it) is that if the purpose is to discover new links, then it can matter for links pointing outwards to other sites too. My understanding of the point of the crawling was the 80-20 rule being applied to search &#8211; 80% of the high quality stuff coming from 20% of the sites&#8230; Ergo if a site is high quality, let&#8217;s try and index ALL of it (but not necessarily who/what it links to). (More detail if you click my name; it goes to a page where I guessed what the rationale behind discovering these new pages might be.)</p>
<p>I mean, with circle-of-trust type algos, I guess discovering external links comes to the same thing as discovering more pages on the site you&#8217;re crawling. Depends how cynical you are about high quality sites linking out to their sponsors etc though&#8230;</p>
<p>All that said, your search tech knowledge is greater than mine, so I&#8217;d love to get your take!</p>
<p>p.s. The flip side of the canonical issues is at least those guys know they have trusted sites now!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
