<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Natural Search Blog &#187; Spiders</title>
	<atom:link href="http://www.naturalsearchblog.com/archives/category/spiders/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.naturalsearchblog.com</link>
	<description>Thought leaders in search engine optimization weigh in with the latest SEO news and commentary</description>
	<lastBuildDate>Thu, 03 Nov 2011 14:09:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
	<copyright>2006-2009 </copyright>
	<managingEditor>pliesse@netconcepts.com (Natural Search Blog)</managingEditor>
	<webMaster>pliesse@netconcepts.com (Natural Search Blog)</webMaster>
	<ttl>1440</ttl>
	<image>
		<url>http://www.naturalsearchblog.com/images/logo.png</url>
		<title>Natural Search Blog &#187; Spiders</title>
		<link>http://www.naturalsearchblog.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle></itunes:subtitle>
	<itunes:summary>Thought leaders in search engine optimization weigh in with the latest SEO news and commentary</itunes:summary>
	<itunes:keywords></itunes:keywords>
	<itunes:category text="Society &amp; Culture" />
	<itunes:author>Natural Search Blog</itunes:author>
	<itunes:owner>
		<itunes:name>Natural Search Blog</itunes:name>
		<itunes:email>pliesse@netconcepts.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://www.naturalsearchblog.com/wp-content/plugins/podpress/images/powered_by_podpress_large.jpg" />
		<item>
		<title>Search Engine Crawling and Indexing Factors</title>
		<link>http://www.naturalsearchblog.com/archives/2009/07/05/search-engine-crawling-and-indexing-factors/</link>
		<comments>http://www.naturalsearchblog.com/archives/2009/07/05/search-engine-crawling-and-indexing-factors/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 00:20:08 +0000</pubDate>
		<dc:creator>Ravi</dc:creator>
				<category><![CDATA[Search Engine Optimization]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[backlinks]]></category>
		<category><![CDATA[content freshness]]></category>
		<category><![CDATA[crawling factors]]></category>
		<category><![CDATA[domain importance]]></category>
		<category><![CDATA[duplicate-content]]></category>
		<category><![CDATA[external links]]></category>
		<category><![CDATA[Feeds]]></category>
		<category><![CDATA[google-webmaster-tools]]></category>
		<category><![CDATA[increase crawl rate]]></category>
		<category><![CDATA[Links]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[query deserves freshness]]></category>
		<category><![CDATA[search engine crawling]]></category>
		<category><![CDATA[search engine indexing]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[supplemental index]]></category>
		<category><![CDATA[technical factors]]></category>
		<category><![CDATA[unique content]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/?p=579</guid>
		<description><![CDATA[The post today is about getting a site crawled and indexed effectively by the major search engines. It can be frustrating for a site owner to find that her newly built site with bells and whistles is just not appearing on the Google SERPs for a search query relevant to her business. It is a [...]]]></description>
			<content:encoded><![CDATA[<p>The post today is about getting a site crawled and indexed effectively by the major search engines. It can be frustrating for a site owner to find that her newly built site with bells and whistles is just not appearing on the Google SERPs for a search query relevant to her business.</p>
<p>It is a good idea to have some knowledge of the factors that influence the crawling of a site and its successful indexing before the site ranks on the SERPs. The site can be built in a user friendly way that allows the spiders to know what to crawl and how frequently to crawl.<br />
<span id="more-579"></span></p>
<p><strong>Crawling Factors</strong></p>
<ol>
<li><em>Links</em>:<br />
All major search engines crawl the web through link structures. If a site has a good link structure starting broadly from the top and going down into the category and sub category level, with all the money pages three to four clicks away from the home page, the bots would find crawling the site a lot easy. Placing a sitemap on the home page further assists the bots to find all the content on the site.
</li>
<li><em>Content Freshness and Updates:</em><br />
This is one of the best ways to keep the bots coming back to your site regularly. It is vital to have fresh content updated regularly on a site. A blog will go a long way in achieving this. To a googlebot, new content is a sign of attaching more importance to the site by visiting it more often.</p>
<p>There is a Query Deserves Freshness (QDF) component in Google&#8217;s algorithm that awards sites with updated content (news sites for example) that invites the bots back to the site for repeated crawling and indexing.
</li>
<li><em>Feeds</em>:<br />
If a site has a regularly updated blog or fresh articles posted on it at regular intervals, it would be ideal to have a feed and export it. Google Blog Search and feed tracking help in increasing the crawl activity. When a new post or article is published on the site, the search engine is pinged to let it know that the content has been updated.
</li>
<li><em>Importance of Domain</em>:<br />
A powerful domain that has good quality links coming in from diverse trustworthy domains is very important and it affects both the crawl rate and indexing of the site that resides on that domain.
</li>
<li><em>Technical Factors:</em><br />
A site can have spider traps in the form of linking structures that have infinite loop system. The crawling can be interrupted by broken links. The problem of duplicate content with same content found on multiple URLs due to use of a CMS is also possible. All these factors inhibit the capacity of a bot to crawl the site exhaustively.
</li>
<li>
</li>
<li><em>Increase the Crawl Rate in Google Webmaster Tools:</em><br />
If you login to the Google webmaster tools, there is a provision to increase the spider&#8217;s crawl rate. It is a small consolation if the site is affected by problems listed above. On its own, it cannot influence the crawl rate to any extent.
</li>
</ol>
<p><strong>Crawling Factors</strong><br />
You can picture Google (Yahoo and Bing most likely) to consist of a Main Index and a Supplemental Index. The main index consists of the top 10 or 20 results served for important search queries. </p>
<p>If Google thinks a page is not relevant and of high quality, it places it in its Supplemental index. But this index is not visible. A good post by Aaron Wall on <a href="http://http://www.seobook.com/archives/002047.shtml">Supplemental index</a> will give you a better idea.</p>
<p>The third scenario is where a site&#8217;s pages can be crawled and then dropped from the index.</p>
<ol>
<li><em>Content That is Valuable and Unique</em>:<br />
To have your pages in the main index, you must provide valuable and unique content. Google is extremely good at identifying content that is unique. Gone are the days when content could be scraped and the introduction and conclusion added to make it look unique. Content that is engaging and valuable definitely play a big part in a page being part of the main index.
</li>
<li><em>Domain Importance:</em><br />
If a domain has variety of good trustworthy domains pointing to it, it helps Google retain its pages in its main index. A good example is Wikipedia. Some of its pages with just one line of content or duplicate content gets ranked at the top of the SERPs. This is due to the domain trust and authority that Wikipedia commands.
</li>
<li><em>PageRank:</em><br />
Pagerank or raw link juice is determined by the number of links pointing to your site and their importance. The internal linking structure on your site is also part of the calculation. Pagerank sculpting has become popular over the past few years to direct the link juice to important pages on a site. Links to less perceived important pages are nofollowed. <a href="http://www.seobook.com">Aaron Wall</a> has remarked that a certain Pagerank threshold is required for a page to be crawled and indexed by Google.
</li>
<li><em>External Links:</em><br />
If your site has a problem with unique quality content, is on a not so strong domain and does not have enough PR, having a few backlinks from good or lesser known domains to your money pages will be sufficient to get them indexed by Google.</p>
<p>Oftentimes, I have seen new sites insulated on the web where they have not linked out or have had a lack of incoming links. A simple step such as submitting to a local popular directory that gets crawled by the search engines regularly is enough to get the site pages indexed by Google.</p>
<p>When the search engines see backlinks to your site, they are literally votes for the site as someone on the web thinks that your site is quite important. That is a key factor that gets your site pages indexed and retained in the index.
</li>
<li><em>Other Signals:</em><br />
There is a belief among the wider search community that search and traffic volumes to a site, the number of clicks earned by pages on a site on the SERPs, average time spent on a site etc are all signals which search engines are using to retain such pages in their index. </p>
<p>If the number of visits to a site is increasing steadily and users are spending more time on the site, it is logical to assume that such pages will be found relevant to search queries and retained in the search engines&#8217; index. There is no confirmation officially from the search engines themselves to this effect. There is always the prospect of the signals getting noisy over time.
</li>
</ol>
<p>Rand has done a great video on this in his Whiteboard Friday post on <a href="http://www.seomoz.org/blog/whiteboard-friday-crawling-indexing">Search Engine Crawling and Indexing</a> factors.</p>
<p>Ravi Venkatesan is a senior <a href="http://www.netconcepts.co.nz/natural-search-marketing-seo/">SEO consultant</a> at Netconcepts, a well established <a href="http://www.netconcepts.co.nz/natural-search-marketing-seo/">Auckland SEO company</a> that has a great track record of optimising client sites for organic search and delivering great results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2009/07/05/search-engine-crawling-and-indexing-factors/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Yahoo&#8217;s Recent Spider Improvement Beats Google&#8217;s</title>
		<link>http://www.naturalsearchblog.com/archives/2007/06/06/yahoos-recent-spider-improvement-beats-googles/</link>
		<comments>http://www.naturalsearchblog.com/archives/2007/06/06/yahoos-recent-spider-improvement-beats-googles/#comments</comments>
		<pubDate>Wed, 06 Jun 2007 15:08:35 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[bot-detection]]></category>
		<category><![CDATA[bots]]></category>
		<category><![CDATA[Googlebot]]></category>
		<category><![CDATA[slurp]]></category>
		<category><![CDATA[spidering]]></category>
		<category><![CDATA[user-agents]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2007/06/06/yahoos-recent-spider-improvement-beats-googles/</guid>
		<description><![CDATA[Yahoo!&#8217;s Search Blog announced yesterday that they were making some final changes to their spider, (named &#8220;Slurp&#8221;), standardizing their crawlers to provide a common DNS signature for identification/authorization purposes. Previously, Slurp&#8217;s requests may have come from IP addresses associated with inktomisearch.com, and now they should all come from IPs associated with domains in this standard [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://farm2.static.flickr.com/1156/533230958_38914f7e6b_t.jpg" alt="Googlebot Spider" align="right" border="0" height="100" width="100" /></p>
<p>Yahoo!&#8217;s Search Blog <a href="http://www.ysearchblog.com/archives/000460.html" title="Yahoo! Search Blog" target="_blank">announced yesterday</a> that they were making some final changes to their spider, (named &#8220;Slurp&#8221;), standardizing their crawlers to provide a common DNS signature for identification/authorization purposes.</p>
<p>Previously, Slurp&#8217;s requests may have come from IP addresses associated with inktomisearch.com, and now they should all come from IPs associated with domains in this standard syntax:</p>
<blockquote><p><strong>[something].crawl.yahoo.net</strong></p></blockquote>
<p><span id="more-221"></span></p>
<p>What will this mean to most of us? In most cases, likely nothing. Most sites out there are not likely to be currently performing reverse DNS lookups to check if search engine spiders are actually coming from the IPs/Domains they&#8217;re supposed to, except when those spiders get really impolite in requesting too many pages per second. Most people are only identifying bots by their User-Agent strings.</p>
<p>In fact, Yahoo&#8217;s provision of this authoritative bot ID syntax is more advanced than Google&#8217;s! Google only <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=33577&amp;topic=8460" title="Google Help on Googlebot" target="_blank">recommends</a> that people identify their bot (aka &#8220;Googlebot&#8221;) solely through the User-Agent String &#8212; a bit unsatisfactory for a lot of webmasters out there. I&#8217;ve heard quite a number of webmasters ask what IP address block to expect the Googlebot requests to originate from, and Google wouldn&#8217;t provide them with an authoritative answer.</p>
<p>Of course, one could take a visiting bot&#8217;s IP address, say &#8220;66.249.65.69&#8243;, and perform a Network WHOIS lookup on it to find out if it&#8217;s in a block owned by Google. The Network Whois for 66.249.65.69 returns the following info (lookup info provided by <a href="http://centralops.net/co/" title="Domain Dossier - DNS lookup and Network WHOIS" target="_blank">Hexillion&#8217;s Domain Dossier</a>) :</p>
<blockquote><p>OrgName:    Google Inc.<br />
OrgID:      GOGL<br />
Address:    1600 Amphitheatre Parkway<br />
City:       Mountain View<br />
StateProv:  CA<br />
PostalCode: 94043<br />
Country:    US</p>
<p>NetRange:   66.249.64.0 &#8211; 66.249.95.255<br />
CIDR:       66.249.64.0/19<br />
NetName:    GOOGLE<br />
NetHandle:  NET-66-249-64-0-1<br />
Parent:     NET-66-0-0-0-0<br />
NetType:    Direct Allocation<br />
NameServer: NS1.GOOGLE.COM<br />
NameServer: NS2.GOOGLE.COM<br />
NameServer: NS3.GOOGLE.COM<br />
NameServer: NS4.GOOGLE.COM<br />
Comment:<br />
RegDate:    2004-03-05<br />
Updated:    2007-04-10</p>
<p>OrgTechHandle: ZG39-ARIN<br />
OrgTechName:   Google Inc.<br />
OrgTechPhone:  +1-650-318-0200<br />
OrgTechEmail:  <a  rel="nofollow" id="sto_emailShroud1" href="http://www.somethinkodd.com/emailshroud/emailaddress.php?domainName=google.com&amp;userName=arin-contact&amp;ver=2.2.0" >arin-contact</a></p></blockquote>
<p>While webmasters could do this lookup for requests for bots displaying the Googlebot user-agent string, it&#8217;s still very unsatisfactory because Google does not state that all requests necessarily come from IP blocks that are identifiably owned by Google. So, webmasters would be nervous about blocking something that claimed to be Googlebot yet came from non-Google IP address ranges. After all, it&#8217;s possible that Google could have purchased IP addresses and domain names through a proxy in order to perform various types of investigative page requests on sites.</p>
<p>There are cases where hostile dataminers will set their user-agent strings up to masquerade as major search engine spiders, so this newly authoritative method for IDing the bots places Yahoo one step ahead of the game for those webmasters who feel the need to ban the badguys who are scraping their site&#8217;s content or requesting pages fast enough to be a defacto denial of service attack.</p>
<p align="center">. . . . . . . . . . . . . . . . . . . .</p>
<p><strong><font color="red">UPDATE:</font></strong> <a href="http://incredibill.blogspot.com/" title="incrediBILL's blog">incrediBILL</a>, one of the moderators at WebmasterWorld, kindly pointed out to me that Matt Cutts had provided the same sort of <a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" title="Googlebot Authentication Method" target="_blank">Googlebot authentication method</a> via the Webmaster Central Blog not long ago. I wish that Google would update their webmaster help section to reflect the same information, if this is indeed intended to be a trustworthy method for authenticating Googlebot. With the instruction only to be found in the blog and not in the actual help section, it still leaves one with the uncomfortable feeling that it&#8217;s perhaps an informal method and might still not be depended upon to be true for all cases or it could abruptly change. Hopefully, they&#8217;ll update the help pages so everything will be in sync!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2007/06/06/yahoos-recent-spider-improvement-beats-googles/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dupe Content Penalty a Myth, but Negative Effects Are Not</title>
		<link>http://www.naturalsearchblog.com/archives/2007/03/18/dupe-content-penalty-a-myth-but-negative-effects-are-not/</link>
		<comments>http://www.naturalsearchblog.com/archives/2007/03/18/dupe-content-penalty-a-myth-but-negative-effects-are-not/#comments</comments>
		<pubDate>Sun, 18 Mar 2007 14:37:46 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Content Optimization]]></category>
		<category><![CDATA[Search Engine Optimization]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Site Structure]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[URLs]]></category>
		<category><![CDATA[duplicate-content]]></category>
		<category><![CDATA[Duplicate-Content-Penalization]]></category>
		<category><![CDATA[Jill-Whalen]]></category>
		<category><![CDATA[URL-Optimization]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2007/03/18/dupe-content-penalty-a-myth-but-negative-effects-are-not/</guid>
		<description><![CDATA[I was interested to read a column by Jill Whalen this past week on &#8220;The Duplicate Content Penalty Myth&#8221; at Search Engine Land. While I agree with her assessment that there really isn&#8217;t a Duplicate Content Penalty per se, I think she perhaps failed to address one major issue affecting websites in relation to this. [...]]]></description>
			<content:encoded><![CDATA[<p>I was interested to read a column by Jill Whalen this past week on &#8220;<a href="http://searchengineland.com/070315-100022.php" title="The Duplicate Content Penalty Myth">The Duplicate Content Penalty Myth</a>&#8221; at Search Engine Land. While I agree with her assessment that there really isn&#8217;t a Duplicate Content Penalty per se, I think she perhaps failed to address one major issue affecting websites in relation to this.</p>
<p>Read on to see what I mean.</p>
<p><span id="more-174"></span></p>
<p align="center"><a href="http://www.flickr.com/photos/silvery/424506953/" title="Hercules fights the Duplicate Content beast"><img border="0" width="328" src="http://farm1.static.flickr.com/184/424506953_2adf5c7037.jpg" alt="Hercules Fights the Original Duplicate Content Beast" height="500" /></a></p>
<p>Sure she&#8217;s right in that webmasters don&#8217;t have to be afraid if their applications have created multiple page URLs which all contain identical or near-identical content. Websites do this all the time, and search engines aren&#8217;t penalizing them for it. (Except perhaps for the case of page-scrapers who steal other sites&#8217; content for redisplay &#8212; in which case a scraper&#8217;s page might get penalized or just ranked lower as being non-authoritative for its content.) But, webmasters *do* still need to be concerned with duplicate content, because it can affect their overall traffic and rankings.</p>
<p>Quite simply, PageRankÂ continues to beÂ one big factor in ranking one page versus another for keyword searches. Most sites only have so much PageRank to spend on all the pages in their site. If you double the number of pages on your site, you may be virtually cutting each page&#8217;s PageRank in half when you do it. If you deploy duplicate copies of all your pages willy-nilly, you&#8217;ll have watered-downÂ your pages&#8217; PageRank scores for no good reason.</p>
<p>I wish Jill had mentioned this &#8212; dupe content may not cause a website to be penalized, but it&#8217;s still an important factor for the sake of improving/optimizing a site&#8217;s pages to rank better and bring in more traffic. Her article seems to leave one with the feeling that since there&#8217;s not a penalization, webmasters just don&#8217;t need to worry about duplication at all.</p>
<p>I say, what do webmasters care if it&#8217;s called &#8220;penalization&#8221; or not, if the end result is still unnecessarily lower rankings in SERPs?Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â </p>
<p>If you don&#8217;t know what duplicate content may be, you should know that there are a number of things which can cause it to occur in web applications. Primarily, if you have multiple different URLs which all present the same page content, and all of these URLs can be found and indexed by search engine spiders, then you have a duplicate content problem. Here&#8217;s some common examples:</p>
<ul>
<li><strong>http://example.com</strong> &amp; <strong>http://www.example.com</strong> both present pages for users. As the homepage of your site, if both are indexed by search engines with no qualification, they effectively will split your homepage PR. If subpages are also indexed on both domains, it splits the PR of any of those pages on your site, too. You need one, canonical (i.e. &#8220;official&#8221;) domain for your site.<br />
Â Â </li>
<li><strong>http://www.example.com/index.html</strong> is the same as <strong>http://www.example.com/</strong> &#8212; if your site developers linked to the homepage indescriminantly, using both of these types of URLs, they&#8217;ve split your homepage&#8217;s PageRank.<br />
Â Â </li>
<li><strong>http://www.example.com/?UID=A7WF5681HJF145I</strong> &#8212; if your site uses sessionizing in ubiquitous querystrings &#8212; assigning session IDs for users for personalization and such &#8212; there could be hundreds of different URLs indexed for pages, causing loads of PageRank split through duplication.<br />
Â Â </li>
<li><strong>www.example.com/app.jsp?DATE=3/17/07&amp;PageID=6</strong> is the same as:<br />
<strong>www.example.com/app.jsp?PageID=6&amp;DATE=3/17/07</strong> and the same as:<br />
<strong>www.example.com/app.jsp?PageID=6&amp;DATE=3/17/07&amp;Link=ad</strong><br />
- if a page&#8217;s URL has multiple querystring terms and different links point into it with the terms in different order, it can create duplication.</li>
</ul>
<p>These are just some examples &#8212; there are many more cases possible.</p>
<p>There are a handful of ways you can fight duplication problems or mitigate their effects on PageRank:</p>
<ul>
<li>Fix your site/application so that only one URL per page of content will occur or be found by spiders;<br />
Â Â </li>
<li>Make the application deliver up NOINDEX metatags for alternate page URLs;<br />
Â Â </li>
<li>Move user session IDs out of page querystrings and into persistent cookies;<br />
Â Â </li>
<li>Place 301 redirects on alternative page URLs, redirecting over to the permanant, primary page URL;<br />
Â Â </li>
<li>If setting up special querystringed page URLs for tracking media campaigns, place those URLs in a special subdirectory on your site which you&#8217;ve specified in your robots.txt file for search engines to not index;</li>
</ul>
<p>There are a number of other solutions out there, depending upon what your duplication problem may be. In most cases, fixing duplication is going to be a bit of a technical clean-up job, but the benefit to your overall page rankings and referral traffic may be significant.</p>
<p>You don&#8217;t need to worry about being &#8220;penalized&#8221; for duplicate content within your site &#8212; you&#8217;re not going to be delisted for it. But *do* worry about how it affects the SERP rankings for your content.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2007/03/18/dupe-content-penalty-a-myth-but-negative-effects-are-not/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>In other news, a new free Clinic</title>
		<link>http://www.naturalsearchblog.com/archives/2007/02/27/in-other-news-a-new-free-clinic/</link>
		<comments>http://www.naturalsearchblog.com/archives/2007/02/27/in-other-news-a-new-free-clinic/#comments</comments>
		<pubDate>Wed, 28 Feb 2007 04:36:09 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Content Optimization]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[HTML Optimization]]></category>
		<category><![CDATA[Link Building]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[Search Engine Optimization]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Site Structure]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[help]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[SEO-consulting]]></category>
		<category><![CDATA[SEO-critiques]]></category>
		<category><![CDATA[website-design]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2007/02/27/in-other-news-a-new-free-clinic/</guid>
		<description><![CDATA[Search Engine Journal today openedÂ free SEO ClinicÂ for sites in need of optimization or with specific challenges that have not been overcome. AÂ group of leading SEOsÂ including Carsten Cumbrowski, Ahmed Bilal, and Rhea Drysdale will review one submission per week delivering a thorough reviewÂ of usability and site navigation, link building,Â and copywriting from the perspective of placement in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.searchenginejournal.com/" target="_blank"><font color="#5588aa">Search Engine Journal</font></a> today openedÂ free SEO ClinicÂ for sites in need of optimization or with specific challenges that have not been overcome.</p>
<p>AÂ group of leading SEOsÂ including Carsten Cumbrowski, Ahmed Bilal, and Rhea Drysdale will review one submission per week delivering a thorough reviewÂ of usability and site navigation, link building,Â and copywriting from the perspective of placement in the four leading engines (Google, Yahoo!, MSN and Ask).</p>
<p>It&#8217;s clear though that &#8220;free&#8221; is as free as having your site criticized in one of the SEO clinics experts like to host at conferences.Â  If chosen for review, the findings and recommendations will be posted for others to peruse.Â  I&#8217;d do as much myself and appreciate their efforts to help others with these case studies but as a website owner, someone responsible for SEO, or marketing manager for a major brand, I mightÂ not be so inclined to have my successes and failures outlined in detail for everyone to see.Â  That concern aside, I do hope they get some quality sites and develop a thorough library of reviews (perhaps I&#8217;ll sign up myself!).</p>
<p>To participate, simply <a href="http://www.searchenginejournal.com/?p=4458" target="_blank"><font color="#5588aa">contact the team here</font></a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2007/02/27/in-other-news-a-new-free-clinic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AdSense Spider Cross-Pollinates for Google</title>
		<link>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/</link>
		<comments>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/#comments</comments>
		<pubDate>Thu, 20 Apr 2006 02:05:24 +0000</pubDate>
		<dc:creator>Chris</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[AdSense]]></category>
		<category><![CDATA[bots]]></category>
		<category><![CDATA[Googlebot]]></category>
		<category><![CDATA[Robots.txt]]></category>
		<category><![CDATA[URL-submission]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/</guid>
		<description><![CDATA[A few bloggers such as Jenstar have just posted that pages spidered by Google&#8217;s AdSense bot are appearing in Google&#8217;s regular search results pages. Shoemoney just blogged that Matt Cutts has officially verified that this is happening, saying that this was done so that they wouldn&#8217;t have to spider the same content twice, and that [...]]]></description>
			<content:encoded><![CDATA[<p>A few bloggers such as Jenstar have <a href="http://www.jensense.com/archives/2006/04/matt_cutts_conf.html">just posted</a> that pages spidered by Google&#8217;s AdSense bot are appearing in Google&#8217;s regular search results pages. <a href="http://www.shoemoney.com/2006/04/18/matt-cutts-confirms-media-bot-crawling-for-big-daddy">Shoemoney just blogged</a> that Matt Cutts has officially verified that this is happening, saying that this was done so that they wouldn&#8217;t have to spider the same content twice, and that Google did this as part of their recent Big Daddy infrastructure improvements.</p>
<p>This has a couple of interesting ramifications for SEO professionals and those of us who are optimizing our sites for Google, since bot detection systems may now need to be updated and since this may essentially be a new way of committing site/page submissions into Google&#8217;s indices.  And we all thought automated URL submissions were dead!  I&#8217;ll explain further&#8230;.<span id="more-87"></span></p>
<p>First of all, quite a few people like to track the movements of bots through their site pages, in order to know the frequency of spider visits, and to confirm that a page has been spidered, period. For sites/pages which have frequent updates happening upon them, it&#8217;s also useful to know the date/time the page gets re-spidered and then to see when the updated text will typically appear in the SERPs. Also, some folks have set their robots.txt to disallow spiders into sections of their sites for various reasons.</p>
<p>So, this change in Google&#8217;s spidering functionality will be important for you if you have AdSense ads running on your site. You&#8217;ll want to update your robots.txt file to reflect the AdSense bot&#8217;s user agent string, and you&#8217;ll also want to make sure this user-agent string definition is present in the logfile analysis systems you&#8217;re using to track spider activity on your site.</p>
<p>Many of us are using the Web Robots Database at <a href="http://www.robotstxt.org/">Robotstxt.org</a> in order to identify bots and spiders passing through our sites, and it&#8217;s a great resource for all information about the robots exclusion protocol and related matters &#8212; Google even <a href="http://www.google.com/webmasters/bot.html">cites them as an information resource throughout their webmaster info pages</a>. However, the Robots Database has not been updated to include the definition of the AdSense bot as of the time that I&#8217;m writing this. (I&#8217;ve just reported the bot identification information over to them to add in, so hopefully this won&#8217;t be the case for long.)</p>
<p>(Some webmasters and systems are using the IP address of the bots instead of the User-Agent Strings, but I consider the preferred method to be to use the User Agent for this purpose. Otherwise, you risk counting a search engine employee who is browsing your site during their coffee break to be their spider visiting you!)</p>
<p>Matt Cutts is apparently informally referring to this bot as &#8220;Mediabot&#8221; or &#8220;Media Bot&#8221;, but the bot is currently declaring itself with this User-Agent String:</p>
<blockquote><p><font face="courier">Mediapartners-Google/2.1</font></p></blockquote>
<p>If you want to specifically disallow this bot from some section of your site, you should wildcard the bot version number at the end in your robots.txt file like this:</p>
<blockquote><p><font face="courier">User-agent: Mediapartners-Google*<br />
Disallow: /dont-crawl-this-uri-on-my-site</font></p></blockquote>
<p>Another interesting point is raised due to the indexing of the Mediabot-spidered content:  doesn&#8217;t this basically provide a new way to, errrr, <strong>automatically submit pages to Google</strong>?!?</p>
<p>If you manually submit your site to Google using <a href="http://www.google.com/addurl/?continue=/addurl">their submission form</a>, you&#8217;re only allowed to provide the top-level domain name of your site. Using that method, Googlebot will initially visit your homepage, and likely only crawl through one or two levels of links out from the homepage in that initial spidering visit. If you&#8217;ve got a really deep site with thousands of pages of content, Googlebot might later revisit the site to try to spider more deeply, in a widening circle out from the homepage.  This existing process could utlimately take quite some time before all of your content gets spidered and can begin appearing in the SERPs.</p>
<p>I&#8217;m thinking that for cases like that where you have a lot of pages on a new or non-indexed site, adding the Google ads onto all your pages might actually result in them getting initially spidered more rapidly.</p>
<p>Also, quite a lot more pages could potentially get indexed if they have the ads on them, since there are situations where Googlebot will not or cannot spider pages on sites.  For instance, if your site content is accessible primarily only through a submission form on your homepage, or through Java/Javascripted menus, or you have only some Flash-enabled navigation system (of course, no SEO professional worth his or her salt would use a Flash-only nav system!) &#8212; if your site pages aren&#8217;t navigable through regular links displayed on your pages, Googlebot would otherwise never find and index their content.</p>
<p>But, now pages that were only accessible to users through a submission form on your site could potentially now get indexed and appear in SERPs if they have AdSense ads on them!</p>
<p>This is mostly a good thing for those of us working hard to expose content through the SEs, but I bet it could create some havoc for webmasters who are AdSense publishers and who are taken unaware by the potential sudden influx in traffic which pounds their databases as pages suddenly become visible in SERPs.  Fun problem to have, though!</p>
<p>Google apparently uses other bots specifically for harvesting other types of content as well.  Two that I&#8217;ve come across include a bot which grabs images from websites to use in their Google Images section, and a bot for gathering RSS feeds for use in personalized Google homepage, or in the Google Reader.</p>
<p>These are also not identified in Robotstxt.org yet, but their User-Agent strings are as follows:</p>
<blockquote><p><font face="courier">User-agent: Googlebot-Image</font><font face="courier">User-agent: Feedfetcher-Google</font></p></blockquote>
<p>Note: Feedfetcher <strong>ignores</strong> robots.txt exclusion files! Google does this because:</p>
<blockquote><p><em>Feedfetcher retrieves feeds only after users have explicitly added them to their Google homepage or Google Reader. Feedfetcher behaves as a direct agent of the human user, not as a robot, so it ignores robots.txt entries. Feedfetcher does have one special advantage, though: because it&#8217;s acting as the agent of multiple users, it conserves bandwidth by making requests for common feeds only once for all users.</em></p></blockquote>
<p>Do you know if Google is using other specialized bots for other sections of their site or other types of media? If so, I&#8217;d be interested in hearing about it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bloody hell, thatâ€™s a lot of information</title>
		<link>http://www.naturalsearchblog.com/archives/2004/12/13/bloody-hell-thats-a-lot-of-information/</link>
		<comments>http://www.naturalsearchblog.com/archives/2004/12/13/bloody-hell-thats-a-lot-of-information/#comments</comments>
		<pubDate>Mon, 13 Dec 2004 22:09:37 +0000</pubDate>
		<dc:creator>stephan</dc:creator>
				<category><![CDATA[Reference Material]]></category>
		<category><![CDATA[Spiders]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2004/12/13/bloody-hell-thats-a-lot-of-information/</guid>
		<description><![CDATA[My feeling of technogeek euphoria that I got last month when Google doubled the size of their index has quickly evaporated as I perused Berkeley&#8217;s &#8220;How Much Information&#8221; study. Here&#8217;s some stats that will blow you away: The World Wide Web contains 167 terabytes of Web pages on its &#8220;surface&#8221; (i.e. fixed web pages); in [...]]]></description>
			<content:encoded><![CDATA[<p>My feeling of technogeek euphoria that I got last month when <a href="http://www.stephanspencer.com/archives/2004/11/14/googles-index-hits-8-billion-pages-yes-folks-size-does-matter/">Google doubled the size of their index</a> has quickly evaporated as I perused <a href="http://www.sims.berkeley.edu/research/projects/how-much-info-2003/">Berkeley&#8217;s &#8220;How Much Information&#8221; study</a>. Here&#8217;s some stats that will blow you away:</p>
<ul>
<li>The World Wide Web contains 167 terabytes of Web pages on its &#8220;surface&#8221; (i.e. fixed web pages); in volume this is  seventeen times the size of the Library of Congress print collections. Plus another 91,850 terabytes of data in the &#8220;deep web&#8221; (from database driven websites that create web pages on demand)</li>
<li>Email generates about 400,000 terabytes  of new information each year worldwide.</li>
<li>The amount of new information  stored on paper, film, magnetic, and optical media has about doubled in the last three years.</li>
<li>Print, film, magnetic, and optical storage  media produced about 5 exabytes of new information in 2002. Ninety-two  percent of the new information was stored on magnetic media,  mostly in hard disks. Five exabytes of information is equivalent in size to the information  contained in 37,000 new libraries the size of the Library  of Congress book collections.</li>
</ul>
<p>What I found even more amazing (and depressing) is the degree to which we consume this data. We are a society of<br />
information junkies. Witness this from the same report:</p>
<blockquote><p>
Published  studies on media use say that the average American adult uses  the telephone 16.17 hours a month, listens to radio 90 hours  a month, and watches TV 131 hours a month.  About 53% of the U.S. population uses the  Internet, averaging 25 hours and 25 minutes a month at home,  and 74 hours and 26 minutes a month at work &mdash; about 13% of the  time.
</p></blockquote>
<p>I can&#8217;t imagine sitting in front of the &#8216;idiot box&#8217; for 131 hours a month. What a terrible waste of one&#8217;s life. For an average person, that&#8217;s something like 7 years of your life &mdash; gone.
</p>
<p>Dave of the excellent <a href="http://www.passingnotes.com">PassingNotes.com</a> blog <a href="http://www.passingnotes.com/index.php/we-really-are-drinking-from-a-fire-hose-ouch/">looks at it</a> this way:</p>
<blockquote><p>
IF you were all of those things, then of the 720 average hours in a given month, of which you should be sleeping circa 200 (give or take a few hundred), then you&#8217;d basically be occupied by media (in some form) for over 330 hours per month &#8211; and since we spend about one-third of our lives &#8216;waiting for something to happen&#8217; (bus, phone etc) and about another 20-40 hours per month in a bathroom (much higher for ted kennedy), then discount sleep, and you&#8217;ve got about 80ish hours to be a genuine, sentient human being&#8230;sad, sad world&#8230;</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2004/12/13/bloody-hell-thats-a-lot-of-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is your site unfriendly to search engine spiders like MSNBot?</title>
		<link>http://www.naturalsearchblog.com/archives/2004/11/21/is-your-site-unfriendly-to-search-engine-spiders-like-msnbot/</link>
		<comments>http://www.naturalsearchblog.com/archives/2004/11/21/is-your-site-unfriendly-to-search-engine-spiders-like-msnbot/#comments</comments>
		<pubDate>Sun, 21 Nov 2004 11:19:36 +0000</pubDate>
		<dc:creator>stephan</dc:creator>
				<category><![CDATA[Dynamic Sites]]></category>
		<category><![CDATA[Spiders]]></category>
		<category><![CDATA[URLs]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2004/11/21/is-your-site-unfriendly-to-search-engine-spiders-like-msnbot/</guid>
		<description><![CDATA[Microsoft blogger Eytan Seidman on their MSN Search blog offers some very useful specifics on what makes a site crawler unfriendly, particularly to MSNBot: An example of a page that might look &#8220;unfriendly&#8221; to a crawler is one that looks like this: http://www.somesite.com/info/default.aspx?view=22&#038;tab=9&#038;pcid=81-A4-76&#038;section=848&#038;origin=msnsearch&#038;cookie=false&#8230;.URL&#8217;s with many (definitely more than 5) query parameters have a very low [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft blogger Eytan Seidman on their MSN Search blog offers some very useful specifics on <a href="http://blogs.msdn.com/msnsearch/archive/2004/11/18/266087.aspx">what makes a site crawler unfriendly</a>, particularly to MSNBot:</p>
<blockquote><p>
An example of a page that might look &#8220;unfriendly&#8221; to a crawler is one that looks like this: http://www.somesite.com/info/default.aspx?view=22&#038;tab=9&#038;pcid=81-A4-76&#038;section=848&#038;origin=msnsearch&#038;cookie=false&#8230;.URL&#8217;s with many (definitely more than 5) query parameters have a very low chance of ever being crawled&#8230;.If we need to traverse through eight pages on your site before finding leaf pages that nobody but yourself points to, MSNBot might choose not to go that far. This is why many people recommend creating a site map and we would as well.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2004/11/21/is-your-site-unfriendly-to-search-engine-spiders-like-msnbot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s index hits 8 billion pages. Yes folks, size does matter.</title>
		<link>http://www.naturalsearchblog.com/archives/2004/11/14/googles-index-hits-8-billion-pages-yes-folks-size-does-matter/</link>
		<comments>http://www.naturalsearchblog.com/archives/2004/11/14/googles-index-hits-8-billion-pages-yes-folks-size-does-matter/#comments</comments>
		<pubDate>Mon, 15 Nov 2004 04:23:14 +0000</pubDate>
		<dc:creator>stephan</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Research and Development]]></category>
		<category><![CDATA[Spiders]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2004/11/14/googles-index-hits-8-billion-pages-yes-folks-size-does-matter/</guid>
		<description><![CDATA[On Wednesday, the day before Microsoft unveiled the beta of Microsoft Search, Google announced that their index was now over eight billion pages strong. Impeccable timing from the Googleplex. Just a couple days later, and Microsoft could have proudly touted its bigger web page index over Google&#8217;s. Still, Microsoft&#8217;s 5 billion documents is an impressive [...]]]></description>
			<content:encoded><![CDATA[<p>On Wednesday, the day before Microsoft unveiled the beta of Microsoft Search, Google announced that their index was now over eight billion pages strong. Impeccable timing from the Googleplex. Just a couple days later, and Microsoft could have proudly touted its bigger web page index over Google&#8217;s. Still, Microsoft&#8217;s 5 billion documents is an impressive feat, particularly for a new search engine just out of the blocks. Google continues to show their market dominance, however, with a database of a whopping 8,058,044,651 web pages. Poor Microsoft, trumped by Google at the last minute!</p>
<p>Why the big deal about index size? From the user&#8217;s perspective, a search engine that is comprehensive of the Web in its entirety is going to be more useful than one whose indexation is patchy. Which is why I think the Overture Site Match paid inclusion program from Yahoo! is a really bad idea. Sites shouldn&#8217;t pay the search engine to be indexed. Rather, the search engine should strive to index as much of the Web as possible because that makes for a better search engine.</p>
<p>Indeed, I see Google&#8217;s announcement as a landmark in the evolution of search engines. Search engine spiders have historically had major problems with &#8220;spider traps&#8221; &mdash; dynamic database-driven websites that serve up identical or nearly identical content at varying URLs (e.g. when there is a session ID in the URL). Alas, search engines couldn&#8217;t find their way through this quagmire without severe duplication clogging up their indices. The solution for the search engines was to avoid dynamic sites, to a large degree &mdash; or at least to approach them with caution. Over time, however, the sophistication of the spidering and indexing algorithms has improved to the point that search engines (most notably, Google) have been able to successfully index a plethora of previously un-indexed content and minimize the amount of duplication. And thus, the &#8220;Invisible Web&#8221; begins to shrink. Keep it up, Google and Microsoft!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2004/11/14/googles-index-hits-8-billion-pages-yes-folks-size-does-matter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Store makeover still not wooing the spiders</title>
		<link>http://www.naturalsearchblog.com/archives/2004/10/05/google-store-makeover-still-not-wooing-the-spiders/</link>
		<comments>http://www.naturalsearchblog.com/archives/2004/10/05/google-store-makeover-still-not-wooing-the-spiders/#comments</comments>
		<pubDate>Tue, 05 Oct 2004 11:57:11 +0000</pubDate>
		<dc:creator>stephan</dc:creator>
				<category><![CDATA[Dynamic Sites]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Spiders]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2004/10/05/google-store-makeover-still-not-wooing-the-spiders/</guid>
		<description><![CDATA[You may recall my observation a few months ago that the Google Store is not all that friendly to search engine spiders, including Googlebot. Now that the site has had a makeover, and the session IDs have been eliminated from the URLs, the many tens of thousands of duplicate pages have dropped to a mere [...]]]></description>
			<content:encoded><![CDATA[<p>You may recall <a href="/archives/2004/06/25/spiders-like-googlebot-choke-on-session-ids/">my observation</a> a few months ago that the <a href="http://www.googlestore.com/">Google Store</a> is not all that friendly to search engine spiders, including Googlebot. Now that the site has had a makeover, and the session IDs have been eliminated from the URLs, the many tens of thousands of duplicate pages have <a href="http://www.google.com/search?sourceid=navclient&#038;ie=UTF-8&#038;q=site%3Awww%2Egooglestore%2Ecom&#038;num=100">dropped to a mere 144</a>. This is a good thing, since there&#8217;s only a small number of products for sale on the site. Unfortunately, a big chunk of those hundred-and-some search results lead to error pages. So even after a site rebuild, Google&#8217;s own store STILL isn&#8217;t spider friendly. And if you&#8217;re curious what the old site looked like, don&#8217;t bother checking the <a href="http://www.archive.org">Wayback Machine</a> for it. Unfortunately, the Wayback Machine&#8217;s bot has choked on the site since 2002, so <a href="http://web.archive.org/web/*/www.googlestore.com">all you&#8217;ll find</a> for the past several years are &#8220;redirect errors&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2004/10/05/google-store-makeover-still-not-wooing-the-spiders/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spiders like Googlebot choke on Session IDs</title>
		<link>http://www.naturalsearchblog.com/archives/2004/06/25/spiders-like-googlebot-choke-on-session-ids/</link>
		<comments>http://www.naturalsearchblog.com/archives/2004/06/25/spiders-like-googlebot-choke-on-session-ids/#comments</comments>
		<pubDate>Sat, 26 Jun 2004 05:56:19 +0000</pubDate>
		<dc:creator>stephan</dc:creator>
				<category><![CDATA[HTML Optimization]]></category>
		<category><![CDATA[Spiders]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2004/06/25/spiders-like-googlebot-choke-on-session-ids/</guid>
		<description><![CDATA[Many ecommerce sites have session IDs or user IDs in the URL of their pages. This tends to cause either the pages to not get indexed by search engines like Google, or to cause the pages to get included many times over and over, clogging up the index with duplicates (this phenonemon is called a [...]]]></description>
			<content:encoded><![CDATA[<p>Many ecommerce sites have session IDs or user IDs in the URL of their pages. This tends to cause either the pages to not get indexed by search engines like Google, or to cause the pages to get included many times over and over, clogging up the index with duplicates (this phenonemon is called a &#8220;spider trap&#8221;). Furthermore, having all these duplicates in the index causes the site&#8217;s importance score, known as PageRank, to be spread out across all these duplicates (this phenonemon is called &#8220;PageRank dilution&#8221;).</p>
<p>Ironically, Googlebot regularly gets caught in a spider trap while spidering one of its own sites &#8211; the <a target="_blank" href="http://www.googlestore.com">Google Store</a> (where they sell branded caps, shirts, umbrellas, etc.). The URLs of the store are not very search engine friendly: they and are overly complex, and include session IDs. This has resulted in <a target="_blank" href="http://www.google.com/search?hl=en&#038;lr=&#038;ie=UTF-8&#038;q=inurl%3AAccessories.html+site%3Agooglestore.com&#038;btnG=Search">3,440</a> duplicate copies of the Accessories page and <a target="_blank" href="http://www.google.com/search?hl=en&#038;lr=&#038;ie=UTF-8&#038;q=inurl%3AOffice.html+site%3Agooglestore.com&#038;btnG=Search">3,420</a> copies of the Office page, for example.</p>
<p>If you have a dynamic, database-driven website and you want to avoid your own site becoming a spider trap, you&#8217;ll need to keep your URLs simple. Try to avoid having any ?, &#038;, or = characters in the URLs. And try to keep the number of &#8220;parameters&#8221; to a minimum. With URLs and search engine friendliness, less is more.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2004/06/25/spiders-like-googlebot-choke-on-session-ids/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

