<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Natural Search Blog &#187; URL-submission</title>
	<atom:link href="http://www.naturalsearchblog.com/tag/url-submission/rss2" rel="self" type="application/rss+xml" />
	<link>http://www.naturalsearchblog.com</link>
	<description>Thought leaders in search engine optimization weigh in with the latest SEO news and commentary</description>
	<pubDate>Thu, 09 Oct 2008 19:12:37 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
	<language>en</language>
		<!-- podcast_generator="podPress/8.8" -->
		<copyright>&#xA9; </copyright>
		<managingEditor>chris@netconcepts.com ()</managingEditor>
		<webMaster>chris@netconcepts.com()</webMaster>
		<category></category>
		<itunes:keywords></itunes:keywords>
		<itunes:subtitle></itunes:subtitle>
		<itunes:summary>Thought leaders in search engine optimization weigh in with the latest SEO news and commentary</itunes:summary>
		<itunes:author></itunes:author>
		<itunes:category text="Society &amp; Culture"/>
		<itunes:owner>
			<itunes:name></itunes:name>
			<itunes:email>chris@netconcepts.com</itunes:email>
		</itunes:owner>
		<itunes:block>No</itunes:block>
		<itunes:explicit>no</itunes:explicit>
		<itunes:image href="http://www.naturalsearchblog.com/wp-content/plugins/podpress/images/powered_by_podpress_large.jpg" />
		<image>
			<url>http://www.naturalsearchblog.com/wp-content/plugins/podpress/images/powered_by_podpress.jpg</url>
			<title>Natural Search Blog</title>
			<link>http://www.naturalsearchblog.com</link>
			<width>144</width>
			<height>144</height>
		</image>
		<item>
		<title>To Use Sitemaps, or Not To Use Sitemaps, That&#8217;s the Question</title>
		<link>http://www.naturalsearchblog.com/archives/2006/09/19/to-use-sitemaps-or-not-to-use-sitemaps-thats-the-question/</link>
		<comments>http://www.naturalsearchblog.com/archives/2006/09/19/to-use-sitemaps-or-not-to-use-sitemaps-thats-the-question/#comments</comments>
		<pubDate>Wed, 20 Sep 2006 05:06:28 +0000</pubDate>
		<dc:creator>Chris Silver Smith</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[Site Structure]]></category>

		<category><![CDATA[Yahoo]]></category>

		<category><![CDATA[Google-Sitemaps]]></category>

		<category><![CDATA[google-webmaster-tools]]></category>

		<category><![CDATA[Sitemaps]]></category>

		<category><![CDATA[URL-submission]]></category>

		<category><![CDATA[yahoo-site-explorer]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2006/09/19/to-use-sitemaps-or-not-to-use-sitemaps-thats-the-question/</guid>
		<description><![CDATA[It was really great when Google launched its Sitemaps (recently renamed to Webmaster Tools, as part of their Webmaster Central utilities) - when that happened it was a really great indication of a new time where technicians who wished to help make their pages findable would not automatically be considered &#8220;evil&#8221; and the SEs might [...]]]></description>
			<content:encoded><![CDATA[<p>It was really great when Google launched its Sitemaps (recently renamed to Webmaster Tools, as part of their Webmaster Central utilities) - when that happened it was a really great indication of a new time where technicians who wished to help make their pages findable would not automatically be considered &#8220;evil&#8221; and the SEs might provide tools to help technicians disclose their pages directly. Yahoo soon followed with their own tools, named Yahoo! Site Explorer, and surely MSN will bow to peer pressure with their own submission system and tools.</p>
<p>Initially, I thought that there wasn&#8217;t significant advantage to me for using these systems, because I&#8217;d already developed good methods for providing our page links to the search engines through the natural linking found in our site navigation systems.</p>
<p>Why should I expend yet more time and resources to dynamically produce the link files?</p>
<p><span id="more-156"></span></p>
<p>I have begun using these tools, though, because there are additional features now beyond just the URL disclosure pieces. Google&#8217;s Webmaster Tools include some nice reports on errors found when indexing, top keyword reports for sites, and page content analysis. My in-house analytics systems also have a lot of this same sort of reporting of course, but I&#8217;m interested in seeing Google&#8217;s perspective on my content.</p>
<p>As the SEs add even more webmaster tools, it&#8217;s eventually going to become necessary to fully integrate with them. It could easily come to the point where large sites will have to explicitly declare all the pages they wish to have indexed, just to insure that those pages might get ranked as optimally as possible.</p>
<p>Hopefully each of the major search engines will try to employ identical or compatible formats for site URLs, because it will be a hassle to have to keep up with multiple formats. This is an area where the SEs really ought to cooperate with one another for &#8220;pro bono publico&#8221; - for the common good. Currently, Yahoo seems to be just defensively immitating Google in this arena, and no one&#8217;s showing signs of collaborating. (At the recent SES Conference in San Jose, an audience member had a question for Yahoo&#8217;s Dr. Rajat Mukherjee, but the audience member kept referring to Yahoo&#8217;s product as &#8220;Yahoo Sitemaps&#8221; instead of Site Explorer, much to the consternation of Mukherjee. Amanda and others from Google who were sitting in front of me were highly amused at the situation. It was very obvious that the parallel Google and Yahoo teams have a healthy competitive streak betwixt them.)</p>
<p>If you&#8217;re reading this, perhaps you&#8217;re trying to decide if using Google&#8217;s Webmaster Tools or Yahoo Site Explorer will be valuable to you or not. Here&#8217;s my advice:</p>
<p>Register with these services so that you can use the tools and reports they offer. If you have a site that&#8217;s not already optimized and well-indexed, use the tools to provide them with all your page URLs. While use of their services isn&#8217;t a guarantee that your pages will be ranked well in the SERPs (&#8221;search engine results pages&#8221;), it&#8217;s a sure bet that if they can&#8217;t find your pages you won&#8217;t be in the SERPs at all. This can help you make sure your pages can get indexed. This is particularly useful if you are trying to get a brand new site indexed.</p>
<p>Also, keep up with the developments at the Google and Yahoo teams, because they&#8217;re each bound to deploy more tools and features as time progresses.</p>
<p><a href="http://googlewebmastercentral.blogspot.com/">Google Webmaster Central blog</a></p>
<p><a href="http://www.ysearchblog.com/">Yahoo! Search Blog</a></p>
<p>(MSN doesn&#8217;t have a webmaster tools portal yet, though they surely will at some point.)</p>
<p>If you&#8217;re one of the development team members from the search engines, I have to tell you what the next killer app tool could be:  provide an interface that would allow us to see how our pages rank for at least 300 keywords. I know, I know &#8212; you guys don&#8217;t like providing a lot of metrics, since people will use it to test your black-box algorithms and figure out what/how various signals are used. The thing is, some people use 3rd-party software to accomplish this already (which you don&#8217;t like, since automated queries can impact performance), or some of us can hire temps to execute the searches and document the rankings. So, we&#8217;re getting this data already &#8212; you might as well provide it as a useful service to us, and to obviate the need for people or scripts to execute pointless searches.</p>
<p>I&#8217;m still very excited about the increasing functionality provided by the search engines, and I hope the trend continues.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2006/09/19/to-use-sitemaps-or-not-to-use-sitemaps-thats-the-question/feed/</wfw:commentRss>
		</item>
		<item>
		<title>AdSense Spider Cross-Pollinates for Google</title>
		<link>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/</link>
		<comments>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/#comments</comments>
		<pubDate>Thu, 20 Apr 2006 02:05:24 +0000</pubDate>
		<dc:creator>Chris Silver Smith</dc:creator>
		
		<category><![CDATA[Google]]></category>

		<category><![CDATA[Spiders]]></category>

		<category><![CDATA[AdSense]]></category>

		<category><![CDATA[bots]]></category>

		<category><![CDATA[Googlebot]]></category>

		<category><![CDATA[Robots.txt]]></category>

		<category><![CDATA[URL-submission]]></category>

		<guid isPermaLink="false">http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/</guid>
		<description><![CDATA[A few bloggers such as Jenstar have just posted that pages spidered by Google&#8217;s AdSense bot are appearing in Google&#8217;s regular search results pages. Shoemoney just blogged that Matt Cutts has officially verified that this is happening, saying that this was done so that they wouldn&#8217;t have to spider the same content twice, and that [...]]]></description>
			<content:encoded><![CDATA[<p>A few bloggers such as Jenstar have <a href="http://www.jensense.com/archives/2006/04/matt_cutts_conf.html">just posted</a> that pages spidered by Google&#8217;s AdSense bot are appearing in Google&#8217;s regular search results pages. <a href="http://www.shoemoney.com/2006/04/18/matt-cutts-confirms-media-bot-crawling-for-big-daddy">Shoemoney just blogged</a> that Matt Cutts has officially verified that this is happening, saying that this was done so that they wouldn&#8217;t have to spider the same content twice, and that Google did this as part of their recent Big Daddy infrastructure improvements.</p>
<p>This has a couple of interesting ramifications for SEO professionals and those of us who are optimizing our sites for Google, since bot detection systems may now need to be updated and since this may essentially be a new way of committing site/page submissions into Google&#8217;s indices.  And we all thought automated URL submissions were dead!  I&#8217;ll explain further&#8230;.<span id="more-123"></span></p>
<p>First of all, quite a few people like to track the movements of bots through their site pages, in order to know the frequency of spider visits, and to confirm that a page has been spidered, period. For sites/pages which have frequent updates happening upon them, it&#8217;s also useful to know the date/time the page gets re-spidered and then to see when the updated text will typically appear in the SERPs. Also, some folks have set their robots.txt to disallow spiders into sections of their sites for various reasons.</p>
<p>So, this change in Google&#8217;s spidering functionality will be important for you if you have AdSense ads running on your site. You&#8217;ll want to update your robots.txt file to reflect the AdSense bot&#8217;s user agent string, and you&#8217;ll also want to make sure this user-agent string definition is present in the logfile analysis systems you&#8217;re using to track spider activity on your site.</p>
<p>Many of us are using the Web Robots Database at <a href="http://www.robotstxt.org/">Robotstxt.org</a> in order to identify bots and spiders passing through our sites, and it&#8217;s a great resource for all information about the robots exclusion protocol and related matters &#8212; Google even <a href="http://www.google.com/webmasters/bot.html">cites them as an information resource throughout their webmaster info pages</a>. However, the Robots Database has not been updated to include the definition of the AdSense bot as of the time that I&#8217;m writing this. (I&#8217;ve just reported the bot identification information over to them to add in, so hopefully this won&#8217;t be the case for long.)</p>
<p>(Some webmasters and systems are using the IP address of the bots instead of the User-Agent Strings, but I consider the preferred method to be to use the User Agent for this purpose. Otherwise, you risk counting a search engine employee who is browsing your site during their coffee break to be their spider visiting you!)</p>
<p>Matt Cutts is apparently informally referring to this bot as &#8220;Mediabot&#8221; or &#8220;Media Bot&#8221;, but the bot is currently declaring itself with this User-Agent String:</p>
<blockquote><p><font face="courier">Mediapartners-Google/2.1</font></p></blockquote>
<p>If you want to specifically disallow this bot from some section of your site, you should wildcard the bot version number at the end in your robots.txt file like this:</p>
<blockquote><p><font face="courier">User-agent: Mediapartners-Google*<br />
Disallow: /dont-crawl-this-uri-on-my-site</font></p></blockquote>
<p>Another interesting point is raised due to the indexing of the Mediabot-spidered content:  doesn&#8217;t this basically provide a new way to, errrr, <strong>automatically submit pages to Google</strong>?!?</p>
<p>If you manually submit your site to Google using <a href="http://www.google.com/addurl/?continue=/addurl">their submission form</a>, you&#8217;re only allowed to provide the top-level domain name of your site. Using that method, Googlebot will initially visit your homepage, and likely only crawl through one or two levels of links out from the homepage in that initial spidering visit. If you&#8217;ve got a really deep site with thousands of pages of content, Googlebot might later revisit the site to try to spider more deeply, in a widening circle out from the homepage.  This existing process could utlimately take quite some time before all of your content gets spidered and can begin appearing in the SERPs.</p>
<p>I&#8217;m thinking that for cases like that where you have a lot of pages on a new or non-indexed site, adding the Google ads onto all your pages might actually result in them getting initially spidered more rapidly.</p>
<p>Also, quite a lot more pages could potentially get indexed if they have the ads on them, since there are situations where Googlebot will not or cannot spider pages on sites.  For instance, if your site content is accessible primarily only through a submission form on your homepage, or through Java/Javascripted menus, or you have only some Flash-enabled navigation system (of course, no SEO professional worth his or her salt would use a Flash-only nav system!) &#8212; if your site pages aren&#8217;t navigable through regular links displayed on your pages, Googlebot would otherwise never find and index their content.</p>
<p>But, now pages that were only accessible to users through a submission form on your site could potentially now get indexed and appear in SERPs if they have AdSense ads on them!</p>
<p>This is mostly a good thing for those of us working hard to expose content through the SEs, but I bet it could create some havoc for webmasters who are AdSense publishers and who are taken unaware by the potential sudden influx in traffic which pounds their databases as pages suddenly become visible in SERPs.  Fun problem to have, though!</p>
<p>Google apparently uses other bots specifically for harvesting other types of content as well.  Two that I&#8217;ve come across include a bot which grabs images from websites to use in their Google Images section, and a bot for gathering RSS feeds for use in personalized Google homepage, or in the Google Reader.</p>
<p>These are also not identified in Robotstxt.org yet, but their User-Agent strings are as follows:</p>
<blockquote><p><font face="courier">User-agent: Googlebot-Image</font><font face="courier">User-agent: Feedfetcher-Google</font></p></blockquote>
<p>Note: Feedfetcher <strong>ignores</strong> robots.txt exclusion files! Google does this because:</p>
<blockquote><p><em>Feedfetcher retrieves feeds only after users have explicitly added them to their Google homepage or Google Reader. Feedfetcher behaves as a direct agent of the human user, not as a robot, so it ignores robots.txt entries. Feedfetcher does have one special advantage, though: because it&#8217;s acting as the agent of multiple users, it conserves bandwidth by making requests for common feeds only once for all users.</em></p></blockquote>
<p>Do you know if Google is using other specialized bots for other sections of their site or other types of media? If so, I&#8217;d be interested in hearing about it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.naturalsearchblog.com/archives/2006/04/19/adsense-spider-cross-pollinates-for-google/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
