Natural Search Blog


Robots Meta Tag Not Well Documented by Search Engines

Those of us who do SEO have been increasingly pleased with the various search engines for providing or allowing tools and protocols to allow us to help direct, control, and manage how our sites are indexed. However, the search engines still have a significant need to keep much of their workings a secret out of fear of being exploited by ruthless black-hats who will seek to improve page rankings for keywords regardless of appropriateness. This often leaves the rest of us with tools that can be used in some limited cases, but there’s little or no documentation to tell us how those tools operate functionally in the complex real world. The Robots META tag is a case in point.

The idea behind the protocol was simple, and convenient. It’s sometimes hard to use a robots.txt file to manage all the types of pages delivered up by large, dynamic sites. So, what could be better than using a tag directly on a page to tell the SE whether to spider and index the page or not?  Here’s how the tag should look, if you wanted a page to NOT be indexed, and for links found on it to NOT be crawled:

<meta content=”noindex,nofollow” name=”ROBOTS”>

Alternatively, here’s the tag if you wanted to expressly tell the bot to index the page and crawl the links on it:

<meta content=”index,follow” name=”ROBOTS”>

But, what if you wanted the page to not be indexed, while you still wanted the links to be spidered? Or, what if you needed the page indexed, but the links not followed? The major search engines don’t clearly describe how they treat these combinations, and the effects may not be what you’d otherwise expect. Read on and I’ll explain how using this simple protocol with the odd combos had some undesirable effects.

(more…)

RSS Feeds
Categories
Archives
Other