Natural Search Blog


Spiders like Googlebot choke on Session IDs

Many ecommerce sites have session IDs or user IDs in the URL of their pages. This tends to cause either the pages to not get indexed by search engines like Google, or to cause the pages to get included many times over and over, clogging up the index with duplicates (this phenonemon is called a “spider trap”). Furthermore, having all these duplicates in the index causes the site’s importance score, known as PageRank, to be spread out across all these duplicates (this phenonemon is called “PageRank dilution”).

Ironically, Googlebot regularly gets caught in a spider trap while spidering one of its own sites – the Google Store (where they sell branded caps, shirts, umbrellas, etc.). The URLs of the store are not very search engine friendly: they and are overly complex, and include session IDs. This has resulted in 3,440 duplicate copies of the Accessories page and 3,420 copies of the Office page, for example.

If you have a dynamic, database-driven website and you want to avoid your own site becoming a spider trap, you’ll need to keep your URLs simple. Try to avoid having any ?, &, or = characters in the URLs. And try to keep the number of “parameters” to a minimum. With URLs and search engine friendliness, less is more.

1 comment for Spiders like Googlebot choke on Session IDs »

  1. MyAvatars 0.2

    See pix of Googlebot
    http://www.thesemzone.com/2007/06/what-googlebot-actually-looks-like.html

    Comment by Al — 6/1/2007 @ 1:31 pm


Leave a comment

* Do not use spammy names!

RSS feed for comments on this post. TrackBack URI

RSS Feeds
Categories
Archives
2013
Feb      
2011
May      
2010
Jan Feb Mar Apr
Sep      
2009
Jan Feb Apr May
Jun Jul Aug Sep
Oct Nov Dec  
2008
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Dec  
2007
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Nov Dec
2006
Mar Apr May Jun
Jul Aug Sep Oct
Nov Dec    
2005
Jan Feb Mar Dec
2004
May Jun Jul Aug
Sep Oct Nov Dec
Other