I was interested to read a column by Jill Whalen this past week on “The Duplicate Content Penalty Myth ” at Search Engine Land. While I agree with her assessment that there really isn’t a Duplicate Content Penalty per se, I think she perhaps failed to address one major issue affecting websites in relation to this.
Read on to see what I mean.
Sure she’s right in that webmasters don’t have to be afraid if their applications have created multiple page URLs which all contain identical or near-identical content. Websites do this all the time, and search engines aren’t penalizing them for it. (Except perhaps for the case of page-scrapers who steal other sites’ content for redisplay — in which case a scraper’s page might get penalized or just ranked lower as being non-authoritative for its content.) But, webmasters *do* still need to be concerned with duplicate content, because it can affect their overall traffic and rankings.
Quite simply, PageRankÂ continues to beÂ one big factor in ranking one page versus another for keyword searches. Most sites only have so much PageRank to spend on all the pages in their site. If you double the number of pages on your site, you may be virtually cutting each page’s PageRank in half when you do it. If you deploy duplicate copies of all your pages willy-nilly, you’ll have watered-downÂ your pages’ PageRank scores for no good reason.
I wish Jill had mentioned this — dupe content may not cause a website to be penalized, but it’s still an important factor for the sake of improving/optimizing a site’s pages to rank better and bring in more traffic. Her article seems to leave one with the feeling that since there’s not a penalization, webmasters just don’t need to worry about duplication at all.
I say, what do webmasters care if it’s called “penalization” or not, if the end result is still unnecessarily lower rankings in SERPs?Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
If you don’t know what duplicate content may be, you should know that there are a number of things which can cause it to occur in web applications. Primarily, if you have multiple different URLs which all present the same page content, and all of these URLs can be found and indexed by search engine spiders, then you have a duplicate content problem. Here’s some common examples:
- http://example.com & http://www.example.com both present pages for users. As the homepage of your site, if both are indexed by search engines with no qualification, they effectively will split your homepage PR. If subpages are also indexed on both domains, it splits the PR of any of those pages on your site, too. You need one, canonical (i.e. “official”) domain for your site.
- http://www.example.com/index.html is the same as http://www.example.com/ — if your site developers linked to the homepage indescriminantly, using both of these types of URLs, they’ve split your homepage’s PageRank.
- http://www.example.com/?UID=A7WF5681HJF145I — if your site uses sessionizing in ubiquitous querystrings — assigning session IDs for users for personalization and such — there could be hundreds of different URLs indexed for pages, causing loads of PageRank split through duplication.
- www.example.com/app.jsp?DATE=3/17/07&PageID=6 is the same as:
www.example.com/app.jsp?PageID=6&DATE=3/17/07 and the same as:
– if a page’s URL has multiple querystring terms and different links point into it with the terms in different order, it can create duplication.
These are just some examples — there are many more cases possible.
There are a handful of ways you can fight duplication problems or mitigate their effects on PageRank:
- Fix your site/application so that only one URL per page of content will occur or be found by spiders;
- Make the application deliver up NOINDEX metatags for alternate page URLs;
- Move user session IDs out of page querystrings and into persistent cookies;
- Place 301 redirects on alternative page URLs, redirecting over to the permanant, primary page URL;
- If setting up special querystringed page URLs for tracking media campaigns, place those URLs in a special subdirectory on your site which you’ve specified in your robots.txt file for search engines to not index;
There are a number of other solutions out there, depending upon what your duplication problem may be. In most cases, fixing duplication is going to be a bit of a technical clean-up job, but the benefit to your overall page rankings and referral traffic may be significant.
You don’t need to worry about being “penalized” for duplicate content within your site — you’re not going to be delisted for it. But *do* worry about how it affects the SERP rankings for your content.