Natural Search Blog

Interview with Google about duplicate content

The following is an excerpt of a video conversation held between Vanessa Fox, Product Manager of Google Webmaster Central, and Rand Fishkin, CEO and co-founder of SEOMoz about Google and duplicate content. This further confirms Adam Lasnik’s position that it’s a filter, not a penalty. The full video can be found here.

Rand Fishkin: Duplicate content filter, is that the same or different to a duplicate content penalty?
Vanessa Fox: So I think there is a lot of confusion about this issue. I think people think that if Google sees information on a site that is duplicate within the site then there will some kind of penalty applied (duplicating its own material). There’s a couple of different ways this can happen, one if you use subpages that seem to have a lot of content that is the same, e.g. a local type site that says here is information about Boulder and here’s information about Denver, but it doesn’t actually have any information about Boulder, it just says Boulder in one place and Denver in the other. But otherwise the pages are exactly the same. Another scenario is where you have multiple URL’s that point to the same exact page, e.g. a dynamic site. So those are two times when you have duplicate content within a site.

Fishkin: So would you call that a filter or would you call that a penalty, do you discriminate between the two?
Fox: There is no penalty. We don’t apply any kind of penalty to a site that has that situation. I think people get more worried than they should about it because they think oh no, there’s going to be a penalty on my site because I have duplicate content. But what is going to happen is some kind of filtering, because in the search results page we want to show relevant, useful pages instead of showing ten URLs that all point to the same page – which is probably not the best experience for the user. So what is going to happen is we are going to only index one of those pages. So if you don’t care, in the instance where there are a lot of URLs that all point to the same exact page, if you don’t care which one of them is indexed then you don’t have to do anything, Google will pick one and we’ll index it and it will be fine.

Fishkin: So let’s say I was looking for the optimal Google experience and I was trying to optimize my site to the best of my ability, would I then say well maybe it isn’t so good for me to have Google crawling my site pages I know are duplicates (or very similar), let me just give them the pages I know they will want?
Fox: Right, so you can do that, you can redirect versions…we can figure it out, it’s fine, we have a lot of systems. But if you care which version of the site is indexed, and you don’t want us to hit your site too much by crawling all these versions, then yeah, you might want to do some things, you can submit sitemaps and tell us which version of the page you want, you can do a redirect, you can block with robots, you can not serve us session IDs. I mean there’s a lot of different things you could do in that situation. In the situation where the pages are just very similar, it’s sort of a similar situation where you want to make the pages as unique as possible. So that’s sort of a different solution to the similar sort of problem. You want to go, ok, how can I make my page about Boulder, different from my page about Denver, or maybe I just need one page about Colorado if I don’t have any information about the other two pages.

1 comment for Interview with Google about duplicate content »

  1. MyAvatars 0.2

    I have a site that i accidentally duplicated the same content on dream weaver. I duplicated my index page for back up references and put it on my remote files for easier access or home page and named it index2.html
    I accidentally linked the index2.html with the same exact content my index.html had on my site to the actual website with out noticing. It when i was working on a web page and did not realize it until i saw Google had indexed both the index.html and the index2.html and they had the same exact content on both pages ranked 3rd one on top of the other. Index.html 3rd and index2.html 2nd. So when you typed in “Austin Flooring” the two pages appeared, index.html and index2.html on the search. for that desired particular key word. When i went back and deleted the index2 from the remote site, it was to late Google had de-indexed my site from the most important keywords or search terms. Both index.html and index2.html
    Today the index2.html is no longer the same as my index.html page but still no results. It is going on 3 months and im number 1 on yahoo and msn and all the other search engines for my most important key words except on Google. If there is no penaly, then how do I go about this serious issue? If there is a “penalty” how long does it last?

    Comment by matt — 2/5/2009 @ 10:21 pm

Leave a comment

* Do not use spammy names!

RSS feed for comments on this post. TrackBack URI

RSS Feeds
Aug Sep Oct Nov
Jan Feb Mar Apr
Jan Feb Apr May
Jun Jul Aug Sep
Oct Nov Dec  
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Dec  
Jan Feb Mar Apr
May Jun Jul Aug
Sep Oct Nov Dec
Mar Apr May Jun
Jul Aug Sep Oct
Nov Dec    
Jan Feb Mar Dec
May Jun Jul Aug
Sep Oct Nov Dec