- Natural Search Blog - https://www.naturalsearchblog.com -

Interview with Google about duplicate content

The following is an excerpt of a video conversation held between Vanessa Fox, Product Manager of Google Webmaster Central [1], and Rand Fishkin, CEO and co-founder of SEOMoz [2] about Google and duplicate content. This further confirms Adam Lasnik’s position [3] that it’s a filter, not a penalty. The full video can be found here [4].

Rand Fishkin: Duplicate content filter, is that the same or different to a duplicate content penalty?
Vanessa Fox: So I think there is a lot of confusion about this issue. I think people think that if Google sees information on a site that is duplicate within the site then there will some kind of penalty applied (duplicating its own material). There’s a couple of different ways this can happen, one if you use subpages that seem to have a lot of content that is the same, e.g. a local type site that says here is information about Boulder and here’s information about Denver, but it doesn’t actually have any information about Boulder, it just says Boulder in one place and Denver in the other. But otherwise the pages are exactly the same. Another scenario is where you have multiple URL’s that point to the same exact page, e.g. a dynamic site. So those are two times when you have duplicate content within a site.

Fishkin: So would you call that a filter or would you call that a penalty, do you discriminate between the two?
Fox: There is no penalty. We don’t apply any kind of penalty to a site that has that situation. I think people get more worried than they should about it because they think oh no, there’s going to be a penalty on my site because I have duplicate content. But what is going to happen is some kind of filtering, because in the search results page we want to show relevant, useful pages instead of showing ten URLs that all point to the same page – which is probably not the best experience for the user. So what is going to happen is we are going to only index one of those pages. So if you don’t care, in the instance where there are a lot of URLs that all point to the same exact page, if you don’t care which one of them is indexed then you don’t have to do anything, Google will pick one and we’ll index it and it will be fine.

Fishkin: So let’s say I was looking for the optimal Google experience and I was trying to optimize my site to the best of my ability, would I then say well maybe it isn’t so good for me to have Google crawling my site pages I know are duplicates (or very similar), let me just give them the pages I know they will want?
Fox: Right, so you can do that, you can redirect versions…we can figure it out, it’s fine, we have a lot of systems. But if you care which version of the site is indexed, and you don’t want us to hit your site too much by crawling all these versions, then yeah, you might want to do some things, you can submit sitemaps and tell us which version of the page you want, you can do a redirect, you can block with robots, you can not serve us session IDs. I mean there’s a lot of different things you could do in that situation. In the situation where the pages are just very similar, it’s sort of a similar situation where you want to make the pages as unique as possible. So that’s sort of a different solution to the similar sort of problem. You want to go, ok, how can I make my page about Boulder, different from my page about Denver, or maybe I just need one page about Colorado if I don’t have any information about the other two pages.