There is a lot of talk on the forums dedicated to SEO and Internet business issues about something called a ‘duplicate content penalty’. Several people maintain that there is no such thing, or that it should be referred to as a duplicate content filter.

Well there is very definately a duplicate content penalty. And there is a duplicate content filter. I have discovered that there is even a duplicate layout penalty. I googled ‘duplicate layout penalty’ and got zero results — so nobody is talking about that, but I will.

First, there are several types of duplicate content, and various penalties and filters apply. And the criteria by which the penalties or filters are applied is neither clear nor consistent. The worst-case example of duplicate content is a mirror-site. Make a second copy of your site, and you will find one or both removed from the search engine listings. But sites with a valid need for mirror sites, like sourceforge.net or php.net do not get penalized. Different mirrors often have different page-rank, but that may be due to differences in incoming links, server response time, and other ranking factors that have nothing to do with content.

The next level of duplicate content is the reprint article site. If all of the pages on your site consist of free reprint articles, you will likely be penalized with low pagerank. But some sites with nothing but reprint articles fare well — articlecity.com has pr6 rank for their home page.

So clearly, duplicate content is a FACTOR in page rank, but not the only consideration. I looked at some one-year-old free reprint articles. In a typical case, one was listed in Google results, though if you clicked the link to see filtered-out results, there were three listings. Yahoo, on the other hand, showed 14 copies of the same article. Interestingly, the sites listed by Yahoo all were indexed by Google, usually with thousands of pages. So even though a site was indexed, the results might not appear in a Google search, even when duplicate listings were requested. One home page for that sample article had a pr3 rank, but most were pr0 or un-ranked.

Perhaps I’ll take the time to do a more detailed study of those factors in the future, but for now, lets just say the situation is complex enough that most people are just guessing when they talk about the duplicate content penalty. So when we move on to the topic of this post, duplicate layout penalty, we are on even shakier ground — but I’m personally convinced it exists.

Here is the scenario. I made a topic-specific website for a particular U.S. locality. It got a lot of traffic, and better yet, made a lot of money through Adsense. There is demand for that kind of information, and it pays off, so I create another website for the same topic, but a different locality — so all the data is different, but being lazy I used the exact same page layout, and just a different database. This continued through about five locations, each on different IPS servers. When they had aged a bit, so I could expect them to ‘catch-up’ with the first site in traffic and earnings, I noticed they were not doing so well. More specifically, the 2nd and 3rd site were rated one pagerank lower than the first, and the 4th and 5th were yet another pagerank lower.

Lower pagerank = lower traffic. Google has over half of the web-search market, so their pagerank is important. Here, I made sites, all equally useful, all with similar incoming links, and of six sites, one ranks pr4, two pr3, two pr2 and one pr0.

Well, maybe it was coincidence. But I remembered another disappointing set of sites that started out well, then didn’t live up to the first example — and I found the same pattern. It wasn’t just that the older sites ranked better, but at one year old, the first site had a big advantage, and the order in which sites were added predicted relative rank, with later sites ranking lower.

So why, I can hear skeptics ask, don’t newer blogs using Wordpress, rank poorly — there are tens of thousands (or 100’s of thousands?) of sites using the same basic layout. My only guess is that it is the same reason that 1,000s of news sites do not get hit with duplicate content penalties/filtering. Either the folks at google are so brilliant they developed a ranking program that can detect actual value, or (more likely) they programmed in certain exceptions to their ‘rules’.