Duplicate content seems to be a major issue confronting site builders these days.
Duplicate content. Is Google penalizing duplicate content or not? Based on my experience, and some experienced SEO experts I’ve talked to, the answer is a qualified yes. Admittedly, there is still tons of top ranking duplicate content in the Google index, but there are also growing examples that Google is cracking down on this as best they can.
There is a “shingle theory” that is much discussed, in that Google looks at 12 to 15-word segments or “shingles” to determine duplicate content status. The theory goes that Google has determined that it is almost mathematically impossible for two articles to have the same 12 to 15-word “shingle” without being the same article. So which article wins? The article on the page with the highest PR wins, according to some SEO experts I’ve talked to.
How does this impact you now? It may not impact you at all right now, depending on the niche your site is in. However, if you are republishing public domain articles or content, republishing PLR (Private Label Rights) content without alterations, republishing snippets of unaltered content, or republishing unaltered datafeeds on your site, you should reassess your site publishing strategy, because the Search Engines will eventually catch up to you someday.
The “anti-Duplicate Content” contingent argues that Google doesn’t have NEARLY the computing power necessary to sift through BILLIONS of websites to go searching for 15-word “shingles”. And they would be partially right too. Google doesn’t have the computing power necessary to compare ALL sites. However, my argument is that they don’t have to. Google is getting increasingly better at theming sites - why not just compare sites that exist within a similar theme? This drastically reduces the number of sites for comparison and then allows for significantly easier “shingle” comparison.
How does Google know which themes to focus on? Well, it stands to reason that the most competitive niches likely have the most duplicate content. See that Google Toolbar at the top of your web brower? That’s one way. The other way is for them to simply scan Google Adwords activity on their own site, looking for the most paid ads and bid levels.
Some site builders aren’t even concerned about Google, but are focused on Yahoo and MSN. However, it doesn’t take a rocket scientist to determine that Yahoo and MSN are probably developing duplicate content filters of their own as well.
So, young site builders, go forth and create, but do so carefully, and with an eye firmly fixed toward the future.
Mr. P