no one saw the panda uprising coming. one day they were frolicking in our zoos. the next they were frolicking in our entrails. they came for the identical twins first then the gingers and then the rest of us. i finally trapped one and asked him the question burning in all of our souls why he just smiled and said you humans all look alike to me. jericho bamboo jackson pandas take no prisonersok maybe we re starting to get a bit melodramatic about this whole panda thing. while it s true that panda didn t change everything about seo i think it has been a wake up call about seo issues we ve been ignoring for too long. one of those issues is duplicate content. while duplicate content as an seo problem has been around for years the way google handles it has evolved dramatically and seems to only get more complicated with panda has upped the ante even more. so i thought it was a good time to cover the topic of duplicate content as it stands in 2011 in depth. this is designed to be a comprehensive resource a complete discussion of what duplicate content is how it happens how to diagnose it and how to fix it. maybe we ll even round up a few rogue pandas along the way. let s start with the basics. duplicate content exists when any two or more pages share the same content. if you re a visual learner here s an illustration for you easy enough right so why does such a simple concept cause so much difficulty one problem is that people often make the mistake of thinking that a page is a file or document sitting to a crawler like googlebot a page is any unique url it happens to find usually through internal or external links. especially on large dynamic sites creating two urls that land on the same content is surprisingly easy and often unintentional . duplicate content as an seo issue was around long before the panda update and has taken many forms as the algorithm has changed. here s a brief look at some major issues with duplicate content over the years in the early days of google just indexing the web was a massive computational challenge. to deal with this challenge some pages that were seen as duplicates or just very low quality were stored in a secondary index called the supplemental index. these pages automatically became 2nd class citizens from an seo perspective and lost any competitive ranking ability. around late 2006 google integrated supplemental results back into the main index but those results were still often filtered out. you know you ve hit filtered results anytime you see this warning at the bottom of a google serp even though the index was unified results were still omitted with obvious consequences for seo. of course in many cases these pages really were duplicates or had very little search value and the practical seo impact was negligible but not always. it s always tough to talk limits when it comes to google because people want to hear an absolute number. there is no absolute crawl budget or fixed number of pages that google will crawl on a site. there is however a point at which google may give up crawling your site for a while especially if you keep sending spiders down winding paths. although the budget isn t absolute even for a given site you can get a sense of google s crawl allocation for your site in google webmaster tools under diagnostics crawl stats so what happens when google hits so many duplicate paths and pages that it gives up for the day practically the pages you want indexed may not get crawled. at best they probably won t be crawled as often. similarly there s no set cap to how many pages of a site google will index. there does seem to be adynamic limit though and that limit is relative to the authority of the site. if you fill up your index with useless duplicate pages you may push out more important deeper pages. for example if you load up on 1000s of internal search results google may not index all of your product pages. many people make the mistake of thinking that more indexed pages is better. i ve seen too many situations where the opposite was true. all else being equal bloated indexes dilute your ranking ability. long before panda a debate would erupt every few months over whether or not there was a duplicate content penalty. while these debates raised valid points they often focused on semantics whether or not duplicate content caused a capital p penalty. while i think the conceptual difference between penalties and filters is important the upshot for a site owner is often the same. if a page isn t ranking or even indexed because of duplicate content then you ve got a problem no matter what you call it. since panda starting in february 2011 the impact of duplicate content has become much more severe in some cases. it used to be that duplicate content could only harm that content itself. if you had a duplicate it might go supplemental or get filtered out. usually that was ok. in extreme cases a large number of duplicates could bloat your index or cause crawl problems and start impacting other pages. panda made duplicate content part of a broader quality equation now a duplicate content problem can impact your entire site. if you re hit by panda non duplicate pages may lose ranking power stop ranking altogether or even fall out of the index. duplicate content is no longer an isolated problem.
no one saw the panda uprising coming.
Plagiarized
one day they were frolicking in our zoos.
Compare
unique
the next they were frolicking in our entrails.
Plagiarized
they came for the identical twins first then the gingers and then the rest of us.
Compare
unique
i finally trapped one and asked him the question burning in all of our souls why
Plagiarized
he just smiled and said you humans all look alike to me.
Compare
unique
jericho bamboo jackson pandas take no prisonersok maybe we re starting to get a bi....
unique
while it s true that panda didn t change everything about seo i think it has been
Plagiarized
a wake up call about seo issues we ve been ignoring for too long.
Compare
Plagiarized
one of those issues is duplicate content.
Compare
unique
while duplicate content as an seo problem has been around for years the way google
unique
handles it has evolved dramatically and seems to only get more complicated with
unique
panda has upped the ante even more.
Plagiarized
so i thought it was a good time to cover the topic of duplicate content as it sta....
Compare
unique
this is designed to be a comprehensive resource a complete discussion of what duplicate
unique
content is how it happens how to diagnose it and how to fix it.
unique
maybe we ll even round up a few rogue pandas along the way.
Plagiarized
let s start with the basics.
Compare
Plagiarized
duplicate content exists when any two or more pages share the same content.
Compare
unique
if you re a visual learner here s an illustration for you easy enough right
Plagiarized
so why does such a simple concept cause so much difficulty one problem is that
Compare
unique
people often make the mistake of thinking that a page is a file or document sitting
Plagiarized
to a crawler like googlebot a page is any unique url it happens to find usually ....
Compare
unique
especially on large dynamic sites creating two urls that land on the same content....
Plagiarized
duplicate content as an seo issue was around long before the panda update and has ....
Compare
unique
here s a brief look at some major issues with duplicate content over the years in
Plagiarized
the early days of google just indexing the web was a massive computational challenge.
Compare
Plagiarized
to deal with this challenge some pages that were seen as duplicates or just very
Compare
Plagiarized
low quality were stored in a secondary index called the supplemental index.
Compare
Plagiarized
these pages automatically became 2nd class citizens from an seo perspective and l....
Compare
unique
around late 2006 google integrated supplemental results back into the main index
unique
but those results were still often filtered out.
unique
you know you ve hit filtered results anytime you see this warning at the bottom of
unique
a google serp even though the index was unified results were still omitted
unique
with obvious consequences for seo.
unique
of course in many cases these pages really were duplicates or had very little search
Plagiarized
value and the practical seo impact was negligible but not always.
Compare
unique
it s always tough to talk limits when it comes to google because people want to he....
unique
there is no absolute crawl budget or fixed number of pages that google will crawl o....
unique
there is however a point at which google may give up crawling your site for a
unique
while especially if you keep sending spiders down winding paths.
Plagiarized
although the budget isn t absolute even for a given site you can get a sense of
Compare
unique
google s crawl allocation for your site in google webmaster tools under diagnostics crawl
unique
stats so what happens when google hits so many duplicate paths and pages that it
Plagiarized
gives up for the day practically the pages you want indexed may not get crawled.
Compare
Plagiarized
at best they probably won t be crawled as often.
Compare
unique
similarly there s no set cap to how many pages of a site google will index.
unique
there does seem to be adynamic limit though and that limit is relative to the aut....
Plagiarized
if you fill up your index with useless duplicate pages you may push out more impo....
Compare
Plagiarized
for example if you load up on 1000s of internal search results google may not ind....
Compare
Plagiarized
many people make the mistake of thinking that more indexed pages is better.
Compare
Plagiarized
i ve seen too many situations where the opposite was true.
Compare
unique
all else being equal bloated indexes dilute your ranking ability.
Plagiarized
long before panda a debate would erupt every few months over whether or not there ....
Compare
unique
while these debates raised valid points they often focused on semantics whether or
unique
not duplicate content caused a capital p penalty.
Plagiarized
while i think the conceptual difference between penalties and filters is important ....
Compare
unique
if a page isn t ranking or even indexed because of duplicate content then you ve g....
Plagiarized
since panda starting in february 2011 the impact of duplicate content has become ....
Compare
Plagiarized
it used to be that duplicate content could only harm that content itself.
Compare
unique
if you had a duplicate it might go supplemental or get filtered out.
unique
usually that was ok.
Plagiarized
in extreme cases a large number of duplicates could bloat your index or cause craw....
Compare
unique
panda made duplicate content part of a broader quality equation now a duplicate co....
Plagiarized
if you re hit by panda non duplicate pages may lose ranking po