Matt Cutts: Google Insider Info

Hi, my name is Matt Cutts and I joined Google as a software engineer in January 2000. I'm currently the head of Google's Webspam team. I sometimes blog about things relevant to Google's Algortihm.
Matt Cutts is Google's Front-Man. He knows how Google works and how Google Ranks.
New Insights on (Internal) Duplicate Content from Matt Cutts
On March 14, 2010, Eric Enge of Stone Temple Consulting interviewed Matt Cutts of Google and uncovered some interesting revelations and confirmations about duplicate content and many other issues. Here are a few highlights from the interview, but please take time to read the entire article for yourself.
Duplicate Content Pages Might Not Get Crawled
“Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We’ll drop two out of the three pages and keep only one, and that’s why it looks like [your site] has less good content. So we might tend to not crawl quite as much from that site.”
…”Typically, duplicate content is not the largest factor on how many pages will be crawled, but it can be a factor.”
In other words, duplicate content on the same URL can result in Google not crawling as many pages from your site. According to Matt, you have a certain “crawl budget” – and allotment of pages they are willing to crawl within your domain. Having the same content on multiple pages of your website means Google will likely crawl less pages of your site.
Do Affiliate Links Pointing to Your Site Create Duplicate Content?
“Duplicate content can happen. If you are operating something like a co-brand, where the only difference in the pages is a logo, then that’s the sort of thing that users look at as essentially the same page. Search engines are typically pretty good about trying to merge those sorts of things together, but other scenarios certainly can cause duplicate content issues.”
What About Ecommerce Product Pages that are Almost Identical?
Matt says the canonical tag is one answer. “There are a couple of things to remember here. If you can reduce your duplicate content using site architecture, that’s preferable. The pages you combine don’t have to be complete duplicates, but they really should be conceptual duplicates of the same product, or things that are closely related. People can now do cross-domain rel=canonical, which we announced last December.”
My Cart
Matt Cutts: Google Insider Info