I am the SEO director of Vitals, a comprehensive health care and doctor information and review site. There are over one million health professionals in the United States alone. This means a lot of categorization and pagination is necessary to organize all the providers by name, city and specialty. Our pagination strategy has changed several times to keep in step with Google’s latest recommendations. At the end of this article, I will present a Google recommendation history.
Pagination can occur in many formats. First is article pagination when a single article spans across two or more pages. Next there is gallery pagination when every item in a gallery has its own page. There is also forum pagination where threads can span many pages. Category pagination is when listings span several pages. These lists can be in the form of products or anything else that can be placed in categories. A newer form of pagination is infinite scroll pagination, where data is pre-fetched from a subsequent page and added directly to the user’s current page as the scroll down the page.
You should note every pagination type on your site and discern which of the pagination options shown below works best for your situation. You should also determine which pages in the series would provide additional value to surface in the Google index. An article spanning several pages should allow Google to read and index the keywords from the entire article. Likely, on lists of products, you would want the search engines to have a crawlable path to all your product listings. A component page in a paginated series can be valuable as: 1) a component page with good content that completes the series 2) a crawlable path to reach individual items content.
The best time to deal with pagination structure is during the design process. This will avoid any issues with having to re-code or restructure post launch.
“Do what’s good for the user”
Often product managers are resistant to change existing pagination by citing Matt Cutts, “do what’s good for the user, not for search engines”. This is certainly the top priority, but I would like to add one crucial element, “do what’s valuable to searchers trying to find your business”, otherwise, there’s no value to your content. You also need to code the pages properly, so the search engines can act as the intermediary between your users and your quality content. Over the course of the last few years, Google has laid out instructions in Google Webmaster forums on how paginations pages should be structured and coded.
What can be the problems with pagination?
- Crawl Hog
Google will crawl all the pagination pages if you let it. However, the Google crawler bandwidth can have its site crawl limitations. You don’t want the crawler to get tied up in paginated pages, especially if the pagination pages do not add any Google indexation value over page one. Increasing the number of categories or items per page can decrease the depth of pagination.
- Page Juice Dilution
Incorrect code implementation can dilute page juice across the paginated pages which will also prevent link juice transferring to pages that they link to.
Paginated pages are vulnerable to duplication filtering by the search engines. Coding paginated pages correctly will let the search engines know that they are pagination pages and will not be flagged as duplication.
- Thin Content
A lot of paginated pages do not have a significant amount of quality content on them. The Panda algorithm can penalize an entire site if it finds too much low quality content. Thankfully, Google has given us relatively clear guidelines on best practices for pagination. Here are some pertinent excerpts from Google’s recommendations.
A brief history on Google’s recommendations:
9/15/2011 Google Webmaster Central
Here are three options for a series:
- Leave whatever you have exactly as-is. Paginated content exists throughout the web and we’ll continue to strive to give searchers the best result, regardless of the page’s rel=”next”/rel=”prev” HTML markup—or lack thereof.
- If you have a view-all page, or are considering a view-all page, see our post on View-all in search results.
- Hint to Google the relationship between the component URLs of your series with rel=”next” and rel=”prev”. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.
A few points to mention:
- The first page only contains rel=”next” and no rel=”prev” markup.
- Pages two to the second-to-last page should be doubly-linked with both rel=”next” and rel=”prev” markup.
- The last page only contains markup for rel=”prev”, not rel=”next”.
- rel=”next” and rel=”prev” values can be either relative or absolute URLs (as allowed by the<link> tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL.
- rel=”next” and rel=”prev” only need to be declared within the <head> section, not within the document <body>.
- We allow rel=”previous” as a syntactic variant of rel=”prev” links.
- rel=”next” and rel=”previous” on the one hand and rel=”canonical” on the other constitute independent concepts.
- rel=”prev” and rel=”next” act as hints to Google, not absolute directives.
- When implemented incorrectly, such as omitting an expected rel=”prev” or rel=”next” designation in the series, we’ll continue to index the page(s), and rely on our own heuristics to understand your content.
10/19/2011 – Maile Ohye in Google Forums
If you’ve marked page 2 to n of your paginated series as “noindex, follow” to keep low quality content from affecting users and/or your site’s rankings, that’s fine, you can additionally include rel=”next” and rel=”prev.” Noindex and rel=”next”/”prev” are entirely independent annotations.
This means that if you add rel=”next” and rel=”prev” to noindex’d pages, it still signals to Google that the noindex’d pages are components of the series (though the noindex’d pages will not be returned in search results). This configuration is totally possible (and we’ll honor it), but the benefit is mostly theoretical.
If you believe the user experience on page 2 to n provides little value — so much so that you’ve already marked these pages as noindex — then to ensure that these low-quality pages aren’t returned to users and/or considered in ranking updates such as Panda, even if you choose to add rel=”next” and rel=”prev,” you may want to consider keeping the noindex (or “noindex, follow”).
03/01/2012 – Maile Ohye in Google Forums
“Does rel=next/prev also work as a signal for only one page of the series (page 1 in most cases?) to be included in the search index? Or would noindex tags need to be present on page 2 and on?”
When you implement rel=”next” and rel=”prev” on component pages of a series, we’ll then consolidate the indexing properties from the component pages and attempt to direct users to the most relevant page/URL. This is typically the first page. There’s no need to mark page 2 to n of the series with noindex unless you’re sure that you don’t want those pages to appear in search results.
03/12/2012 – Maile Ohye in YouTube Video
02/12/2014 – Maile Ohye in Google Webmaster Central
Best practices for new faceted navigation implementations or redesigns
New sites that are considering implementing faceted navigation have several options to optimize the “crawl space” (the totality of URLs on your site known to Googlebot) for unique content pages, reduce crawling of duplicative pages, and consolidate indexing signals.
- Option 1: internal links
Make all unnecessary URLs links rel=“nofollow”. This option minimizes the crawler’s discovery of unnecessary URLs and therefore reduces the potentially explosive crawl space (URLs known to the crawler) that can occur with faceted navigation. rel=”nofollow” doesn’t prevent the unnecessary URLs from being crawled (only a robots.txt disallow prevents crawling). By allowing them to be crawled, however, you can consolidate indexing signals from the unnecessary URLs with a searcher-valuable URL by adding rel=”canonical” from the unnecessary URL to a superset URL
- Option 2: Robots.txt disallow
For URLs with unnecessary parameters, include a /filtering/ directory that will be robots.txt disallow’d. This lets all search engines freely crawl good content, but will prevent crawling of the unwanted URLs. For instance, if my valuable parameters were item, category, and taste, and my unnecessary parameters were session-id and price. I may have the URL:
- Option 3: Separate hosts
If you’re not using a CDN (sites using CDNs don’t have this flexibility easily available in Webmaster Tools), consider placing any URLs with unnecessary parameters on a separate host — for example, creating main host http://www.example.com and secondary host, www2.example.com. On the secondary host (www2), set the Crawl rate in Webmaster Tools to “low” while keeping the main host’s crawl rate as high as possible. This would allow for more full crawling of the main host URLs and reduces Googlebot’s focus on your unnecessary URLs.
- Be sure there remains at least one click path to all items on the main host.
- If you’d like to consolidate indexing signals, consider adding rel=”canonical” from the secondary host to a superset URL on the main host.
- Improve indexing of individual content pages with rel=”canonical” to the preferred version of a page. rel=”canonical” can be used across hostnames or domains.
- Improve indexing of paginated content (such as page=1 and page=2 of the category “gummy candies”) by either:
- Adding rel=”canonical” from individual component pages in the series to the category’s “view-all” page (e.g. page=1, page=2, and page=3 of “gummy candies” with rel=”canonical” to category=gummy-candies&page=all while making sure that it’s still a good searcher experience (e.g., the page loads quickly).
- Using pagination markup with rel=”next” and rel=”prev” to consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole.
- Include only canonical URLs in Sitemaps.
- Configure Webmaster Tools URL Parameters if you have strong understanding of the URL parameter behavior on your site (make sure that there is still a clear click path to each individual item/article). For instance, with URL Parameters in Webmaster Tools, you can list the parameter name, the parameters effect on the page content, and how you’d like Googlebot to crawl URLs containing the parameter.
Note: URL “Parameter Handling” in Webmaster Tools allows the site owner to provide information about the site’s parameters and recommendations for Googlebot’s behavior.
Let’s analyze Google’s advice:
Option 1: The View All Page
Google clearly favors the View-All page option when the page loads quickly and users can easily find what they are looking for. This means that all items in a paginated series should be listed on the View-All page and all the paginated pages canonical tags to reference the View-All page. The paginated pages in this scenario are there to garner more page views and to make the lists per page more manageable for a user to read. The View-All page is primarily for the search engines.
Coding Instruction for the View-All Option:
- Create a single View-All page with all of the content from the paginated pages within a single series of pagination.
- Once you have created the View-All page, place a rel=”canonical” tag in the head section of each paginated component page, referencing the View-All Page. (example: <link rel=”canonical” “href=http://www.example.com/view-all”/>). This will tell Google to treat each specific page in a paginated series as a segment of the View-All page and queries will return the View-All page as opposed to a relevant segment page of the pagination chain.
- In Google Webmaster Parameter Handling, set the paginated page parameter to “Paginates” and for Google to crawl every URL.
View-All Option Works Well:
- If your pagination does not have so many links or images that the View-All page will take a considerable time to load. Five seconds is already stretching the limit for many users, especially on mobile devices. With their preference of this option, I believe Google is indicating to us that this option is most beneficial. If your View-All pages are too large, then it’s time to think how to break your pagination down to more manageable levels.
- If you don’t mind that the View-All page is the only one that is allowed to be indexed in the search engines. This can undermine the main purpose of your pagination, which was to get more page views, as you want users to scroll through the navigation in manageable chunks of data.
Option 2: Block Pagination Beyond Page One
In some instances, you may want to structure your website so that the search engines do not access the paginated series of pages after the first page. This means that every product must have internal links from a first page of listings. This can be difficult to structure, but I have seen some sites use this method successfully. This method ensures that the bot crawler will not needlessly crawl unimportant pages and only your first main representative page will be indexed by the search engines. Be cautious using this option, as it will prevent search engines from indexing content in the rest of the article or from finding any products listed after the first page. If you will need to stuff in additional categorization to accomplish this goal of linking to every product URL or article on a first page, then this option can have the unintended consequence of a poor user experience and Google will certainly take notice of that.
Coding Instruction for the Blocking Pagination Option:
- Place a nofollow tag on all links to the paginated pages.
- Since the paginated pages will not get crawled, all link equity that the links receive will not get transferred. To prevent loss of page juice, you should limit the number of paginated links that will be shown on the first page.
- In Google Webmaster tools, under the Parameter Handling section, set the paginated page parameter to “Paginates” and for Google to crawl “No URLs”. This is another setting that requires extreme caution as parameters can be shared across various sections of the website and may have negative unintended consequences. If you are not confident and comfortable with these settings, leave the setting to “Let Googlebot Decide”.
Blocking Pagination works well if:
- Other pages on the site do not pass link equity to the paginated pages.
- All pages on the site are linked internally on pages the search engines are allowed to crawl and the links are allowed to pass link equity.
Option 3: Implement Pagination Relationships
This option requires the use of “next” and “prev” tags. The next and “prev” tags establish the relationship between all pages in a paginated series. This coding relationship protects the paginated pages from being seen as duplicates. The robots “noindex,follow” tag can implemented on the paginated pages if you believe there is absolutely no purpose for the paginated pages to surface in the Google index. This method ensures that link equity will not be wasted. The downside to this method is if you have excessive pagination, the crawlers may get caught up in crawling the paginated pages and not crawl key areas of your site.
Coding Instruction for the Relationship Option:
- Implement the rel “prev” and “next” tags to indicate a sequence of paginated pages.
- Each page in this paginated series can have the same title tag, meta description and H1 tags. However, if you are allowing the paginated pages to get indexed, you may choose to have targeted keywords in all these tags instead.
- All pages should have the canonical tag set to its own URL and not to the first page. If the URLs have a tracking ID or extra parameters, the canonical tag may need extra consideration.
- If you don’t want the paginated pages to get indexed, set a robots meta tag to “noindex,follow” in the head section of every page in the paginated series, excluding the first page. I will refer to this as Option 3B in the table below.
- In Google Webmaster Parameter Handling, set the paginated page parameter to “Paginates” and for Google to crawl every URL
Paginated Relationships works well if:
- If it can be implemented correctly. This extra coding can be challenging for some sites.
- You don’t have excessive pagination and the crawlers are not having trouble crawling your entire site.
A single site can use one or all the options shown above. Each pagination template on your site should be reviewed thoroughly to see which option makes sense to use. You may choose to use one or all of the above options on different content sections. I checked a selection of competitor sites and they all use Option 2 (block pagination) or Option 3 (paginated series). I again want to stress the major challenges with Option 2 and this option requires perfection in implementation to work correctly and the safer choices are either Option 1 (View-All page) or Option 3 (paginated series). I would surmise that although Google is promoting Option 1 (View-All page), most webmasters have not figured out how to fit the View-All page into their user experience, and therefore will not implement it. However, if Google is promoting the view-All option, I am sure Google has discovered that the View-All Option is the preferred option by searchers, so webmasters may sometimes need to cast aside their own business objectives.
NYTimes and Zocdoc use Option 2 and blocked out all pagination pages from getting crawled and indexed. The other sites all use Option 3 with Vitals setting the robots tag on the pagination pages to “nofollow,index”. Avvo’s strategy is a combination of Option 1 with the canonical tag set to the primary page and Option 2 with the links to pagination tagged as “nofollow”. It is advisable not mix-up or combine the various strategies or risk sending wrong signals to the search engines.
Major Pagination Challenges with All Options:
- Pay close attention to the crawler settings in Webmaster tools and also to your log files. Make sure Google is properly crawling all intended areas of the site.
- Make sure the parameter handling, robots.txt file, robots tag, anchor tag settings (follow or nofollow) and canonical tags all complement with each other and are implemented correctly. This is where most sites misconfigure their pagination.
- Endless pagination is a major concern. If your last pagination page has the URL http://www.example.com/page4, then that page should result in a 404 and page4 should not have the rel=”next” pointing to page 5. This sounds obvious, but it is a common issue that can cause the crawlers to get bogged down and stuck in your pagination.
- Include only crawler accessible canonical URLs in your XML and HTML sitemaps. All URLs that are blocked by robots.txt, “noindex” robots tag, non-self canonicals and redirected URLs should not be included in the sitemaps. Only the first URL in a paginated series using “next” and “prev” should be included in the sitemaps.
Pagination is complication. I hope that this article provides enough insight so that you can plan a proper strategy and provide the search engines logical paths and quality content. These methods will allow the search engines to crawl efficiently, resulting in strong rankings for your site content.