SEO guidelines on pagination, e.g. of online store categories or blog articles, are a recurring subject and there are a few approaches to it. We are going to discuss how to tackle pagination to help web crawlers process content more easily.
What is pagination?
Pagination is simply dividing a specific resource on a website into pages. The simplest example can be categories in a majority of online stores where after listing a specific number of products the user can go to the next part of the list of products. Another way to present products within a category is infinite scroll pagination where when the user scrolls down, new and new products are displayed. It is the same for blogs. Generally, it does not matter which resource of a website is divided into parts, it is more about how you do it and how you notify Googlebot of content that follows.
Let web crawlers do what they do
One of the basic mistakes committed while paginating is using canonical tags on the first pagination page. And even though it is justified in one situation, it is usually done incorrectly.
If, for instance, a product category is divided into a few pages, it means that the base page includes a specific number of products, the second page includes another group of products, and so on and so forth, until the last pagination page.
Pagination pages are as follows:
therefore, it does not make much sense to use such canonical links:
<link rel="canonical" href="https://website.com/product-category" />
due to the fact that it is a further part of the products from a given category, pages 2,3,4, etc. are not duplicates of the main page (unless it is not the case, then using a canonical link is reasonable).
Naturally, you can provide canonical links, but to themselves. In such a case, enter the following on page https://website.com/product-category/ :
<link rel="canonical" href="https://website.com/product-category" />
and the following on page https://website.com/product-category/page-2:
<link rel="canonical" href="https://website.com/product-category/page-2" />
A page with all the content
There is an exception – a base page with all the content, additionally divided into a few sub-pages with less content. In such a situation, it makes sense to use a canonical link. It is a practice used often in quite long articles with a lot of content.
Thus, the following pages:
combined into a single logical sequence will in fact include all the content on base page https://website.com/product-category/ which you want to use for search engine indexing.
This unfortunate first pagination page…
Yet another mistake is duplicating category base page https://website.com/product-category/ with the first pagination page https://website.com/product-category/page-1
If the base page includes the same content as /page-1,then it is pointless to make a duplicate – certain content management systems normally generate (beside a base page)the first pagination page starting with /page-1;in such a case, the address should e.g. be redirected to the base page using Redirect 301, and in the code you should provide a link not to https://website.com/product-category/page-1 but rather to https://website.com/product-category/
Otherwise, you will encounter an internal redirection problem, thus wasting the crawl budget of Googlebot for crawling a given website. In other words, there is no use in redirecting it with an internal link using Redirect 301 when you can link directly to the landing page using HTTP status code 200.
The rel=“next”andrel=“prev” attributes
If you decide to paginate as follows:
then you can indicate relationships between individual pages to a crawler. Providing relevant links with rel attributes in the <head>section highlights individual pages of a sequence.
On the first (base) page, put the following:
<link rel="next" href="https://website.com/product-category/page-2">
as another page containing a part of the same resource.
On the next page, i.e. https://website.com/product-category/page-2,put the following:
<link rel="prev" href="https://website.com/product-category/"> <link rel="next" href="https://website.com/product-category/page-3">
Moving on, on the third page, i.e. https://website.com/product-category/page-3:
<link rel="prev" href="https://website.com/product-category/page/page-2"> <link rel="next" href="https://website.com/product-category/page-4">
Assuming that /page-4 is thelast one in the pagination sequence, put there the following:
<link rel="prev" href="https://website.com/product-category/page-3">
It is the most frequently used combination of links in a heading with “rel next/prev” attributes. There are sometimes attributes added to links “<a href=” in the code (e.g. below the list), though it is better to put them inthe<head> section –the sooner a crawler gets a clue where the rest of the resource is, the better.
You need to remember that links with “rel next/prev” attributes are not absolute directives, that is they can, but do not have to, be taken into consideration while crawling a website. In addition, it is worth using absolute paths to subsequent pages, although Google guidelines say that “values can be absolute paths and relative paths.”However, in the case of a page with parameters in a URL, e.g. sorting parameters, they should also be put in “rel next/prev” links–for instance, the following page:
will include the following links in the <head> section:
<link rel="prev" href="https://website.com/product-category/page-2&sort=desc"> <link rel="next" href="https://website.com/product-category/page-4&sort=desc">
but a canonical tag pointing only to itself (without the parameter of sorting in descending order):
<link rel="canonical" href="https://website.com/product-category/page-3"/>
To index or not to index?
Another problem faced while paginating is misusing robots meta-tags. Google claimed thatindividual pages are to be considered as a logical sequence which will result in taking into account attributes of their links and in most cases will help the user display the first page.
It is very often the case that individual pagination pages, despite being connected to each other with links with “rel next/prev” attributes, are in the search engine index. If you have a problem with that, it will be best to use the following:
<meta name="robots" content="noindex, follow" />
on the subsequent pages, i.e. on /page-2 and higher. A web crawler will crawl the links until the last paginated page, but will not index them.
So what about canonicals?
It clearly follows from the discussion with John Mueller (https://plus.google.com/+JohnELincoln/posts/TCJHwdZHdQc) that you should not combine ‘no index’ meta tags with canonicals pointing at an indexable URL. Therefore, in this case, if a canonical link on a pagination page directs to itself, a ‘no index’ meta tag is not to be put and the other way round.
I did not notice any problems with crawling paginated pages with canonical links to the pages themselves and the‘no index’ meta tag; however, Google wants to decide on its own what to index and what to not index, so perhaps we should let Google do it.
Pagination and a website XML sitemap
Here is an interesting case: you rather do not enter pagination links in a sitemap.xml, but only the base address of, for example, a category or the main article with all the content. In principle, in an XML sitemap there should be entered those addresses which you want to index, so if you do not care about pagination indexing, do not include there URLs to /page-2 and higher. Naturally, if you block it with a ‘no index’ robots meta tag, it is even simpler – do not provide them.
Pagination pages as parameters
If you want to support product/article indexing from the last paginated page, provide web crawlers with unlimited access to pages of other categories past the base category. It is often the case that individual pages have “?page=” or “?p=” parameters and can be mistakenly blocked in robots.txt together with the rest of the parameters.
It is similar for URL parameters in Google Search Console. It is worth showing Googlebot that such parameters are responsible for dividing resources into pages and letting the bot decide.
Pagination and descriptions of categories
A frequently followed practice is extending category content, e.g. in online stores. It often happens that an added description is displayed at all the pagination URLs in an unchanged form. If all the pages are treated as a logical sequence and combined into a whole by “rel next/prev” attributes, then it seems unnecessary to duplicate the same text on /page-2, /page-3, etc.
It is similar as far as listed products are concerned: every subsequent page displays a new set of products, and the user wants to neither read the same category description all the time on each and every page nor look at the same products further within the category. There is a question though – who reads category descriptions? Googlebot probably does, so do not feed it with duplicates.
Do you need pagination if you use an infinite scroll?
It turns out that pagination is a viable option also in the ‘infinite scroll’ solution, where by scrolling down the list of products/articles within a category, new resources are displayed. John Mueller (http://scrollsample.appspot.com/items) provided a sensible example: while scrolling the list, new parameters pointing to subsequent pages are added to the URL.
and so on and so forth.
Obviously, here are used links with “rel next/prev” attributes and provided is a canonical link to itself; for example, at http://scrollsample.appspot.com/items?page=5, it is as follows:
<link rel="canonical" href="/items?page=5" /> <link rel="next" href="/items?page=6"/> <link rel="prev" href="/items?page=4"/>