Pagination – a Crucial Element of Search Engine Indexing

Dawid Medwediuk

Published: 11.08.2017

10 minute

SEO guidelines on pagination, e.g. of online store categories or blog articles, are a recurring subject and there are a few approaches to it. We are going to discuss how to tackle pagination to help web crawlers process content more easily.

What is pagination?

Pagination is simply dividing a specific resource on a website into pages. The simplest example can be categories in a majority of online stores where after listing a specific number of products the user can go to the next part of the list of products. Another way to present products within a category is infinite scroll pagination where when the user scrolls down, new and new products are displayed. It is the same for blogs. Generally, it does not matter which resource of a website is divided into parts, it is more about how you do it and how you notify Googlebot of content that follows.

Let web crawlers do what they do

One of the basic mistakes committed while paginating is using canonical tags on the first pagination page. And even though it is justified in one situation, it is usually done incorrectly.

If, for instance, a product category is divided into a few pages, it means that the base page includes a specific number of products, the second page includes another group of products, and so on and so forth, until the last pagination page.

Pagination pages are as follows:

https://website.com/ product-category/page-2
https://website.com/ product-category/page-3
https://website.com/product-category/page-4

etc.

therefore, it does not make much sense to use such canonical links:

due to the fact that it is a further part of the products from a given category, pages 2,3,4, etc. are not duplicates of the main page (unless it is not the case, then using a canonical link is reasonable).

Naturally, you can provide canonical links, but to themselves. In such a case, enter the following on page https://website.com/product-category/ :

<link rel="canonical" href="https://website.com/product-category" />

and the following on page https://website.com/product-category/page-2:

A page with all the content

There is an exception – a base page with all the content, additionally divided into a few sub-pages with less content. In such a situation, it makes sense to use a canonical link. It is a practice used often in quite long articles with a lot of content.

Thus, the following pages:

https://website.com/product-category/page-1
https://website.com/product-category/page-2
https://website.com/product-category/page-3
https://website.com/product-category/page-4

combined into a single logical sequence will in fact include all the content on base page https://website.com/product-category/ which you want to use for search engine indexing.

This unfortunate first pagination page…

Yet another mistake is duplicating category base page https://website.com/product-category/ with the first pagination page https://website.com/product-category/page-1

If the base page includes the same content as /page-1,then it is pointless to make a duplicate – certain content management systems normally generate (beside a base page)the first pagination page starting with /page-1;in such a case, the address should e.g. be redirected to the base page using Redirect 301, and in the code you should provide a link not to https://website.com/product-category/page-1 but rather to https://website.com/product-category/

Otherwise, you will encounter an internal redirection problem, thus wasting the crawl budget of Googlebot for crawling a given website. In other words, there is no use in redirecting it with an internal link using Redirect 301 when you can link directly to the landing page using HTTP status code 200.

The rel=“next”andrel=“prev” attributes

If you decide to paginate as follows:

https://website.com/product-category/
https://website.com/product-category/page-2
https://website.com/product-category/page-3
https://website.com/product-category/page-4

then you can indicate relationships between individual pages to a crawler. Providing relevant links with rel attributes in the <head>section highlights individual pages of a sequence.

On the first (base) page, put the following:

as another page containing a part of the same resource.

On the next page, i.e. https://website.com/product-category/page-2,put the following:

Moving on, on the third page, i.e. https://website.com/product-category/page-3:

Assuming that /page-4 is thelast one in the pagination sequence, put there the following:

It is the most frequently used combination of links in a heading with “rel next/prev” attributes. There are sometimes attributes added to links “<a href=” in the code (e.g. below the list), though it is better to put them inthe<head> section –the sooner a crawler gets a clue where the rest of the resource is, the better.

You need to remember that links with “rel next/prev” attributes are not absolute directives, that is they can, but do not have to, be taken into consideration while crawling a website. In addition, it is worth using absolute paths to subsequent pages, although Google guidelines say that “values can be absolute paths and relative paths.”However, in the case of a page with parameters in a URL, e.g. sorting parameters, they should also be put in “rel next/prev” links–for instance, the following page:

https://website.com/product-category/page-3&sort=desc

will include the following links in the <head> section:

but a canonical tag pointing only to itself (without the parameter of sorting in descending order):

To index or not to index?

Another problem faced while paginating is misusing robots meta-tags. Google claimed thatindividual pages are to be considered as a logical sequence which will result in taking into account attributes of their links and in most cases will help the user display the first page.

It is very often the case that individual pagination pages, despite being connected to each other with links with “rel next/prev” attributes, are in the search engine index. If you have a problem with that, it will be best to use the following:

on the subsequent pages, i.e. on /page-2 and higher. A web crawler will crawl the links until the last paginated page, but will not index them.

So what about canonicals?

It clearly follows from the discussion with John Mueller (https://plus.google.com/+JohnELincoln/posts/TCJHwdZHdQc) that you should not combine ‘no index’ meta tags with canonicals pointing at an indexable URL. Therefore, in this case, if a canonical link on a pagination page directs to itself, a ‘no index’ meta tag is not to be put and the other way round.

I did not notice any problems with crawling paginated pages with canonical links to the pages themselves and the‘no index’ meta tag; however, Google wants to decide on its own what to index and what to not index, so perhaps we should let Google do it.

Pagination and a website XML sitemap

Here is an interesting case: you rather do not enter pagination links in a sitemap.xml, but only the base address of, for example, a category or the main article with all the content. In principle, in an XML sitemap there should be entered those addresses which you want to index, so if you do not care about pagination indexing, do not include there URLs to /page-2 and higher. Naturally, if you block it with a ‘no index’ robots meta tag, it is even simpler – do not provide them.

Pagination pages as parameters

If you want to support product/article indexing from the last paginated page, provide web crawlers with unlimited access to pages of other categories past the base category. It is often the case that individual pages have “?page=” or “?p=” parameters and can be mistakenly blocked in robots.txt together with the rest of the parameters.

It is similar for URL parameters in Google Search Console. It is worth showing Googlebot that such parameters are responsible for dividing resources into pages and letting the bot decide.

Pagination and descriptions of categories

A frequently followed practice is extending category content, e.g. in online stores. It often happens that an added description is displayed at all the pagination URLs in an unchanged form. If all the pages are treated as a logical sequence and combined into a whole by “rel next/prev” attributes, then it seems unnecessary to duplicate the same text on /page-2, /page-3, etc.

It is similar as far as listed products are concerned: every subsequent page displays a new set of products, and the user wants to neither read the same category description all the time on each and every page nor look at the same products further within the category. There is a question though – who reads category descriptions? Googlebot probably does, so do not feed it with duplicates.

Do you need pagination if you use an infinite scroll?

It turns out that pagination is a viable option also in the ‘infinite scroll’ solution, where by scrolling down the list of products/articles within a category, new resources are displayed. John Mueller (http://scrollsample.appspot.com/items) provided a sensible example: while scrolling the list, new parameters pointing to subsequent pages are added to the URL.

http://scrollsample.appspot.com/items?page=2
http://scrollsample.appspot.com/items?page=3

and so on and so forth.

Obviously, here are used links with “rel next/prev” attributes and provided is a canonical link to itself; for example, at http://scrollsample.appspot.com/items?page=5, it is as follows:

Pagination as an element of an indexing strategy

A decision to apply a particular pagination solution is naturally yours. The foregoing examples are not perfect options in every situation because you may encounter technical difficulties, CMS issues or simply it is contrary to an indexing strategy adopted. However, it is worth optimizing pagination to such an extent as to at least not stop Googlebot from crawling the website, as well as fostering the indexing of buried product/article pages which are not that easily accessible during a single visit.