SQL — Pagination, You Are Probably Doing It Wrong

Published in

The Startup

8 min readAug 31, 2020

Am I crazy assuming you are doing pagination wrong? I mean how hard is pagination? it’s so simple how can you do it wrong?

Well, you might not be doing it wrong, but you are probably using LIMIT with OFFSET to do pagination. That has performance limitations you might not be aware of and you probably can do better or you need to secure some of your URL's.

You do not believe me? Well just check this graph down below.

This is done on a small dataset of a 100.000 lines with small 10 line pages. Using both Limit & Offset for the first data series and the alternative method which I have called “optimized” which in reality is called keyset pagination or seek pagination where only a Limit is used. This was done on mysql 5.7.

What you can see here is by using the offset, that the first pages are relatively fast to open. The first 1200 pages which allows us to load at the end12K lines of data have all taken less then 2ms each. So if you are displaying a page to a user it’s still “okay”. But if the users also opens the last page of the site, then performance will drop. And the more pages you have the slower the last page will be to load.

So in this article we are going to clear out 3 questions:

Do we need an order by when doing pagination? If so why
Why does using the offset slow downs the queries?
How can you get better performance across all pages without using the offset?

Databases allows us to query a data with little knowledge of how they work; but some very simple queries can create performance issues. It’s important to understand the basics of how things work to be able to write better code. I do a lot of simplifications in this article to make it clear. I try and point out those simplifications as we go.

Before getting into the subject if you wish to see the difference in performance of both methods for yourself you can check this gist with the code I used. I didn’t post the code here as I don’t think it’s relevant (and it’s ugly).

Not using an order by with limit & offset

You probably have seen, or even written queries such as

SELECT * FROM test LIMIT 100, 10;

At first glance this query doesn’t seems to be wrong; but it is.

First of all let’s see what might go wrong, then why it can go wrong.

Let’s use an example dataset with 10 rows, which contains only an ID.

╔════╦════╦════╦════╦════╦════╦════╦════╦════╦════╗
║ 10 ║ 13 ║ 15 ║ 17 ║ 21 ║ 22 ║ 24 ║ 25 ║ 27 ║ 28 ║
╚════╩════╩════╩════╩════╩════╩════╩════╩════╩════╝

Our first query SELECT * FROM test LIMIT 0, 5; should return the following data:

╔════╦════╦════╦════╦════╗
║ 10 ║ 13 ║ 15 ║ 17 ║ 21 ║
╚════╩════╩════╩════╩════╝

Our second data the following

╔════╦════╦════╦════╦════╗
║ 22 ║ 24 ║ 25 ║ 27 ║ 28 ║
╚════╩════╩════╩════╩════╝

The order here is not really important, what we really wish is to get all 10 elements. In 2 different pages so that are user can see all the numbers. And we think they can. Or can they?

In reality nothing prevents the second page to contain a row that was already loaded on the first. So we can have the following 2 pages:

╔════╦════╦════╦════╦════╗
║ 10 ║ 13 ║ 15 ║ 17 ║ 21 ║
╚════╩════╩════╩════╩════╝
╔════╦════╦════╦════╦════╗
║ 22 ║ 24 ║ 10 ║ 27 ║ 28 ║
╚════╩════╩════╩════╩════╝

And we have “10” displayed in both the first page and the second page and “25” never displayed.

Why can this happen? Well we have not said to to the sql server to order the data. This means the execution plan of the query may differ between 2 queries in which case the way it loads the data changes and therefore the order of the data might change. Adding the Order By you insure that your data is reliable.

This is a documented behavior and not a bug!

Do you loose something by adding the order by and securing the behavior? Well no.

Pagination without and order by is not faster. Let’s see now why such queries are slow.

Why does using offset slow down queries.

To understand this we need first to understand how RDBMS will order the data. If our database schema is well done, we probably have an index on the field we are ordering by. We will assume in this article that our RDBMS uses B trees for those indexes. This is a simplification as the method used depends on the data type and even then they will rather use B tree variants. But for simplifications sake let’s assume that a simple B tree is used.

For our previous 10 rows the index would look as follows.

So when we execute our query to get first 5 results, the tree allow the 5 elements to be found immediately. So the following query will go through the the tree as follows:

SELECT * FROM my_table ORDER BY id LIMIT 5

In this example just going through 5 leafs it was able to find the 5 smallest numbers. Let’s now load the second page and see what happens:

SELECT * FROM my_table ORDER BY id LIMIT 5 5

The index didn’t help, we still had to go through all the leafs in order to find the last 5. This is because there is no way for the RDBMS to know where it needs to start. So it has to find the first 5(in red) elements in order to load the next 5 elements.

Optimizing the pagination

How can we make our pagination work and fast in that case?

Simple, by profiting from the indexes.

The hypothesis we need to make here is that you are never going to open page 2 without opening page 1 before.

With this hypothesis we know that the last Id on page 1 was 21 we can therefore ask the 5 first elements whose id is superior to 21.

SELECT * FROM my_table WHERE id > 21 ORDER BY id LIMIT 5

This way the index is utilized and we are checking very few unnecessary rows. This is called seek method or keyset pagination.

We can also see this by using the EXPLAIN command on both queries.

EXPLAIN EXTENDED SELECT * FROM test LIMIT 5, 5\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: test
 partitions: NULL
 type: ALL
possible_keys: NULL
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 100104
 filtered: 100.00
 Extra: NULL

In this example there are no keys being used. While with the keyset pagination the primary key will be used.

EXPLAIN EXTENDED SELECT * FROM test WHERE col1 > 21 LIMIT 5\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: test
 partitions: NULL
 type: range
possible_keys: PRIMARY
 key: PRIMARY
 key_len: 4
 ref: NULL
 rows: 50052
 filtered: 100.00
 Extra: Using where
1 row in set, 2 warnings (0.00 sec)

Lets also note that some storage solutions such as Elasticsearch will even prevent you from querying a important offset for this same reason.

A few additional examples

Sorting per date

If you are sorting per date there always is a risk that multiple rows have the same date. If those rows are just between 2 pages you are going to have an issue not loading some of those rows on neither page. To fix this you need to order on a second primary field.

SELECT *
FROM my_table
WHERE (update_date = '2017-12-21' AND id > 21) 
    OR update_date > '2017-12-21'
ORDER BY update_date,id LIMIT 5

Basically we are fetching all the rows that has the same update date as the last row on the previous page, but whose id is different. And also rows that were updated later. This of course make the query more complicated.

Conclusion

You need to remember that; you must use an order by when doing pagination.

There is an alternative to pagination, but it’s has limitations

It won’t allow you to open a page without opening the previous page.
If you have multiple order by’s the query will become very complicated.

Therefore if you are displaying a pager with page numbers to your users; you are probably using the correct method but must limit the max page number.

Limiting the number of usable pages is important. If you have let’s say 100 Million entries well a few users trying to open the last page can create more load on your servers then hundreds of users. So you should limit the number of usable pages and force the user to use filters & additional sorting to find what he needs.

Or just have a next & previous buttons using the the keyset pagination.

If you are using pagination for batch processes and you are using the offset you are using the wrong method.

Using a seek method/keyset pagination to load pages is very efficient; it will make your processes faster for no cost if you do it that way from the beginning. It will also secure the development for the future. If it works today with 10.000 rows it will work in 10 years when you have 10 Million rows.

If you have an infinite scroll or a simpler page with just a next & previous button you must also use the seek method/keyset pagination.

Keyset pagination’s only limitations is jumping on a specific page. But that’s is in my opinion rarely needed. With proper UX/Design it can be avoided.

Keyset pagination has no size limit, you can query in a 100 million row table and if you have the right indexes all the pages will be fast. So in my opinion there is no reason not to use it instead of what we are more used of doing.

Sadly we are used to the classic offset pagination, and most frameworks allows us to do such pagination's; and actually even make it easier for us to use them.

We should avoid using offset as much as possible and only use it when absolutely necessary and putting proper protection in place.