An ‘order by’ tweak for better/worse query performance

Imagine this if the query used to retrieve a listing of books for your website.

select id, date_published from books order by date_published desc

If there happen to be multiple books with precisely the same date_published their order is clearly defined. They will tend to come out of the database in the same order but code running multiple queries using offset and limit to page through these rows could potentially see the same book more than once due to rows coming out of the database in a different order between queries.

No problem, just order by id too. That way the rows are guaranteed to come out of the database in the same order. The result set is 99.9% sorted by the first order by clause so the second one shouldn’t affect performance much.

select id, date_published from books order by date_published desc, id

Suddenly the query execution time goes from 3ms to 300ms. What just happened? Why has a second order by clause had such a massive performance effect?

Do both date_published and id have indexes on them? Yes.

If you have a look at the explain output for the two queries one crucial difference is apparent. The query with a single order by clause doesn’t have a sort step. The rows come out of the database in index order which happens to match your requested order so no sorting is necessary. When you then say “sort by this, and now sort by this other thing” relying on index order is no longer enough. Explicit sorting is now required and that takes time.

How To Fix It

1) Add a compound index on your two columns. Having the two columns individually indexed isn’t enough. You need an index that includes the two columns as a pair.

2) Check that your order by is sorting in the same direction as your index ie that you are requesting the rows in index order. You can easily check this by trying both an asc and a desc order by and making a note of query execution time. Flipping between asc and desc will likely causes your query execution time to flip between super fast and disastrously slow. Chances are you want the below.

select id, date_published from books order by date_published desc, id desc

If your query execution time went from 3ms to 300ms you should now see it drop back down to ~3ms but you now have guaranteed row order even with duplicate date_published values. If you don’t get both steps right then the query execution time won’t come down at all.