3 pleasantly surprising PostgreSQL Indexing tricks

May 31 Bullet_white By Greg Bullet_white Comments Comments

Most Rails engineers know the basics of database performance. They know that if a query is slow, an index may be the solution. Some know the trade-offs between having and not having an index. Or why an index on a low-cardinality column might not help. But everyone is surprised when I show them a few more advanced indexing techniques. The only response I get is Wow, I didn't know that's possible! In this article, I'll show you 3 techniques that render this kind of response.

Subindexes

Almost every larger app I see has tables that are over-indexed. Often, the cause is indexing columns that were already covered by a larger, multi-column index. The smaller indexes (I call them subindexes) could be safely dropped. Let's illustrate this with an example.

Imagine a users table with columns first_name and last_name. The app allows to search users by matching both columns:

User.where(first_name: params[:first_name], last_name: params[:last_name])

This query will benefit from an index on last_name, first_name (note the order, it'll be important later). This is one index on two columns, not two single-column indexes. The app also allows searches by last name only:

User.where(last_name: params[:last_name])

This query will benefit from an index on last_name. But you don't have to create one! The two-column index on last_name, first is also an index on last_name. Just take a look at the example index below. The first column is already sorted (but the second one is not; this why the order was important)!

last_name first_name
Bastiat Frédéric
Menger Carl
Ricardo David
Say Jean-Baptiste
Smith Adam

The general rule is that a multi-column index on column_1, column_2, ..., column_n can serve as an index on:

  • column_1
  • column_1, column_2
  • ...
  • column_1, column_2, ..., column_(n - 1)

In other words, it's an index on all prefixes of column_1, ..., column_n.

You can create a multi-column index in vanilla Rails with:

add_index :users, [:last_name, :first_name]

It might be difficult to spot an index that is a prefix of a larger index. I recommend that you use my tool called active_record_doctor. Among other things, it identifies extraneous indexes with a single command.

Expression Indexes

One of my clients used the following code:

User.where("LOWER(email) = LOWER(?)", params[:email])

If the query didn't contain the call to LOWER(email) then it could use an index on users.email. However, the index on users.email contains unaltered emails, not their lower-case variants, so it's useless for this query. In general, an index on a column won't be used if the query matches rows based on a function of data stored in them. But PostgreSQL can help here too!

PostgreSQL supports expression indexes. They index values obtained by applying functions to the original row. To speed up the query above, we could create an index with:

CREATE INDEX users_on_lower_email ON users(LOWER(email));

Unfortunately, Rails doesn't support these indexes. There are two ways around it.

You can use the schema_plus gem that makes schema.rb more powerful. Add the gem to your Gemfile:

gem "schema_plus"

and run bundle install. An expression index can be created by calling:

add_index :users, index: { expression: "LOWER(email)" }

Alternatively, you can resort to raw SQL in your migrations. First, switch the schema format to SQL. Go to config/application.rb and set config.active_record.schema_format to :sql. Second, create a migration like this:

class IndexUsersOnLowerEmail < ActiveRecord::Migration
  def up
    execute(<<SQL)
CREATE INDEX users_on_lower_email ON users(LOWER(email));
SQL
  end

  def down
    execute(<<SQL)
DROP INDEX users_on_lower_email;
SQL
  end
end

I avoid dependencies that aren't essential, so I prefer the raw SQL. Your mileage may vary.

Partial Indexes

On one of the projects I worked on had the following code:

Message.where(sent: false).find_each(&:process)

It runs every hour or so to send enqueued messages. The Message#process method set sent to true after sending the message. Only new messages had sent set to false. The code above doesn't care about sent messages at all.

The problem is that there was an index on messages.sent. It encompassed all rows. 99% of messages were marked as sent so 99% of entries in the index were unused. The app cared only about those 1% with sent == false.

PostgreSQL to the rescue! We can create an index on a subset of rows that satisfy a WHERE condition. In this case:

CREATE INDEX messages_sent_true ON messages(sent) WHERE NOT sent;

This command will index only those rows that aren't marked as sent.

You can create partial indexes in vanilla Rails. Just run:

add_index :messages, :sent, where: "NOT sent"

When used right, partial indexes are smaller and faster than full-table indexes.

3 New Tools in Your Arsenal

That's it! Let's take one last look at the 3 tricks we discussed here:

  1. Subindexes that can save you space and make updates faster by reusing an existing multi-column index.
  2. Expression indexes that speed up queries with WHERE clauses matching a function of the columns instead of raw values.
  3. Partial indexes that are much smaller than a complete index because they don't contain data that the app doesn't care about.

These 3 tricks can save you gigabytes of storage and hundreds of milliseconds of latency.


Greg Navis provides PostgreSQL scaling consulting services tailored for apps running on Heroku. If you want to learn more about PostgreSQL performance, subscribe to Greg's newsletter.

Comments

comments powered by Disqus