Making sense of ActiveRecord joins, includes, preload, and eager_load

January 24 Bullet_white By Derek Bullet_white Posted in HowTo Bullet_white Comments Comments

Like a pair of jumper cables, ActiveRecord's joins, includes, preload, and eager_load methods are incredibly useful, but also very dangerous when used incorrectly. Knowing when and where to use each approach - and even when to combine them - can save you considerable trouble as your app grows.

I'll explore the when and where of each method below.

joins

What's the ideal use case for joins?

If you are just filtering results - not accessing records from a relationship - joins is your goto. The example below fetches all blog posts with a comment authored by Derek. I'm not accessing any of the associated comments, so joins is a great fit:

Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.title }
  Post Load (1.2ms)  SELECT  "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1
=> ["One weird trick to better Rails apps",
 "1,234 weird tricks to faster Rails apps",
 "You wouldn't believe what happened to this Rails developer after 14 days"]

Do joins prevent N+1 queries?

By itself, no. joins does not load data from the relationship into memory: accessing columns from the relationship will trigger N+1 queries.

For example, notice all of the additional queries when accessing the Comment relationship:

Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.comments.size }
  Post Load (1.2ms)  SELECT  "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1
   (1.0ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
   (3.0ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
   (0.3ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
   (1.0ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
   (2.1ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
   (1.4ms)  SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1
=> [3,5,2,4,2,1]

Can joins be combined with includes, preload, and eager_load?

Yes. The join type specified by joins (default is INNER JOIN) will override any join applied by includes or eager_load. Note that preload doesn't apply a join.

includes

Can includes prevent N+1 queries?

Yes. includes will load (1) all records of the parent and (2) all of the records referenced as arguments in the includes method.

Notice how using includes in the example below only triggers 1 additional query. Without includes, there would be an additional query to count the number of comments for every post:

Post.includes(:comments).map { |post| post.comments.size }
  Post Load (1.2ms)  SELECT  "posts".* FROM "posts"
  Comment Load (2.0ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 3, 4, 5, 6)
=> [3,5,2,4,2,1]

Does includes always generate a separate query to fetch the records in the relationship?

No. includes will either use a separate query (like above) or a LEFT OUTER JOIN. If you have a where or order clause that references a relationship, a LEFT OUTER JOIN is used versus a separate query.

Is a single query or two queries faster?

Digging through the ActiveRecord source, I don't believe ActiveRecord makes the decision to use two queries or a single join query based on performance. If you are seeing slow performance with an includes query, I'd suggest using a tool like Scout DevTrace locally and examining which approach ActiveRecord is using when running includes.

If two queries are being used, you can experiment with a single LEFT OUTER JOIN query by adding references to the ActiveRecord relation:

Post.includes(:comments).references(:comments).map { |post| post.comments.size }

What happens when I apply conditions to a relationship referenced via includes?

ActiveRecord will return all of the parent records and just the relationship records that match the condition.

For example, the following will return all Post records with a Comment by Derek, and just those comments authored by Derek:

Post.includes(:comments).references(:comments).where(comments => {author: 'Derek'}).map { |post| post.comments.size }

Does includes prevent all N+1 queries?

No. If you are accessing data in a nested relationship, that data isn't preloaded. For example, an additional query would be required to load the Comment#likes association for each comment:

<% post.comments.each do |comment| %>
  <%= comment.likes.map { |like| like.user_avatar_url }
<% end %>

Can I prevent N+1s in nested relationships?

Yes. You can load nested relationships via includes:

Post.includes(comments => :likes).references(:comments).map { |post| post.comments.size }

Should I always always load data from nested relationships?

No. It's very easy to end up initializing a significant number of records. For example, a popular Comment may have thousands of Like records, which would result in a slow query and significant memory allocations.

A tool like Scout DevTrace running locally against production-like data can help you determine the faster approach.

preload

Should I ever use preload by itself?

Sometimes, but not by default. I use preload versus includes if I know using a LEFT OUTER JOIN to load a relationship is significantly slower. Otherwise, if I add where or order clauses at a later date, those clauses would trigger eager_load, which would trigger a join.

Is it common to combine joins with preload?

If I need all relationship records - not just just those that match a relationship condition - I'll combine preload and joins. For example:

  1. Find all Post records with a Comment authored by Derek
  2. Render those Post records and the total count of comments for each post

includes will only fetch Comment records authored by Derek, not all comments associated with each post.

Post.joins("LEFT OUTER JOIN comments ON comments.post_id = posts.id").where(:comments => {author: 'Derek'}).preload(:comments).map { |post| post.comments.size }

eager_load

includes delegates to eager_load when a where or order clause references a relationship.

Should I ever use eager_load by itself?

Yes. If I've found includes to be slow using two queries, using eager_load will force a single query via a LEFT OUTER JOIN. Its presence in my code indicates I'm future-proofing against using two queries to fetch records.

Can I combine eager_load with joins?

Yes. In the following example:

Post.joins(:comments).eager_load(:comments).map { |post| post.comments.size }

ActiveRecord will do the following:

  1. Return an Array of Post records with comments.
  2. Load the comments associated with each Post.

It's includes with an INNER JOIN vs. a LEFT OUTER JOIN.

TL;DR

I'd roughly summarize my approach to these methods like this:

  1. If I'm just filtering, use joins.
  2. If I'm accessing relationships, start with includes.
  3. If includes is slow using two separate queries, I'll use eager_load to force a single query and compare performance.

There are many edge cases when accessing relationships via ActiveRecord. Hopefully this is enough to prevent some of the more basic performance deadends when using joins, includes, preload, and eager_load.

Also See

Subscribe for more

Want more Rails insights like this delivered monthly to your inbox? Just put your email into the sidebar form.

Comments

comments powered by Disqus