Dom Lizarraga

dominiclizarraga@hotmail.com

N + 1 queries and how to fix them.

I recently had to deal with a n + 1 query….

Firstly let’s understand why does this happen?

The N+1 query problem is like inviting friends to a party and then calling each one separately to ask if they’re coming. If you have 10 friends, you make 1 call to decide to invite them and then 10 more calls to each friend. So, for 10 friends, you make 11 calls in total.

In databases, this is like fetching a list of items (like blog posts), and then for each item, making another query to fetch related data (like comments on each post). If you have 10 posts and you fetch comments for each one by one, you end up making 1 query to get all posts plus 10 more queries for comments, leading to 11 queries.

But why in db happens this??

The First Query (N): When you ask for the list of posts, the ORM makes one query to fetch all posts. This is your “N” part of the problem, where “N” is the number of posts.

The Plus One (+1) Part: For each post, when you try to access its comments, the ORM realizes it hasn’t fetched those yet. So, it makes a new query for each post to fetch its comments. If you have 10 posts, this approach results in 10 additional queries - one for each post to get its comments.

The ORM’s default behavior is to load data on demand (lazy loading). It avoids fetching related data until you explicitly access it, which can be efficient in scenarios where the related data is not needed. However, when you do need related data for each item in a list, it leads to multiple queries, creating the N+1 problem.

Example:

# rails console
@events = Event.all
  Event Load (0.5ms)  SELECT "events".* 
  FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]

# Each call to `event.attendees` for an `event` triggers a separate query to fetch the `attendees` for just that `event`.

@events.each do |event|
  puts "Event: #{event.name}"
  👉event.attendees.each do |attendee|
    puts "  Attendee: #{attendee.name}"
  end
end
# Logs:
👉Event Load (0.2ms)  SELECT "events".* FROM "events"
  Event: Tech Conference
  👉Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ?  [["event_id", 1]]
    Attendee: Alice
    Attendee: Bob
    Attendee: Charlie
  Event: Music Festival
  👉Attendee Load (0.0ms)  SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ?  [["event_id", 2]]
    Attendee: Bob
    Attendee: Dana

Data to recreate a small excercise:

Let’s create a small app so we can see in detail what is being produced by each active record method.

# Create app with name `preload_demo_app`
rails new preload_demo_app
# Go to `preload_demo_app` dir
cd preload_demo_app

Then let’s create models and add the associations we need needed.

# Create `Event` model
rails generate model Event name:string location:string start_time:datetime
# Create `Attendee` model
rails generate model Attendee name:string
# Create `Registration` model
rails generate model Registration event:references attendee:references

Active Record Associations 👇

# app/models/event.rb
class Event < ApplicationRecord
  has_many :registrations
  has_many :attendees, through: :registrations
end
# app/models/attendee.rb
class Attendee < ApplicationRecord
  has_many :registrations
  has_many :events, through: :registrations
end
# app/models/registration.rb
class Registration < ApplicationRecord
  belongs_to :event
  belongs_to :attendee
end

One last step, we need some data to play with:

# events
event1 = Event.create(name: "Tech Conference", location: "Conference Center", start_time: DateTime.new(2024, 5, 20, 10, 0, 0))
event2 = Event.create(name: "Music Festival", location: "Outdoor Park", start_time: DateTime.new(2024, 6, 15, 12, 0, 0))

# attendees
attendee1 = Attendee.create(name: "Alice")
attendee2 = Attendee.create(name: "Bob")
attendee3 = Attendee.create(name: "Charlie")
attendee4 = Attendee.create(name: "Dana")

# registrations 
Registration.create(event: event1, attendee: attendee1)
Registration.create(event: event1, attendee: attendee2)
Registration.create(event: event1, attendee: attendee3)
Registration.create(event: event2, attendee: attendee2)
Registration.create(event: event2, attendee: attendee4)

With this in place we can start playing with the model by opening ‘rails c

🔺 Let’s see includes behavior:

# `.includes` is designed to minimize the number of queries and the overall load on the database by preloading associated data
Event.includes(:attendees)
  # See the `IN` SQL keyword in the query for both `registrations` and `attendees`
  # A single SQL query that retrieves both in a single database roundtrip.
  Event Load (0.3ms)  SELECT "events".* FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
    Registration Load (0.4ms)  SELECT "registrations".* FROM "registrations" 
      WHERE "registrations"."event_id" IN (?, ?)  [["event_id", 1], ["event_id", 2]]
    Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" 
      WHERE "attendees"."id" IN (?, ?, ?, ?)  [["id", 1], ["id", 2], ["id", 3], ["id", 4]]

🔺 Now preload:

Event.preload(:attendees)
  # See the `IN` SQL keyword as well, here the key difference is how you apply the `.where`
  Event Load (0.1ms)  SELECT "events".* FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
  Registration Load (0.3ms)  SELECT "registrations".* FROM "registrations" 
  WHERE "registrations"."event_id" IN (?, ?)  [["event_id", 1], ["event_id", 2]]
  Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" 
  WHERE "attendees"."id" IN (?, ?, ?, ?)  [["id", 1], ["id", 2], ["id", 3], ["id", 4]]

🔺 eager_load:

Event.eager_load(:attendees)
  # See the LEFT OUTER JOIN, it ensures that even `events` without any attendees are included in the result
  SQL (0.1ms)  SELECT DISTINCT "events"."id" FROM "events" 
  LEFT OUTER JOIN "registrations" ON "registrations"."event_id" = "events"."id" 
  LEFT OUTER JOIN "attendees" ON "attendees"."id" = "registrations"."attendee_id" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
  SQL (0.1ms)  SELECT "events"."id" AS t0_r0, "events"."name" AS t0_r1, "events"."location" AS t0_r2,
   "events"."start_time" AS t0_r3, "events"."created_at" AS t0_r4, "events"."updated_at" AS t0_r5,
    "attendees"."id" AS t1_r0, "attendees"."name" AS t1_r1, "attendees"."created_at" AS t1_r2, 
    "attendees"."updated_at" AS t1_r3 FROM "events" LEFT OUTER JOIN "registrations" 
    ON "registrations"."event_id" = "events"."id" LEFT OUTER JOIN "attendees" 
    ON "attendees"."id" = "registrations"."attendee_id" WHERE "events"."id" 
    IN (?, ?) /* loading for pp */  [["id", 1], ["id", 2]]

🔺 What happens with .joins:

event = Event.find_by(name: "Tech Conference")
Event Load (0.1ms)  SELECT "events".* FROM "events" WHERE "events"."name" = ? LIMIT ?  [["name", "Tech Conference"], ["LIMIT", 1]]
# Usage of `.joins`
Attendee.joins(:registrations).where(registrations: {event_id: event.id})
  # an `INNER JOIN` is performed between `attendees` and `registrations`. 
  # It uses the association named `registrations` defined in the `Attendee` model.
  Attendee Load (0.2ms)  SELECT "attendees".* FROM "attendees" 
  INNER JOIN "registrations" ON "registrations"."attendee_id" = "attendees"."id" 
  WHERE "registrations"."event_id" = ? /* loading for pp */ LIMIT ?  [["event_id", 1], ["LIMIT", 11]]

Event.joins(:registrations).distinct
  # See the `DISTINCT` in the `SELECT` statement (ensures that events are listed uniquely).
  SELECT DISTINCT "events".* FROM "events" 
  INNER JOIN "registrations" ON "registrations"."event_id" = "events"."id" 

Summary:

n + 1 table

Hope you get more understanding on this topic!

More blog posts here: 👇

Bhumi’s N + 1 blog post
arunyadav N + 1
Benito Serna N + 1 ebook

So you need a sort at database level or just with ruby is enough?

December 6, 2023   @domlizarraga_