I recently had to deal with a n + 1 query….
Firstly let’s understand why does this happen?
The N+1 query problem is like inviting friends to a party and then calling each one separately to ask if they’re coming. If you have 10 friends, you make 1 call to decide to invite them and then 10 more calls to each friend. So, for 10 friends, you make 11 calls in total.
In databases, this is like fetching a list of items (like blog posts), and then for each item, making another query to fetch related data (like comments on each post). If you have 10 posts and you fetch comments for each one by one, you end up making 1 query to get all posts plus 10 more queries for comments, leading to 11 queries.
But why in db happens this??
The First Query (N): When you ask for the list of posts, the ORM makes one query to fetch all posts. This is your “N” part of the problem, where “N” is the number of posts.
The Plus One (+1) Part: For each post, when you try to access its comments, the ORM realizes it hasn’t fetched those yet. So, it makes a new query for each post to fetch its comments. If you have 10 posts, this approach results in 10 additional queries - one for each post to get its comments.
The ORM’s default behavior is to load data on demand (lazy loading). It avoids fetching related data until you explicitly access it, which can be efficient in scenarios where the related data is not needed. However, when you do need related data for each item in a list, it leads to multiple queries, creating the N+1 problem.
Example:
# rails console
@events = Event.all
Event Load (0.5ms) SELECT "events".*
FROM "events" /* loading for pp */ LIMIT ? [["LIMIT", 11]]
# Each call to `event.attendees` for an `event` triggers a separate query to fetch the `attendees` for just that `event`.
@events.each do |event|
puts "Event: #{event.name}"
👉event.attendees.each do |attendee|
puts " Attendee: #{attendee.name}"
end
end
# Logs:
👉Event Load (0.2ms) SELECT "events".* FROM "events"
Event: Tech Conference
👉Attendee Load (0.1ms) SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ? [["event_id", 1]]
Attendee: Alice
Attendee: Bob
Attendee: Charlie
Event: Music Festival
👉Attendee Load (0.0ms) SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ? [["event_id", 2]]
Attendee: Bob
Attendee: Dana
Data to recreate a small excercise:
Let’s create a small app so we can see in detail what is being produced by each active record method.
# Create app with name `preload_demo_app`
rails new preload_demo_app
# Go to `preload_demo_app` dir
cd preload_demo_app
Then let’s create models and add the associations we need needed.
# Create `Event` model
rails generate model Event name:string location:string start_time:datetime
# Create `Attendee` model
rails generate model Attendee name:string
# Create `Registration` model
rails generate model Registration event:references attendee:references
Active Record Associations 👇
# app/models/event.rb
class Event < ApplicationRecord
has_many :registrations
has_many :attendees, through: :registrations
end
# app/models/attendee.rb
class Attendee < ApplicationRecord
has_many :registrations
has_many :events, through: :registrations
end
# app/models/registration.rb
class Registration < ApplicationRecord
belongs_to :event
belongs_to :attendee
end
One last step, we need some data to play with:
# events
event1 = Event.create(name: "Tech Conference", location: "Conference Center", start_time: DateTime.new(2024, 5, 20, 10, 0, 0))
event2 = Event.create(name: "Music Festival", location: "Outdoor Park", start_time: DateTime.new(2024, 6, 15, 12, 0, 0))
# attendees
attendee1 = Attendee.create(name: "Alice")
attendee2 = Attendee.create(name: "Bob")
attendee3 = Attendee.create(name: "Charlie")
attendee4 = Attendee.create(name: "Dana")
# registrations
Registration.create(event: event1, attendee: attendee1)
Registration.create(event: event1, attendee: attendee2)
Registration.create(event: event1, attendee: attendee3)
Registration.create(event: event2, attendee: attendee2)
Registration.create(event: event2, attendee: attendee4)
With this in place we can start playing with the model by opening ‘rails c
’
🔺 Let’s see includes
behavior:
# `.includes` is designed to minimize the number of queries and the overall load on the database by preloading associated data
Event.includes(:attendees)
# See the `IN` SQL keyword in the query for both `registrations` and `attendees`
# A single SQL query that retrieves both in a single database roundtrip.
Event Load (0.3ms) SELECT "events".* FROM "events" /* loading for pp */ LIMIT ? [["LIMIT", 11]]
Registration Load (0.4ms) SELECT "registrations".* FROM "registrations"
WHERE "registrations"."event_id" IN (?, ?) [["event_id", 1], ["event_id", 2]]
Attendee Load (0.1ms) SELECT "attendees".* FROM "attendees"
WHERE "attendees"."id" IN (?, ?, ?, ?) [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
🔺 Now preload
:
Event.preload(:attendees)
# See the `IN` SQL keyword as well, here the key difference is how you apply the `.where`
Event Load (0.1ms) SELECT "events".* FROM "events" /* loading for pp */ LIMIT ? [["LIMIT", 11]]
Registration Load (0.3ms) SELECT "registrations".* FROM "registrations"
WHERE "registrations"."event_id" IN (?, ?) [["event_id", 1], ["event_id", 2]]
Attendee Load (0.1ms) SELECT "attendees".* FROM "attendees"
WHERE "attendees"."id" IN (?, ?, ?, ?) [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
🔺 eager_load
:
Event.eager_load(:attendees)
# See the LEFT OUTER JOIN, it ensures that even `events` without any attendees are included in the result
SQL (0.1ms) SELECT DISTINCT "events"."id" FROM "events"
LEFT OUTER JOIN "registrations" ON "registrations"."event_id" = "events"."id"
LEFT OUTER JOIN "attendees" ON "attendees"."id" = "registrations"."attendee_id" /* loading for pp */ LIMIT ? [["LIMIT", 11]]
SQL (0.1ms) SELECT "events"."id" AS t0_r0, "events"."name" AS t0_r1, "events"."location" AS t0_r2,
"events"."start_time" AS t0_r3, "events"."created_at" AS t0_r4, "events"."updated_at" AS t0_r5,
"attendees"."id" AS t1_r0, "attendees"."name" AS t1_r1, "attendees"."created_at" AS t1_r2,
"attendees"."updated_at" AS t1_r3 FROM "events" LEFT OUTER JOIN "registrations"
ON "registrations"."event_id" = "events"."id" LEFT OUTER JOIN "attendees"
ON "attendees"."id" = "registrations"."attendee_id" WHERE "events"."id"
IN (?, ?) /* loading for pp */ [["id", 1], ["id", 2]]
🔺 What happens with .joins
:
event = Event.find_by(name: "Tech Conference")
Event Load (0.1ms) SELECT "events".* FROM "events" WHERE "events"."name" = ? LIMIT ? [["name", "Tech Conference"], ["LIMIT", 1]]
# Usage of `.joins`
Attendee.joins(:registrations).where(registrations: {event_id: event.id})
# an `INNER JOIN` is performed between `attendees` and `registrations`.
# It uses the association named `registrations` defined in the `Attendee` model.
Attendee Load (0.2ms) SELECT "attendees".* FROM "attendees"
INNER JOIN "registrations" ON "registrations"."attendee_id" = "attendees"."id"
WHERE "registrations"."event_id" = ? /* loading for pp */ LIMIT ? [["event_id", 1], ["LIMIT", 11]]
Event.joins(:registrations).distinct
# See the `DISTINCT` in the `SELECT` statement (ensures that events are listed uniquely).
SELECT DISTINCT "events".* FROM "events"
INNER JOIN "registrations" ON "registrations"."event_id" = "events"."id"
Summary:
Hope you get more understanding on this topic!
More blog posts here: 👇
Bhumi’s N + 1 blog post
arunyadav N + 1
Benito Serna N + 1 ebook
So you need a sort at database level or just with ruby is enough?