dom lizarraga

    dominiclizarraga@hotmail.com

    N + 1 queries and how to fix them.

    11 minutes to read

    I recently had to deal with a n + 1 query….

    Firstly let’s understand why does this happen?

    The N+1 query problem is like inviting friends to a party and then calling each one separately to ask if they’re coming. If you have 10 friends, you make 1 call to decide to invite them and then 10 more calls to each friend. So, for 10 friends, you make 11 calls in total.

    In databases, this is like fetching a list of items (like blog posts), and then for each item, making another query to fetch related data (like comments on each post). If you have 10 posts and you fetch comments for each one by one, you end up making 1 query to get all posts plus 10 more queries for comments, leading to 11 queries.

    But why in db happens this??

    The First Query (N): When you ask for the list of posts, the ORM makes one query to fetch all posts. This is your “N” part of the problem, where “N” is the number of posts.

    The Plus One (+1) Part: For each post, when you try to access its comments, the ORM realizes it hasn’t fetched those yet. So, it makes a new query for each post to fetch its comments. If you have 10 posts, this approach results in 10 additional queries - one for each post to get its comments.

    The ORM’s default behavior is to load data on demand (lazy loading). It avoids fetching related data until you explicitly access it, which can be efficient in scenarios where the related data is not needed. However, when you do need related data for each item in a list, it leads to multiple queries, creating the N+1 problem.

    Example:

    # rails console
    @events = Event.all
      Event Load (0.5ms)  SELECT "events".* 
      FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
    
    # Each call to `event.attendees` for an `event` triggers a separate query to fetch the `attendees` for just that `event`.
    
    @events.each do |event|
      puts "Event: #{event.name}"
      👉event.attendees.each do |attendee|
        puts "  Attendee: #{attendee.name}"
      end
    end
    # Logs:
    👉Event Load (0.2ms)  SELECT "events".* FROM "events"
      Event: Tech Conference
      👉Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ?  [["event_id", 1]]
        Attendee: Alice
        Attendee: Bob
        Attendee: Charlie
      Event: Music Festival
      👉Attendee Load (0.0ms)  SELECT "attendees".* FROM "attendees" INNER JOIN "registrations" ON "attendees"."id" = "registrations"."attendee_id" WHERE "registrations"."event_id" = ?  [["event_id", 2]]
        Attendee: Bob
        Attendee: Dana
    

    Data to recreate a small excercise:

    Let’s create a small app so we can see in detail what is being produced by each active record method.

    # Create app with name `preload_demo_app`
    rails new preload_demo_app
    # Go to `preload_demo_app` dir
    cd preload_demo_app
    

    Then let’s create models and add the associations we need needed.

    # Create `Event` model
    rails generate model Event name:string location:string start_time:datetime
    # Create `Attendee` model
    rails generate model Attendee name:string
    # Create `Registration` model
    rails generate model Registration event:references attendee:references
    

    Active Record Associations 👇

    # app/models/event.rb
    class Event < ApplicationRecord
      has_many :registrations
      has_many :attendees, through: :registrations
    end
    # app/models/attendee.rb
    class Attendee < ApplicationRecord
      has_many :registrations
      has_many :events, through: :registrations
    end
    # app/models/registration.rb
    class Registration < ApplicationRecord
      belongs_to :event
      belongs_to :attendee
    end
    

    One last step, we need some data to play with:

    # events
    event1 = Event.create(name: "Tech Conference", location: "Conference Center", start_time: DateTime.new(2024, 5, 20, 10, 0, 0))
    event2 = Event.create(name: "Music Festival", location: "Outdoor Park", start_time: DateTime.new(2024, 6, 15, 12, 0, 0))
    
    # attendees
    attendee1 = Attendee.create(name: "Alice")
    attendee2 = Attendee.create(name: "Bob")
    attendee3 = Attendee.create(name: "Charlie")
    attendee4 = Attendee.create(name: "Dana")
    
    # registrations 
    Registration.create(event: event1, attendee: attendee1)
    Registration.create(event: event1, attendee: attendee2)
    Registration.create(event: event1, attendee: attendee3)
    Registration.create(event: event2, attendee: attendee2)
    Registration.create(event: event2, attendee: attendee4)
    

    With this in place we can start playing with the model by opening ‘rails c

    🔺 Let’s see includes behavior:

    # `.includes` is designed to minimize the number of queries and the overall load on the database by preloading associated data
    Event.includes(:attendees)
      # See the `IN` SQL keyword in the query for both `registrations` and `attendees`
      # A single SQL query that retrieves both in a single database roundtrip.
      Event Load (0.3ms)  SELECT "events".* FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
        Registration Load (0.4ms)  SELECT "registrations".* FROM "registrations" 
          WHERE "registrations"."event_id" IN (?, ?)  [["event_id", 1], ["event_id", 2]]
        Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" 
          WHERE "attendees"."id" IN (?, ?, ?, ?)  [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
    

    🔺 Now preload:

    Event.preload(:attendees)
      # See the `IN` SQL keyword as well, here the key difference is how you apply the `.where`
      Event Load (0.1ms)  SELECT "events".* FROM "events" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
      Registration Load (0.3ms)  SELECT "registrations".* FROM "registrations" 
      WHERE "registrations"."event_id" IN (?, ?)  [["event_id", 1], ["event_id", 2]]
      Attendee Load (0.1ms)  SELECT "attendees".* FROM "attendees" 
      WHERE "attendees"."id" IN (?, ?, ?, ?)  [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
    

    🔺 eager_load:

    Event.eager_load(:attendees)
      # See the LEFT OUTER JOIN, it ensures that even `events` without any attendees are included in the result
      SQL (0.1ms)  SELECT DISTINCT "events"."id" FROM "events" 
      LEFT OUTER JOIN "registrations" ON "registrations"."event_id" = "events"."id" 
      LEFT OUTER JOIN "attendees" ON "attendees"."id" = "registrations"."attendee_id" /* loading for pp */ LIMIT ?  [["LIMIT", 11]]
      SQL (0.1ms)  SELECT "events"."id" AS t0_r0, "events"."name" AS t0_r1, "events"."location" AS t0_r2,
       "events"."start_time" AS t0_r3, "events"."created_at" AS t0_r4, "events"."updated_at" AS t0_r5,
        "attendees"."id" AS t1_r0, "attendees"."name" AS t1_r1, "attendees"."created_at" AS t1_r2, 
        "attendees"."updated_at" AS t1_r3 FROM "events" LEFT OUTER JOIN "registrations" 
        ON "registrations"."event_id" = "events"."id" LEFT OUTER JOIN "attendees" 
        ON "attendees"."id" = "registrations"."attendee_id" WHERE "events"."id" 
        IN (?, ?) /* loading for pp */  [["id", 1], ["id", 2]]
    

    🔺 What happens with .joins:

    event = Event.find_by(name: "Tech Conference")
    Event Load (0.1ms)  SELECT "events".* FROM "events" WHERE "events"."name" = ? LIMIT ?  [["name", "Tech Conference"], ["LIMIT", 1]]
    # Usage of `.joins`
    Attendee.joins(:registrations).where(registrations: {event_id: event.id})
      # an `INNER JOIN` is performed between `attendees` and `registrations`. 
      # It uses the association named `registrations` defined in the `Attendee` model.
      Attendee Load (0.2ms)  SELECT "attendees".* FROM "attendees" 
      INNER JOIN "registrations" ON "registrations"."attendee_id" = "attendees"."id" 
      WHERE "registrations"."event_id" = ? /* loading for pp */ LIMIT ?  [["event_id", 1], ["LIMIT", 11]]
    
    Event.joins(:registrations).distinct
      # See the `DISTINCT` in the `SELECT` statement (ensures that events are listed uniquely).
      SELECT DISTINCT "events".* FROM "events" 
      INNER JOIN "registrations" ON "registrations"."event_id" = "events"."id" 
    

    Summary:

    n + 1 table

    Hope you get more understanding on this topic!

    More blog posts here: 👇

    Bhumi’s N + 1 blog post
    arunyadav N + 1
    Benito Serna N + 1 ebook

    So you need a sort at database level or just with ruby is enough?