top of page

Posts

[EagerLoad] Which is more useful: “leftJoin” or “with”? [N+1 problem]


N+1 Problem


Suppose such a relationship was set up.

class Bolg

{

    public function comments(): HasMany

    {

        return $this->hasMany(Comment::class);

    }

}


Retrieve the Blog model.

$blogs= Bolg::query()->first();


A further loop can be used to retrieve all of the Comment models associated with each Blog model.

$comments= $blogs->map(function ($blog) {

    return $blogs->comments->first();

});


And since it's looping, the following query will be executed for each chapter.

"select * from `blogs` where `blogs`.`blog_id` = ? and `blogs`.`blog_id` is not null"


In this way, after the initial query to retrieve all records, N more additional queries will be issued.

This is referred to as the 1+N or N+1 problem. 


This problem arises directly from issuing queries for the number of chapters, leading to an increase in the system's workload.


“join” and “with” that avoid N+1 Problem


We've reached the heart of the matter.

As mentioned earlier, issuing queries each time is extremely inefficient, so we use (left) join or with. (By the way, in some cases, it may be better to acquire them on a case-by-case basis, so please make your selection carefully.) These methods can be used to prevent a large number of queries from being executed. However, each of them operates differently.


What is “join”?


"Join" performs an “Inner Join” operation, which retrieves common data across tables and eliminates the rest.

"leftJoin" executes a “Left Outer Join”, returning all rows in the left table and any matching rows in the right table. It retains all data from the left table while fetching data that matches the join condition.

These "join" operations are SQL functionalities that merge multiple tables within a query. Here's how it looks in a query:

0 => array:3[

    "query" => "select * from `blogs` inner join `comments` on `blogs`.`id` = `blog_id`"

    "bindings" => []

    "time" => 1.5

],


The query is executed only once and works perfectly.

What about "with"?


“with”


The query with "with" is executed twice. It may lose to "join," but considering that loops run N+1 times, it's an incredible advancement.

By the way, the argument passed to "with" is the name of the relation definition method in the model file. (It is actually the same as the model name of the relation destination, but be careful that if it's not defined as a method, you can't use "with.")

$blogAndComments= Blog::query()

    ->with('comments')

    ->first();


Looking at the query, thanks to “whereIn”, the data can be retrieved all at once.

0 => array:3[

    "query" => "select * from `blogs`"

    "bindings" => []

    "time" => 0.5

],

 

1 => array:3[

    "query" => "select * from `blogs` where `blogs`.`blogt_id` in (?)"

    "bindings" => []

    "time" => 1.5

],


The characteristics and differentiation in usage


Now that we have acquired the model in two ways, let's take a look at the characteristics of each.


  • (left)Join:

Merges models based on the join condition.

Utilizes SQL functionality to retrieve data from multiple tables simultaneously.

Data is generated only for matches based on the join condition.

For example, you can expect the final data to be created for the same number of Comment models.

While it's attractive to retrieve data in bulk, it may lead to incorrect data when combining multiple tables.

It is ideal when you don't need the models merged or when you require simple, flat data.


  • with:

Retrieves data in the form of associative arrays without merging models.

Solves the N+1 problem by preloading model relation in ORM.

For example, under the Blog model, the associated Comment model exists within the "relations" property.

Models can be nested and retrieved in the hierarchical order of parent-child relations, thus avoiding model merging.

Retrieves associated models or data at once, without creating redundant data.

Convenient when you want to handle retrieved data directly as models.


Note: While "with" can retrieve nested relations as long as the relations continue, (e.g., ModelA.ModelB.ModelC.), all passed models will be attached as outer shells. Eventually, you may need to unset them or take other measures to exclude them.


Summary and digression


Now that we've gathered all the information, let's summarize it and then conclude with a brief digression.


Summary


Since a lot of information came out, I organized it into a table.

Method of Acquisition

Number of queries

Relation

Acquired data format

Method of Acquisition

Features

Loop

1+N

necessary

flat

issue queries per model

cause N+1 problem when retrieving large amounts of data, and result in high load

join

1

unnecessary

flat

retrieve data with join conditions

merge tables to reduce load when retrieving data

with

2

necessary

nested

load relations to retrieve data

retrieve models as associative arrays


I'm not sure which one is truly useful!

In short, the result is a retrieval of data from multiple tables, but the methods used are too different.

So, it's a commonplace conclusion to say, "Please change the retrieval method according to the desired data format."

The easiest way to understand is to determine whether you want the data in the original form of the model or if it's okay to merge them.

Alternatively, considering which form, nested or flat, would be easier to use for future processing, could help in choosing.

Additionally, I think it should be changed depending on the total amount of data you want to retrieve.

Since "with" loads relations, the deeper the nesting, the heavier the operation becomes compared to "join".

In other words, it depends on the case.


Digression (EagerLoad)


Let me write about EagerLoad.

You might have understood it so far, the point is that you do not want to issue SQL each time.

Just saying that may not be enough, so let's delve into a bit more detail.

To use "EagerLoad," it's necessary for it to be used with non-dynamic properties.

First, let's talk about the difference between dynamic properties and relation methods. That is, the difference between this:


$blog= new Blog;

$blog->comments;


and this:


$blog= new Blog;

$blog->comments();


In the case of ->comments, it becomes a dynamic property and returns a Collection. It also has the feature of "lazy loading," which loads the relation data only when accessed. On the other hand, in the ->comments() case, it becomes a relation object.

With dynamic properties, you cannot chain query builders, but with relation objects, you can chain them like ->comments()->where('create_user_id', 1).

Therefore, it seems like using a relation object would be the better choice, right? Let's recall the nature of EagerLoad and its companions like ‘with’ in this case. 

Yes, it issues queries in advance. Since it is retrieving the data (in the relation properties), the query will not be executed when you write $blog->comment->create_user_id.

This is why it's said that "EagerLoad cannot be used unless it is a dynamic property." In other words, "If you want to use EagerLoad, you must make the subsequent processing a dynamic property as well, otherwise, it will be meaningless!" It becomes paradoxical in that sense, doesn't it? 

Here, for those who are wondering how to apply conditions, constraints, or searches when acquiring the data.

$blog= Bolg::with(['comments' => function ($query) {

    $query->where('create_user_id', 1);

}])->first();

$comments= $blog->comments;


By doing this, you can combine EagerLoad with conditions, constraints, and searches. By the way, if you have frequently used conditions, it's convenient to make them as methods in your model file. This not only enhances readability but also makes it more convenient.


The end


I hope that this article will lighten the burden of accessing databases in the world, even if only a little.


This blog post is translated from a blog post written by Tamaki on our Japanese website Beyond Co..


Comments


bottom of page