I’ve spent a bunch of time lately with clients helping them understand why their applications are so slow and how to improve performance. This often comes down to their use (or misuse) of ORM frameworks such as nHibernate and/or Entity Framework. I think this probably stems from the fact that ORM’s have gone mainstream somewhat recently, and most developer teams realize they should be using one, but they have never really learned the intricacies of how to use one properly.
The first thing I do is pull out SQL Profiler and run through some common scenarios in their application and just get a rough count of how many DB queries happen in each application scenario. A lot of teams are surprised when they see hundreds or thousands of queries being executed as a result of a single button click in their application.
In my experience teams seem to be suffering from one of two problems, either loading too much data at once (eager loading), or loading too little (lazy loading). The lazy loading problem is probably more common, but the eager loading scenario is easier to explain so I’ll start with that.
I’ve run into a few code-bases where they have explicitly turned off lazy-loading in nHibernate (lazy loading is the default behavior). Unless you explicitly partition your domain model (e.g. using Aggregate boundaries like DDD proposes), not using lazy loading can result in massive amounts of data being retrieved from the DB for seemingly simple scenarios. If you think of your Domain Model as a giant object graph, where you have many types of objects, most with links to other objects. When you ask nHibernate for any object, it will automatically retrieve the object you asked for from the DB, *plus* any linked objects, and any objects linked from those, and on and on, until it has populated an entire object graph into memory for you. When you have any non-trivial domain model, this can be a huge amount of data. Lets look at an example:
If we have lazy loading turned off, and do a simple operation like asking nHibernate to give us an Invoice with a specific Id. What will happen is nHibernate will go and retrieve that row from the Invoices table, but it will also get the related InvoiceBatch object, and all of the InvoiceItems, and for each InvoiceItem it will retrieve the Shipment object, and the Product object, and for each Product it will get the Product Group, and so on.
It can get really bad if you have circular references in your domain model – which is fairly common because it is so convenient for writing business logic (e.g. the Invoice object has a collection of InvoiceItems, and the InvoiceItem also contains an Invoice object). In our example, lets assume that InvoiceBatch contains a collection of child Invoice objects, and each Invoice contains an InvoiceBatch object. When we ask nHibernate for a single Invoice, it will populate the Invoice Batch object, which will in turn populate the Invoices collection and all objects related to every Invoice in that collection. Lets imagine another example, if we have an Employee object that has a property referencing the Manager (also an Employee object), and also has a collection of Employees representing the Subordinates. When you retrieve any Employee it will also retrieve the Manager Employee object, then his Manager, and his Manager, and so on until you get up to the top (CEO), then it will get all of the CEO’s sub-ordinates, and all of their Sub-ordinates, and so on. Ultimately, this means anytime you ask nHibernate to get a single Employee it is actually retrieving *all* employess, along with any other related objects.
The solution to this is to either partition your domain model in some way (e.g. Aggregates as per DDD), or use Lazy Loading (the default in nHibernate). Lazy Loading works by only retrieving the Invoice object from the DB, then loading any sub-objects only if and when you attempt to access them (aka lazily). This ensures, that only the minimal set of data that you need to do your work is retrieved from the Database. nHibernate does it’s lazy loading in a way that is mostly transparent to the developer, when you ask nHibernate for an Invoice object, it is actually generating a dynamic proxy object, that looks like an Invoice (it inherits from Invoice), but has some hooks in there to allow nHibernate to intercept any property access so it can lazy load them as needed.
However, Lazy loading has it’s own problems, and these are probably more common due to the fact that lazy loading is on by default. These problems are commonly called the Select N+1 Problem. Lets say I had a screen with a grid displaying a list of Invoices, and one of the fields in that grid is Invoice.Customer.Offices.Address.City. What will happen is nHibernate will execute a single query to retrieve all the Invoices I ask for, but then when I try to render it into the grid I’ll have to loop through each Invoice and access the Customer property (which will trigger nHibernate to fire off a SQL query), then access the Customer.Offices collection (another query), then the Office.Address (another query), and finally retrieve the City for display. These queries will happen separately for every Invoice displayed in the grid. So if I have 30 invoices displayed in the grid, I could potentially have 91 SQL queries executed. And that is a relatively simple scenario, in a more complex (realistic) application this problem can become a serious performance concern.
What we need is a middle-ground between the first scenario (load everything), and the 2nd scenario (load minimal, and lazy load everything else). Most modern ORM frameworks will have support for programmatic “eager loading”. Usually you will have some kind of Repository layer/class in your application. This is where you want to put this code. You’ll still leave lazy loading turned on in all your nHibernate mappings, but then in your repository functions you can tell it specifically how much of the object graph should be loaded up-front (and the rest will still be lazy loaded if/when accessed). With nHibernate this is done via the Fetch/FetchMany/ThenFetch/ThenFetchMany methods.
Lets take the previous example where we want to display Invoices in a grid and include a column that displays Invoice.Customer.Offices.Address.City. What we’d like to have happen is for nHibernate to load up this data for all 30 invoices in as few queries as possible (ideally one). Previously we would have retrieved the list of invoices by doing a simple session.Query (assuming our grid is displaying all Invoices). Now we might have code in our Repository that looks something like this:
I’m going to modify this method to let nHibernate know that I want all Invoices, but I also want it to go ahead and populate the Customer, Offices, and Address objects at the same time.
I’ll let you look up how the Fetch functions work yourself. This should result in a single SQL query that loads all the necessary data (instead of the previous 91 queries). You may have many areas of your app that require a list of invoices, and some require more or less of the object graph to be loaded. Often you will see several methods in the InvoiceRepository that return a list of all Invoices, but the different methods will eager load different subsets of the object graph for different uses.
When I’m trying to optimize the lazy/eager loading behavior of my code, I’ll find myself spending a bunch of time going through in the debugger with SQL Profiler open, and seeing what code is triggering lazy-loading queries, and starting to build up a list of all the pieces of the object graph I might want to eager load.
nHibernate Gotcha – Don’t Do Multiple FetchMany
There is a major gotcha to be aware of (at least with the nHibernate eager loading). If you try to eager load more than one collection in a single query the results won’t be what you expect. Put another way, never use more than one FetchMany/ThenFetchMany in a single query. Lets look at an example, lets say we wanted to load all Invoices, and also eager load all Offices and Contacts for the related Customers. We might try writing code like this:
If you look at the SQL being executed it’s actually doing something like this:
FROM Invoices LEFT OUTER JOIN Customers ON Invoices.CustomerId = Customers.CustomerId
LEFT OUTER JOIN Offices ON Offices.CustomerId = Customers.CustomerId
LEFT OUTER JOIN Contacts ON Contacts.OfficeId = Offices.OfficeId
What happens is that as you add more relationships that are collections, the result set grows large since SQL is doing a cartesian product of all the collections (so if you have 10 invoices, with 10 customers, and each customer has 4 offices, and each office has 7 contacts, you get a resultset of 280 rows). nHibernate doesn’t deal with this well (it won’t complain, it will just result in an incorrect object graph returned to you – which makes the problem even worse IMO). I believe, in this example if you examined the resulting object graph, each Customer would show itself as having 28 offices (when in fact it should have 4 offices, with 7 contacts in each).
Luckily, there is a solution. nHibernate essentially has it’s own in-memory cache scoped to the session. When it goes to lazy load something, it will first look in this cache to see if the object has already been loaded, and if so it can skip querying the database. (Note: I’m not sure if this is exactly how nHibernate works under the covers, but this is how I conceptualize it). What we can do, is give nHibernate a few queries, and tell it to use them to pre-populate it’s internal cache. nHibernate is even smart enough to execute all these queries within a single round-trip to the database. Now whenever we find ourselves wanting to multiple FetchMany, we can just break that down into multiple queries that nHibernate will use to populate it’s cache. Here’s the previous example re-written to actually work:
In this case I’m executing the first query which will retrieve all Office objects (related to Customers that are linked to our Invoices), and it will eager load each Office’s Contacts collection. Then I do a separate query to retrieve all Invoices, their related Customer object, and each Customer’s Offices collection. Both of these SQL queries will be executed as part of a single round-trip (assuming your DB supports that – SQL Server does). The Contacts will be present in nHibernate cache, so no lazy-loading is required to access them.
If you have a significant portion of the object-graph that you want to eager load the “fetch code” can get a little complex. The silver lining is, that even if you get it wrong you’re not going to break anything, it just means things will be inefficiently lazy loaded when they should have been eager loaded, but your application behavior should still be correct just slow (so long as you obey the single FetchMany per query rule).
To finish this post off, here’s what code might look like if you wanted to eager load the entire object graph from the first graphic (note: this code is not tested):