Geeks With Blogs


Dylan Smith ALM / Architecture / TFS

In the last post we looked at how aggregate boundaries affect our ability to provide consistency guarantees and enforce invariants across our domain model.  What we said is that enforcing an invariant within an aggregate boundary – rather than invariants that span aggregates – is much easier to do.  So based on that we would want to design our software with very large aggregates.  Taken to the extreme we could have the entire domain model within a single aggregate.  This would allow us to easily enforce any invariant without ever needing to worry about consistency across aggregate boundaries.

The downside to having excessively large aggregates is the impact it has on scalability.  I’m not taking about scalability in terms of adding more servers and hardware to increase throughput.  But rather scaling the amount of users using the system.  When you have large aggregates, that also means that when you “lock” your data to provide consistency guarantees you are locking large amounts of data at once.  In the extreme example of having the entire domain inside one aggregate, you will be locking the entire domain model.  If your system is only ever used by a single user at a time, then that is actually perfectly reasonable.  However, most systems we build are used by multiple users at the same time.  If we had one giant aggregate that means that anytime anybody changed any data, it would increment the version number and any other edits in progress will get a concurrency exception when they try to save (when the concurrency check looks at the version # and sees somebody else has changed it in the middle of that user editing it).

If we start to split up our domain model into smaller aggregates, it reduces the likelihood of concurrency exceptions happening.  If we have each Customer as an Aggregate (containing the Orders), then you will only get concurrency exceptions if two users are trying to edit Orders for the same customer.  If you make Customer and Order separate aggregates you only get concurrency exceptions if two users are trying to edit the same order at the same time.

So now we have two competing desires, larger aggregates give us flexibility for enforcing invariants, smaller aggregates give us less chance of concurrency exceptions.  We have to make a tradeoff between these two properties.  We have a little more flexibility than just size of the aggregate though, we can strategically choose how to place those boundaries.  You can have to 2 similarly sized aggregates, that encompass different sets of entities; one of those aggregate boundaries may be better than the other.

What I try to do is choose aggregate boundaries such that most of my system’s invariants will not have to span aggregates, but also try to choose them such that there is a small likelihood that multiple users will be simultaneously updating the same aggregate.  Ultimately this all comes down to examining your business domain, and expected usage patterns of your application in order to make the best decision here.  Let’s look at a couple of examples.


Customer / Orders Example

In the Customer/Order example, we’ve talked about 3 different possibilities for Aggregate boundaries:

  1. Single aggregate encompassing the entire domain model
  2. Customer aggregate that contains Order entities
  3. Separate aggregates for Customer and Order

Assuming we have a system that is used by many users simultaneously, we can probably rule out option #1 pretty easily.  In order to decide between #2 and #3 I’d have a discussion with the domain experts, and try to get a feel for the usage patterns for creating/maintaining the Orders data.  Do they have account managers that are responsible for specific customers?  If so it’s unlikely that multiple users will be editing Orders belonging to the same customer, so I would likely go with option #2 because of benefits of easier enforcing invariants across orders.  If it was more of a call-center type business where anybody can enter orders for any customers I might start considering option #3.  However, I might also start asking about their typical scenarios.  If we’re talking about the system that takes online delivery orders for Pizza Hut, it’s pretty unlikely that multiple orders for the same customer are going to be undergoing changes at the same time (by multiple users).  In fact, I’m having a hard time coming up with any example system that takes customer orders that would commonly have multiple users editing orders for the same customer at the same time.  That would lead me towards option #2 from above.  But the key point I’m trying to make is that the decision should be driven by business/domain knowledge, and take into account the consistency vs scalability tradeoffs.


Poker Example

Lets look at another example, my software to manage a weekly poker league.  In this case I could see a couple obvious choices:

  1. Single aggregate encompassing the entire domain model
  2. Each Game is an aggregate

If we remember the sample invariants from the last blog post, the examples I used were:

  1. For a completed poker game, total pay-in must equal total pay-out
  2. There can be only one poker game for each week

The first invariant can be enforced easily enough with either choice of aggregate boundaries (all the data involved is contained in a single Game), but the 2nd invariant would span aggregates if we had an aggregate for each game (we need to look at the set of all games in order to validate the invariant).  So there’s a clear consistency advantage for option #1.

If we look at it from a scalability perspective, lets consider whether we are likely to have multiple users editing data at the same time?  In this example scenario it’s actually pretty reasonable to have the entire domain model as a single aggregate.  The only significant data updates are somebody entering in the results of a new game (once a week), or maybe tweaking some past mistakes.  Regardless, it’s unlikely there will be multiple people editing data at the same time, so in this case I would introduce a new entity called League (we need something to act as the aggregate root), and have it contain a collection of all Games.

If we take this example a little further, lets imagine we want to offer our poker league manager as SaaS.  Now we have many leagues stored in our domain model.  In that case it doesn’t seem reasonable to have somebody editing one leagues data lock *all* leagues data (as it would if the entire domain model was still a single aggregate).  In that case it would seem to make sense to have each separate League be it’s own Aggregate.  This also appears to work well as it’s unlikely we would have any invariants that span Leagues.



In the next post we’ll take a look at what I do when I realize I need an invariant that spans Aggregate boundaries (hint: it’s much more painful than invariants within an Aggregate boundary).

Posted on Monday, April 15, 2013 2:40 AM | Back to top

Comments on this post: Choosing Aggregate Boundaries – Scalability Tradeoffs

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © Dylan Smith | Powered by: