Geeks With Blogs

News


Dylan Smith ALM / Architecture / TFS

Those who know me know I’m a pretty big fan of the CQRS set of design patterns.  CQRS style architectures typically borrow / build-upon the DDD (Domain Driven Design) set of patterns (in fact before Greg Young coined the term CQRS he was calling it DDDD [Distributed DDD]).  One pattern that’s pretty central in DDD is the concept of Aggregates.  This is the practice of splitting your domain model up into pieces, and these pieces are what we call Aggregates.  Each aggregate may contain several “Entities”, but must contain a specific Entity that is designated as the Aggregate Root.  Examples of Aggregates could be Customer, Product, Order, etc.

 

A lot of people – even people that claim to be doing DDD – will just naturally make almost every entity into it’s own Aggregate.  They are missing an important design decision around scoping your Aggregate boundaries appropriately. As per Evans’ DDD book, Aggregates are intended to define the consistency and transactional boundaries of your system.  This has some really significant implications that make it important to choose your Aggregate boundaries with care.  There’s a bunch of literature providing guidance around choosing your Aggregate boundaries, but in this blog post I want to talk a little bit about what I think about when I do this, and provide some examples.

 

Consistency

When designing software you need to understand what consistency guarantees you have (and probably more importantly the guarantees you don’t have).  I see too many intermediate/advanced software developers take on the task of designing/architecting important software, without properly understanding the consistency aspects of the system and the tradeoffs involved.

 

Consistency is being able to guarantee that a given set of data is all from a specific identical point in time (I’m sure there’s a better official definition, but that’s how I think about it).  This is important because most software has a set of invariants (fancy word for “rules”) that you want to enforce across the domain model.  A few examples of invariants might be (I’m in the middle of building some software to manage our weekly poker league, so I hope you like poker related examples):

  • Total value of unpaid orders for a customer must not exceed that customers’ credit limit
  • For a completed poker game, total pay-in must equal total pay-out
  • Username must be unique
  • There can be only one poker game results for each week (weekly league)

 

These are rules that our software system is expected to enforce.  If a customer tries to place an order that would exceed their credit limit the system should reject it.  Likewise, if somebody tries to enter a username that’s already taken the system should reject that to ensure the invariant is kept intact.

   

What might not be immediately obvious is that you need to have some consistency guarantees in order to enforce every single one of those invariants.  “Locking” goes hand in hand with consistency, as that’s typically how you achieve consistency guarantees.  So for the first example (orders + credit limit)  in order to enforce that invariant you need to have a consistent data set representing all of that customers unpaid orders, *and* you need to be able to acquire some kind of lock, so you can ensure that nobody writes a new order in between the time you do the invariant check (sum(orders) + new_order_cost <= customer.credit_limit) and save the new order.  If you can’t lock that data, you end up with a race condition, that could result in the invariant being violated.

   

Most software I encounter uses optimistic concurrency/locking to achieve this.  Usually this means adding a version # to your entities/aggregates, then checking that it hasn’t changed since you retrieved it when saving.  For example, if the user is editing the customer information, the software will keep track of the customer version # that was retrieved when the user started editing, then when they hit save the system will check that the version # in the database hasn’t changed before it writes the updates (if it has changed it will reject the update with some kind of concurrency exception).  You also need some way to “lock” the Customer aggregate/entity to prevent race conditions between the time we check the version # and actually writing the updates.  For a typical system that uses a Relational DB (e.g. SQL Server), you might be able to rely on DB features to enforce the locking and prevent race conditions.  If you’re doing something like Event Sourcing you will need to implement your own or use a 3rd Party Framework that does this for you.

 

If we come back to the original topic – aggregate boundaries – these come into play because it turns out it’s pretty straightforward to enforce invariants within an aggregate, but if you have an invariant that spans multiple aggregates, it becomes significantly harder.

 

Back to the Customer/Orders example.  If we assume that both Customer and Order are separate Aggregates, then they will each have their own version #’s.  In order to enforce the credit limit invariant we need to get all unpaid orders for that customer and sum up the order totals and compare with the customers credit limit.  To do this properly we need to make sure that the data doesn’t change out from under us while we’re checking the invariant, meaning we would in theory need to lock the customer, and every order that we’re are looking at.  But we would also need to ensure that no new orders for that customer are created also.  With the simple version # per aggregate implementation, that is simply not supported (at least not without a lot of added complexity).

 

What if we were to change our aggregate boundaries?  Lets say that Order isn’t a separate aggregate but we have a collection of Order entities contained within the Customer aggregate (Customer entity is the aggregate root).  Now enforcing the invariant is easy, because all the data necessary is contained within a single Aggregate.  We can easily lock the Customer aggregate (using the single version # we have) and enforce our invariant.

   

There’s certainly techniques for enforcing invariants that cross aggregate boundaries, but it definitely adds complexity (more on this later).

   

If we only consider consistency guarantees when designing our aggregate boundaries, then we would want to make our aggregates as large as possible.  The bigger the aggregate, the more power and flexibility we have to easily enforce invariants.  If we take it to the extreme, we could make our entire domain model a single aggregate, with one version # for the entire domain.  However, consistency isn’t the only consideration.  We need to make a tradeoff between Consistency and Availability/Scalability.

 

In the next post I’ll take a look at how Availability / Scalability comes into play when choosing Aggregate boundaries, and take a look at options for enforcing invariants that span aggregate boundaries.

Posted on Sunday, April 7, 2013 3:00 PM | Back to top


Comments on this post: Choosing Aggregate Boundaries – Consistency

# re: Choosing Aggregate Boundaries – Consistency
Requesting Gravatar...
Excellent post dylan. I cant wait for the next one, I am facing just these issues over the past month and have been wrestling how to describe the issues.
I am using eventsourced, a scala es library.
Left by Ramon Buckland on Apr 12, 2013 7:05 PM

# re: Choosing Aggregate Boundaries – Consistency
Requesting Gravatar...
Its refreshing when someone explains things clearly and without just quoting the text book like you do. It reassures me that they actually know what they are talking about. Looking forward to the "tradeoff between Consistency and Availability/Scalability" bit.
Left by Simon Hearn on Jul 07, 2013 10:59 AM

Your comment:
 (will show your gravatar)


Copyright © Dylan Smith | Powered by: GeeksWithBlogs.net