Geeks With Blogs

News This is the *old* blog. The new one is at blog.sixeyed.com
Elton Stoneman
This is the *old* blog. The new one is at blog.sixeyed.com

One of the biggest performance issues with the solution we’re building at the moment has been contention in SQL Server.

We have NServiceBus processing workflows and we’re distributing the load across multiple queues and multiple host processes, each with multiple threads. At high load we have something like 200 threads running concurrently, all trying to read and write data in the same tables.

For each step in the workflow we record an event against the main entity, which gives us an audit trail of the entity’s progress through the workflow. The events table is getting hit with large numbers of simultaneous INSERTs.

The INSERTs have to hold PAGE and KEY locks which cause other requests to wait. Running high load and checking DM_TRAN_LOCKS we were finding hundreds of locks and performance was degraded as a result.

I wanted to see how the story would look with a document database, where the model and the lock situation would be different. I put together a heavily simplified version of the domain model in SQL Server and in MongoDB, and simulated the parallel message handlers with unit tests that ran multiple threads. In this post and two more to follow, I’ll look at the results.

Headline

The first set of results were pretty surprising – running 1,000 workflows as 10 batches of 100 concurrent threads:

MongoDB completed in 7.7 seconds, while SQL Server with Entity Framework took 42.5 seconds. MongoDB was 6 times faster.

The models

The domain for the comparison is pretty simple – we have a core entity, which has a collection of entity events, and a parent entity. The parent entity doesn’t get used much, it’s there to correlate entities in different workflows.

With EF, the relationships between entities are modelled like this:

image

And in SQL Server, that gives us a physical model with three tables related by foreign keys – parent entity has 0..* entities, and entity has 0..* entity events.

In Mongo the document model is similar, but we take advantage of our domain knowledge. Parent entities are only used to find correlations, so we don’t actually need to model the relationship between parent entity and entity. Entity events are only ever accessed via an entity, so we can model events as an array nested in the event entity:

image

Using the C# MongoDB driver, the document model is persisted as-is, so the physical model in Mongo is the same – a  ParentEntity collection and an Entity collection.

The workflow

Our real workflow is massively complex, but for the tests I use a much simpler version:

  1. Generate a GUID to simulate the correlation ID for an incoming message
  2. Search parent entities by correlation ID
    1. if none found, create parent entity & save
  3. Create entity
    1. link to parent entity
    2. set initial status
    3. save
  4. Simulate 15 processing stages
    1. fetch entity by id
    2. update status
    3. create entity event
    4. add event to entity
    5. save

The process is exactly the same for the MongoDB and SQL Server versions, just translating the steps into the relevant code (SaveChanges() vs. Save() etc.).

Fair comparisons

Look at the domain models and you’ll see we’re not comparing like for like. Which is deliberate – we have two different storage paradigms, so we can’t use the same modelling techniques to represent our domain.

To compare like for like we’d need to either try and apply document modelling to SQL Server, or relational modelling to MongoDB, and the results wouldn’t make sense. We’d end up with an Entities table in SQL with an Events column containing a serialized array of events in XML or JSON. Or we’d have an EntityEvents document collection in MongoDB.

Either option would give us a more exact comparison, but it wouldn’t be a realistic use of the technology. You don’t denormalize a relationship to an embedded array in a relational database, because it would be hard to work with and would be a distinctly unusual design. Likewise, you don’t model what should be an embedded array in a document database as a separate collection.

So we’re comparing each technology used in its correct way, rather than trying to make both behave in the same way to get a comparison which is ostensibly fairer, but pointless.

Results

I ran the tests on a Windows Server 2012 VM with 4Gb RAM allocated, and used SQL Server 2012 Developer Edition, and MongoDB version 2.4.8, 64-bit version for Windows.

The initial results for 1,000 and 10,000 workflows (both at 100 concurrency):

image

MongoDB completed  in 7.7 and 72.8 seconds respectively, compared to 42.5 and 451.2 for SQL Server. Mongo ran 6 times faster.

After the results from the first cut of the tests, I ran some variants to see the impact. In the next post I’ll look at options we have around performance tuning and reliability to see if Mongo’s advantage holds up with different configurations.

Posted on Friday, December 13, 2013 10:10 PM SQL Server , Mongo | Back to top


Comments on this post: The concurrency shoot-out: SQL Server versus MongoDB (part 1)

# re: The concurrency shoot-out: SQL Server versus MongoDB (part 1)
Requesting Gravatar...
It is also worth noting that SQL Server enlists in the distributed transactions that NServiceBus uses by default to guarantee consistency. MongoDB doesn't do that, meaning you'd need to implement the appropriate deduplication logic to provide an equivalent level of consistency before measuring performance.

If you are willing to run with looser consistency guarantees, you can configure NServiceBus to run without transactions and see what kind of performance improvements you get on SQL Server that way.
Left by Udi Dahan on Dec 29, 2013 3:26 PM

# re: The concurrency shoot-out: SQL Server versus MongoDB (part 1)
Requesting Gravatar...
Hi Udi, thanks for the comment. Yes, in our production system we have opted out of NSB transactions and we take control of that ourselves in the message handlers.

In the performance testing code I was only interested in concurrency, so there's no NSB - multiple concurrent messages are simulated with parallel tasks. Code up on github: http://is.gd/tKG3H1
Left by Elton on Jan 05, 2014 9:40 PM

# re: The concurrency shoot-out: SQL Server versus MongoDB (part 1)
Requesting Gravatar...
Out of curiosity, was read committed snapshot on in the SQL database?
Left by Gianluca Sartori on Jan 10, 2014 4:01 PM

# re: The concurrency shoot-out: SQL Server versus MongoDB (part 1)
Requesting Gravatar...
Hi Gianluca - no, but after the first run all the SQL Server tests were running with NOLOCK reads instead.
Left by Elton on Jan 10, 2014 4:24 PM

Your comment:
 (will show your gravatar)


Copyright © Elton Stoneman | Powered by: GeeksWithBlogs.net