James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 136 , comments - 1076 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#: The Curious ConcurrentDictionary

Update: I revised some of my initial thoughts after taking  bit of my own advice and thinking about the less-tangible benefits of a ConcurrentDictionary.  This leads me to believe that ay time you use a Dictionary in a read/write manner in a multi-threaded environment, you should use ConcurrentDictionary instead for the simplicity and safety.  It's easier to read and more maintainable, and even if you are write-heavy, it's not orders-of-magnitude slower so I think it's worth it for the safety and maintainability gains (not to mention performance gains on read-heavy uses.

In my previous post (here) I did a comparison of the new ConcurrentQueue versus the old standard of a System.Collections.Generic Queue with simple locking.  The results were exactly what I would have hoped, that the ConcurrentQueue was faster with multi-threading for most all situations.  In addition, concurrent collections have the added benefit that you can enumerate them even if they're being modified.

So I set out to see what the improvements would be for the ConcurrentDictionary, would it have the same performance benefits as the ConcurrentQueue did?  Well, after running some tests and multiple tweaks and tunes, I have good and bad news.

But first, let's look at the tests.  Obviously there's many things we can do with a dictionary.  One of the most notable uses, of course, in a multi-threaded environment is for a small, local in-memory cache.  So I set about to do a very simple simulation of a cache where I would create a test class that I'll just call an Accessor.  This accessor will attempt to look up a key in the dictionary, and if the key exists, it stops (i.e. a cache "hit").  However, if the lookup fails, it will then try to add the key and value to the dictionary (i.e. a cache "miss"). 

So here's the Accessor that will run the tests:

   1: internal class Accessor
   2: {
   3:     public int Hits { get; set; }
   4:     public int Misses { get; set; }
   5:     public Func<int, string> GetDelegate { get; set; }
   6:     public Action<int, string> AddDelegate { get; set; }
   7:     public int Iterations { get; set; }
   8:     public int MaxRange { get; set; }
   9:     public int Seed { get; set; } 
  10:  
  11:     public void Access()
  12:     {
  13:         var randomGenerator = new Random(Seed); 
  14:  
  15:         for (int i=0; i<Iterations; i++)
  16:         {
  17:             // give a wide spread so will have some duplicates and some unique
  18:             var target = randomGenerator.Next(1, MaxRange); 
  19:  
  20:             // attempt to grab the item from the cache
  21:             var result = GetDelegate(target); 
  22:  
  23:             // if the item doesn't exist, add it
  24:             if(result == null)
  25:             {
  26:                 AddDelegate(target, target.ToString());
  27:                 Misses++;
  28:             }
  29:             else
  30:             {
  31:                 Hits++;
  32:             }
  33:         }
  34:     }
  35: }

Note that so I could test different implementations, I defined a GetDelegate and AddDelegate that will call the appropriate dictionary methods to add or retrieve items in the cache using various techniques.

So let's examine the three techniques I decided to test:

  • Dictionary with mutex - Just your standard generic Dictionary with a simple lock construct on an internal object.
  • Dictionary with ReaderWriterLockSlim - Same Dictionary, but now using a lock designed to let multiple readers access simultaneously and then locked when a writer needs access.
  • ConcurrentDictionary - The new ConcurrentDictionary from System.Collections.Concurrent that is supposed to be optimized to allow multiple threads to access safely.

So the approach to each of these is also fairly straight-forward.  Let's look at the GetDelegate and AddDelegate implementations for the Dictionary with mutex lock:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         lock (_mutex)
   4:                         {
   5:                             _dictionary[key] = val;
   6:                         }
   7:                     };
   8: var getDelegate = (key) =>
   9:                     {
  10:                         lock (_mutex)
  11:                         {
  12:                             string val;
  13:                             return _dictionary.TryGetValue(key, out val) ? val : null;
  14:                         }
  15:                     }; 

Nothing new or fancy here, just your basic lock on a private object and then query/insert into the Dictionary.

Now, for the Dictionary with ReadWriteLockSlim it's a little more complex:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         _readerWriterLock.EnterWriteLock();
   4:                         _dictionary[key] = val;
   5:                         _readerWriterLock.ExitWriteLock();
   6:                     };
   7: var getDelegate = (key) =>
   8:                     {
   9:                         string val;
  10:                         _readerWriterLock.EnterReadLock();
  11:                         if(!_dictionary.TryGetValue(key, out val))
  12:                         {
  13:                             val = null;
  14:                         }
  15:                         _readerWriterLock.ExitReadLock();
  16:                         return val;
  17:                     };  

And finally, the ConcurrentDictionary, which since it does all it's own concurrency control, is remarkably elegant and simple:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         _concurrentDictionary[key] = val;
   4:                     };
   5: var getDelegate = (key) =>
   6:                     {
   7:                         string s;
   8:                         return _concurrentDictionary.TryGetValue(key, out s) ? s : null;
   9:                     };    

                  
Then, I set up a test harness that would simply ask the user for the number of concurrent Accessors to attempt to Access the cache (as specified in Accessor.Access() above) and then let them fly and see how long it took them all to complete.  Each of these tests was run with 10,000,000 cache accesses divided among the available Accessor instances.  All times are in milliseconds.

   1: Dictionary with Mutex Locking
   2: ---------------------------------------------------
   3: Accessors   Mostly Misses    Mostly Hits
   4: 1           7916             3285
   5: 10          8293             3481
   6: 100         8799             3532
   7: 1000        8815             3584
   8:  
   9:  
  10: Dictionary with ReaderWriterLockSlim Locking
  11: ---------------------------------------------------
  12: Accessors    Mostly Misses    Mostly Hits
  13: 1            8445             3624
  14: 10           11002            4119
  15: 100          11076            3992
  16: 1000         14794            4861
  17:  
  18:  
  19: Concurrent Dictionary 
  20: ---------------------------------------------------
  21: Accessors    Mostly Misses    Mostly Hits
  22: 1            17443            3726
  23: 10           14181            1897
  24: 100          15141            1994
  25: 1000         17209            2128

The first test I did across the board is the Mostly Misses category.  The mostly misses (more adds because data requested was not in the dictionary) shows an interesting trend.  In both cases the Dictionary with the simple mutex lock is much faster, and the ConcurrentDictionary is the slowest solution.  But this got me thinking, and a little research seemed to confirm it, maybe the ConcurrentDictionary is more optimized to concurrent "gets" than "adds".  So since the ratio of misses to hits were 2 to 1, I decided to reverse that and see the results.

So I tweaked the data so that the number of keys were much smaller than the number of iterations to give me about a 2 to 1 ration of hits to misses (twice as likely to already find the item in the cache than to need to add it).  And yes, indeed here we see that the ConcurrentDictionary is indeed faster than the standard Dictionary here.  I have a strong feeling that as the ration of hits-to-misses gets higher and higher these number gets even better as well.  This makes sense since the ConcurrentDictionary is read-optimized.

Also note that I tried the tests with capacity and concurrency hints on the ConcurrentDictionary but saw very little improvement, I think this is largely because on the 10,000,000 hit test it quickly ramped up to the correct capacity and concurrency and thus the impact was limited to the first few milliseconds of the run.

So the ConcurrentDictionary is very good at read-heavy uses, and a bit slower at write-heavy uses, but with the obvious additional level of safety and maintainability that the ConcurrentDictionary brings, I would recommend the following:

Use System.Collections.Generic.Dictionary when:

  • You need a single-threaded Dictionary (no locking needed).
  • You need a multi-threaded Dictionary that is loaded only once at creation and never modified (no locking needed).

And use System.Collections.Concurrent.ConcurrentDictionary when:

  • You need a multi-threaded Dictionary where both reads and writes are being performed.
  • You need to be able to iterate over the collection without locking it even if its being modified.

Both Dictionaries have their strong suits, but the more i think about it, even with the slighty slower speed on write-heavy multi-threaded use, it's worth it for the safety and performance, and for read-heavy use, it is clearly faster.

 

 Technorati Tags: , , , ,

 

Print | posted on Wednesday, June 9, 2010 5:45 PM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Feedback

Gravatar

# re: C#: The Curious ConcurrentDictionary

You said you can use a ConcurrentDictionary when "You need to be able to iterate over the collection without locking it even if its being modified. ".
I hope I'm wrong, but MSDN in http://msdn.microsoft.com/en-us/library/dd997369.aspx says:

// Enumerate collection from the app main thread.
// Note that ConcurrentDictionary is the one concurrent collection
// that does not support thread-safe enumeration.
10/8/2010 4:30 PM | Gustavo
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Gustavo:

Wow! That is quite the code comment in the example! I must research that! Thanks for the heads up!
10/11/2010 1:47 PM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Gustavo:

Turns out it's not that bad. You can safely iterate, but the main thing is that the ConcurrentDictionary does not give you a snapshot on GetEnumerator() like the other collections do. Thus, you WILL see modified values while enumerating, but you shouldn't crash.

http://msdn.microsoft.com/en-us/library/dd287131.aspx
2/9/2011 1:40 PM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

This is such an awesome website. I have bookmarked it to share with
family and friends. Thank you for the post.
Contractor Tax
2/23/2011 2:13 PM | Brandi
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Lisa: True, but once again don't just favor one over the other always. It's the correct tool for the correct job that matters. If you're not using the dictionary in a multi-threaded way or have other external synchronization, then you only need the Dictionary.
3/28/2011 9:16 AM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

Hey! I just would like to give a huge thumbs up for the nice data you've right here on this post. I will be coming again to your blog for extra soon.
4/5/2011 5:16 AM | apad
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Dee: They are part of .NET 4.0 in the System.Collections.Concurrent namespace.
4/21/2011 9:39 AM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

I have viewed so many blog post but yours are different. I like to ask how you composed your articles for it really leaves an excellent impression on me. Just keep on posting interesting facts. I'm already a fan! payday loan
6/30/2011 7:52 PM | Champsey
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Andrew: yep. I know, I had been trying to clean out the spam manually, but just switched to an API that should catch most of it now.
7/15/2011 8:43 AM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

I implemented a custom session state module & a custom session state items collection. I used a concurrent dictionary within the ssi collection. I had then set up a simple web service to write N strings to the session, where each string was based off an iterator, as was the session variable name such that:

Session[i]="Fred "+i.ToString()

I rigged the # of iterations to be malleable, i.e. a parameter, and then I set about adding 2 buttons for write and 2 buttons for read and I tracked begin & end time.

I used the ajax script manager to call an asmx. I was hoping to see a performance boost but what I found was the following:

The write times using MSFT plain vanilla session state module & inproc were roughly 3 75% faster.

The read times were roughly 15% faster using the concurrent dictionary but that varied and sometimes there were basically statistically insignificant.

I was hoping to see better peformance but I suppose I still do not know enough about multithreaded applications to be able to evaluate the benefit there.

I did originally have my sessionstate items collection using a SortedList and I was using locking, etc. but I did not performance test that as I was too excited to try the concurrent collection. I am wondering if I will see a benefit of rolling back and testing the older version.

7/23/2011 12:04 AM | Cary Abramoff
Gravatar

# re: C#: The Curious ConcurrentDictionary

I typed 3 75% faster but that is a TYPO. I meant 3 times faster. Basically if it took 10 seconds for msft to write it took my implementation with the concurrent dict. 40 seconds. It's been a long night of performance testing :)) Pardon the typo.
7/23/2011 12:07 AM | Cary Abramoff
Gravatar

# re: C#: The Curious ConcurrentDictionary

Keep in mind the main thing that concurrent dictionary buys you is better performance when the number of reads and concurrent readers is greater than the number of writers. In those cases where you expect a lot more concurrent reading with writes mixed in, concurrent dictionary tends to be faster. If you are very write-heavy, though, the writer has to lock out all readers, which makes it much more like a simple dictionary with a single lock.

I revisted the ConcurrentDicitonary and the other collections in a later series and referenced a very good microsoft article that outlined the performance and implementation differences. I really find the microsoft article insightful in both when to favor the concurrent classes and a bit more detail on what goes on under the covers.

http://blackrabbitcoder.net/archive/2011/02/17/c.net-little-wonders-the-concurrentdictionary.aspx
7/24/2011 5:09 PM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

I did more testing and it turned out my initial results were plain wrong. I had added my custom session state module in the web.config using the <httpmodules> section but this actually turns out to do nothing. It needed to be added into the <modules> section. This was very difficult to detect.

At any rate once I got that figured out I could do real performance testing with both a session items collection based on a concurrent dictionary and also a sortedlist (that was in one of the samples I found online so why not :))

My test was 4 buttons on a page that used the ajax scriptmanager to call 2 asmx methods. One wrote one million strings to session vars and the other read them.

Then I cloned the site so I ended up with one site using msft default session state module and the other using my custom module.

To add one million strings to session with default msft inproc module was generally 1-3 seconds on my server.

To add one million strings with the concurrent dictionary was....

300-450 seconds. No joke. I traced and it was indeed hitting my code error free. Just took about 100 times longer.

The sorted list version was a bit faster.

That was yesterday. Today I tried using a hashtable to hold the collection but even with Read/Write Slim locks my test ended up immediately triggering race conditions, i.e. the session var already exists. This did not happen with concurrentdictionary.

As for reads with concurrent dictionary it was still much much slower than whatever msft uses.

MSFT could read 10 million strings in about 20 seconds. With concur. dict. it was 10-20 times slower.

In my experience I never had any type of multi-user problems with inproc session. So whatever they do I could not reproduce. My hashtable version of a sessionstate module was actually slightly faster than msft inproc but again, not threadsafe and when testing with Ajax async calls I actually somehow managed to cause the session collection to self-obliterate. In other words if I synchronously added 4 miillion strings, then the minute I tried to add 2 million by clicking the 2 add buttons simultaneously I ended up back at 1 million. NASTY.

I only ended up on this mission because I was evaluating sessionstateprovider customization, then I had read an article that inproc session state would require one request to complete before the most recently triggered request. So the poster was basically complaining if a user clicked Link A and it was slow running, then decided to bail by clicking link B they were hosed cuz they hadda wait for A to complete.

I never actually experienced that issue, maybe cuz I use all ajax calls with the scriptmanager but at any rate I had the idea in my head I might actually be able to create a higher performing sessionstatemodule but in all honesty, the msft one is purty darn good. No collisions, race conditions, incredible speeds, and no problem with async calls regardless of #.

I am not sure which collection I will evaluate to create a custom sessionstateprovider as my brain is pretty fried from battling with the poor documentation in this relatively unexplored area. The samples are very bad too, some have bugs, and again, my custom module never even worked (no error either) when following the msft documentation to <remove><add> to httpmodules web.config declaration.

I truly wonder what they have going on under the hood that their own online samples fell apart on threading or blew away the new concurrent collections which really had me excited as I thought there would be performance gains without the need for explicit locking.

Thanks for the article. It was helpful and no I am not selling insurance or loans this week :))

7/24/2011 5:52 PM | Cary Abramoff
Gravatar

# re: C#: The Curious ConcurrentDictionary

The MSFT session state provider is extremely optimized for what it does, so I can see replacing it with a single collection would probably be a losing battle. Keep in mind with most collections when there is a write operation at some point the whole structure must be locked down in case of resizing, etc. I'm not sure how the session state works behind the scenes, but I'd be willing to bet it's more optimized for write operations (much like row locks on a database)
7/25/2011 8:46 AM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

Thanks for your feedback James. Row based locking comparisons sound quite appropriate as their performance truly is enviable. So enviable in fact that I will be skipping out of process session state management if at all possible in favor of Amazon AWS Elastic Load Balancing session affinity/sticky sessions to keep right on leveraging the good stuff :))

Your article was quite helpful again and I feel like I got to plug a gap in my architectural toolbelt.

Thanks!

Cary :))
7/25/2011 3:38 PM | Cary Abramoff
Gravatar

# re: C#: The Curious ConcurrentDictionary

@Cary: Glad I could help, sorry I don't know about session state itself :-)
7/26/2011 12:30 PM | James Michael Hare
Gravatar

# re: C#: The Curious ConcurrentDictionary

Hello James

Its really nice of you to share this code. I appreciate the thought behind all of these and the effort you put to share this piece of information. I try to run it and with some little configuration I was able to make it work.
9/1/2011 10:52 PM | BBQ Islands
Gravatar

# re: C#: The Curious ConcurrentDictionary

Thanks for the comparative information between the ConcurrentDictionary and the ConcurrentQueue. Appreciate the comparison!
9/20/2011 9:29 AM | 2012 Survival
Gravatar

# re: C#: The Curious ConcurrentDictionary

Hey nice work.
would be nice to get updated timing values for the new release under win8.
You think you could just run the same code on a win8 machine?
Or post the solution and I will do it ;-)
Regards.
2/17/2012 4:35 AM | Confused
Gravatar

# re: C#: The Curious ConcurrentDictionary

Excellent site, keep up the good work my colleagues would love this. I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say I’m glad I found your blog. Thanks.
5/26/2012 10:07 PM | yChatter
Gravatar

# re: C#: The Curious ConcurrentDictionary

Does this mean that the ReaderWriterLock is always slower than a simple lock? that would not make sense.
3/2/2013 12:31 AM | Buylar
Gravatar

# re: C#: The Curious ConcurrentDictionary

The thing indicated at your blog is somewhat very fascinating; I unearthed load's of aspects to ponder upon on this blog post. I was looking out for debt consolidation guidebooks however.
2/5/2014 8:41 PM | Joseph Modlin
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: