James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

.NET

CSharp

Little Wonders

Little Wonders

vNext

C#: The Curious ConcurrentDictionary

Update: I revised some of my initial thoughts after taking  bit of my own advice and thinking about the less-tangible benefits of a ConcurrentDictionary.  This leads me to believe that ay time you use a Dictionary in a read/write manner in a multi-threaded environment, you should use ConcurrentDictionary instead for the simplicity and safety.  It's easier to read and more maintainable, and even if you are write-heavy, it's not orders-of-magnitude slower so I think it's worth it for the safety and maintainability gains (not to mention performance gains on read-heavy uses.

In my previous post (here) I did a comparison of the new ConcurrentQueue versus the old standard of a System.Collections.Generic Queue with simple locking.  The results were exactly what I would have hoped, that the ConcurrentQueue was faster with multi-threading for most all situations.  In addition, concurrent collections have the added benefit that you can enumerate them even if they're being modified.

So I set out to see what the improvements would be for the ConcurrentDictionary, would it have the same performance benefits as the ConcurrentQueue did?  Well, after running some tests and multiple tweaks and tunes, I have good and bad news.

But first, let's look at the tests.  Obviously there's many things we can do with a dictionary.  One of the most notable uses, of course, in a multi-threaded environment is for a small, local in-memory cache.  So I set about to do a very simple simulation of a cache where I would create a test class that I'll just call an Accessor.  This accessor will attempt to look up a key in the dictionary, and if the key exists, it stops (i.e. a cache "hit").  However, if the lookup fails, it will then try to add the key and value to the dictionary (i.e. a cache "miss"). 

So here's the Accessor that will run the tests:

   1: internal class Accessor
   2: {
   3:     public int Hits { get; set; }
   4:     public int Misses { get; set; }
   5:     public Func<int, string> GetDelegate { get; set; }
   6:     public Action<int, string> AddDelegate { get; set; }
   7:     public int Iterations { get; set; }
   8:     public int MaxRange { get; set; }
   9:     public int Seed { get; set; } 
  10:  
  11:     public void Access()
  12:     {
  13:         var randomGenerator = new Random(Seed); 
  14:  
  15:         for (int i=0; i<Iterations; i++)
  16:         {
  17:             // give a wide spread so will have some duplicates and some unique
  18:             var target = randomGenerator.Next(1, MaxRange); 
  19:  
  20:             // attempt to grab the item from the cache
  21:             var result = GetDelegate(target); 
  22:  
  23:             // if the item doesn't exist, add it
  24:             if(result == null)
  25:             {
  26:                 AddDelegate(target, target.ToString());
  27:                 Misses++;
  28:             }
  29:             else
  30:             {
  31:                 Hits++;
  32:             }
  33:         }
  34:     }
  35: }

Note that so I could test different implementations, I defined a GetDelegate and AddDelegate that will call the appropriate dictionary methods to add or retrieve items in the cache using various techniques.

So let's examine the three techniques I decided to test:

  • Dictionary with mutex - Just your standard generic Dictionary with a simple lock construct on an internal object.
  • Dictionary with ReaderWriterLockSlim - Same Dictionary, but now using a lock designed to let multiple readers access simultaneously and then locked when a writer needs access.
  • ConcurrentDictionary - The new ConcurrentDictionary from System.Collections.Concurrent that is supposed to be optimized to allow multiple threads to access safely.

So the approach to each of these is also fairly straight-forward.  Let's look at the GetDelegate and AddDelegate implementations for the Dictionary with mutex lock:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         lock (_mutex)
   4:                         {
   5:                             _dictionary[key] = val;
   6:                         }
   7:                     };
   8: var getDelegate = (key) =>
   9:                     {
  10:                         lock (_mutex)
  11:                         {
  12:                             string val;
  13:                             return _dictionary.TryGetValue(key, out val) ? val : null;
  14:                         }
  15:                     }; 

Nothing new or fancy here, just your basic lock on a private object and then query/insert into the Dictionary.

Now, for the Dictionary with ReadWriteLockSlim it's a little more complex:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         _readerWriterLock.EnterWriteLock();
   4:                         _dictionary[key] = val;
   5:                         _readerWriterLock.ExitWriteLock();
   6:                     };
   7: var getDelegate = (key) =>
   8:                     {
   9:                         string val;
  10:                         _readerWriterLock.EnterReadLock();
  11:                         if(!_dictionary.TryGetValue(key, out val))
  12:                         {
  13:                             val = null;
  14:                         }
  15:                         _readerWriterLock.ExitReadLock();
  16:                         return val;
  17:                     };  

And finally, the ConcurrentDictionary, which since it does all it's own concurrency control, is remarkably elegant and simple:

   1: var addDelegate = (key,val) =>
   2:                     {
   3:                         _concurrentDictionary[key] = val;
   4:                     };
   5: var getDelegate = (key) =>
   6:                     {
   7:                         string s;
   8:                         return _concurrentDictionary.TryGetValue(key, out s) ? s : null;
   9:                     };    

                  
Then, I set up a test harness that would simply ask the user for the number of concurrent Accessors to attempt to Access the cache (as specified in Accessor.Access() above) and then let them fly and see how long it took them all to complete.  Each of these tests was run with 10,000,000 cache accesses divided among the available Accessor instances.  All times are in milliseconds.

   1: Dictionary with Mutex Locking
   2: ---------------------------------------------------
   3: Accessors   Mostly Misses    Mostly Hits
   4: 1           7916             3285
   5: 10          8293             3481
   6: 100         8799             3532
   7: 1000        8815             3584
   8:  
   9:  
  10: Dictionary with ReaderWriterLockSlim Locking
  11: ---------------------------------------------------
  12: Accessors    Mostly Misses    Mostly Hits
  13: 1            8445             3624
  14: 10           11002            4119
  15: 100          11076            3992
  16: 1000         14794            4861
  17:  
  18:  
  19: Concurrent Dictionary 
  20: ---------------------------------------------------
  21: Accessors    Mostly Misses    Mostly Hits
  22: 1            17443            3726
  23: 10           14181            1897
  24: 100          15141            1994
  25: 1000         17209            2128

The first test I did across the board is the Mostly Misses category.  The mostly misses (more adds because data requested was not in the dictionary) shows an interesting trend.  In both cases the Dictionary with the simple mutex lock is much faster, and the ConcurrentDictionary is the slowest solution.  But this got me thinking, and a little research seemed to confirm it, maybe the ConcurrentDictionary is more optimized to concurrent "gets" than "adds".  So since the ratio of misses to hits were 2 to 1, I decided to reverse that and see the results.

So I tweaked the data so that the number of keys were much smaller than the number of iterations to give me about a 2 to 1 ration of hits to misses (twice as likely to already find the item in the cache than to need to add it).  And yes, indeed here we see that the ConcurrentDictionary is indeed faster than the standard Dictionary here.  I have a strong feeling that as the ration of hits-to-misses gets higher and higher these number gets even better as well.  This makes sense since the ConcurrentDictionary is read-optimized.

Also note that I tried the tests with capacity and concurrency hints on the ConcurrentDictionary but saw very little improvement, I think this is largely because on the 10,000,000 hit test it quickly ramped up to the correct capacity and concurrency and thus the impact was limited to the first few milliseconds of the run.

So the ConcurrentDictionary is very good at read-heavy uses, and a bit slower at write-heavy uses, but with the obvious additional level of safety and maintainability that the ConcurrentDictionary brings, I would recommend the following:

Use System.Collections.Generic.Dictionary when:

  • You need a single-threaded Dictionary (no locking needed).
  • You need a multi-threaded Dictionary that is loaded only once at creation and never modified (no locking needed).

And use System.Collections.Concurrent.ConcurrentDictionary when:

  • You need a multi-threaded Dictionary where both reads and writes are being performed.
  • You need to be able to iterate over the collection without locking it even if its being modified.

Both Dictionaries have their strong suits, but the more i think about it, even with the slighty slower speed on write-heavy multi-threaded use, it's worth it for the safety and performance, and for read-heavy use, it is clearly faster.

 

 Technorati Tags: , , , ,

 

Print | posted on Wednesday, June 9, 2010 5:45 PM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Powered by: