Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  141 Posts | 8 Stories | 358 Comments | 162 Trackbacks

News



Archives

Post Categories

Programming

Since most of us use multiple threads we get all sorts of race conditions which we can solve with proper locking and concurrent data structures. A simple issue is if you have concurrency on an object which exposes a Dictionary<string,string> from where you get exceptions back if concurrent writers add the same data to it. You would then either need to coordinate the writers or use a different dictionary which allows concurrent adds. As it turns out ConcurrentDictionary is just the collection you were searching for and your exceptions are gone. You let the tests run. All green, check in and submit it to the main branch.

A long time later at the end of the project until now ignored performance issues tend to get identified and fixed. Someone calls you that your checkin from a long time ago is the root cause of a major performance issue and that you are showing up as mount Everest in the VirtualAlloc reports.

 

image

How can that be? The problem with the Concurrent collections which claim to be largely lock free is that to achieve uncontended writes you have to allocate your key data structures per core. This means that you are allocating memory proportional to the number of your cores. This is not a problem if you have relatively few instances around but if you are exchanging a dictionary in key data structures which are heavily allocated you will get much slower. You can test it for yourself with a small sample app like this:

    class DataObject // Original Version
    {
        public IDictionary<string, string> Data { get; private set; }

        public DataObject()
        {
            Data = new Dictionary<string, string>();
        }
    }

    class FixedDataObject // Fixed version with ConcurrentDictionary
    {
        public IDictionary<string, string> Data { get; private set; }

        public FixedDataObject()
        {
            Data = new ConcurrentDictionary<string, string>();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            List<DataObject> dataList = new List<DataObject>();
            List<FixedDataObject> dataListFixed = new List<FixedDataObject>();

            var sw = Stopwatch.StartNew();
            const int Runs = 300*1000;
            for(int i=0;i<Runs;i++)
            {
                if( args.Length == 0 ) // No args use old
                {
                    dataList.Add(new DataObject());
                }
                else // some args passed use fixed version
                {
                    dataListFixed.Add(new FixedDataObject());
                }
            }
            sw.Stop();
            Console.WriteLine("{0:N0} did take {1:F2}s", Runs, sw.Elapsed.TotalSeconds);

            GC.KeepAlive(dataList);
            GC.KeepAlive(dataListFixed);
        }
    }

Old Version

300.000 did take 0,14s

Fixed Version with Concurrent Dictionary

300.000 did take 1,08s

Update

On a 24 core server it takes 3,3s and it consumes 1,4GB of memory. Data structures which allocate proportional to the number of cores for the sake of thread performance do not sound like a good idea to me.

This is over 7 times slower and introduces tons of GCs in your application limiting what you can do in parallel even further. The worst part of this is that the bigger the server you let it run on the slower (more cores means more memory to allocate!) it will get. On a slow dev machine with 2 or 4 cores you might not notice it much. But if you deploy this on real server hardware with 40 cores you have a really bad performance problem at hand. That is the reason why you need to do performance testing on virtually every hardware configuration you support.

The fix was to go back to a plain dictionary with a lock. ConcurrentDictionary supports a loadfactor which controls how many arrays for isolated writes is should allocated but it was still a factor two slower.

So beware by simply replacing non concurrent data structures with their lock free but much more memory hungry counterparts you might be making your system much slower. In effect we were loading data slower the bigger the server did get. This is definitively not what the customer has paid for.

posted on Monday, January 26, 2015 1:45 AM

Feedback

# re: ConcurrentCollections Can Be Prohibitively Expensive 1/26/2015 11:14 PM Ivan Patrick
I think that use of a concurrent dictionary is have a multi-thread app, where one concurrentdictionary is accessed by threads, without we worried. In this case, one concurrent dictionary have more resources that a normal dictionary.
The test would be with a multi-thread app, where we have one normal dictionary and one concurrent dictionary storing values of for multi-thread app.

# re: ConcurrentCollections Can Be Prohibitively Expensive 1/27/2015 1:49 AM Alois Kraus
You are talking about multi threaded access performance. That is different from the overhead caused by creating the dictionaries in the first place. In the specific scenario ConcurrentDictionary was used as thread safe drop in replacement to ensure correct working. Correctness first, Performance second. That is the order how it should be. Unfortunately there was no second check for performance after the application did work properly.

# re: ConcurrentCollections Can Be Prohibitively Expensive 2/4/2015 12:32 PM Thomas Levesque
OMG! I had no idea... I'll think twice before using concurrent collections now. Thanks for the tip!

# re: ConcurrentCollections Can Be Prohibitively Expensive 7/25/2015 3:11 AM Robert van Poelgeest
This bug is not fixed in the 4.6 version. On a 40 2008R2 server I get:
D:\Data>ConcurrentCollection.exe
300.000 did take 0,12s
D:\Data>ConcurrentCollection.exe concurrent
300.000 did take 13,27s

And the last run takes a full GB of memory

Post A Comment
Title:
Name:
Email:
Comment:
Verification: