Inside the Concurrent Collections: ConcurrentDictionary

Using locks to implement a thread-safe collection is rather like using a sledgehammer – unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised.

This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over.

The problem with locks

There’s several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you’re unsure as to what lock does in C#, I briefly covered it in my first post in the series):

Locks block threads
The most obvious problem is that threads waiting on a lock can’t do any work at all. No preparatory work, no ‘optimistic’ work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you’re trying to maximise concurrency.
Locks are slow
Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There’s quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance.
Deadlocks
When using locks there’s always the possibility of a deadlock – two threads, each holding a lock, each trying to aquire the other’s lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we’ll see.

So, it’s important to minimise where locks are used to maximise the concurrency and performance of the collection.

Implementation

As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I’ll be using some concepts introduced in that post, so I recommend you have a quick read of it.

So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn’t allow much concurrency.

Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this – one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method:

private void GetBucketAndLockNo(int hashcode, out int bucketNo, out int lockNo,
                                int bucketCount) {
  // the bucket number is the hashcode (without the initial sign bit)
  // modulo the number of buckets
  bucketNo = (hashcode & 0x7fffffff) % bucketCount;

  // and the lock number is the bucket number modulo the number of locks
  lockNo = bucketNo % m_locks.Length;
}

However, this does require some changes to how the buckets are implemented. The ‘implicit’ linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket:

This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets.

Modifying the dictionary

All the operations on the dictionary follow the same basic pattern:

void AlterBucket(TKey key, ...) {
  int bucketNo, lockNo;
  GetBucketAndLockNo(key.GetHashCode(), out bucketNo, out lockNo,
                     m_buckets.Length);

  lock (m_locks[lockNo]) {
    Node headNode = m_buckets[bucketNo];

    Mutate the node linked list as appropriate
  }
}

For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern.

Performance issues

There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list.

To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.)

Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock.

How can we eliminate this? Simple – ensure that threads never try to ‘swap’ locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block.

At this point, it’s worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don’t have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn’t going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety.

Consequences of growing the array

Just because we’re using locks doesn’t mean that race conditions aren’t a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to:

private void GrowTable(Node[] buckets) {
  try {
    Acquire first lock in the locks array
        // this causes any other thread trying to take out
        // all the locks to block because the first lock in the array
        // is always the one taken out first

        // check if another thread has already resized the buckets array
        // while we were waiting to acquire the first lock
        if (buckets != m_buckets) return;

    Calculate the new size of the backing array

        Node[] array = new array[size];

    Acquire all the remaining locks

            Re -
        hash the contents of the existing buckets into array

            m_buckets = array;
  } finally {
    Release all locks
  }
}

As you can see, there’s already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it’ll see that the array has already been resized & exit without doing anything.

There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation:

Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers.
Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2.
Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8).
Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid.

Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here:

void AlterBucket(TKey key, ...) {
  while (true) {
    Node[] buckets = m_buckets;

    int bucketNo, lockNo;
    GetBucketAndLockNo(key.GetHashCode(), out bucketNo, out lockNo,
                       buckets.Length);

    lock (m_locks[lockNo]) {
      // if the state has changed, go back to the start
      if (buckets != m_buckets)
        continue;

      Node headNode = m_buckets[bucketNo];

      Mutate the node linked list as appropriate
    }
    break;
  }
}

TryGetValue and GetEnumerator

And so, finally, we get onto TryGetValue and GetEnumerator. I’ve left these to the end because, well, they don’t actually use any locks.

How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don’t actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don’t need to take out a lock because they don’t change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything.

However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this).

This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it’s key & value removed before the node itself has been removed from the linked list).

Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node’s m_next pointer.

Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads – changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It’s worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary:Lockless:

TryGetValue
GetEnumerator
The indexer getter
ContainsKey

Takes out every lock (lockfull?):

Count
IsEmpty
Keys
Values
CopyTo
ToArray

Concurrent principles

That covers the overall implementation of ConcurrentDictionary. I haven’t even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we’ve looked at enough to be able to extract some useful principles for concurrent programming:

PartitioningWhen using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions.
Ordered lock-takingWhen a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks.
Lockless readsRead operations that don’t care about dirty reads don’t take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection.

That leads us to the final collection in this little series – ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

This article is part of the GWB Archives. Original Author: Simon Cooper

Replatforming Guide: Pros, Cons, and Impact

Deciding to replatform is no small feat; it’s like setting sails for new horizons with your digital presence. Weighing the

Cypress vs Selenium: Why Cypress is Better!

Navigating the competitive landscape of web testing tools, Cypress emerges as a noteworthy contender, outshining Selenium with its cutting-edge advantages.