Blog Stats
  • Posts - 44
  • Articles - 0
  • Comments - 67
  • Trackbacks - 0

 

Inside the Concurrent Collections

The concurrent collections, located in the System.Collections.Concurrent namespace, were introduced in .NET 4 as thread-safe collections that could be used without locking, and there are plenty of posts and articles out on the interwebs giving an overview of the collections and how to use them.

Instead, I'll be focusing these posts on how the concurrent collections are implemented; how they achieve thread-safety and the principles behind their implementation. Not only because I find that sort of thing quite interesting, but also to give you ideas about how you might use these principles in your own code.

Note however, that writing bulletproof thread-safe collections is hard. Really hard. Think carefully about somehow using one of the existing collections before trying to write or adapt your own.

What is a 'thread-safe' collection?

First of all, we need to understand what we mean by 'thread-safe'. Well, let's start with the repository of all human knowledge - wikipedia:

A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time
OK, well, as an example, if m_Collection is some sort of 'thread-safe' ICollection, what will the result of these two threads running concurrently do?

Thread 1 Thread 2
m_Collection.Add(m_Item);
bool removed = m_Collection.Remove(m_Item);

The answer depends on exactly which order Thread 1 and Thread 2 run with respect to each other. So, whether m_Item is in m_Collection after these two threads have run depends entirely on the whim of the operating system. Thread 2 could try and remove an item that's not in the collection, setting removed to false, then Thread 1 adds the item to the collection. Or Thread 1 could run first, adding the item, which is then immediately removed by Thread 2, setting removed to true. Thread 1 could then merrily carry on executing, assuming m_Item is in m_Collection, not knowing it's been immediately removed by Thread 2.

That, however, is an implementation detail of whoever wrote the code for Thread 1 and Thread 2. The important thing is that, after these two threads have run the above code in some order, m_Collection will either have the item in it, or not; it won't be in some corrupted half-state where some internal datastructures think it has the item and some don't, so that (for example) m_Collection.Contains(m_Item) returns false but the enumerator still returns the item. So, I propose that this is what is meant by a thread-safe collection:

A thread-safe collection is one that can be modified by multiple threads at the same time without corrupting itself.

Achieving concurrency

There is a very simple way of writing a thread-safe collection - implement ICollection<T> by wrapping a List<T> that locks the backing list object on every method call (if you're unclear about what locks are, I'll be covering them below). This means that only one thread can access and modify the backing list at once; thread-safety is therefore maintained. However, this isn't a very good solution; many threads could be blocked by one costly list resize operation. This solution doesn't have very good concurrency.

Ideally, we want collections that don't keep threads waiting. As we'll see, two of the collections (ConcurrentStack and ConcurrentQueue) achieve thread-safety using no locks at all, and the other collections minimise the chances of two threads blocking on the same lock.

Locking and synchronization primitives

You can't just create a lockless thread-safe datastructure out of nowhere, you need some underlying support from the hardware, operating system and runtime to achieve this. Before we look at the collections themselves, we need to understand these primitives and what guarantees they provide.

  1. Atomic types

    Some types in .NET are specified as being 'atomic'. That is, if you change the value of a variable of an atomic type, it is guaranteed to appear to change value instantaneously; another thread won't be able to see the variable 'half-changed'. Object references and primitive types of 4 bytes or shorter are always atomic (ints, chars, booleans, floats etc), but 8-byte primitives are only atomic when running on a 64-bit process. The atomicity of object references is especially important to lockless collections, as we'll see in later posts.

  2. Volatile variables

    I discussed volatility in a previous post. To recap, marking a variable as volatile means that:

    • The JIT compiler won't cache the variable in a cpu register; it will always read and write to it using the original memory location. This is important when multiple threads are changing a variable at the same time, so that any changes made to the variable in one thread are immediately picked up by other threads.
    • A read or write to a volatile location introduces a memory barrier at that point, so that other reads and writes won't be reordered with respect to each other. This is important later on, as we'll see.

  3. Locks

    Mutual-exclusion locks are one of the more basic synchronization primitives. Each object has a lock associated with it, and by locking on an object you only allow one thread to have 'ownership' of that lock at a time. This allows only one thread to execute the section of code protected by that lock at a time.

    When the currently executing thread 'releases' the lock, another thread (and only one other thread) is then allowed to take control of the lock and execute the protected code. Any other threads waiting to take the lock are blocked and cannot continue executing until it is their turn to take the lock.

    C# has special syntax for locking on an object:

    private readonly object m_LockObj = new object();
    
    public void SynchronizedMethod() {
        lock (m_LockObj) {
            // protected code
         }
    }
    C# actually compiles this method to something like this:
    public void SynchronizedMethod() {
        bool lockTaken = false;
        object lockObj = m_LockObj;
        
        try {
            System.Threading.Monitor.Enter(lockObj, ref lockTaken);
            // protected code
        }
        finally {
            if (lockTaken)
                System.Threading.Monitor.Exit(lockObj);
        }
    }

    This uses the System.Threading.Monitor class to implement the lock. The call to Monitor.Enter blocks the thread until it can take control of the lock, and the lock is released by Monitor.Exit.

  4. System.Threading.Interlocked

    Interlocked provides various methods for performing operations that wouldn't normally be atomic in an atomic fashion. For example, Interlocked.Read allows an 8-byte long to be read as an atomic operation (remember, 8-byte primitives are only atomic on 64-bit processes), Interlocked.Add allows you to perform a = a + b (aka a+=b) atomically, Interlocked.Decrement performs a = a - 1 (aka --a) atomically, etc.

    The most important of these is Interlocked.CompareExchange. This family of methods performs the following operation atomically (using the generic overload as an example):

    public static T CompareExchange<T>(ref T location, T value, T comparand)
      where T : class {
        T originalValue = location;
        if (location == comparand)
            location = value;
        return originalValue;
    }

    This might not seem like a particularly useful atomic operation, but it is crucial to the lockless collections, as we'll see.

  5. System.Threading.SpinWait

    Introduced in .NET 4, this structure encapsulates the idea of a 'spinwait':

    while (!variableThatShouldBeTrueToContinue) {}

    This keeps the thread continually spinning round the while loop, repeatedly checking whether another thread has set variableThatShouldBeTrueToContinue to true. However, performing such a spinwait gives no guarantees that the thread that is meant to set variableThatShouldBeTrueToContinue will be given the chance to execute; the operating system could, if it wishes, simply keep on executing the spinwait and not switch to other threads that actually change this variable.

    System.Threading.SpinWait gets round this problem by, as well as simply spinning, occasionally calling Thread.Sleep and Thread.Yield. This will respectively encourage and force the operating system to execute other threads, giving a chance for the spinning condition to be satisfied.

Using the concurrent collections

At this point it's also worth pointing out that using a thread-safe collection will not make the rest of your code thread-safe and free of race conditions; it is not a panacea. As in the example at the top of this post, using a thread-safe collection does not stop a race condition between adding and removing the same item from the collection. Using a thread-safe collection simply means you have one less thing to worry about when writing threaded code. You still have the rest of your code to worry about!

That's it for introductions; in the next post, we'll look at the simplest of the concurrent collections, ConcurrentStack.

Cross posted from Simple Talk.

Feedback

# re: Inside the Concurrent Collections

Gravatar Well this was a rather disappointing break point for the post.

It was like "and now on to the real topic at hand.... see ya next time folks!" 1/6/2012 1:25 PM | Chris Marisic

# re: Inside the Concurrent Collections

Gravatar This is only the introductory first post in a whole series, it's not meant to be stand-alone. It gives necessary context for the posts to follow. If I had written all what I plan to say on this subject in a single post it would simply be too long! 1/6/2012 2:59 PM | Simon Cooper

# re: Inside the Concurrent Collections

Gravatar I agree. Anti climax 1/12/2012 7:26 AM | vince

Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

 

 

Copyright © simonc