Lazy Vs Eager Init Singletons / Double-Check Lock Pattern

Singletons are the easiest programming pattern to learn which seems to be the reason why it is often used in places where it is inappropriate. They are very straightforward to implement and understand. Just create a static field and return the static instance in a public get property. A Logger Singleton is a classic example of this pattern:

public class Logger
{

static LogWriter writer = new LogWriter();

public static LogWriter Writer

{

get { return writer; }

}

If you wrote such code before and thought that this is perfectly working in all cases I must disappoint you. There are subtle things going on when static fields are initialized by the CLR. This implementation is thread safe because static fields are initialized only once by the CLR before any method can be called or a static field can be accessed from the outside. More details can be found inside the ECMA-334 C# Language Specification and ECMA-335 Common Language Infrastructure (CLI). Now comes the tricky part. When do you think will be LogWriter instance be instantiated? To answer this question let's have a look at two nearly identical Singletons.

Two Singleton Implementations

///

/// .class public auto ansi beforefieldinit EagerInitSingleton

/// extends object

///

public class EagerInitSingleton

{

private EagerInitSingleton()

{

Console.WriteLine("EarlyInitSingleton was instantiated");

}

static EagerInitSingleton myInstance = new EagerInitSingleton

public static EagerInitSingletonInstance

{

get { return myInstance; }

}

};

///

/// .class public auto ansi LazySingleton

/// extends object

///

public class LazySingleton

{

static LazySingleton() { } // Explicit Type initializer removes the beforefieldinit Flag

private LazySingleton()

{

Console.WriteLine("Singleton was instantiated");

}

static LazySingleton myInstance = new LazySingleton

public static LazySingletonInstance

{

get { return myInstance; }

}

};

This two simple Singleton classes have as only difference that the LazySingleton class does contain an empty type initializer (static ctor). A type initializer is as the name implies called before any constructor to initialize the type such as instantiating the static member variables. If an exception is thrown inside the Type Initializer you get a TypeInitializationException which is quite bad because the initializer is run only once for each type and then never again by the runtime. You can "fix" this by calling RuntimeHelpers.RunClassConstructor for yourself but this is a bad hack. Now let's use our Singletons.

public static void UseSingletons(bool bShouldLog)

{

Console.WriteLine("Function Start");

if (bShouldLog)

{

EagerInitSingleton inst1 = EagerInitSingletonInstance;

LazySingleton inst2 = LazySingleton.Instance;

}

Console.WriteLine("Function End");

}

The UseSingletons method can be called with false where no Singleton will be touched. Can you guess the outcome?

UseSingletons(false)

EagerInitSingleton was instantiated
Function Start
Function End

UseSingletons(true)

EagerInitSingleton was instantiated
Function Start
LazySingleton was instantiated
Function End

The EagerInitSingleton deserves it's name because it is initialized before our function is entered regardless if it is used inside it or not! This behavior is controlled with the beforefieldinit flag in IL code at the class declaration which you can check with Reflector in IL mode or by simply looking into the comments of the Singleton class declarations where I did put the output of Reflector. C# does not allow us to directly control the initialization behavior but if we declare an type initializer inside our Singleton class the C# compiler does not emit the beforefieldinit Flag. For performance reasons the C# compiler does emit the beforefieldinit flag to speed up the JITer which has now more freedom to initialize the types when he thinks it is appropriate. You will not notice the difference until you loose performance by this side effect.

Static Type Initialization Order

It is not only a tricky thing when a static field initialized but also in what order the members of a type are initialized. The ECMA Spec (ECMA-334 chapter 17.4.5) states that fields are initialized in the order they are declared but it is strongly discouraged to rely on such a thing. Static variables are initialized in two stages.

Initialize static field to its default value (0 or null depending on its type).
Initialize static field with its actual value.

The spec clearly states that it is possible to observe variables in its default but not yet initialized state. This does mean that it is possible to see a null reference of an statically initialized object under special circumstances! This part of the spec becomes relevant in our life as programmers when we have circular dependencies which can happen in your code. The following example declares two class types A and B which have a static member of the other type inside it. To make it clear when the type initializer is run a console message is printed.

Two classes with static members

class A

{

static B b = new B();

static A()

{

Console.WriteLine("A Type Initializer Thread {0}", Thread.CurrentThread.ManagedThreadId);

}

public A()

{

Console.WriteLine("A Ctor from Thread {0}, Static Field: {1}",

Thread.CurrentThread.ManagedThreadId, (b == null) ? "null" : "Initialized");

Thread.Sleep(1000);

}

public static void AFunction()

{

Console.WriteLine("A Static function Thread {0}, Static Field: {1}",

Thread.CurrentThread.ManagedThreadId, (b == null) ? "null" : "Initialized");

}

};

class B

{

static A a = new A();

static B()

{

Console.WriteLine("B Type Initializer Thread {0}", Thread.CurrentThread.ManagedThreadId);

}

public B()

{

Console.WriteLine("B Ctor Thread {0}, Static Field: {1}",

Thread.CurrentThread.ManagedThreadId, (a == null) ? "null" : "Initialized");

Thread.Sleep(1000);

}

public static void BFunction()

{

Console.WriteLine("B Static function Thread {0}, Static Field: {1}",

Thread.CurrentThread.ManagedThreadId, (a == null) ? "null" : "Initialized");

}

};

Such a circular dependency does happen in many programs, although it will be not so direct like in this example. What will happen if we call one of the static functions of A or B?

public static void StaticInit()

{

Console.WriteLine("Function Start");

A.AFunction();

Console.WriteLine("Function End");

}

        Function Start
        A Ctor from Thread 1, Static Field: null
        B Type Initializer Thread 1
        B Ctor Thread 1, Static Field: Initialized
        A Type Initializer Thread 1
        A Static function Thread 1, Static Field: Initialized
        Function End

Here is A::b == null inside the ctor of class A although we would expect a non null value. I did tell you that type initializers are run before any ctor right? It should be impossible that we encounter a null reference of a statically initialized member but this assumption does no longer hold true in case of circular dependencies. To avoid endless recursion other rules apply here. If we set a breakpoint with our debugger inside the ctor of A we see what is going on:

            StaticInit.A..ctor()
            StaticInit.B..cctor()
            [PrestubMethodFrame: 0012e6b8] StaticInit.B..ctor()
            StaticInit.A..cctor()
            [PrestubMethodFrame: 0012f424] StaticInit.A.AFunction()

Before the method StaticInit.A.AFunction can be called the following things happen.

Invoke A's type initializer (cctor)
Instantiate member variable A::b = new B(); which causes the CLR to prepare a call to B's ctor
Invoke B's type initializer before the ctor of B can be run
Invoke B::a = new A(); which calls the ctor of A

The last call to the ctor of A should cause a call to the type initializer of A before the ctor can run but since we are already during the initialization of A we would end up inside a recursion. To prevent such bad things at this point the CLI gives up and let us get away with static fields which are set to its default values but are not yet initialized. This behavior is explained in greater detail in ECMA-335 (chapter 9.5.3.3: Races and Deadlocks). Similar mechanisms exist to prevent deadlocks if we initialize different types from within different tthreads. The Spec outlines the Deadlock/Race resolver algorithm for type initialization as follows:

Two separate threads might start attempting to access static variables of separate
types (A and B) and then each would have to wait for the other to complete initialization.

1. At class load time (hence prior to initialization time) store zero or null into all static fields of the type.
2. If the type is initialized you are done.
2.1. If the type is not yet initialized, try to take an initialization lock.
2.2. If successful, record this thread as responsible for initializing the type and proceed to step 2.3. 8
2.2.1. If not, see whether this thread or any thread waiting for this thread to complete already holds the lock.
2.2.2. If so, return since blocking would create a deadlock. This thread will now see an incompletely initialized state for the type, but no deadlock will arise.
2.2.3 If not, block until the type is initialized then return.
2.3 Initialize the parent type and then all interfaces implemented by this type.
2.4 Execute the type initialization code for this type.
2.5 Mark the type as initialized, release the initialization lock, awaken any threads waiting for this type to be.initialized, and return.

Luckily you need rarely know this stuff in such great detail but does not hurt your brain much either ;-). What have we learned from this? Variable initialization is a quite complex process but the CLI does all the heavy lifting behind the scenes. To initialize a type with statics inside it in a thread safe manner special care has to be taken during your class design.

Deterministic And Thread Safe Initialization

The oddities of static variable initialization can be circumvented doing the variable initialization by ourself where you have full control. In this case the Double-Check Lock pattern is perfectly suited for this task but unfortunately a big confusion exists how a correct multi threaded implementation does look like. There were changes in the memory model of the CLR between 1.0 and 2.0 which are not very well documented. This makes it not easier to determine if one or the other implementation does work correct with every CLR version because you need to know the underlying memory model of your language. Java for example has a memory model where this pattern does not work. Things might have changed with Java 1.5 but it was hard enough to get the correct information about the behavior of the Double-Check Lock pattern for C#. To be able to show you that my implementation of this pattern is correct we need to know how volatile does affect code generation.

What the volatile keyword in C# really does

There are many myth around what the volatile keyword in C# is really doing. The only way to know what really happens is to look at disassembled code. I used a small code snippet that does a double check of an variables reference. Here we can study how the volatile keyword does affect code generation. The following code was used:

Char [] cArr = new Char[20];

public void Func()

{

object secondRef = cArr;

if (secondRef == cArr)

{

if( secondRef == cArr )

return;

}

It does simply check if the reference is the same as before. This code can is quite good optimized by the JIT compiler by storing the first object reference inside a register without the need to check the memory location of the cArr object a second or third time:

        public void Func()
        {
            object secondRef = cArr;
00000000 mov         edx,dword ptr [ecx+4]
00000003 mov         eax,edx
            if (secondRef == cArr)
00000005 cmp         eax,edx
00000007 jne         00000009
00000009 ret

}

When we decorate the array declaration with the volatile keyword the JITer is no longer allowed to store any previous addresses of the array inside a register. Each compare call will need to get the arrays address from memory.

        public void Func()
        {
            object secondRef = cArr;
00000000 mov         eax,dword ptr [ecx+4]
            if (secondRef == cArr)
00000003 cmp         eax,dword ptr [ecx+4]
00000006 jne         0000000B
            {
                if( secondRef == cArr )
00000008 cmp         dword ptr [ecx+4],eax
0000000b ret

}

Did you notice the differences of the generated code? If you omit the volatile keyword the second check is simply removed and a register compare is performed with the previously determined address of our array. cmp eax,edx becomes with volatile cmp eax,dword ptr [ecx+4] which does in effect work like a memory barrier. In other words the volatile keyword forces the JITer to access the memory location of the volatile variable each time it is requested. No register caching will ever happen.

Does The "lock" Keyword Affect Memory Barriers?

We can check this quite easy by placing a lock(this) around the code of our test function and declare the array as a non volatile field. The disassembled code does reflect only the behavior of the JITer for x86 32-Bit code. On other CPU platforms will look different.

        public void Func()
        {
            lock (this)
00000000 push        ebp
00000001 mov         ebp,esp
00000003 push        edi
00000004 push        esi
00000005 push        ebx
00000006 sub         esp,18h
00000009 xor         eax,eax
0000000b mov         dword ptr [ebp-24h],eax
0000000e xor         eax,eax
00000010 mov         dword ptr [ebp-18h],eax
00000013 mov         dword ptr [ebp-24h],ecx
00000016 call        79173A71
            {
                object secondRef = cArr;
0000001b mov         eax,dword ptr [ebp-24h]
0000001e mov         edx,dword ptr [eax+4]
00000021 mov         ecx,edx
                if (secondRef == cArr)
00000023 cmp         ecx,edx
00000025 jne         00000040
                {
                    if (secondRef == cArr)
00000027 cmp         ecx,edx
00000029 jne         00000040
0000002b mov         dword ptr [ebp-1Ch],0
00000032 mov         dword ptr [ebp-18h],0FCh
00000039 push        0D00140h
0000003e jmp         00000055
00000040 mov         dword ptr [ebp-1Ch],0
00000047 mov         dword ptr [ebp-18h],0FCh
0000004e push        0D00149h
00000053 jmp         00000055
                        return;
00000055 mov         ecx,dword ptr [ebp-24h]
00000058 call        79173CEB
0000005d pop         eax
0000005e jmp         eax
00000060 lea         esp,[ebp-0Ch]
                }
            }
        }

I have marked the compare operations in question with yellow again. As it turns out the lock keyword does NOT prevent any JIT optimizations inside the lock statement with regards to variable access. But it does create a read memory barrier after entering the the lock statement which does cause a reload of all variable references from memory. Any register cached object addresses are not reused after the lock is entered. This is true for the generated code on a Intel P4 but on other CPU's the picture can look quite different. When we deal with multiple CPU's, branch prediction, speculative execution, ... things get very complicated and we will have to consult the CPU vendors manual how the JIT generated assembler code does behave. The Itanium CPU for example has load and store operations with acquire semantics (ld.acq (read barrier) , st.rel (write barrier)) to ensure that our load and store operations are not reorganized by the CPU in a way we do not want.

Is "volatile" Needed For Double Checked-Locking?

volatile Char [] cArr = new Char[20];

public void Func()

{

object secondRef = cArr;

if (secondRef == cArr)

{

lock (this)

{

if (secondRef == cArr)

cArr = new Char[0x10];

}

I have compared the generated assembler code of both versions and found to my surprise no differences in the generated assembler code at my Intel P4. You cannot draw from this result the conclusion that there will be no differences between these two variants at other CPU platforms. Joe Duffy has an extensive article about the Double-Check Lock pattern online where he talks about the IA64 memory model. The answer to this question is: You need the volatile keyword if you want to ensure that the Double-Check Lock pattern will work on all platforms.

The Correct Double Checked-Lock Pattern Implementation

To put a long story short if you stick with the following Double-Check Lock pattern implementation on reference types you are on the safe side. You need at least

A volatile instance field
A lock after the first null check

Double-Checked Lock Singleton

class DoubleLockSingleton

{

private DoubleLockSingleton() { }

static volatile DoubleLockSingleton instance;

static object myLock = new object();

public DoubleLockSingleton Instance

{

get

{

if (instance == null)

{

lock (myLock)

{

if (instance == null)

{

instance = new DoubleLockSingleton();

}

return instance;

}

Second Variant with an explicit volatile read

public class StructDoubleLockChecking

{

private static object myLock = new object();

private static DateTime instance; // we have a struct instance which we cannot check against null

private static int initialized;

public DateTime Instance

{

get

{

if (Thread.VolatileRead(ref initialized) == 0)

{

lock (myLock)

{

if (initialized == 0)

{

instance = DateTime.Now;

initialized = 1;

}

return instance;

}

Since some CPU's can reorder your reads and writes even above your lock statement we need to do a volatile read on our initialized variable correctly. This can be achieved by using the volatile modifier or the more explicit Thread.VolatileRead functions. I hope you have enjoyed reading this article while I enjoyed debugging into the CLI ;-).

Two Singleton Implementations

Static Type Initialization Order

Two classes with static members

Deterministic And Thread Safe Initialization

What the volatile keyword in C# really does

Does The "lock" Keyword Affect Memory Barriers?

Is "volatile" Needed For Double Checked-Locking?

The Correct Double Checked-Lock Pattern Implementation

Double-Checked Lock Singleton

Second Variant with an explicit volatile read

New on Geeks with Blogs