Friday, April 07, 2006 12:52 PM
There is something I've been wanting to post about for a while, but just didn’t have enough info to make it worth my, or anyone who reads this, while.
We all know that .Net 2.0 came out with something wonderful called Generics. I've been profiling a lot of my 1.1 code, comparing it to my 2.0 code that utilizes Generics, and the one change that has given me the greatest, most dramatic performance benefits is to switch from IComparer to IComparer.
I use IComparer a LOT. But there is a problem with it is it's Compare() method. The following is the pattern I use on all my compare functions:
public int Compare(object x, object y)
{
if (x == null || y == null)
throw new ApplicationException("Invalid NGram Compare");
NGram n1 = x as NGram;
NGram n2 = y as NGram;
return n1.Score.CompareTo(n2.Score);
}
It takes an object for both its comparison operators. Then I have to cast it to NGram before I can compare against my Score property. This extra step may not be much, but if you are sorting an array that holds 1.5 million NGram objects, this adds up to a LOT of operations.
Enter IComparer! This is the Generics version of IComparer, and its compare function would look like this for IComparer.
public int Compare(NGram x, NGram y)
{
if (x == null || y == null)
throw new ApplicationException("Invalid NGram Compare");
return x.Score.CompareTo(y.Score);
}
Notice that two NGram instances are passed into the Compare function, instead of two objects. This lets me save the casting operations. YEAH! It must be faster, right?
So like any good performance guy (or gall for that matter) I revved up my favorite code profiler and ran two tests: one with the generic comparer and one with the object comparer, and sorted an array of 1.5 million NGram objects a few times. And guess what I saw.
The generics comparer was slower than the object comparer. And not just a little bit slower either. But 61% slower!!! Oh my gosh…what the hell is going on here? Generics have to be faster! Its less code!
So, to get to the bottom of this mystery, I opened my assembly in Reflector and took a peek at the IL code for each function. I know IL doesn’t lie to me. What I saw was very interesting, and it had to do with checking an object to see if it is null.
This is a section of IL from the generics Compare function.
L_0000: ldarg.1
L_0001: ldnull
L_0002: call bool NGram::op_Equality(NGram, NGram)
L_0007: brtrue.s L_0012
L_0009: ldarg.2
L_000a: ldnull
L_000b: call bool NGram::op_Equality(NGram, NGram)
L_0010: brfalse.s L_001d
L_0012: //throw exception stuff
This is pretty much what I expected to see. Argument 1 is loaded onto the stack, then a null is loaded onto the stack. Then the NGram's operator equality function is called to see if the two are the same or not (is the ngram null). If it is, then it branches down to the throw new exception code. It then loads argument 2 onto the stack, and another null onto the stack. And does another NGram operator equality check. Like I said before, nothing really amazing here, and pretty much what I'd expect. The IL has to call the operator Equality function just to be sure that I didn’t override it in my NGram class.
Ok, now lets take a look at how nulls were checked I the object comparer's IL code to see what was so different that it would be 61% faster. This is what I saw:
L_000e: ldarg.1
L_000f: brfalse.s L_0014
L_0011: ldarg.2
L_0012: brtrue.s L_001f
L_0014: //throw exception stuff
Damn…that’s a lot less code! What's going on here? Well, it looks like the IL compiler does something that the C# compiler wont let you get away with. In C++ a NULL, a 0, and FALSE are all the same thing. They are all 0. But the C# compiler doesn’t allow you to make this leap. Null, 0, and false are three distinctly different things. So what the IL code is doing is loading the first NGram instance onto the stack, then just doing a false equality check. Since null the same as false (in IL) this works just fine.
The C# compiler knows, when comparing null an object that is casted all the way down to Object, it can just compare it to false and call it good. So why didn’t the C# compiler do this for the NGram null comparison? Because I could have overloaded the == operator in my NGram class, that’s why. And if I did, then it would need to call it. But doesn’t the C# compiler have enough info to check if I have overloaded it or not, and if not do a false comparison? Yes it does, but it looks like that’s one optimization it doesn’t do, unfortuanttly
So to test this theory out, I took the null check out of my IComparer.Compare function and re-ran my test code under the profiler, and this time the generic comparer without the null checks were 66% faster than the object comparer. Ahhhh, satisfaction at last.
This exercise reinforced something in my head. Even if you KNOW that thing a is faster then thing b, always profile it just to be sure. Yes, a generic typed comparer is much faster than a normal object comparer. But if I had just done the code change and called it good, I would have actually slowed my app down. Which is a bad thing.