I am sure by now that most know how floating point approximations work on a computer.. They can be quite interesting. This has to to be the weirdest experience I have ever had with them though
Open a new console application in .NET 2.0 (set to build in release mode /debug:pdbonly should be the default) it is important for me to note that all of this code runs fine in 1.x.
Paste the following code into your main function
float f = 97.09f; int tmp = (int) (f * 100.0f); Console.WriteLine(tmp); |
Output: 9708
Interesting eh? It gets more interesting!
float f = 97.09f; float tmp = f * 100.0f; Console.WriteLine(tmp); |
Output: 9709
This is very interesting when taken in context with the operation above. Let’s stop for a minute and think about what we said should happen. We told it to take f and multiply It by 100.0 storing the intermediate result as a floating point, and to then take that floating point and convert it to an integer. When we run the second example, we can see that if we do the operation as a floating point, it comes out correctly. So where is the disconnect?
Let’s try to explicitly tell the compiler what we want to do.
float f = 97.09f; f = (f * 100f); int tmp = (int)f; Console.WriteLine(tmp); |
Output: 9709 (with a debugger attached, 9708 without in release mode!!) DEBUG:PDBONLY (even with no debug information through advanced settings)
Wow this has become REALLY interesting. What on earth happened here?
Let’s look at some IL to get a better idea of what’s going on here.
.locals init ( [0] float32 single1, [1] float32 single2) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: stloc.1 L_000e: ldloc.1 L_000f: call void [mscorlib]System.Console::WriteLine(float32) L_0014: ret
|
This is our floating point example that prints the correct value (as a float) |
.locals init ( [0] float32 single1, [1] int32 num1) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: conv.i4 L_000e: stloc.1 L_000f: ldloc.1 L_0010: call void [mscorlib]System.Console::WriteLine(int32) L_0015: ret
|
This is our floating point example that came out wrong above |
.locals init ( [0] float32 single1, [1] int32 num1) L_0000: ldc.r4 97.09 L_0005: stloc.0 L_0006: ldloc.0 L_0007: ldc.r4 100 L_000c: mul L_000d: stloc.0 L_000e: ldloc.0 L_000f: conv.i4 L_0010: stloc.1 L_0011: ldloc.1 L_0012: call void [mscorlib]System.Console::WriteLine(int32) L_0017: ret
|
This is our floating point example that gets it right when debugger is attached but not without |
Interesting, the only significant difference between the one that never works and the one that does but only in when a debugger is attached is that the one that does work stores and then loads our value back onto the stack before issuing the conv.i4 on the value.
L_000c: mul
L_000d: stloc.0
L_000e: ldloc.0
L_000f: conv.i4
Basically these instructions are telling it to take the result from the multiplication (pop it off of the stack) and store them back into location0 which is our floating point variable. It then says to take that floating point variable and push it onto the stack so it can be used for the cast operation. This is probably something that should be handled for us (by the C# compiler) in the case of our first example so that it works as well as the 3rd example.
The “debugger/no debugger” problem is still our big problem though. The fact that JIT optimizations are changing behavior of identical IL is frankly kind of scary. My initial thought upon seeing the changes we just identified was that the operation was being optimized away by the JIT (storing and loading the same value on the stack seems like just the thing the JIT optimizer would be looking for) thus causing the problem.
The next step in tracking this down will be to look at the native code being generated.
Note: In order to do this you have to enable “Native Debugging” in Visual Studio.
00000000 push esi 00000001 sub esp,8 00000004 fld dword ptr ds:[00C400D0h] 0000000a fld dword ptr ds:[00C400D4h] 00000010 fmulp st(1),st 00000012 fstp qword ptr [esp] 00000015 fld qword ptr [esp] 00000018 fstp qword ptr [esp] 0000001b movsd xmm0,mmword ptr [esp] 00000020 cvttsd2si esi,xmm0 00000024 cmp dword ptr ds:[02271084h],0 0000002b jne 00000037 0000002d mov ecx,1 00000032 call 7870D79C 00000037 mov ecx,dword ptr ds:[02271084h] 0000003d mov edx,esi 0000003f mov eax,dword ptr [ecx] 00000041 call dword ptr [eax+000000BCh] 00000047 call 78776B48 0000004c mov ecx,eax 0000004e mov eax,dword ptr [ecx] 00000050 call dword ptr [eax+64h] 00000053 add esp,8 00000056 pop esi 00000057 ret
|
This is our native code when started without the debugger (attach to process when its running) 9708 |
00000000 push esi 00000001 sub esp,10h 00000004 mov dword ptr [esp],ecx 00000007 cmp dword ptr ds:[00918868h],0 0000000e je 00000015 00000010 call 79441146 00000015 fldz 00000017 fstp dword ptr [esp+4] 0000001b xor esi,esi 0000001d mov dword ptr [esp+4],42C22E14h 00000025 fld dword ptr ds:[00C51214h] 0000002b fmul dword ptr [esp+4] 0000002f fstp dword ptr [esp+4] 00000033 fld dword ptr [esp+4] 00000037 fstp qword ptr [esp+8] 0000003b movsd xmm0,mmword ptr [esp+8] 00000041 cvttsd2si eax,xmm0 00000045 mov esi,eax 00000047 mov ecx,esi 00000049 call 78767DE4 0000004e call 78767BBC 00000053 nop 00000054 nop 00000055 add esp,10h 00000058 pop esi 00000059 ret
|
This is our native code when started with the debugger 9709
(I am fairly certain this disables at least some forms of JIT optimizations) |
Unfortunately when looking at the native code it does not appear that this push/pop is being removed. I have to admit that I am very rusty on my assembly language but my uneducated guess here would be that the difference is being seen due to the change from dword values to qword values . In the version that does not work, the operation is being done on QWORD values, in the version that does work it is being done on DWORD values.
If we look we can see that in the working example, it is done in dwords; then changed to be a qword
0000002b fmul dword ptr [esp+4]
0000002f fstp dword ptr [esp+4]
00000033 fld dword ptr [esp+4]
00000037 fstp qword ptr [esp+8]
In the non-working example all operations are done with qwords
00000010 fmulp st(1),st
00000012 fstp qword ptr [esp]
00000015 fld qword ptr [esp]
00000018 fstp qword ptr [esp]
My (again uneducated) guess is that what is happening is that the higher precision of the qword is picking up a small residual causing the result to be off (just slightly i.e. 98.9999999997). This could easily cause the behavior being seen.
Basically this is not so much a bug, as it is an oddity. The CLR is treating floats internally (when its time to do calculations) as if they were float64s (I would imagine since context switching from floating point to MMX is kind slow?? (again not my area of specialty)). This can cause other issues as well if you have something in a register (fresh from a calculation) and something in memory as they are in different formats, the one in the register is still in a native 64bit format where as the memory one will get widenned to 64 bits in order to be compared (as such they will not be equal)...
Back to our first example .. you remember how it was missing the
L_000d: stloc.0
L_000e: ldloc.0
before the conversion to an integer? It is failing because it is using the 64 bit version of the float value (still in a register) that has not yet been converted back to a 32 bit version.
I took my best uneducated guess, hopefully someone smarter than I can come through here and either confirm what I have said or identify the real problem :)
update: I finally found a resource on this and it seems I am in the right ballpark http://blogs.msdn.com/davidnotario/archive/2005/08/08/449092.aspx
Another good question is, why is this doing anything at runtime :) Couldn't we multiply the two constants at compile time?