Unmanaged or managed memory leaks?

Its been a while since I posted something in this blog.

I've been dealing with a memory leak issue for the past week so I thought about sharing the experience I gained with you.

First of all, when it comes to memory management in .NET some people think that it shouldn't be trusted so GC.Collect is triggered everywhere they think it will help lowering the memory working set of their application. It's just plain wrong. I had very rare occasions where I needed to explicitly collect objects from the GC.

A good way of diagnosing memory problems is to use the tools available to you. DebugDiag, VMMap and WinDbg should suffice. DebugDiag and VMMap are quite easy to use while WinDbg needs some training before hand.

Things might not be what they seem to be :) In my case I was getting working sets that were about 1,4-1,7GB with .NET Heaps consuming only around 250-300MB. This can lead you into thinking it is an unmanaged leak because you have a lower .net heap memory usage than the process (unmanaged) heap. While it could be the sign of an unmanaged memory leak it can be in fact managed objects being pinned in the GC and keeping references to unmanaged data allocated by them. Example: Bitmaps.

If you take a look at how Bitmaps are structured (you can run !do <bitmapaddress>) you will see a property called nativeImage. This property is nothing more nothing less than a IntPtr that points to an address in the "unmanaged" heap. This is where it gets interesting. The unmanaged heap is keeping all the bitmap data allocated with Gdi+ (used under the hood by System.Drawing).

You might think that by looking at the output of !eeheap -gc you can see how much memory your .NET application is using but this will in fact tell you the total size in the .NET Heaps that your .NET objects consume. Not all your objects' values.

So in the Bitmap case, you will get very low instance size while they can be in fact keeping a great amount of memory in the unmanaged heap.

In case you are experiencing OutOfMemory exceptions or see that your app process is consuming a lot of virtual memory (VAC or <unclassified> - see !address -summary) and the #bytes in all heaps counter is a lot lower than that, this might help you:
  • Run DebugDiag and run a Memory Analysis test. 
    • Exclude the warnings about clr.dll. In my case I was getting warning about WindowsCodecs.dll doing a lot of zcalloc (WindowsCodecs!zcalloc). I actually tried to get the stack of the Heap Allocs that were leading to this.
  • What objects are being pinned? 
    • Use WinDbg to find out: !gchandles. If you see a big number of Bitmaps or System.Drawing.Internal.GPStream you are in big trouble :)
  • Why are those objects pinned? 
    • Use WinDbg to find out: !gcroot <objectaddress> and try to understand the dependency between objects in the output. In my case I had a tooltip keeping a reference to all my buttons (in a data navigator used widely in the application).
  • Make sure you Dispose the objects that are keeping your objects pinned.

Fighting Architecture and Design Erosion

It is no big news that whatever the architecture you plan and implement, soon or later you'll start to see some code "bad smells". In the early phase of implementation you might do some team training and explain how your developer team should write code using the planned architecture and design (patterns). Despite this effort, you cannot expect people to just follow your implementation recommendations every time.

At first, you might consider using gated-checkins or CI builds with VS code analysis. The problem with this approach is that you cannot accurately see what is changing or braking in your architecture so this is not enough.

The solution is: use an architecture and design visualization tool!

I am a total C# and .NET lover so the tool I would recommend and will talk about is NDepend.

This tool has direct integration with Visual Studio (2005, 2008 and 2010) and can be a great replacement for Visual Studio's integrated Code Analysis.


It not only allows you to graphically see your dependencies in code using a Dependency Matrix or Dependency Graph but also allows you to query your code using CQL (Code Query Language):

CQL - Code Query Language

Let's say you are implementing the MVP pattern (passive view) and don't want anyone referencing concrete view types in the presenters. You would just write the following CQL:

WARN IF Count > 0 IN SELECT TYPES FROM NAMESPACES "AgnosticApp.Presenters.*"
WHERE IsDirectlyUsing "AgnosticApp.Views.*"

The CQL language gives you an easy way of creating code validation rules and the Visual NDepend editor has Intellisense-like dropdown hints that appears while you're writing code.

The report you get gives you very detailed information regarding Dependencies, CQL Rules violations and Code Metrics (over 82 metrics). You can even define a baseline and get Diffs between analysis runs!

Ohh almost forgot! You can even integrate it with FinalBuilder!



Microsoft .NET Framework 4 Platform Update 1 - Runtime Update

It seems that Microsoft has made available a runtime update for the .Net Framework 4.0 that allows a developer to create State Machine Workflows (event-driven workflows). This is great news for everyone that had this feature available back in WF3. They are also providing support for usage of SQL Workflow Instance Store together with SQL Azure.



Visual Studio 2010 - ResGen.exe using 25% CPU, very slow Build

Recently I've found that one of our Hotfix Branches was taking a long time to compile (15min or more). This was strange because our main development branch was OK and compiling the same Visual Studio solution in under 3 minutes. The only difference was that in the hotfix branch we still had the project in .NET 2.0.

I saw right away that ResGen.exe (Resource Generator) was executing for a long time and consuming 25% CPU (I guess it would use more CPU if it was multi-threaded :))

The problem you face when you try to solve problems regarding compilation time is that it is very hard to see what is actually taking more time to compile because MSBuild.exe does not provide a complete and easy way to get time metrics.

Although you can call MSBuild in verbose mode  it does not tell you within each project what took longer to compile.

The technique I used to figure out what was going on was to use Process Monitor from Microsoft (made by Mark Russinovich, which I had the opportunity to meet at TechEd Europe last year, and Bryce Cogswell).

So I ran ProcessMonitor and filtered all the results by process name "ResGen.exe" and watched the log while the solution was compiling.

What I found was quite horrific :) ResGen was trying to repeatedly find .NET assemblies in specific versions that do not exist like System.Windows.Forms, Version= or System.Drawing, Version= I then had to get time differences between each RESX file read operation and so I realized what files were taking longer to generate the resources.

I had RESX files that were taking about 200 seconds to process.

I still do not know why the RESX files from my projects now and then get this malformed references but one thing I know for sure: if I replace the version attribute to the problem goes away and I get the normal compilation times for my solution.