Recently a performance bug came my way.  A highly multithreaded application, that can run for hours depending on the amount of data its processing, was observed having all its CPUs ramping up to 100% utilization, and the amount of data processed per second dropped down to nothing.

Ok, no big deal here.  I've most likely got a state where all threads are stuck in a tight loop (most likely the same loop), and each thread is waiting on the other to set a flag that will allow them to exit the loop.    Your basic deadlock issue.  Pretty easy to fix, if I can reproduce the problem on my dev machine and use the debugger to tell me what the offending function is.

The problem is that I wasn’t able to reproduce it.  Crap.

Ok…on to step two…

Looks like I'll have to find or build a tool that can give me the call stack of all the threads in a managed app.  I started out trying to use System.Diagnostics.StackTrace, and StackFrame.  From a System.Thread instance I can get create a StackTrace object and see what function the thread is in.  But I cant get a list of all System.Thread objects in an app.  I have access to the Process.Threads collection, but that gives me a list of System.Diagnostics.ProcessThread objects, not System.Thread.  Shoot…that’s not going to work.

Ok, next step is to look at creating a really light weight debugger, from .Net's ICorDebug api, to basically break into the app and dump out the call stack of all the managed threads.  I found a couple examples and it didn’t look too bad, but the only issue is that ICorDebug is a COM API.  So I'd have to do all that fun C++ COM stuff…Ick.  And I need the tool yesterday.

After digging around a bit more I found out that the Visual Studio debugger team wrote a very nice managed wrapper around ICorDebug, called MDbg.  Sweet!

There is a bunch of info about it here.

After digging a bit further, I found that someone write a handy little tool called Managed Stack Explorer.  Oh geez!  The gods are smiling at me!  That’s exactly what I need.

This little tool shows all managed apps running on your server.  When you pick an app, it shows all threads in the process.  When you click the thread, it shows you the call stack for that thread.  Simple and nice.

With this tool, I was able to find the offending non-threadsafe function in about 5 minutes.  Fixed, done, yipee.

But this post and about someone's tool, of my bug fixing adventures.  No, its about coming across one of the most useful APIs I've seen in a long time!  A simple and well designed .Net wrapper around ICorDebug, giving .Net developers full access to the CLR debugger.  I'm very excited about the idea of a managed wrapper around ICorDebug.  There are so many diagnostic tools that could be created with this.  I'm looking forward to digging around in the API!