<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>Alois Kraus</title>
        <link>http://geekswithblogs.net/akraus1/Default.aspx</link>
        <description>blog</description>
        <language>en-US</language>
        <copyright>Alois Kraus</copyright>
        <managingEditor>akraus1@gmx.de</managingEditor>
        <generator>Subtext Version 0.0.0.0</generator>
        <image>
            <title>Alois Kraus</title>
            <url>http://geekswithblogs.net/images/RSS2Image.gif</url>
            <link>http://geekswithblogs.net/akraus1/Default.aspx</link>
            <width>77</width>
            <height>60</height>
        </image>
        <item>
            <title>Using WPT For Batch Script Profiling</title>
            <link>http://geekswithblogs.net/akraus1/archive/2013/01/17/151871.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2013/01/17/151871.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2013/01/17/151871.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;In the process of digging deeper into &lt;a href="http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx"&gt;WPT&lt;/a&gt; I did find many new ways to explore my system. The Windows Performance Toolkit for Windows 8 is NOT only for Windows 8 but it does run on Windows Vista and later. This toolkit is the the Hubble Space Telescope of performance engineering. It gives you call stacks far far away in the kernel and back across  all processes of your system. But it is not only a tool for kernel hackers. You can use it also as an easy and free batch script profiler. It does not matter what you use (cmd, powershell, perl, ..) as long as your main work of the script does consist of calling one process after another.&lt;/p&gt;  &lt;p&gt;When you want to install the Windows Performance Toolkit you get more infos &lt;a href="http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx"&gt;here&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;To start profiling your system you only need to execute 4 steps.&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Open a command prompt and do &lt;strong&gt;&lt;em&gt;cd “C:\Program Files\Windows Kits\8.0\Windows Performance Toolkit”&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;Now type&lt;strong&gt;&lt;em&gt; xperf -on PROC_THREAD&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;Execute your script &lt;/li&gt;    &lt;li&gt;Stop profiling and save the trace file to the defautl location: &lt;strong&gt;&lt;em&gt;xperf –stop&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;View the profiling information: &lt;strong&gt;&lt;em&gt;wpa c:\kernel.etl&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;To show how the trace file does look like I have started several cmd.exe processes and stopped them to have something to graph. When I view the trace file with wpa.exe I do get this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/UsingWPTForBatchScriptProfiling_13AEE/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/UsingWPTForBatchScriptProfiling_13AEE/image_thumb.png" width="1872" height="871" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The graph shows on the x axis the time in seconds since you did start recording the trace. Every process start and stop with full command line information is recorded. The y-axis does contain the started processes so you can follow the timeline of your script and you can directly see which process has the longest bar which might be worth optimizing. I do show you this with xperf since for this very useful graph you do need to enable only the PROC_THREAD kernel provider. A list of the most useful providers is documented &lt;a href="http://blogs.technet.com/b/askperf/archive/2008/06/27/an-intro-to-xperf.aspx"&gt;here&lt;/a&gt;. This allows you to capture long traces with very small trace files (about 1MB/min). WPA can handle trace files with decent performance up to 1GB. The other recording tools of WPT are WPR and the gui WPRUI which do enable myriads of kernel providers which logs trace data with a rate of ca. 1 GB/min on a busy system. That is not a suitable setup for taking data for some hours or even minutes.&lt;/p&gt;  &lt;p&gt;It was never so easy to to benchmark your scripts as it is with WPT. The “old” xperf tool can do this stuff since many years but it was only known to driver developers. If you want to capture ETW events on a customer machine you do not need to install WPT on them for simple tasks. You can capture and save traces with logman.exe as well.&lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Start Profiling:            &lt;em&gt;&lt;strong&gt;logman create trace -ets "NT Kernel Logger" -o kernel_logman.etl -p "Windows Kernel Trace" (process)&lt;/strong&gt;&lt;/em&gt; &lt;/li&gt;    &lt;li&gt;Stop Profiling:            &lt;strong&gt;&lt;em&gt;logman stop -ets "NT Kernel Logger"&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;View Captured Data: &lt;strong&gt;&lt;em&gt;wpa kernel_logman.etl&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;This is the equivalent to our previous XPerf call. There seem to be many aliases built in into the tools which makes it a bit hard to find out what options I do actually need. But hey it does work. If you search for logman in your favorite search engine there are surprisingly few hits. It is time to change that. If it is so easy to profile your batch scripts these things should get more attention.&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151871.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2013/01/17/151871.aspx</guid>
            <pubDate>Thu, 17 Jan 2013 23:23:53 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151871.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2013/01/17/151871.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151871.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151871.aspx</trackback:ping>
        </item>
        <item>
            <title>How To Make Your System Slower With WMI</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/28/151662.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/28/151662.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/28/151662.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;With the new &lt;a href="http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx"&gt;Windows Performance Toolkit&lt;/a&gt; you can now identify bottlenecks in your system which did not show up in any profiler. Lets e.g. query for some system properties via WMI. &lt;/p&gt;  &lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
    {
        var sw = Stopwatch.StartNew();
        &lt;span class="kwrd"&gt;string&lt;/span&gt;[] queries = &lt;span class="kwrd"&gt;new&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt;[] { &lt;span class="str"&gt;"Win32_Processor"&lt;/span&gt;, &lt;span class="str"&gt;"Win32_Printer"&lt;/span&gt;, &lt;span class="str"&gt;"Win32_ComputerSystem"&lt;/span&gt;, &lt;span class="str"&gt;"Win32_VideoController"&lt;/span&gt;, &lt;span class="str"&gt;"Win32_PhysicalMemory"&lt;/span&gt;, &lt;span class="str"&gt;"Win32_LogicalDisk"&lt;/span&gt; };

        &lt;span class="kwrd"&gt;foreach&lt;/span&gt;(var wmiColl &lt;span class="kwrd"&gt;in&lt;/span&gt; queries.Select(x=&amp;gt; &lt;span class="kwrd"&gt;new&lt;/span&gt; ManagementObjectSearcher(&lt;span class="kwrd"&gt;new&lt;/span&gt; SelectQuery(x)))
                                      .Select(query =&amp;gt; query.Get()))
        {
            &lt;span class="kwrd"&gt;foreach&lt;/span&gt;(var prop &lt;span class="kwrd"&gt;in&lt;/span&gt; wmiColl.Cast&amp;lt;ManagementBaseObject&amp;gt;()
                                       .SelectMany(x=&amp;gt;x.Properties.Cast&amp;lt;PropertyData&amp;gt;()))
            {
                &lt;span class="rem"&gt;//Console.WriteLine("{0} - {1}", prop.Name, prop.Value);&lt;/span&gt;
            }
        }
        sw.Stop();
        Console.WriteLine(&lt;span class="str"&gt;"Time: {0:F1}s"&lt;/span&gt;, sw.Elapsed.TotalSeconds);
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[







.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;This takes around 1,6s on my machine. When I do look at my process (ConsoleApplication1.exe) the CPU consumption is rather low. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb.png" width="1379" height="343" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;So where is the time spent? It could be some disc IO happening during the queries. The check for disc IO we need the Disc Usage “Utilization by Process, IO Type” graph. As you can see for the specified time there was no disc IO at all happening. A flat line does not mean that there were no CreateFile/ReadFile calls at all. But if you call ReadFile on a file which contents are already cached by the OS reading from a file is only a matter of a CopyMemory call. This graph will show you only real disc accesses. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb_3.png" width="1389" height="259" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;In the CPU graph you can see that the ConsoleApplication1.exe was not the process with most CPU consumption. Most CPU cycles were spent in WmiPrvSE, spoolsv and svchost. So lets graph them as well. You can enable or disable items in the graph by left clicking on the color box or via the context menu. In the context menu you have Enable … or Disable … menus to enable or disable all or only the selected processes/threads, …&lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb_4.png" width="1376" height="523" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We see that although our process did only consume&lt;strong&gt; 2,8%&lt;/strong&gt; of CPU for all WMI queries we did max out one core with a WMI worker process (50% is one core for a dual core machine. If you have an 8 core machine one core would be 12,5%). In total we did consume &lt;strong&gt;45% CPU&lt;/strong&gt; time which means one core was fully utilized to process the WMI queries! At least this is the working hypothesis that our WMI queries did cause the load in the other processes. How can we verify this? First we can have a look at the call stacks in our process.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb_6.png" width="1088" height="678" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Here we do see nicely the full managed call stacks. &lt;strike&gt;At least with .NET 4.5. If you have &lt;/strike&gt;&lt;a href="http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/b7098a78-6600-456a-813e-79a84b994f32"&gt;&lt;strike&gt;.NET 4.0 you will only see only the first managed stack&lt;/strike&gt;&lt;/a&gt;&lt;strike&gt; frame and then nothing.&lt;/strike&gt; &lt;strike&gt;It is better than nothing but it should be improved for .NET 4.0 as well.&lt;/strike&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;To get full managed stacks it does work with .NET 4.0 as well for x86 processes. It does only fail when you want to profile 64 bit processes under Windows Vista or Windows 7. There is a bug in the stalk walking algorithm which does stop when it does encounter the first dynamically generated stack frame which address is not inside a loaded module. To workaround this issue you can NGen your assemblies to get full call stacks as well. This is a bit cumbersome to work with but it does work very nicely. The only issue is that truly dynamic code like WCF proxies and stuff like this will not work out as expected. &lt;/strong&gt;&lt;/p&gt;

  &lt;p&gt;&lt;strong&gt;The other fix to this issue is to use Windows 8.&lt;/strong&gt; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The ManagementObjectCollection does some COM calls to communicate with the WMI service where we do wait for a response from a COM object. When our thread gives up its time slice by calling some wait method the OS does schedule the next thread in the ready queue. This topic is covered in greater detail &lt;a href="http://msdn.microsoft.com/en-us/library/windows/desktop/jj679884.aspx"&gt;here&lt;/a&gt; which is part of the WPT docu. Here is a copy of the relevant part:&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Each thread exists in a particular execution state at any given time. Windows uses three states that are relevant to performance; these are: &lt;em&gt;Running&lt;/em&gt;, &lt;em&gt;Ready&lt;/em&gt;, and &lt;em&gt;Waiting&lt;/em&gt;.&lt;/p&gt;

  &lt;p&gt;Threads that are currently being executed are in the &lt;em&gt;Running&lt;/em&gt; state. Threads that can execute but are currently not running are in the &lt;em&gt;Ready&lt;/em&gt; state. Threads that cannot run because they are waiting for a particular event are in the &lt;em&gt;Waiting&lt;/em&gt; state.&lt;/p&gt;

  &lt;p&gt;A state to state transition is shown in Figure 1 Thread State Transitions:&lt;/p&gt;

  &lt;p&gt;&lt;img title="Figure 1 Thread State Transition" alt="Figure 1 Thread State Transition" src="http://i.msdn.microsoft.com/dynimg/IC620615.jpg" /&gt;&lt;/p&gt;

  &lt;p&gt;&lt;strong&gt;Figure 1 Thread State Transitions&lt;/strong&gt;&lt;/p&gt;

  &lt;p&gt;Figure 1 Thread State Transitions is explained as follows:&lt;/p&gt;

  &lt;ol&gt;
    &lt;li&gt;A thread in the Running state initiates a transition to the Waiting state by calling a wait function such as &lt;strong&gt;WaitForSingleObject&lt;/strong&gt; or &lt;strong&gt;Sleep(&amp;gt; 0)&lt;/strong&gt;. &lt;/li&gt;

    &lt;li&gt;A running thread or kernel operation readies a thread in the Waiting state (for example, &lt;strong&gt;SetEvent&lt;/strong&gt; or timer expiration). If a processor is idle or if the readied thread has a higher priority than a currently running thread, the readied thread can switch directly to the Running state. Otherwise, it is put into the Ready state. &lt;/li&gt;

    &lt;li&gt;A thread in the Ready state is scheduled for processing by the dispatcher when a running thread waits, yields &lt;strong&gt;(Sleep(0))&lt;/strong&gt;, or reaches the end of its quantum. &lt;/li&gt;

    &lt;li&gt;A thread in the Running state is switched out and placed into the Ready state by the dispatcher when it is preempted by a higher priority thread, yields &lt;strong&gt;(Sleep(0))&lt;/strong&gt;, or when its quantum ends. &lt;/li&gt;
  &lt;/ol&gt;

  &lt;p&gt;A thread that exists in the Waiting state does not necessarily indicate a performance problem. Most threads spend significant time in the Waiting state, which allows processors to enter idle states and save energy. Thread state becomes an important factor in performance only when a user is waiting for a thread to complete an operation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;This means that we can see where our thread was rescheduled by the OS to execute the next non waiting thread. To see the next executing thread you need to right click on the column header of the list and add &lt;strong&gt;&lt;em&gt;ReadyThreadStack&lt;/em&gt;&lt;/strong&gt;. This allows you to follow the trail of your thread from one process to the next process. It is not a 100% correlation but the chances are good that the next ready thread is the one that the OS has scheduled as next thread to execute now. Since the the WPT toolkit does record the call stacks after every context switch we can get a deep understanding how the threads do interact with each other. When we do follow now the ReadThreadStack until it does give up control by itself we do see that we did end up in svhost (1028). It would be interesting to see which system services were running inside it to confirm that our process has something to do with it.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb_5.png" width="1624" height="685" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately there is no column command line or running services available in this view. But there is a view called Processes “Lifetime By Process” which does give us the command line of all processes and the hosted services if any. This way we do see that our call did end up in the service which does host the WMI (Winmgmt) service which sounds reasonable. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_16.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WasThatMeHowToMakeYourSystemSlowerWithWM_E6B5/image_thumb_7.png" width="1606" height="419" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;When we do look into the call stacks of our WMI service host process we do find in the ReadyThreadStack many times our console application which is not surprising since it does need to send the results back to our process. There are only a few calls to WmiPrvSE.exe (1516) which do execute the WMI queries in some WMI worker process where the actual WMI provider objects are living. When we do inspect the WMI worker process more closely we will find that it does some work with svchost(700) which is responsible for DCOM calls. No we have a pretty good understanding what is actually going on.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;ConsoleApplication1 does issue some WMI query calls. &lt;/li&gt;

  &lt;li&gt;These calls are processed by the WMI service (svchost 1028) which does hand these calls to a WMI worker process &lt;/li&gt;

  &lt;li&gt;WmiPrvSE.exe (1516) does execute the actual queries. &lt;/li&gt;

  &lt;li&gt;The execution of the “Win32_Printer” query does cause some heavy load in the print spooling service (spoolsv 1648). &lt;/li&gt;

  &lt;li&gt;Some WMI call (the fetching of display device infos) requires the activation of some remote COM object which causes some load in the DCOM system service (svchost 700). &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From this we can conclude that our process did cause with a 2,8% CPU load in our process a total load of nearly 50% (that means one core was full utilized). The load in our process did cause a 15 times higher load in other processes in the system! &lt;/p&gt;

&lt;p&gt;I do really love this tool since it allows you to analyze cross process correlations which no other profiler can do I have encountered so far. It seems that the initial target audience of this tool were kernel developers but over the years XPerf has now reached the state of a full system wide profiler for managed and unmanaged applications. The managed part does work best with .NET 4.5 but you can get useful insights with .NET 4.0 already. Now we have found hat the WMI calls in our process were rather cheap but the whole system was under quite high load due to the WMI queries. Be careful to not overuse WMI since it might hurt overall system performance for no good reasons. &lt;/p&gt;

&lt;p&gt;There is good content in the &lt;a href="http://msdn.microsoft.com/en-us/library/windows/desktop/jj679884.aspx"&gt;WPT toolkit online&lt;/a&gt; to guide you to more CPU scenarios. I found the thread starvation analysis especially interesting. If you see in your thread high Ready (us) times then you have a problem because this means that your thread was sitting too long in the Thread Ready queue to be scheduled for execution. Now you can not only see that your thread was treated unfair by the OS but you can also find out which other thread did get in your way (could be another process, higher thread priority …). &lt;/p&gt;

&lt;p&gt;I am still shocked to see how many professional developers do not use profiling tools at all. I am not sure how they tackle tricky performance or memory issues. But I do think the answer is: They do not do it. If performance problems are tackled by “normal” devs only on demand by management because customers did complain about crappy performance they will only be able to iron out the big performance bottlenecks. The medium and small ones will be left as an exercise for the next dev. There may still be big issues in the product but they will not be able to analyze it down to the root cause. Part of the issue was that there was no good tool available to analyze managed and unmanaged code at the same time.&lt;/p&gt;

&lt;p&gt;The tool problem is gone now. &lt;/p&gt;

&lt;p&gt;There is no longer an excuse to say: I do not know how to dig deeper. &lt;/p&gt;

&lt;p&gt;But there is another excuse left: I do not know how to read kernel stacks…&lt;/p&gt;

&lt;p&gt;Now you can dig from your process and thread into the kernel and back to another process in another thread with a few clicks. Yes this tool is complex but this is our software we do write today as well. And there are times where all these nice abstractions of a managed execution environment, automated garbage collection, dynamic code generation, kernel mode transitions, OS thread scheduling need to become white boxes to see what is actually going on. Some problems are only solvable when you have such detailed information at hand. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151662.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/28/151662.aspx</guid>
            <pubDate>Sat, 29 Dec 2012 00:38:57 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151662.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/28/151662.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151662.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151662.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Code &amp;ndash; The Microsoft Way</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;A colleague of mine asked me how he can find out why his laptop does consume 30% CPU in the SYSTEM process after it did wake up from hibernate. That is easy I replied: Open Process Explorer select the SYSTEM process, double click it, select the thread tab, sort by CPU and double click on the first thread which does consume so much CPU to get the call stack. &lt;/p&gt;  &lt;p&gt;A few days later I got a mail back that it did not work. Ok could be some elevation problem. I did try it on my machine and sure enough access denied also here. Ups it is not so easy after all. A little googling around did lead to a very good &lt;a href="http://blogs.technet.com/b/markrussinovich/archive/2008/04/07/3031251.aspx"&gt;article&lt;/a&gt; from Mark Russinovich dealing with the exact same issue. Mark did use Kernrate which seems to work still on Windows 7 but it does look rather unsupported. A few other articles indicated that xperf should also be able to do it. So I did look where to get the latest edition of XPerf. It turns out that it is part of the &lt;a href="http://msdn.microsoft.com/en-us/windows/hardware/hh852363.aspx"&gt;Windows 8 SDK&lt;/a&gt; and the &lt;a href="http://www.microsoft.com/en-us/download/details.aspx?id=30652"&gt;Windows Assessment and Deployment Kit (ADK) für Windows® 8&lt;/a&gt;. You do not need to download GBs of stuff. Only select the Windows Performance Toolkit and uncheck all other stuff. That gives you a nice ~50MB download which will install it to&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;%PROGRAMFILES(X86)%\Windows Kits\8.0\Windows Performance Toolkit&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Besides the well known &lt;em&gt;Xperf&lt;/em&gt; tool I have written &lt;a href="http://geekswithblogs.net/akraus1/archive/2008/12/08/127747.aspx"&gt;several years ago&lt;/a&gt; some new stuff was added. Recording performance with the latest version of the Windows Performance Toolkit has never been so easy. There is a tool called &lt;strong&gt;&lt;em&gt;WPRUI.exe&lt;/em&gt;&lt;/strong&gt; which allows you to record a trace session with a nice GUI. An interesting side note is that this tool is missing in the Windows 8 SDK delivered with Visual Studio 2012. If you download the latest version again you get the GUI tool as well. It does seem to have full support from MS and they did even localize the UI which is rather unusual for such low level tools. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_thumb.png" width="678" height="523" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;When you select CPU, Disc and File IO you should be able to troubleshoot pretty much anything. The UI is easy to use and is even so nice to set some arcane registry key if you are using Windows 7 x64 to enable full stack walks in kernel drivers which seems to be broken otherwise. After you have saved the trace etl file you should cancel recording or you will be recording GB of additional data. &lt;/p&gt;  &lt;p&gt;Now you are ready to load the trace file into the viewer which is called &lt;strong&gt;&lt;em&gt;wpa.exe&lt;/em&gt;&lt;/strong&gt;. After a little rearranging the stuff you get a pretty good picture what VS is doing during the first startup which is called cold startup because no files are in the OS file system cache yet. That means much more disc accesses during the first startup compared to a second warm startup of VS. How much more? We will shortly see. Here comes the picture:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_thumb_1.png" width="1862" height="952" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Holy crap. What is all this? Well I have cut out most columns. This is already a simplified view ;-). On the left side you can select from the Graph Explorer many diagram types via drag and drop or context menu and add it to the Analysis pane. First I want to see what my CPU was doing by selecting &lt;em&gt;“CPU Usage (precise) Utilization by Process Thread”.&lt;/em&gt; Next I do filter only for devenv.exe in each graph. But wait: Did you say filter for process? Yes this tool gives you view of the complete system with all processes at once. It goes so far that you can even watch the OS scheduler doing its work. The call stacks with the official MS symbols can be resolved into the kernel and to all user mode processes. Lets say you call in one thread BeginInvoke and wait for some other event you can see in the CPU graph with the columns Old/New Thread Stacks to which other thread/process the OS scheduler has moved control.&lt;/p&gt;  &lt;p&gt;Besides CPU consumption disc IO is the most important metric to closely have a look at. For this I did select the graph &lt;em&gt;“Utilization by Process, IO Type”&lt;/em&gt; to see at which times the disc is maxed out. I can see the file names which were access and I can see how much time the OS did spend with the hard disc reading this file. The most important column there is Disc Service Time which gives you the timing at ns granularity. Here we do e.g. see that during the cold startup we did spend 0,355s loading the file System.Xml.ni.dll. When you click at the file you see in the graph at which points of time it was read. The bigger blue bars indicate the first and last read event. In the CPU Graph I did also get a lot of selected rows. Why is this? First I thought it was an error but it was of course a feature. &lt;/p&gt;  &lt;p&gt;Since we have a complete view of the system at our hands the tooling is able to follow all call stacks. This means that when I select the accessed file the tool does select all processes and threads that were blocked during one of the accesses to this file! Now I can find out why my mouse cursor does jump when I do start a very disc intensive application. Some years ago this stuff was unmanaged land only. But the latest version of it is also able to fully traverse managed call stacks now as you can see in the above call stack of the CPU graph. There you do see also a blue selected method called GetProfile which will obviously later lead to a blocking call in the call stack when trying to deserialize its data with XmlSerializer leading to reads to System.Xml.ni.dll.&lt;/p&gt;  &lt;p&gt;The managed call stacks for native images are retrieved by a matching pdb for each native image. When you do google for “ngen create pdb” you will find more info about this feature. It does look like a Windows 8 and .NET 4.5 feature but it is not. This does work on Windows 7 with .NET 4.0 perfectly well. The first trace collection will take a quite long time because for each native image is ngen createpdb called to store a synthetic pdb  at &lt;em&gt;C:\ProgramData\WindowsPerformanceRecorder&lt;/em&gt;. This little thingy allows it to fully walk managed and unmanaged call stacks. But what about assemblies that are not NGened? &lt;/p&gt;  &lt;p&gt;Good question. The short and good answer is: It does work. Below is a screen shot of a normal application I did create to read from the event log. You do see the Main method, the wrapped delegate call…. All is there. There is some stuff in the NGen images stored which is needed because I do get no such stack when I did not load the unmanaged and synthetic Ngen symbols before. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_thumb_2.png" width="890" height="189" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;My symbol path does therefore look like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_thumb_3.png" width="510" height="382" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;It does not include the MS symbol server because I have the most important symbols stored at C:\Windows\symbols already. This makes symbol loading a matter of seconds and not many minutes. PerfMonitor was a tool that was enabled mixed mode stack walking with ETW events first. You can read more at &lt;a href="http://naveensrinivasan.com/2010/05/23/get-managed-call-stacks-in-net-for-registry-access-using-etw/"&gt;Naveens&lt;/a&gt; blog. For the NGen pdb support there seems to be pretty much magic involved which includes some CLR Rundown ETW traces you can read more about at the official &lt;a href="http://msdn.microsoft.com/en-us/library/windows/desktop/hh448186.aspx"&gt;docs&lt;/a&gt; which makes this look like a Windows 8 feature again. Anyway. It does work on Windows 7 perfectly well and I do not care about the exact details. &lt;/p&gt;  &lt;p&gt;Now we can start VS 2012 a second time and compare CPU and disc utilization.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/4c28ad36cab6_11551/image_thumb_4.png" width="1455" height="770" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The second startup is much faster. It is only about 3s compared to 6s the first time. The warm vs cold startup performance is a mystery you cannot solve with most profilers which are commercially available. Now you can see all with a free tool. The disc utilization is practically zero this time. We do spend only 13ms compared to 2163ms during the second startup in accessing the disc. All other data comes from the OS cache which boils down to a memcpy when the application does call ReadFile. As you can see this view gives you not a API level view of the disc where we do read from “disc” when we call CreateFile and ReadFile to open and read from it. This gives you direct insights when the hard disc is shuffling data around. &lt;/p&gt;  &lt;p&gt;This tool has the potential to supersede many other profilers because you get a holistic view of the complete system with full managed call stack support now. &lt;/p&gt;  &lt;p&gt;There is much more to say about this fantastic tool. But I need some sleep now. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151604.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx</guid>
            <pubDate>Wed, 19 Dec 2012 23:50:35 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151604.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/19/151604.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151604.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151604.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Startup Of VS2012 &amp;ndash; SpeedTrace Profiler</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/02/151431.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/02/151431.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/02/151431.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://speedtracepro.com/product/features/analytic_components/"&gt;SpeedTrace&lt;/a&gt; is a relatively unknown profiler made a company called Ipcas. A single professional license does cost &lt;a href="http://speedtracepro.com/product/editions-prices/"&gt;449€+VAT&lt;/a&gt;. For the test I did use SpeedTrace 4.5 which is currently Beta. Although it is cheaper than dotTrace it has by far the most options to influence how profiling does work. &lt;/p&gt;  &lt;p&gt;First you need to create a tracing project which does configure tracing for one process type. You can start the application directly from the profiler or (much more interesting) it does attach to a specific process when it is started. For this you need to check “Trace the specified …” radio button and enter the process name in the “Process Name of the Trace” edit box. You can even selectively enable tracing for processes with a specific command line. Then you need to activate the trace project by pressing the Activate Project button and you are ready to start VS as usual. If you want to profile the next 10 VS instances that you start you can set the Number of Processes counter to e.g. 10. This is immensely helpful if you are trying to profile only the next 5 started processes. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb.png" width="1271" height="715" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;As you can see there are many more tabs which do allow to influence tracing in a much more sophisticated way. SpeedTrace is &lt;/p&gt;&lt;del&gt;the only&lt;/del&gt; a profiler which does not rely entirely on the profiling Api of .NET. Instead it does modify the IL code (instrumentation on the fly) to write tracing information to disc which can later be analyzed. This approach is not only very fast but it does give you unprecedented analysis capabilities. Once the traces are collected they do show up in your workspace where you can open the trace viewer.  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_2.png" width="1457" height="1026" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;I do skip the other windows because this view is the most useful one. You can sort the methods not only by Wall Clock time but also by CPU consumption and wait time which none of the other products support in their views at the same time. If you want to optimize for CPU consumption sort by CPU time. If you want to find out where most time is spent you need Clock Total time and Clock Waiting. There you can directly see if the method did take long because it did wait on something or it did really execute stuff that did take so long.&lt;/p&gt;  &lt;p&gt;Once you have found a method you want to drill deeper you can double click on a method to get to the Caller/Callee view which is similar to the JetBrains Method Grid view. But this time you do see much more. In the middle is the clicked method. Above are the methods that call you and below are the methods that you do directly call. Normally you would then start digging deeper to find the end of the chain where the slow method worth optimizing is located. But there is a shortcut. &lt;/p&gt;  &lt;p&gt;You can press the magic &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_4.png" width="34" height="26" /&gt;&lt;/a&gt;  button to calculate the aggregation of all called methods. This is displayed in the lower left window where you can see each method call and how long it did take. There you can also sort to see if this call stack does only contain methods (e.g. WCF connect calls which you cannot make faster) not worth optimizing. YourKit has a similar feature where it is called Callees List.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_3.png" width="1453" height="1023" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;In the Functions tab you have in the context menu also many other useful analysis options &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_5.png" width="406" height="268" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;One really outstanding feature is the View Call History Drilldown. When you select this one you get not a sum of all method invocations but a list with the duration of each method call. This is not surprising since SpeedTrace does use tracing to get its timings. There you can get many useful graphs how this method did behave over time. Did it become slower at some point in time or was only the first call slow? The diagrams and the list will tell you that. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_6.png" width="863" height="377" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_16.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_7.png" width="866" height="376" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_18.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_8.png" width="1028" height="446" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;That is all fine but what should I do when one method call was slow? I want to see from where it was coming from. No problem select the method in the list hit F10 and you get the call stack. This is a life saver if you e.g. search for serialization problems. Today Serializers are used everywhere. You want to find out from where the 5s XmlSerializer.Deserialize call did come from? Hit F10 and you get the call stack which did invoke the 5s Deserialize call. The CPU timeline tab is also useful to find out where long pauses or excessive CPU consumption did happen. Click in the graph to get the Thread Stacks window where you can get a quick overview what all threads were doing at this time. This does look like the Stack Traces feature in YourKit. Only this time you get the last called method first which helps to quickly see what all threads were executing at this moment. YourKit does generate a rather long list which can be hard to go through when you have many threads. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_22.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012SpeedtraceProfil_14A85/image_thumb_10.png" width="1455" height="1020" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;The thread list in the middle does not give you call stacks or anything like that but you see which methods were found most often executing code by the profiler which is a good indication for methods consuming most CPU time. &lt;/p&gt;  &lt;p&gt;This does sound too good to be true? I have not told you the best part yet. The best thing about this profiler is the staff behind it. When I do see a crash or some other odd behavior I send a mail to Ipcas and I do get usually the next day a mail that the problem has been fixed and a download link to the new version. The guys at Ipcas are even so helpful to log in to your machine via a Citrix Client to help you to get started profiling your actual application you want to profile. After a 2h telco I was converted from a hater to a believer of this tool. The fast response time might also have something to do with the fact that they are actively working on 4.5 to get out of the door. But still the support is by far the best I have encountered so far. &lt;/p&gt;  &lt;p&gt;The only downside is that you should instrument your assemblies including the .NET Framework to get most accurate numbers. You can profile without doing it but then you will see very high JIT times in your process which can severely affect the correctness of the measured timings. If you do not care about exact numbers you can also enable in the main UI in the Data Trace tab logging of method arguments of primitive types. If you need to know what files at which times were opened by your application you can find it out without a debugger. &lt;/p&gt;  &lt;p&gt;Since SpeedTrace does read huge trace files in its reader you should perhaps use a 64 bit machine to be able to analyze bigger traces as well. The memory consumption of the trace reader is too high for my taste. But they did promise for the next version to come up with something much improved.&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151431.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/02/151431.aspx</guid>
            <pubDate>Sun, 02 Dec 2012 16:17:02 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151431.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/02/151431.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151431.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151431.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Startup Of VS2012 &amp;ndash; JustTrace Profiler</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/02/151430.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/02/151430.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/02/151430.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.telerik.com/products/memory-performance-profiler/justtrace_features.aspx"&gt;JustTrace&lt;/a&gt; is made by Telerik which is mainly known for its collection of UI controls. The current version (2012.3.1127.0) does include a performance and memory profiler which does cost 614€ and is currently with a special offer &lt;a href="http://www.telerik.com/purchase/individual-justtrace.aspx"&gt;for 306€ on sale&lt;/a&gt;. It does include one year of free upgrades. The uneven € numbers are calculated from the 799€ and 399$ discount price. The UI is already in Metro style and simple to use. Multi process, &lt;/p&gt;&lt;del&gt;attach&lt;/del&gt;, &lt;strong&gt;Update (attach is supported when you select as Profilee type “Running Application” in sampling mode) &lt;/strong&gt;method recording filter are not supported.   &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb.png" width="622" height="455" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_1.png" width="471" height="383" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;It looks like JustTrace is like Ants a Just My Code profiler by default. &lt;/p&gt;&lt;del&gt;At least I have not (yet) found an option to see all methods.&lt;/del&gt; &lt;strong&gt;Update:&lt;/strong&gt; &lt;strong&gt;There is a button there as well. I did not press it since it did look disabled to me. I guess I just too old to spot the difference between disabled and not enabled. You need to press the Minor Methods button to see all methods. Now I do see much more but unfortunately I still cannot sort by total time which is strange for a profiler. In the All Threads window there is a Total Time column but I do not get it why I do not have it in the All Methods view. &lt;/strong&gt;For stuff where you do not have the pdbs or you want to dig deeper into the BCL code you will not get far. After getting the profile data you get in the All Methods grid a plain list with hit count and own time. The method list for all methods is also suspiciously short which is a clear sign that you will not get far during the analysis of foreign code. It looks like this is intentional because I cannot configure the displayed columns where I get at least the total time as well.   &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_2.png" width="1490" height="709" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;But at least there is also a memory profiler included. For this I have to choose in the first window for Profiling Type “Memory Profiler” to check the memory consumption of VS.  There are some interesting number to see but I do really miss from YourKit the thread stack window. How am I supposed to get a clue when much memory is allocated and the CPU consumption is high in which places I should look? &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/clip_image001_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="clip_image001" border="0" alt="clip_image001" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/clip_image001_thumb.png" width="1491" height="642" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The Snapshot summary gives a rough overview which is ok for a first impression.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_3.png" width="1491" height="630" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;Next is Assemblies? This gives you a list of all loaded assemblies. Not terribly useful.&lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_4.png" width="1187" height="497" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The By Type view gives you exactly what it is supposed to do. You have to keep in mind that this list is filtered by the types you did check in the Assemblies list.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_5.png" width="1489" height="687" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The By Type instance list does only show types from assemblies which do not originate from Microsoft. By default mscorlib and System are not checked. That is the reason why for the first time my By Type window looked like&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_18.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_8.png" width="1382" height="359" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The idea behind this feature is to show only your instances because you are ultimately responsible for the overall memory consumption. I am not sure if I do like this feature because by default it does hide too much. I do want to see at least how many strings and arrays are allocated. A simple namespace filter would also do it in my opinion. Now you can examine all string instances and look who in the object graph does keep a reference on them. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_6.png" width="876" height="536" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_16.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012JustTraceProfile_C974/image_thumb_7.png" width="856" height="547" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;That is nice but YourKit has the big plus that you can also look into the string contents.  I am also not sure how in the graph cycles are visualized and what will happen if you have thousands of objects referencing you. That's pretty much it about JustTrace. It can help the average developer to pinpoint performance and memory issues by just looking at his own code and instances. Showing them more will not help them because the sheer amount of information will overwhelm them. And you need to have a pretty good understanding how the GC and the CLR does work. When you have a performance issue at a customer machine it is sometimes very helpful to be able a bring a profiler onto the machine (no pdbs, …) and to get a full snapshot of all processes which are in the problematic use case involved. For these more advanced use cased JustTrace is certainly the wrong tool. &lt;/p&gt;  &lt;p&gt;Next: SpeedTrace&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151430.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/02/151430.aspx</guid>
            <pubDate>Sun, 02 Dec 2012 15:25:26 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151430.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/02/151430.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151430.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151430.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Startup Of VS2012 &amp;ndash; YourKit Profiler</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/02/151429.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/02/151429.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/02/151429.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;  &lt;p&gt;The &lt;a href="http://www.yourkit.com/dotnet/features/index.jsp"&gt;YourKit&lt;/a&gt; (v7.0.5) profiler is interesting in terms of &lt;a href="http://www.yourkit.com/dotnet/purchase/index.jsp"&gt;price&lt;/a&gt; (79€ single place license, 409€ + 1 year support and upgrades) and feature set. You do get a performance and memory profiler in one package for which you normally need also to pay extra from the other vendors. As an interesting side note the profiler UI is written in Java because they do also sell Java profilers with the same feature set. To get all methods of a VS startup you need first to configure it to include System* in the profiled methods and you need to configure * to measure wall clock time. By default it does record only CPU times which allows you to optimize CPU hungry operations. But you will never see a Thread.Sleep(10000) in the profiler blocking the UI in this mode. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb.png" width="521" height="149" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_1.png" width="462" height="116" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;It can profile as all others processes started from within the profiler but it can also profile the next or all started processes.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_2.png" width="264" height="178" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_8.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_3.png" width="437" height="323" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;As usual it can profile in sampling and tracing mode. But since it is a memory profiler as well it does by default also record all object allocations &amp;gt; 1MB. With allocation recording enabled VS2012 did crash but without allocation recording there were no problems. The CPU tab contains the time line of the application and when you click in the graph you the call stacks of all threads at this time. This is really a nice feature. When you select a time region you the CPU Usage estimation for this time window.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_12.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_5.png" width="1386" height="803" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;I have seen many applications consuming 100% CPU only because they did create garbage like crazy. For this is the Garbage Collection tab interesting in conjunction with a time range.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_14.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_6.png" width="1192" height="652" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This view is like the CPU table only that the CPU graph (green) is missing. All relevant information except for GCs/s is already visible in the CPU tab. Very handy to pinpoint excessive GC or CPU bound issues.&lt;/p&gt;  &lt;p&gt;The Threads tab does show the thread names and their lifetime. This is useful to see thread interactions or which thread is hottest in terms of CPU consumption.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_28.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_13.png" width="1385" height="850" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;On the CPU tab the call tree does exist in a merged and thread specific view. When you click on a method you get below a list of all called methods. There you can sort for methods with a high own time which are worth optimizing.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_16.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_7.png" width="1432" height="853" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In the Method List you can select which scope you want to see. Back Traces are the methods which did call you.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_18.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_8.png" width="1213" height="273" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Callees ist the list of methods called directly or indirectly by your method as a flat list. This is not a call stack but still very useful to see which methods were slow so you can see the “root” cause quite quickly without the need to click trough long call stacks. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_20.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_9.png" width="1220" height="334" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The last view Merged Calles is a call stacked view of the previous view. This does help a lot to understand did call each method at run time. You would get the same view with a debugger for one call invocation but here you get the full statistics (invocation count) as well. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_22.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_10.png" width="1215" height="274" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Since YourKit is also a memory profiler you can directly see which objects you have on your managed heap and which objects do hold most of your precious memory. You can in in the Object Explorer view also examine the contents of your objects (strings or whatsoever) to get a better understanding which objects where potentially allocating this stuff. &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_24.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_11.png" width="1235" height="858" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;YourKit is a very easy to use combined memory and performance profiler in one product. The unbeatable single license price makes it very attractive to straightly buy it. Although it is a Java UI it is very responsive and the memory consumption is considerably lower compared to dotTrace and ANTS profiler. What I do really like is to start the YourKit ui and then start the processes I want to profile as usual. There is no need to alter your own application code to be able to inject a profiler into your new started processes. For performance and memory profiling you can simply select the process you want to investigate from the list of started processes. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_26.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012YourKitProfiler_AABA/image_thumb_12.png" width="702" height="261" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;That's the way I like to use profilers. Just get out of the way and let the application run without any special preparations. &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;Next: Telerik JustTrace&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151429.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/02/151429.aspx</guid>
            <pubDate>Sun, 02 Dec 2012 13:14:04 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151429.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/02/151429.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151429.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151429.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Startup Of VS2012 &amp;ndash; dotTrace Profiler</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/12/01/151413.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/12/01/151413.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/12/01/151413.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.jetbrains.com/profiler/"&gt;Jetbrains&lt;/a&gt; which is famous for the Resharper tool has also a profiler in its portfolio. I downloaded dotTrace 5.2 Professional (&lt;a href="http://www.jetbrains.com/profiler/buy/index.jsp"&gt;569€+VAT&lt;/a&gt;) to check how far I can profile the startup of VS2012. The most interesting startup option is &lt;strong&gt;“.NET Process”.&lt;/strong&gt; With that you can profile the next started .NET process which is very useful if you want to profile an application which is not started by you. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb.png" width="329" height="409" /&gt;&lt;/a&gt; &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb_6.png" width="644" height="423" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;I did select Tracing as and Wall time to get similar options across all profilers. For some reason the attach option did not work with .NET 4.5 on my home machine. But I am sure that it did work with .NET 4.0 some time ago. Since we are profiling devenv.exe we can also select &lt;strong&gt;“Standalone Application”&lt;/strong&gt; and start it from the profiler. The startup time of VS does increase about a factor 3 but that is ok. You get mainly three windows to work with. The first one shows the threads where you can drill down thread wise where most time is spent. &lt;/p&gt;  &lt;p&gt;I &lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb_2.png" width="1334" height="676" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The next window is the call tree which does merge all threads together in a similar view.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_16.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb_7.png" width="1343" height="812" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The last and most useful view in my opinion is the Plain List window which is nearly the same as the Method Grid in Ants Profiler. But this time we do get when I enable the Show system functions checkbox not a 150 but 19407 methods to choose from! I really tried with Ants Profiler to find something about out how VS does work but look how much we were missing!&lt;/p&gt;  &lt;p&gt;When I double click on a method I do get in the lower pane the called methods and their respective timings. This is something really useful and I can nicely drill down to the most important stuff. The measured time seems to be Wall Clock time which is a good thing to see where my time is really spent. You can also use Sampling as profiling method but this does give you much less information. Except for getting a first idea where to look first this profiling mode is not very useful to understand how you system does interact. &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb_3.png" width="1337" height="920" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The options have a good list of presets to hide by default many method and gray them out to concentrate on your code. It does not filter anything out if you enable Show system functions. By default methods from these assemblies are hidden or if the checkbox is checked grayed out.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012dotTraceProfiler_136F1/image_thumb_4.png" width="644" height="480" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;All in all JetBrains has made a nice profiler which does show great detail and it has nice drill down capabilities. The only thing is that I do not trust its measured timings. I did fall several times into the trap with this one to optimize at places which were already fast but the profiler did show high times in these methods. After measuring with Tracing I was certain that the measured times were greatly exaggerated. Especially when IO is involved it seems to have a hard time to subtract its own overhead. What I did miss most was the possibility to profile not only the next started process but to be able to select a process by name and perhaps a count to profile the next n processes of this name. &lt;/p&gt;  &lt;p&gt;Next: YourKit&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151413.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/12/01/151413.aspx</guid>
            <pubDate>Sat, 01 Dec 2012 21:12:13 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151413.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/12/01/151413.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151413.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151413.aspx</trackback:ping>
        </item>
        <item>
            <title>Profiling Startup Of VS2012 &amp;ndash; Ants Profiler</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/11/30/151409.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/11/30/151409.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/11/30/151409.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;I just downloaded &lt;a href="http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/"&gt;ANTS Profiler&lt;/a&gt; 7.4 to check how fast it is and how deep I can analyze the startup of Visual Studio 2012. The Pro version which is useful does cost &lt;a href="http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/pricing"&gt;445€&lt;/a&gt; which is ok. To measure a complex system I decided to simply profile VS2012 (Update 1) on my older Intel 6600 2,4GHz with 3 GB RAM and a 32 bit Windows 7. Ants Profiler is really easy to use. So lets try it out. &lt;/p&gt;  &lt;p&gt;The Ants Profiler does want to start the profiled application by its own which seems to be rather common. I did choose Method Level timing of all managed methods. In the configuration menu I did want to get all call stacks to get full details. Once this is configured you are ready to go. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_thumb.png" width="689" height="534" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_thumb_1.png" width="646" height="316" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;After that you can select the Method Grid to view Wall Clock Time in ms. I hate percentages which are on by default because I do want to look where absolute time is spent and not something else. &lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_8.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_thumb_3.png" width="1158" height="1030" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;From the Method Grid I can drill down to see where time is spent in a nice and I can look at the decompiled methods where the time is spent.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_thumb_5.png" width="1386" height="1029" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;This does really look nice. But did you see the size of the scroll bar in the method grid? Although I wanted all call stacks I do get only about 4 pages of methods to drill down. From the scroll bar count I would guess that the profiler does show me about 150 methods for the complete VS startup. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I did miss to uncheck the checkbox Hide Insignificant methods. Sorry about that one. After that I do get a reasonable method list. Yes I am aware that I can attach to a process later but then I do get only sampled call stacks and not full details like &lt;/p&gt;  &lt;p&gt;with tracing which is much better.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_9.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/ProfilingStartupOfVS2012AntsProfiler_12EAF/image_thumb_2.png" width="1375" height="517" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;&lt;del&gt;This is nonsense. I will never find a bottleneck in VS when I am presented only a fraction of the methods that were actually executed. I have also tried in the configuration window to also profile the extremely trivial functions but there was no noticeable difference. It seems that the Ants Profiler does filter away way too many details to be useful for bigger systems.&lt;/del&gt; If you want to optimize a CPU bound operation inside NUnit then Ants Profiler is with its line level timings a very nice tool to work with. But for bigger stuff it is certainly not usable. I also do not like that I must start the profiled application from the profiler UI. This makes it hard to profile processes which are started by some other process.  &lt;p&gt;Next: JetBrains dotTrace&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151409.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/11/30/151409.aspx</guid>
            <pubDate>Fri, 30 Nov 2012 23:33:55 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151409.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/11/30/151409.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151409.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151409.aspx</trackback:ping>
        </item>
        <item>
            <title>Performance Optimization &amp;ndash; It Is Faster When You Can Measure It</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/11/30/151407.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/11/30/151407.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/11/30/151407.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="956"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="204"&gt;&lt;strong&gt;Time Measurement Method&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="750"&gt;&lt;strong&gt;Potential Pitfalls&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="207"&gt;Stopwatch&lt;/td&gt;        &lt;td valign="top" width="747"&gt;Most accurate method on recent processors. Internally it uses the &lt;a href="http://en.wikipedia.org/wiki/Time_Stamp_Counter"&gt;RDTSC&lt;/a&gt; instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily:           &lt;br /&gt;          &lt;p&gt;Intel's Designer's vol3b, section 16.11.1&lt;/p&gt;          &lt;blockquote&gt;           &lt;p&gt;"16.11.1 Invariant TSC &lt;/p&gt;            &lt;p&gt;The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. &lt;/p&gt;            &lt;p&gt;The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."&lt;/p&gt;         &lt;/blockquote&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="210"&gt;DateTime.Now&lt;/td&gt;        &lt;td valign="top" width="745"&gt;Good but it has only a resolution of 16ms which can be not enough if you want more accuracy.&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt; &lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="961"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="205"&gt;&lt;strong&gt;Reporting Method&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="754"&gt;&lt;strong&gt;Potential Pitfalls&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="209"&gt;Console.WriteLine&lt;/td&gt;        &lt;td valign="top" width="751"&gt;Ok if not called too often.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="212"&gt;Debug.Print&lt;/td&gt;        &lt;td valign="top" width="748"&gt;Are you really measuring performance with Debug Builds? Shame on you.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="215"&gt;Trace.WriteLine&lt;/td&gt;        &lt;td valign="top" width="746"&gt;Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. &lt;/p&gt;  &lt;p&gt;If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. &lt;/p&gt;  &lt;p&gt;On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. &lt;/p&gt;  &lt;p&gt;These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Any measurement will disturb the system in a non predictable way. &lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;     &lt;h5&gt;Measure with a profiler to get an idea where potential bottlenecks are. &lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Measure again with tracing only the specific methods to check if this method is really worth optimizing.&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Optimize it&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Measure again.&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;&lt;em&gt;Be surprised that your optimization has made things worse.&lt;/em&gt;&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Think harder&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Implement something that really works.&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Measure again&lt;/h5&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;h5&gt;Finished! - Or look for the next bottleneck.&lt;/h5&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found &lt;a href="http://code.google.com/p/protobuf-net/"&gt;protobuf-net&lt;/a&gt; which uses &lt;a href="http://en.wikipedia.org/wiki/Protocol_Buffers"&gt;Googles Protocol Buffer&lt;/a&gt; format which is a compact binary serialization format. What is good for Google should be good for us. &lt;/p&gt;  &lt;p&gt;A small sample app to check out performance was a matter of minutes:&lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;using&lt;/span&gt; ProtoBuf;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Diagnostics;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.IO;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Reflection;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Runtime.Serialization;

[DataContract, Serializable]
&lt;span class="kwrd"&gt;class&lt;/span&gt; Data
{
    [DataMember(Order=1)]
    &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;int&lt;/span&gt; IntValue { get; set; }

    [DataMember(Order = 2)]
    &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt; StringValue { get; set; }

    [DataMember(Order = 3)]
    &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; IsActivated { get; set; }

    [DataMember(Order = 4)]
    &lt;span class="kwrd"&gt;public&lt;/span&gt; BindingFlags Flags { get; set; }
}

&lt;span class="kwrd"&gt;class&lt;/span&gt; Program
{
    &lt;span class="kwrd"&gt;static&lt;/span&gt; MemoryStream _Stream = &lt;span class="kwrd"&gt;new&lt;/span&gt; MemoryStream();

    &lt;span class="kwrd"&gt;static&lt;/span&gt; MemoryStream Stream 
    {
        get 
        {
            _Stream.Position = 0;
            _Stream.SetLength(0);
            &lt;span class="kwrd"&gt;return&lt;/span&gt; _Stream;
        }
    }

    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
    {
        DataContractSerializer ser = &lt;span class="kwrd"&gt;new&lt;/span&gt; DataContractSerializer(&lt;span class="kwrd"&gt;typeof&lt;/span&gt;(Data));
        Data data = &lt;span class="kwrd"&gt;new&lt;/span&gt; Data 
        { 
            IntValue = 100, 
            IsActivated = &lt;span class="kwrd"&gt;true&lt;/span&gt;, 
            StringValue = &lt;span class="str"&gt;"Hi this is a small string value to check if serialization does work as expected"&lt;/span&gt; 
        };

        var sw = Stopwatch.StartNew();
        &lt;span class="kwrd"&gt;int&lt;/span&gt; Runs = 1000 * 1000;

        &lt;span class="kwrd"&gt;for&lt;/span&gt; (&lt;span class="kwrd"&gt;int&lt;/span&gt; i = 0; i &amp;lt; Runs; i++)
        {
            &lt;span class="rem"&gt;//ser.WriteObject(Stream, data);&lt;/span&gt;
            Serializer.Serialize&amp;lt;Data&amp;gt;(Stream, data);
        }
        sw.Stop();
        Console.WriteLine(&lt;span class="str"&gt;"Did take {0:N0}ms for {1:N0} objects"&lt;/span&gt;, sw.Elapsed.TotalMilliseconds, Runs);
        Console.ReadLine();
    }
}&lt;/pre&gt;

&lt;p align="left"&gt;&lt;/p&gt;&lt;style type="text/css"&gt;&lt;![CDATA[



















.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;The results are indeed promising: 

&lt;div&gt;
  &lt;table border="1" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;
      &lt;tr&gt;
        &lt;td valign="top" width="146"&gt;Serializer&lt;/td&gt;

        &lt;td valign="top" width="132"&gt;Time in ms&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;N objects&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="147"&gt;protobuf-net&lt;/td&gt;

        &lt;td valign="top" width="131"&gt;  807&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;1000000&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="148"&gt;DataContract&lt;/td&gt;

        &lt;td valign="top" width="131"&gt;4402&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;1000000&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p align="left"&gt;Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1.&lt;/p&gt;

&lt;div&gt;
  &lt;table border="1" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;
      &lt;tr&gt;
        &lt;td valign="top" width="146"&gt;Serializer&lt;/td&gt;

        &lt;td valign="top" width="132"&gt;Time in ms&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;N objects&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="147"&gt;protobuf-net&lt;/td&gt;

        &lt;td valign="top" width="131"&gt;85&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;1&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="148"&gt;DataContract&lt;/td&gt;

        &lt;td valign="top" width="131"&gt;24&lt;/td&gt;

        &lt;td valign="top" width="120"&gt;1&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative.&lt;/p&gt;

&lt;p&gt;It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is &lt;a href="http://speedtracepro.com/product/features/analytic_components/"&gt;SpeedTrace&lt;/a&gt; which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special.&lt;/p&gt;

&lt;p&gt;Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok.  &lt;/p&gt;

&lt;p&gt;When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. &lt;/p&gt;

&lt;p&gt;Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. &lt;/p&gt;

&lt;p&gt;When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again.&lt;/p&gt;

&lt;p&gt;As Vance Morrison does point out it is &lt;a href="http://blogs.msdn.com/b/vancem/archive/2012/11/26/wall-clock-time-analysis-using-perfview.aspx"&gt;much more complex to profile a system&lt;/a&gt; against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. &lt;/p&gt;

&lt;p&gt;Next time: Commercial profiler shootout.&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151407.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/11/30/151407.aspx</guid>
            <pubDate>Fri, 30 Nov 2012 20:27:40 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151407.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/11/30/151407.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151407.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151407.aspx</trackback:ping>
        </item>
        <item>
            <title>Do Not Optimize Without Measuring</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/10/27/151086.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/10/27/151086.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/10/27/151086.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Recently I had to do some performance work which included reading a lot of code. It is fascinating with what ideas people come up to solve a problem. Especially when there is no problem. When you look at other peoples code you will not be able to tell if it is well performing or not by reading it. You need to execute it with some sort of tracing or even better under a profiler. The first rule of the performance club is not to think and then to optimize but to &lt;strong&gt;measure&lt;/strong&gt;, think and then optimize. The second rule is to do this do this in a loop to prevent slipping in bad things for too long into your code base. If you skip for some reason the measure step and optimize directly it is like changing the &lt;a href="http://en.wikipedia.org/wiki/Wave_function_collapse"&gt;wave function&lt;/a&gt; in quantum mechanics. This has no observable effect in our world since it does represent only a probability distribution of all possible values. In quantum mechanics you need to let the wave function collapse to a single value. A collapsed wave function has therefore not many but one distinct value. This is what we physicists call a measurement. &lt;/p&gt;  &lt;p&gt;If you optimize your application without measuring it you are just changing the probability distribution of your potential performance values. Which performance your application actually has is still unknown. You only know that it will be within a specific range with a certain probability. As usual there are unlikely values within your distribution like a startup time of 20 minutes which should only happen once in 100 000 years. 100 000 years are a very short time when the first customer tries your heavily distributed networking application to run over a slow WIFI network…&lt;/p&gt;  &lt;p&gt;What is the point of this? Every programmer/architect has a mental performance model in his head. A model has always a set of explicit preconditions and a lot more implicit assumptions baked into it. When the model is good it will help you to think of good designs but it can also be the source of problems. In real world systems not all assumptions of your performance model (implicit or explicit) hold true any longer. The only way to connect your performance model and the real world is to measure it. In the WIFI example the model did assume a low latency high bandwidth LAN connection. If this assumption becomes wrong the system did have a drastic change in startup time. &lt;/p&gt;  &lt;p&gt;Lets look at a example. Lets assume we want to cache some expensive UI resource like fonts objects. For this undertaking we do create a Cache class with the UI themes we want to support. Since Fonts are expensive objects we do create it on demand the first time the theme is requested. &lt;/p&gt;  &lt;p&gt;A simple example of a Theme cache might look like this:&lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;using&lt;/span&gt; System;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Collections.Generic;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Drawing;

&lt;span class="kwrd"&gt;struct&lt;/span&gt; Theme
{
    &lt;span class="kwrd"&gt;public&lt;/span&gt; Color Color;
    &lt;span class="kwrd"&gt;public&lt;/span&gt; Font Font;
}

&lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;class&lt;/span&gt; ThemeCache
{
    &lt;span class="kwrd"&gt;static&lt;/span&gt; Dictionary&amp;lt;&lt;span class="kwrd"&gt;string&lt;/span&gt;, Theme&amp;gt; _Cache = &lt;span class="kwrd"&gt;new&lt;/span&gt; Dictionary&amp;lt;&lt;span class="kwrd"&gt;string&lt;/span&gt;, Theme&amp;gt;
    {
        {&lt;span class="str"&gt;"Default"&lt;/span&gt;, &lt;span class="kwrd"&gt;new&lt;/span&gt; Theme { Color = Color.AliceBlue }},
        {&lt;span class="str"&gt;"Theme12"&lt;/span&gt;, &lt;span class="kwrd"&gt;new&lt;/span&gt; Theme { Color = Color.Aqua }},
    };

    &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; Theme Get(&lt;span class="kwrd"&gt;string&lt;/span&gt; theme)
    {
        Theme cached = _Cache[theme];
        &lt;span class="kwrd"&gt;if&lt;/span&gt; (cached.Font == &lt;span class="kwrd"&gt;null&lt;/span&gt;)
        {
            Console.WriteLine(&lt;span class="str"&gt;"Creating new font"&lt;/span&gt;);
            cached.Font = &lt;span class="kwrd"&gt;new&lt;/span&gt; Font(&lt;span class="str"&gt;"Arial"&lt;/span&gt;, 8);
        }

        &lt;span class="kwrd"&gt;return&lt;/span&gt; cached;
    }

}

&lt;span class="kwrd"&gt;class&lt;/span&gt; Program
{
    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
    {
        Theme item = ThemeCache.Get(&lt;span class="str"&gt;"Theme12"&lt;/span&gt;);
        item = ThemeCache.Get(&lt;span class="str"&gt;"Theme12"&lt;/span&gt;);
    }
}&lt;/pre&gt;

&lt;p&gt;This cache does create font objects only once since on first retrieve of the Theme object the font is added to the Theme object. When we let the application run it should print “Creating new font” only once. Right? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The vigilant readers have spotted the issue already. The creator of this cache class wanted to get maximum performance. So he decided that the Theme object should be a value type (struct) to not put too much pressure on the garbage collector. &lt;/p&gt;

&lt;p&gt;The code&lt;/p&gt;

&lt;pre class="csharpcode"&gt;        &lt;strong&gt;Theme cached = _Cache[theme];&lt;/strong&gt;
        &lt;span class="kwrd"&gt;if&lt;/span&gt; (cached.Font == &lt;span class="kwrd"&gt;null&lt;/span&gt;)
        {
            Console.WriteLine(&lt;span class="str"&gt;"Creating new font"&lt;/span&gt;);
            cached.Font = &lt;span class="kwrd"&gt;new&lt;/span&gt; Font(&lt;span class="str"&gt;"Arial"&lt;/span&gt;, 8);
        }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[


.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;does work with a copy of the value stored in the dictionary. This means we do mutate a copy of the Theme object and return it to our caller. But the original Theme object in the dictionary will have always null for the Font field! The solution is to change the declaration of &lt;em&gt;struct Theme&lt;/em&gt; to &lt;em&gt;class Theme&lt;/em&gt; or to update the theme object in the dictionary. Our cache as it is currently is actually a non caching cache. The funny thing was that I found out with a profiler by looking at which objects where finalized. I found way too many font objects to be finalized. After a bit debugging I found the allocation source for Font objects was this cache. Since this cache was there for years it means that &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the cache was never needed since I found no perf issue due to the creation of font objects.&lt;/li&gt;

  &lt;li&gt;the cache was never profiled if it did bring any performance gain.&lt;/li&gt;

  &lt;li&gt;to make the cache beneficial it needs to be accessed much more often. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was the story of the non caching cache. Next time I will write something something about measuring. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151086.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/10/27/151086.aspx</guid>
            <pubDate>Sat, 27 Oct 2012 23:44:00 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151086.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/10/27/151086.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151086.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151086.aspx</trackback:ping>
        </item>
        <item>
            <title>Thread.Interrupt Is Evil</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/10/22/151047.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/10/22/151047.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/10/22/151047.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Recently I have found an interesting issue with Thread.Interrupt during application shutdown. Some application was crashing once a week and we had not really a clue what was the issue. Since it happened not very often it was left as is until we have got some memory dumps during the crash. A memory dump usually means WindDbg which I really like to use (I know I am one of the very few fans of it).  After a quick analysis I did find that the main thread already had exited and the thread with the crash was stuck in a Monitor.Wait. Strange Indeed. Running the application a few thousand times under the debugger would potentially not have shown me what the reason was so I decided to do what I call constructive debugging. With constructive I mean that I do not try to analyze the running application and try to repro it there. Instead I try to create a synthetic sample application which tries to reproduce the issue. I am done when I get the same crash and callstack as in the production application. This approach is usually quite successful.&lt;/p&gt;  &lt;p&gt;I did create a simple Console application project and try to simulate the exact circumstances when the crash did happen from the information I have via memory dump and source code reading. The thread that was  crashing was actually MS code from an old version of the Microsoft Caching Application Block. From reading the code I could conclude that the main thread did call the Dispose method on the CacheManger class which did call Thread.Interrupt on the cache scavenger thread which was just waiting for work to do. &lt;/p&gt;  &lt;p&gt;My first version of the repro looked like this&lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
    {
        Thread t = &lt;span class="kwrd"&gt;new&lt;/span&gt; Thread(ThreadFunc)
        {
            IsBackground = &lt;span class="kwrd"&gt;true&lt;/span&gt;,
            Name = &lt;span class="str"&gt;"Test Thread"&lt;/span&gt;
        };
        t.Start();

        Console.WriteLine(&lt;span class="str"&gt;"Interrupt Thread"&lt;/span&gt;);
        t.Interrupt();
    }

    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; ThreadFunc()
    {
        &lt;span class="kwrd"&gt;while&lt;/span&gt; (&lt;span class="kwrd"&gt;true&lt;/span&gt;)
        {
            &lt;span class="kwrd"&gt;object&lt;/span&gt; &lt;span class="kwrd"&gt;value&lt;/span&gt; = Dequeue(); &lt;span class="rem"&gt;// block until unblocked or awaken via ThreadInterruptedException&lt;/span&gt;
        }
    }

    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt; WaitObject = &lt;span class="kwrd"&gt;new&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;();

    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt; Dequeue()
    {
        &lt;span class="kwrd"&gt;object&lt;/span&gt; lret = &lt;span class="str"&gt;"got value"&lt;/span&gt;;
        &lt;span class="kwrd"&gt;try&lt;/span&gt;
        {
            &lt;span class="kwrd"&gt;lock&lt;/span&gt; (WaitObject)
            {
            }
        }
        &lt;span class="kwrd"&gt;catch&lt;/span&gt; (ThreadInterruptedException)
        {
            Console.WriteLine(&lt;span class="str"&gt;"Got ThreadInterruptException"&lt;/span&gt;);
            lret = &lt;span class="kwrd"&gt;null&lt;/span&gt;;
        }
        &lt;span class="kwrd"&gt;return&lt;/span&gt; lret;
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[




.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;I do start a background thread and call Thread.Interrupt on it and then directly let the application terminate. The thread in the meantime does plenty of Monitor.Enter/Leave calls to simulate work on it. This first version did not crash. So I need to dig deeper. From the memory dump I did know that the finalizer thread was doing just some critical finalizers which were closing file handles. Ok lets add some long running finalizers to the sample.&lt;/p&gt;

&lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;class&lt;/span&gt; FinalizableObject : CriticalFinalizerObject
{
    ~FinalizableObject()
    {
        Console.WriteLine(&lt;span class="str"&gt;"Hi we are waiting to finalize now and block the finalizer thread for 5s."&lt;/span&gt;);
        Thread.Sleep(5000);
    }
}

&lt;span class="kwrd"&gt;class&lt;/span&gt; Program
{
    &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
    {
        FinalizableObject fin = &lt;span class="kwrd"&gt;new&lt;/span&gt; FinalizableObject();
        Thread t = &lt;span class="kwrd"&gt;new&lt;/span&gt; Thread(ThreadFunc)
        {
            IsBackground = &lt;span class="kwrd"&gt;true&lt;/span&gt;,
            Name = &lt;span class="str"&gt;"Test Thread"&lt;/span&gt;
        };
        t.Start();

        Console.WriteLine(&lt;span class="str"&gt;"Interrupt Thread"&lt;/span&gt;);
        t.Interrupt();
        GC.KeepAlive(fin); &lt;span class="rem"&gt;// prevent finalizing it too early&lt;/span&gt;
        &lt;span class="rem"&gt;// After leaving main the other thread is woken up via Thread.Abort&lt;/span&gt;
        &lt;span class="rem"&gt;// while we are finalizing. This causes a stackoverflow in the CLR ThreadAbortException handling at this time.&lt;/span&gt;
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[



.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;With this changed Main method and a blocking critical finalizer I did get my crash just like the real application. The funny thing is that this is actually a CLR bug. When the main method is left the CLR does suspend all threads except the finalizer thread and declares all objects as garbage. After the normal finalizers were called the critical finalizers are executed to e.g. free OS handles (usually). Remember that I did call Thread.Interrupt as one of the last methods in the Main method. The Interrupt method is actually asynchronous and does wake a thread up and throws a ThreadInterruptedException only once unlike Thread.Abort which does rethrow the exception when an exception handling clause is left. &lt;/p&gt;

&lt;p&gt;It seems that the CLR does not expect that a frozen thread does wake up again while the critical finalizers are executed. While trying to raise a ThreadInterrupedException the CLR goes down with an stack overflow. Ups not so nice. Why has this nobody noticed for years is my next question. As it turned out this error does only happen on the CLR for .NET 4.0 (x86 and x64). It does not show up in earlier or later versions of the CLR. &lt;/p&gt;

&lt;p&gt;I have reported this issue on connect &lt;a href="https://connect.microsoft.com/VisualStudio/feedback/details/764933/crash-stackoverflow-on-application-exit"&gt;here&lt;/a&gt; but so far it was not confirmed as a CLR bug. But I would be surprised if my console application was to blame for a stack overflow in my test thread in a Monitor.Wait call. &lt;/p&gt;

&lt;p&gt;What is the moral of this story? Thread.Abort is evil but Thread.Interrupt is too. It is so evil that even the CLR of .NET 4.0 contains a race condition during the CLR shutdown. When the CLR gurus can get it wrong the chances are high that you get it wrong too when you use this constructs. If you do not believe me see what &lt;a href="http://codebetter.com/patricksmacchia/2012/10/02/beware-mixing-list-sort-and-thread-abort/"&gt;Patrick Smacchia does blog about Thread.Abort and List.Sort&lt;/a&gt;. Not only the CLR creators can get it wrong. The BCL writers do sometimes have a hard time with correct exception handling as well. If you do tell me that you use Thread.Abort frequently and never had problems with it I do suspect that you do not have looked deep enough into your application to find such sporadic errors. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/151047.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/10/22/151047.aspx</guid>
            <pubDate>Mon, 22 Oct 2012 22:04:58 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/151047.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/10/22/151047.aspx#feedback</comments>
            <slash:comments>9</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/151047.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/151047.aspx</trackback:ping>
        </item>
        <item>
            <title>The Truth About .NET Objects And Sharing Them Between AppDomains</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;I have written already &lt;a href="http://geekswithblogs.net/akraus1/archive/2008/12/12/127870.aspx"&gt;some time ago&lt;/a&gt; how big a .NET object is. John Skeet as also made a &lt;a href="http://msmvps.com/blogs/jon_skeet/archive/2011/04/05/of-memory-and-strings.aspx"&gt;very detailed&lt;/a&gt; post about object sizes in .NET. I wanted to know if we can deduce the object size not by experiments (measuring) but by looking at the Rotor source code. There is indeed a simple definition in the object headers how big a .NET object minimally can be. A CLR object is still a (sophisticated) structure which is at an address that is changed quite often by the garbage collector. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_thumb_2.png" width="640" height="331" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The picture above shows that every .NET object contains an object header which contains information about which thread in which AppDomain has locked the object (means called Monitor.Enter). Next comes the Method Table Pointer which defines a managed type for one AppDomain. If the assembly is loaded AppDomain neutral this pointer to the type object will have the same value in all AppDomains. This basic building block of the CLR type system is also visible in managed code via Type.TypeHandle.Value which has IntPtr size.&lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;\sscli20\clr\src\vm\object.h&lt;/strong&gt;&lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="rem"&gt;//&lt;/span&gt;
&lt;span class="rem"&gt;// The generational GC requires that every object be at least 12 bytes&lt;/span&gt;
&lt;span class="rem"&gt;// in size.   &lt;/span&gt;
&lt;span class="preproc"&gt;#define&lt;/span&gt; MIN_OBJECT_SIZE     (2*&lt;span class="kwrd"&gt;sizeof&lt;/span&gt;(BYTE*) + &lt;span class="kwrd"&gt;sizeof&lt;/span&gt;(ObjHeader))&lt;/pre&gt;

&lt;p&gt;A .NET object has basically this layout:&lt;/p&gt;

&lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;class&lt;/span&gt; Object
{
  &lt;span class="kwrd"&gt;protected&lt;/span&gt;:
    MethodTable*    m_pMethTab;

};&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[












.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;class&lt;/span&gt; ObjHeader
{
  &lt;span class="kwrd"&gt;private&lt;/span&gt;:
    &lt;span class="rem"&gt;// !!! Notice: m_SyncBlockValue *MUST* be the last field in ObjHeader.&lt;/span&gt;
    DWORD  m_SyncBlockValue;      &lt;span class="rem"&gt;// the Index and the Bits&lt;/span&gt;
};&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[












.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;&lt;em&gt;x86&lt;/em&gt;&lt;/strong&gt; the minimum size is therefore &lt;strong&gt;&lt;em&gt;12 bytes&lt;/em&gt;&lt;/strong&gt; = 2*4+4. And for &lt;strong&gt;&lt;em&gt;x64 it is 24 bytes&lt;/em&gt;&lt;/strong&gt; = 2*8+8. The ObjectHeader struct is padded with another 4 bytes in x64 which does add up to 24 bytes for every object instance. The &lt;em&gt;MIN_OBJECT_SIZE&lt;/em&gt; definition has actually a factor two inside it whereas we would expect 8 as minimum empty object size. The previous sentence does contain already the answer to it. It makes little sense to define empty objects. Most meaningful objects have at least one member variable of class type which is indeed another pointer sized member hence the minimum size of 12 bytes (24) bytes in x86/x64. &lt;/p&gt;

&lt;p&gt;It is interesting to know that the garbage collector does not know anything about AppDomains. For him the managed heap does only consist of objects which have roots or not and does clean up everything which is not rooted anymore. I found this during the development of &lt;a href="http://wmemoryprofiler.codeplex.com/"&gt;WMemoryProfiler&lt;/a&gt; which uses DumpHeap of Windbg to get all object references from the managed heap. When I did access all objects found this way I got actually objects from other AppDomains as well. And they did work! It is therefore possible to share objects directly between AppDomains. &lt;/p&gt;

&lt;p&gt;Why would you want to do that? Well it is fun and you can do really dirty stuff with that. Do you remember that you cannot unload assemblies from an AppDomain? Yes that is still true but why would you ever want to unload an assembly? Mostly because you were doing some dynamic code generation which will at some point in time dominate your overall memory consumption if you load generated assemblies into your AppDomain. I have seen this stuff many times for dynamic query generation. The problem is that if you load the dynamically created code into another AppDomain you need to serialize the data to the other AppDomain as well because you cannot share plain objects between AppDomains. To serialize potentially much data across AppDomain is prohibitively slow and therefore people live with the restriction that code gen will increase the working set quite a lot.  With some tricks you can now share plain objects between AppDomain and get unloadable code as well.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h5&gt;&lt;strong&gt;Warning: This following stuff well beyond the specs but it does work since .NET 2.0 up to 4.5. &lt;/strong&gt;&lt;/h5&gt;

&lt;h5&gt;&lt;strong&gt;Do not try this at work!&lt;/strong&gt;&lt;/h5&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;When you load an assembly into your (default) AppDomain you will load it only for your current AppDomain. The types defined there are not shared anywhere. There is one exception though: The types defined in mscorlib are always shared between all AppDomains. The mscorlib assembly is loaded into a so called Shared Domain. This is not a real AppDomain but simply a placeholder domain where all assemblies are loaded which can be shared between AppDomains. An assembly loaded into the Shared Domain is loaded therefore AppDomain neutral. Assemblies loaded &lt;em&gt;AppDomain neutral&lt;/em&gt; have one special behavior: &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;AppDomain neutral assemblies are &lt;strong&gt;never&lt;/strong&gt; unloaded even when no “real” AppDomain is using them anymore. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;/p&gt;&lt;style type="text/css"&gt;&lt;![CDATA[













.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;The picture below shows in which scenarios assemblies are loaded AppDomain neutral (green) from the Shared Domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_thumb_5.png" width="798" height="765" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;The first one is the most common one. You load an assembly into the default AppDomain. This defaults to &lt;a href="http://msdn.microsoft.com/en-us/library/system.loaderoptimization.aspx"&gt;LoaderOptimization.SingleAppDomain&lt;/a&gt; where every assembly is compiled from scratch again when other AppDomains with no special flags are created. Only the basic CLR types located in mscorlib are loaded AppDomain neutral and always shared between AppDomains no matter what (with .NET 1.1 this was not the case) flags are used. &lt;/p&gt;

&lt;p&gt;If you create e.g. 5 AppDomains with the default settings you will load and JIT every assembly (except mscorlib) again and get different types for each AppDomain although the were all loaded from the same assembly in the same loader context. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;The opposite is LoaderOptimization.MultiDomain where every assembly is loaded as AppDomain neutral assembly which ensures that all assemblies loaded in any AppDomain which have this attribute set are loaded and JITed only once and share the types (=Same Method Table pointer) between AppDomains.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;An interesting hybrid is LoaderOptimization.MultiDomainHost which does load only these assemblies AppDomain neutral which are loaded from the GAC. That means if you load the same assembly one time from the GAC and a second time the same one unsigned from your probing path you will not get identical types but different Method Table pointers for the types. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;Since we know that the GC does not know anything about AppDomains we (at least the managed heap objects do not contain infos in which AppDomain they reside) we should be able to pass object via a pointer to another AppDomain. This could come in handy if we generate a lot of code dynamically for each query which is made against a data source but we want a way to get rid of the compiled code by unloading the Query AppDomain from time to time without the normally required cost to copy the data to be queried every time from the Default AppDomain into the Query AppDomain. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/TheDefinitiveTruthAbout.NETObjectSizes_9A70/image_thumb_6.png" width="798" height="301" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;You can see this in action in the sample Application &lt;a href="http://wmemoryprofiler.codeplex.com/SourceControl/list/changesets#"&gt;AppDomainTests&lt;/a&gt; which is part of the test suite for WMemoryProfiler. Here is the code for the main application which does create AppDomains in a loop send data via an IntPtr to another AppDomain and get the calculated result back without passing the data via Marshalling by value to the other AppDomain.&lt;/p&gt;

&lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;class&lt;/span&gt; Program
    {
        &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// Show how to pass an object by reference directly into another appdomain without serializing it at all.&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;param name="args"&amp;gt;&amp;lt;/param&amp;gt;&lt;/span&gt;
        [LoaderOptimization(LoaderOptimization.MultiDomainHost)]
        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
        {
            &lt;span class="kwrd"&gt;for&lt;/span&gt; (&lt;span class="kwrd"&gt;int&lt;/span&gt; i = 0; i &amp;lt; 10000; i++) &lt;span class="rem"&gt;// try it often to see how the AppDomains do behave&lt;/span&gt;
            { 
            &lt;span class="rem"&gt;// To load our assembly appdomain neutral we need to use MultiDomainHost on our hosting and child domain&lt;/span&gt;
            &lt;span class="rem"&gt;// If not we would get different Method tables for the same types which would result in InvalidCastExceptions&lt;/span&gt;
            &lt;span class="rem"&gt;// for the same type.&lt;/span&gt;
            &lt;span class="rem"&gt;// Prerequisite for MultiDomainHost is that the assembly we share the data is &lt;/span&gt;
            &lt;span class="rem"&gt;// a) Installed into the GAC (which requires as strong name as well)&lt;/span&gt;
            &lt;span class="rem"&gt;// If you would use MultiDomain then it would work but all AppDomain neutral assemblies will never be unloaded.&lt;/span&gt;
            var other = AppDomain.CreateDomain(&lt;span class="str"&gt;"Test"&lt;/span&gt;+i.ToString(), AppDomain.CurrentDomain.Evidence, &lt;span class="kwrd"&gt;new&lt;/span&gt; AppDomainSetup
                {
                    LoaderOptimization = LoaderOptimization.MultiDomainHost,
                });

            &lt;span class="rem"&gt;// Create gate object in other appdomain&lt;/span&gt;
            DomainGate gate = (DomainGate)other.CreateInstanceAndUnwrap(Assembly.GetExecutingAssembly().FullName, &lt;span class="kwrd"&gt;typeof&lt;/span&gt;(DomainGate).FullName);
            

            &lt;span class="rem"&gt;// now lets create some data&lt;/span&gt;
            CrossDomainData data = &lt;span class="kwrd"&gt;new&lt;/span&gt; CrossDomainData();
            data.Input = Enumerable.Range(0, 10).ToList();

            &lt;span class="rem"&gt;// process it in other AppDomain&lt;/span&gt;
            DomainGate.Send(gate, data);

            &lt;span class="rem"&gt;// Display result calculated in other AppDomain&lt;/span&gt;
            Console.WriteLine(&lt;span class="str"&gt;"Calculation in other AppDomain got: {0}"&lt;/span&gt;, data.Aggregate);

            AppDomain.Unload(other);
            &lt;span class="rem"&gt;// check in debugger now if UnitTests.dll has been unloaded.&lt;/span&gt;
            Console.WriteLine(&lt;span class="str"&gt;"AppDomain unloaded"&lt;/span&gt;);

            }
        }&lt;/pre&gt;

&lt;p&gt;To enable code unloading in the other AppDomain I did use LoaderOptimzation.MultiDomainHost which forces all non GAC assemblies to be unloadable. At the same time we must ensure that the assembly that defines CrossDomainData is loaded from the GAC to get an equal MethodTable pointer accross all AppDomains. The actual magic does happen in the DomainGate class which has a method DoSomething which expects not a CLR object but the object address as parameter to weasel a plain CLR reference into another AppDomain. This sounds highly dirty and it certainly is but it is also quite cool ;-).&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;pre class="csharpcode"&gt;    &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
    &lt;span class="rem"&gt;/// Enables sharing of data between appdomains as plain objects without any marsalling overhead.&lt;/span&gt;
    &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="kwrd"&gt;class&lt;/span&gt; DomainGate : MarshalByRefObject
    {
        &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// Operate on a plain object which is shared from another AppDomain.&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;param name="gcCount"&amp;gt;Total number of GCs&amp;lt;/param&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;param name="objAddress"&amp;gt;Address to managed object.&amp;lt;/param&amp;gt;&lt;/span&gt;
        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; DoSomething(&lt;span class="kwrd"&gt;int&lt;/span&gt; gcCount, IntPtr objAddress)
        {
            &lt;span class="kwrd"&gt;if&lt;/span&gt; (gcCount != ObjectAddress.GCCount)
            {
                &lt;span class="kwrd"&gt;throw&lt;/span&gt; &lt;span class="kwrd"&gt;new&lt;/span&gt; NotSupportedException(&lt;span class="str"&gt;"During the call a GC did happen. Please try again."&lt;/span&gt;);
            }

            &lt;span class="rem"&gt;// If you get an exception here disable under Projces/Debugging/Enable Visual Studio Hosting Process&lt;/span&gt;
            &lt;span class="rem"&gt;// The appdomain which is used there seems to use LoaderOptimization.SingleDomain&lt;/span&gt;
            CrossDomainData data = (CrossDomainData) PtrConverter&amp;lt;Object&amp;gt;.Default.ConvertFromIntPtr(objAddress);;

            &lt;span class="rem"&gt;// process input data from other domain&lt;/span&gt;
            &lt;span class="kwrd"&gt;foreach&lt;/span&gt; (var x &lt;span class="kwrd"&gt;in&lt;/span&gt; data.Input)
            {
                Console.WriteLine(x);
            }

            OtherAssembliesUsage user = &lt;span class="kwrd"&gt;new&lt;/span&gt; OtherAssembliesUsage();

            &lt;span class="rem"&gt;// generate output data &lt;/span&gt;
            data.Aggregate = data.Input.Aggregate((x, y) =&amp;gt; x + y);
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Send(DomainGate gate, &lt;span class="kwrd"&gt;object&lt;/span&gt; o)
        {
            var old = GCSettings.LatencyMode;
            &lt;span class="kwrd"&gt;try&lt;/span&gt;
            {
                GCSettings.LatencyMode = GCLatencyMode.Batch; &lt;span class="rem"&gt;// try to keep the GC out of our stuff&lt;/span&gt;
                var addandGCCount = ObjectAddress.GetAddress(o);
                gate.DoSomething(addandGCCount.Value, addandGCCount.Key);
            }
            &lt;span class="kwrd"&gt;finally&lt;/span&gt;
            {
                GCSettings.LatencyMode = old;
            }

        }

 
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[







.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;To get an object address I use Marshal.UnsafeAddrOfPinnedArrayElement and then try to work around the many race conditions this does impose. But it is not as bad is it sounds since you do need only to pass an object via a pointer once into the other AppDomain and use this as data gateway to exchange input and output data. This way you can pass data via a pointer to another AppDomain which can be fully unloaded after you are done with it. To make the code unloadable I need to use LoadOptimization.MultiDomainHost for all AppDomains. The data exchange type is located in another assembly which is strong named and you need to put it into the GAC before you let the sample run. Otherwise it will fail with this exception&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Unhandled Exception: &lt;strong&gt;System.InvalidCastException: [A]AppDomainTests.CrossDomainData cannot be cast to [B]AppDomainTests.CrossDomainData&lt;/strong&gt;. Type A originates from 'StrongNamedDomainGateDll, Version=1.0.0.0, Culture=neutral, PublicKeyToken=98f280cda3cbf035' in the context 'Default' at location 'C:\Source\WindbgAuto\bin\AnyCPU\Release\StrongNamedDomainGateDll.dll'. Type B originates from 'StrongNamedDomainGateDll, Version=1.0.0.0, Culture=neutral, PublicKeyToken=98f280cda3cbf035' in the context 'Default' at location 'C:\Source\WindbgAuto\bin\AnyCPU\Release\StrongNamedDomainGateDll.dll'. 

    &lt;br /&gt;   at AppDomainTests.DomainGate.DoSomething(Int32 gcCount, IntPtr objAddress) in C:\Source\WindbgAuto\Tests\AppDomainTests\DomainGate.cs:line 24 

    &lt;br /&gt;   at AppDomainTests.DomainGate.DoSomething(Int32 gcCount, IntPtr objAddress) 

    &lt;br /&gt;   at AppDomainTests.DomainGate.Send(DomainGate gate, Object o) in C:\Source\WindbgAuto\Tests\AppDomainTests\DomainGate.cs:line 50 

    &lt;br /&gt;   at AppDomainTests.Program.Main(String[] args) in C:\Source\WindbgAuto\Tests\AppDomainTests\Program.cs:line 41&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;at first it looks a little pointless to deny a cast to an object which was loaded in the default loader context for the very same assembly. But we do know now that the Method Table pointer for CrossDomainData is different between the two AppDomains. When you install the assembly into the GAC (be sure to use the .NET 4 gacutil!) the error goes away and we then get:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;0 
    &lt;br /&gt;1 

    &lt;br /&gt;2 

    &lt;br /&gt;3 

    &lt;br /&gt;4 

    &lt;br /&gt;5 

    &lt;br /&gt;6 

    &lt;br /&gt;7 

    &lt;br /&gt;8 

    &lt;br /&gt;9 

    &lt;br /&gt;Calculation in other AppDomain got: 45&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;which shows that we can get data and are able to modify it directly between AppDomains. If you use this code in production and it does break. I have warned you. This is far beyond what the MS engineers want us to do and it can break the CLR in subtle unintended ways I have not found yet. Now you have got (hopefully) a much better understanding how the CLR type system and the managed heap does work. If questions are left. Start the application and look at !DumpDomain and !DumpHeap –stat and its related commands to see for yourself.&lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/150301.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx</guid>
            <pubDate>Wed, 25 Jul 2012 15:11:20 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/150301.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/150301.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/150301.aspx</trackback:ping>
        </item>
        <item>
            <title>WMemoryProfiler is Released</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/06/22/150026.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/06/22/150026.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/06/22/150026.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;What is it? &lt;a href="https://wmemoryprofiler.codeplex.com/"&gt;WMemoryProfiler&lt;/a&gt; is a managed profiling Api to aid integration testing. This free library can get managed heap statistics and memory usage for your own process (remember testing) and other processes as well. The best thing is that it does work from .NET 2.0 up to .NET 4.5 in x86 and x64. To make it more interesting it can attach to any running .NET process. The reason why I do mention this is that commercial profilers do support this functionality only for their professional editions. An normally only since .NET 4.0 since the profiling API only since then does support attaching to a running process. &lt;/p&gt;  &lt;p&gt;This thing does differ in many aspects from “normal” profilers because while profiling yourself you can get all objects from all managed heaps back as an object array. If you ever wanted to change the state of an object which does only exist a method local in another thread you can get your hands on it now …&lt;/p&gt;  &lt;p&gt;Enough theory. Show me some code&lt;/p&gt;  &lt;pre class="csharpcode"&gt;        &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// Show feature to not only get statisics out of a process but also the newly allocated &lt;/span&gt;
        &lt;span class="rem"&gt;/// instances since the last call to MarkCurrentObjects.&lt;/span&gt;
        &lt;span class="rem"&gt;/// GetNewObjects does return the newly allocated objects as object array&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; InstanceTracking()
        {
            &lt;span class="kwrd"&gt;using&lt;/span&gt; (var dumper = &lt;span class="kwrd"&gt;new&lt;/span&gt; MemoryDumper())  &lt;span class="rem"&gt;// if you have problems use to see the debugger windows true,true))&lt;/span&gt;
            {
                dumper.MarkCurrentObjects();
                Allocate();
                ILookup&amp;lt;Type, &lt;span class="kwrd"&gt;object&lt;/span&gt;&amp;gt; newObjects = dumper.GetNewObjects()
                                                         .ToLookup( x =&amp;gt; x.GetType() );

                Console.WriteLine(&lt;span class="str"&gt;"New Strings:"&lt;/span&gt;);
                &lt;span class="kwrd"&gt;foreach&lt;/span&gt; (var newStr &lt;span class="kwrd"&gt;in&lt;/span&gt; newObjects[&lt;span class="kwrd"&gt;typeof&lt;/span&gt;(&lt;span class="kwrd"&gt;string&lt;/span&gt;)] )
                {
                    Console.WriteLine(&lt;span class="str"&gt;"Str: {0}"&lt;/span&gt;, newStr);
                }
            }
        }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[

.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;… &lt;/p&gt;

&lt;p&gt;New Strings: 
  &lt;br /&gt;Str: qqd 

  &lt;br /&gt;Str: String data: 

  &lt;br /&gt;Str: String data: 0 

  &lt;br /&gt;Str: String data: 1 

  &lt;br /&gt;… &lt;/p&gt;

&lt;p&gt;This is really hot stuff. Not only you can get heap statistics but you can directly examine the new objects and make queries upon them. When I do find more time I can reconstruct the object root graph from it from my own process. It this cool or what? You can also peek into the Finalization Queue to check if you did accidentally forget to dispose a whole bunch of objects …&lt;/p&gt;

&lt;pre class="csharpcode"&gt;        &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// .NET 4.0 or above only. Get all finalizable objects which are ready for finalization and have no other object roots anymore.&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; NotYetFinalizedObjects()
        {
            &lt;span class="kwrd"&gt;using&lt;/span&gt; (var dumper = &lt;span class="kwrd"&gt;new&lt;/span&gt; MemoryDumper())
            {
                &lt;span class="kwrd"&gt;object&lt;/span&gt;[] finalizable = dumper.GetObjectsReadyForFinalization();
                Console.WriteLine(&lt;span class="str"&gt;"Currently {0} objects of types {1} are ready for finalization. Consider disposing them before."&lt;/span&gt;, 
                                  finalizable.Length,
                                   String.Join(&lt;span class="str"&gt;","&lt;/span&gt;, finalizable.ToLookup( x=&amp;gt; x.GetType() )
                                                               .Select(   x=&amp;gt; x.Key.Name)) 
                                 );
            }

        }&lt;/pre&gt;

&lt;h4&gt;&lt;strong&gt;How does it work? &lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;The W of WMemoryProfiler is a good hint. It does employ Windbg and SOS dll to do the heavy lifting and concentrates on an easy to use Api which does hide completely Windbg. If you do not want to see Windbg you will never see it. In my experience the most complex thing is actually to download Windbg from the &lt;a href="http://msdn.microsoft.com/en-US/windows/hardware/hh852363"&gt;Windows 8 Stanalone SDK&lt;/a&gt;. This is described in the Readme and the exception you are greeted with if it is missing in much greater detail. So I will not go into this here. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h4&gt;&lt;strong&gt;What Next?&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;Depending on the feedback I do get I can imagine some features which might be useful as well &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Calculate first order GC Roots from the actual object graph &lt;/li&gt;

  &lt;li&gt;Identify global statics in Types in object graph &lt;/li&gt;

  &lt;li&gt;Support read out of finalization queue of .NET 2.0 as well. &lt;/li&gt;

  &lt;li&gt;Support Memory Dump analysis (again a feature only supported by commercial profilers in their professional editions if it is supported at all) &lt;/li&gt;

  &lt;li&gt;Deserialize objects from a memory dump into a live process back (this would need some more investigation but it is doable) &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last item needs some explanation. Why on earth would you want to do that? The basic idea is to store in your live process some logging/tracing data which can become quite big but since it is never written to it is very fast to generate. When your process crashes with a memory dump you could transfer this data structure back into a live viewer which can then nicely display your program state at the point it did crash. This is an advanced trouble shooting technique I have not seen anywhere yet but it could be quite useful. You can have here a look at the current &lt;a href="https://wmemoryprofiler.codeplex.com/documentation"&gt;feature list of&lt;/a&gt; WMemoryProfiler with some examples.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h4&gt;&lt;strong&gt;How To Get Started?&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;First I would download the released &lt;a href="https://wmemoryprofiler.codeplex.com/downloads/get/393244"&gt;source package&lt;/a&gt; (it is tiny). And compile the complete project. Then you can compile the Example project (it has this name) and uncomment in the main method the scenario you want to check out. If you are greeted with an exception it is time to install the Windows 8 Standalone SDK which is described in great detail in the exception text. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WMemoryProfilerisReleased_FD4/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/WMemoryProfilerisReleased_FD4/image_thumb.png" width="976" height="652" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;Thats it for the first round. I have seen something more limited in the Java world some years ago (now I cannot find the link anymore) but anyway. Now we have something much better. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/150026.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/06/22/150026.aspx</guid>
            <pubDate>Sat, 23 Jun 2012 06:46:52 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/150026.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/06/22/150026.aspx#feedback</comments>
            <slash:comments>6</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/150026.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/150026.aspx</trackback:ping>
        </item>
        <item>
            <title>Useful .NET Delegate Internals</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Delegates in .NET are a very handy addition to the language. With the introduction of LINQ they did become mainstream and everyone is using them or is at least finding them cool. But what the CLR is really doing to them under the covers remains largely unexplored. This is ok as long as you never get any memory dumps from the field with angry customers waiting for a solution. Delegates are heavily used to e.g. send window messages from an arbitrary thread to the main UI thread. Or you have an thread scheduler (TPL anyone?) which does execute delegates from a queue on one or more threads. These things do all work with a custom data structure where besides scheduling information the delegate to be called is stored. When you want to examine a data structure which does contain delegate fields and you need to know to which method this delegate does point to things become less obvious if you do not have the VS debugger on a live process attached. If you dump the contents of a Func&amp;lt;string,bool&amp;gt; delegate you will see in Windbg something like this:&lt;/p&gt;  &lt;pre class="csharpcode"&gt;0:000&amp;gt; !DumpObj /d 0272b8d8
Name:        System.Func`2[[System.String, mscorlib],[System.Boolean, mscorlib]]
MethodTable: 6e4edbac
EEClass:     6e211748
Size:        32(0x20) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
6e52f744  4000076        4        System.Object  0 instance 0272b8d8 _target  -&amp;gt; Target object to call to
6e52f744  4000077        8        System.Object  0 instance 00000000 _methodBase –&amp;gt; is null except if VS is attached
6e52ab88  4000078        c        System.IntPtr  1 instance   5b0a18 _methodPtr  -&amp;gt; Trampoline code to dispatch the method
6e52ab88  4000079       10        System.IntPtr  1 instance   40c068 _methodPtrAux –&amp;gt; If not null and invocationCount == 0 it does contain the MethodPtr of a static method.
6e52f744  400007a       14        System.Object  0 instance 00000000 _invocationList –&amp;gt; When it is an event there the list of delegates to call are stored
6e52ab88  400007b       18        System.IntPtr  1 instance        0 _invocationCount –&amp;gt; Number of delegates stored in invocationList&lt;/pre&gt;

&lt;p&gt;We do know the type of the object where one of its methods is called by inspecting the _target field which gives us the instance on which non static methods are called. If the method is static _target does contain only the delegate instance (Func&amp;lt;string,bool&amp;gt;) which is of no help. When a delegate to a static method is declared the value which is normally stored in _methodPtr is stored in _methodPtrAux. This _methodPtrxxxx stuff does not point to data structures but trampolin code that does retrieve the target MethodDescriptor (not via reflection of course). I have found this stuff by creating a little sample app which defines some delegate instances and then goes to sleep to allow me to break into with Windbg quite easy. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;using&lt;/span&gt; System;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Linq;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Threading;
&lt;span class="kwrd"&gt;using&lt;/span&gt; System.Runtime.InteropServices;

&lt;span class="kwrd"&gt;namespace&lt;/span&gt; ConsoleApplication10
{
    &lt;span class="kwrd"&gt;class&lt;/span&gt; Program
    {
        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
        {
            &lt;span class="kwrd"&gt;new&lt;/span&gt; Program().Start();
        }

        &lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Start()
        {
            Func&amp;lt;&lt;span class="kwrd"&gt;string&lt;/span&gt;, &lt;span class="kwrd"&gt;bool&lt;/span&gt;&amp;gt; func = Filter;
            Func&amp;lt;&lt;span class="kwrd"&gt;string&lt;/span&gt;, &lt;span class="kwrd"&gt;bool&lt;/span&gt;&amp;gt; staticFunc = FilterStatic;
            Thread.Sleep(5000);
            func(&lt;span class="str"&gt;""&lt;/span&gt;);
        }

        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; FilterStatic(&lt;span class="kwrd"&gt;string&lt;/span&gt; str)
        {
            &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;true&lt;/span&gt;;
        }

        &lt;span class="kwrd"&gt;bool&lt;/span&gt; Filter(&lt;span class="kwrd"&gt;string&lt;/span&gt; str)
        {
            &lt;span class="kwrd"&gt;throw&lt;/span&gt; &lt;span class="kwrd"&gt;new&lt;/span&gt; Exception();
        }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[











.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;The algorithm how the CLR does retrieve the MethodDescriptor when a delegate has been called is the same since .NET 2.0 up to .NET 4.5. There are even no differences between x86 and x64 (except for the pointer size of a MethodDescriptor). The algorithm can best be seen in x64 assembly code how it does retrieve the method descriptor. The _MethodPtr+5 is the base value in eax.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;pre class="csharpcode"&gt;clr!PrecodeFixupThunk+0x1:
000007fe`edf113f1 4c0fb65002      movzx   r10,byte ptr [rax+2] // load byte from eax+2
000007fe`edf113f6 4c0fb65801      movzx   r11,byte ptr [rax+1] // load byte from eax+1
000007fe`edf113fb 4a8b44d003      mov     rax,qword ptr [rax+r10*8+3] // retrieve base MethodDescriptor
000007fe`edf11400 4e8d14d8        lea     r10,[rax+r11*8] // Add method slot offset to get the right one&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[









.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;From this disassembly we can now create a Windbg script to dump for any MethodPtr the actual method.&lt;/p&gt;

&lt;p&gt;&lt;font size="2" face="Lucida Console"&gt;0:000&amp;gt; !DumpObj /d 01ca2328 
    &lt;br /&gt;Name:        System.Func`2[[System.String, mscorlib],[System.Boolean, mscorlib]] 

    &lt;br /&gt;MethodTable: 5cf43c6c 

    &lt;br /&gt;EEClass:     5cb73718 

    &lt;br /&gt;Size:        32(0x20) bytes 

    &lt;br /&gt;File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll 

    &lt;br /&gt;Fields: 

    &lt;br /&gt;      MT    Field   Offset                 Type VT     Attr    Value Name 

    &lt;br /&gt;5cef8194  400002d        4        System.Object  0 instance 01ca231c _target 

    &lt;br /&gt;5cef8194  400002e        8        System.Object  0 instance 00000000 _methodBase 

    &lt;br /&gt;5cefb8fc  400002f        c        System.IntPtr  1 instance   26c048 _methodPtr 

    &lt;br /&gt;5cefb8fc  4000030       10        System.IntPtr  1 instance        0 _methodPtrAux 

    &lt;br /&gt;5cef8194  4000031       14        System.Object  0 instance 00000000 _invocationList 

    &lt;br /&gt;5cefb8fc  4000032       18        System.IntPtr  1 instance        0 _invocationCount 

    &lt;br /&gt;0:000&amp;gt; &lt;font color="#ff0000"&gt;$$&amp;gt;a&amp;lt; "c:\source\DelegateTest\Resolve.txt" 26c048 
      &lt;br /&gt;&lt;/font&gt;Method Name:  &lt;font color="#008000"&gt;ConsoleApplication10.Program.Filter(System.String) 
      &lt;br /&gt;&lt;/font&gt;Class:        002612fc 

    &lt;br /&gt;MethodTable:  00263784 

    &lt;br /&gt;mdToken:      06000005 

    &lt;br /&gt;Module:       00262e7c 

    &lt;br /&gt;IsJitted:     yes 

    &lt;br /&gt;CodeAddr:     002c05f0 

    &lt;br /&gt;Transparency: Critical&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The code for the Resolve.txt Windbg script is&lt;/p&gt;

&lt;p&gt;&lt;font size="2" face="Lucida Console"&gt;r $t0 = ${$arg1}+5 
    &lt;br /&gt;r $t1 = $t0 + 8*by($t0+2) + 3 

    &lt;br /&gt;r $t2 = 8*by($t0+1) 

    &lt;br /&gt;r $t3 = poi($t1) + $t2 

    &lt;br /&gt;!DumpMD $t3&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;This does allow you to retrieve the called method from any delegate instance from .NET 2.0-4.5 regardless of its bitness. The offsets used in the script seem to point to IL structures which have the same layout across all platforms. For x86 code there is a even shorter shortcut. It seems that the two offsets are 0 normally which boils the whole calculation down to&lt;/p&gt;

&lt;p&gt;&lt;font size="2" face="Lucida Console"&gt;!DumpMD poi(_methodPtr+8)&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;to get the desired information. For x64 though you need the script to get the right infos back. I have tried to load a dump of the short demo application into VS11 to see if VS can resolve the target methods already (would be a fine thing) but nope you will not get much infos out of the dump if you stick to VS. Windbg still remains the (only) one tool of choice to do post mortem debugging with memory dumps. You can either tell your customers that you did not find the bug and you need to install VS on their machines or you take a deep breath and remove they layers of abstractions (C#, IL, Assembler, … Quantum Physics) one by one until you can solve your problem. &lt;/p&gt;

&lt;p&gt;If you try this out and come back to me that this script does not work it could be that you did try to dump the _methodPtr from a multicast delegate which has an invocation count &amp;gt; 0. The _methodPtr does then contain the code to loop through the array and calls them one by one. All you need to do is to use the script on one of the delegates contained in the _invocationList array. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;To make Windbg more user friendly you should type as first command always&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;.prefer_dml 1&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;to enable “hyperlink” support for many commands in WinDbg. With .NET 4.0 the sos.dll does also support this hyperlink syntax which makes it much easier to dump the contents of objects. Whenever you click on one of the blue underlined links with the mouse a useful command like dump object contents is executed. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/905ec9716c9d_14058/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/905ec9716c9d_14058/image_thumb.png" width="872" height="317" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;When you have this enabled it becomes also easier to load the right sos.dll. Many of you know the trick to use the .loadby command to load the sos dll from the same location as the clr.dll (.NET 4.0) or mscorwks (pre .NET 4.0). For some reasons this loadby trick does not work in all cases. Another nearly as fast solution is to use the lm command to display the loaded modules. When you then click on the module where you want to know the path you can then copy it into your command line and append sos.dll to your load command and you are done.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/905ec9716c9d_14058/image_6.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://gwb.blob.core.windows.net/akraus1/WindowsLiveWriter/905ec9716c9d_14058/image_thumb_2.png" width="970" height="753" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the ones of you who can remember all shortcuts you can also use&lt;/p&gt;

&lt;p&gt;&lt;font size="2" face="Lucida Console"&gt;0:000&amp;gt; lm i m clr 
    &lt;br /&gt;Browse full module list 

    &lt;br /&gt;start    end        module name 

    &lt;br /&gt;5dba0000 5e21f000   clr        (export symbols)       C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;to show the full path of all modules which match clr as pattern. &lt;/p&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/149699.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx</guid>
            <pubDate>Sun, 20 May 2012 22:39:11 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/149699.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/05/20/149699.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/149699.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/149699.aspx</trackback:ping>
        </item>
        <item>
            <title>The Return Of __FILE__ And __LINE__ In .NET 4.5</title>
            <link>http://geekswithblogs.net/akraus1/archive/2012/04/09/149264.aspx</link>
            <description>&lt;p&gt;Originally posted on: &lt;a href='http://geekswithblogs.net/akraus1/archive/2012/04/09/149264.aspx'&gt;http://geekswithblogs.net/akraus1/archive/2012/04/09/149264.aspx&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Good things are hard to kill. One of the most useful predefined compiler macros in C/C++ were __FILE__ and __LINE__ which do expand to the compilation units file name and line number where this value is encountered by the compiler. After 4.5 versions of .NET we are on par with C/C++ again. It is of course not a simple compiler expandable macro it is an attribute but it does serve exactly the same purpose. &lt;/p&gt;  &lt;p&gt;Now we do get&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.callerlinenumberattribute%28v=vs.110%29.aspx"&gt;CallerLineNumberAttribute&lt;/a&gt;  == __LINE__ &lt;/li&gt;    &lt;li&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.callerfilepathattribute%28v=vs.110%29.aspx"&gt;CallerFilePathAttribute&lt;/a&gt;        == __FILE__ &lt;/li&gt;    &lt;li&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.callermembernameattribute%28v=vs.110%29.aspx"&gt;CallerMemberNameAttribute&lt;/a&gt;  == __FUNCTION__ (MSVC Extension) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt; &lt;/p&gt;  &lt;p&gt;The most important one is CallerMemberNameAttribute which is very useful to implement the INotifyPropertyChanged interface without the need to hard code the name of the property anymore. Now you can simply decorate your change method with the new CallerMemberName attribute and you get the property name as string directly inserted by the C# compiler at compile time.&lt;/p&gt;  &lt;p&gt; &lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt; UserName
{     
      get {  &lt;span class="kwrd"&gt;return&lt;/span&gt; _userName;     }     
      set {  _userName=&lt;span class="kwrd"&gt;value&lt;/span&gt;;        
             RaisePropertyChanged();  &lt;span class="rem"&gt;// no more RaisePropertyChanged(“UserName”)!   &lt;/span&gt;
          }
}

&lt;span class="kwrd"&gt;protected&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; RaisePropertyChanged([CallerMemberName] &lt;span class="kwrd"&gt;string&lt;/span&gt; member = &lt;span class="str"&gt;""&lt;/span&gt;)
{   
    var copy = PropertyChanged; 
    &lt;span class="kwrd"&gt;if&lt;/span&gt;(copy != &lt;span class="kwrd"&gt;null&lt;/span&gt;)  
    {       
       copy(&lt;span class="kwrd"&gt;new&lt;/span&gt; PropertyChangedEventArgs(&lt;span class="kwrd"&gt;this&lt;/span&gt;, member));     
    }
}&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;Nice and handy. This was obviously the prime reason to implement this feature in the C# 5.0 compiler. You can repurpose this feature for tracing to get your hands on the method name of your caller along other stuff very fast now. All infos are added during compile time which is much faster than other approaches like walking the stack.&lt;/p&gt;

&lt;p&gt;The example on MSDN shows the usage of this attribute with an example &lt;/p&gt;

&lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; TraceMessage(&lt;span class="kwrd"&gt;string&lt;/span&gt; message, [CallerMemberName] &lt;span class="kwrd"&gt;string&lt;/span&gt; memberName = &lt;span class="str"&gt;""&lt;/span&gt;, 
                               [CallerFilePath] &lt;span class="kwrd"&gt;string&lt;/span&gt; sourceFilePath = &lt;span class="str"&gt;""&lt;/span&gt;, [CallerLineNumber] &lt;span class="kwrd"&gt;int&lt;/span&gt; sourceLineNumber = 0) 
{ 
    Console.WriteLine(&lt;span class="str"&gt;"Hi {0} {1} {2}({3})"&lt;/span&gt;, message, memberName, sourceFilePath, sourceLineNumber); 
}&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;When I do think of tracing I do usually want to have a API which allows me to &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Trace method enter and leave &lt;/li&gt;

  &lt;li&gt;Trace messages with a severity like Info, Warning, Error &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I do print a trace message it is very useful to print out method and type name as well. So your API must either be able to pass the method and type name as strings or extract it automatically via walking back one Stackframe and fetch the infos from there. The first glaring deficiency is that there is no CallerTypeAttribute yet because the &lt;a href="http://blog.slaks.net/2011/10/caller-info-attributes-vs-stack-walking.html"&gt;C# compiler team was not satisfied with its performance&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;A usable Trace Api might therefore look like &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;enum&lt;/span&gt; TraceTypes
    {&lt;/pre&gt;

&lt;pre class="csharpcode"&gt;        None = 0,
        EnterLeave = 1 &amp;lt;&amp;lt; 0,
        Info = 1 &amp;lt;&amp;lt; 1,
        Warn = 1 &amp;lt;&amp;lt; 2,
        Error = 1 &amp;lt;&amp;lt; 3
    }

    &lt;span class="kwrd"&gt;class&lt;/span&gt; Tracer : IDisposable
    {
        &lt;span class="kwrd"&gt;string&lt;/span&gt; Type;
        &lt;span class="kwrd"&gt;string&lt;/span&gt; Method;
            
        &lt;span class="kwrd"&gt;public&lt;/span&gt; Tracer(&lt;span class="kwrd"&gt;string&lt;/span&gt; type, &lt;span class="kwrd"&gt;string&lt;/span&gt; method)
        {
            Type = type;
            Method = method;
            &lt;span class="kwrd"&gt;if&lt;/span&gt; (IsEnabled(TraceTypes.EnterLeave,Type, Method))
            {
            }
        }

        &lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; IsEnabled(TraceTypes traceTypes, &lt;span class="kwrd"&gt;string&lt;/span&gt; Type, &lt;span class="kwrd"&gt;string&lt;/span&gt; Method)
        {
           &lt;span class="rem"&gt;// Do checking here if tracing is enabled&lt;/span&gt;
            &lt;span class="kwrd"&gt;return&lt;/span&gt; &lt;span class="kwrd"&gt;false&lt;/span&gt;;
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Info(&lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Warn(&lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Error(&lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Info(&lt;span class="kwrd"&gt;string&lt;/span&gt; type, &lt;span class="kwrd"&gt;string&lt;/span&gt; method, &lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Warn(&lt;span class="kwrd"&gt;string&lt;/span&gt; type, &lt;span class="kwrd"&gt;string&lt;/span&gt; method, &lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Error(&lt;span class="kwrd"&gt;string&lt;/span&gt; type, &lt;span class="kwrd"&gt;string&lt;/span&gt; method, &lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Dispose()
        {
            &lt;span class="rem"&gt;// trace method leave&lt;/span&gt;
        }
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[

.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;This minimal trace API is very fast but hard to maintain since you need to pass in the type and method name as hard coded strings which can change from time to time. But now we have at least CallerMemberName to rid of the explicit method parameter right? Not really. Since any acceptable usable trace Api should have a method signature like Tracexxx(… string fmt, params [] object args) we not able to add additional optional parameters after the args array. If we would put it before the format string we would need to make it optional as well which would mean the compiler would need to figure out what our trace message and arguments are (not likely) or we would need to specify everything explicitly just like before .&lt;/p&gt;

&lt;p&gt;There are ways around this by providing a myriad of overloads which in the end are routed to the very same method but that is ugly. I am not sure if nobody inside MS agrees that the above API is reasonable to have or (more likely) that the whole talk about you can use this feature for diagnostic purposes was not a core feature at all but a simple byproduct of making the life of INotifyPropertyChanged implementers easier. &lt;/p&gt;

&lt;p&gt;A way around this would be to allow for variable argument arrays after the params keyword another set of optional arguments which are always filled by the compiler but I do not know if this is an easy one.&lt;/p&gt;

&lt;p&gt;The thing I am missing much more is the not provided CallerType attribute. But not in the way you would think of. In the API above I did add some filtering based on method and type to stay as fast as possible for types where tracing is not enabled at all. It should be no more expensive than an additional method call and a bool variable check if tracing for this type is enabled at all. The data is tightly bound to the calling type and method and should therefore become part of the static type instance. Since extending the CLR type system for tracing is not something I do expect to happen I have come up with an alternative approach which allows me basically to attach run time data to any existing type object in super fast way. The key to success is the usage of generics.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;class&lt;/span&gt; Tracer&amp;lt;T&amp;gt; : IDisposable
    {
        &lt;span class="kwrd"&gt;string&lt;/span&gt; Method;

        &lt;span class="kwrd"&gt;public&lt;/span&gt; Tracer(&lt;span class="kwrd"&gt;string&lt;/span&gt; method)
        {
            &lt;span class="kwrd"&gt;if&lt;/span&gt; (TraceData&amp;lt;T&amp;gt;.Instance.Enabled.HasFlag(TraceTypes.EnterLeave))
            {
            }
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Dispose()
        {
            &lt;span class="kwrd"&gt;if&lt;/span&gt; (TraceData&amp;lt;T&amp;gt;.Instance.Enabled.HasFlag(TraceTypes.EnterLeave))
            {
            }
        }

        &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Info(&lt;span class="kwrd"&gt;string&lt;/span&gt; fmt, &lt;span class="kwrd"&gt;params&lt;/span&gt; &lt;span class="kwrd"&gt;object&lt;/span&gt;[] args)
        {
        }

        &lt;span class="rem"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// Every type gets its own instance with a fresh set of variables to describe the &lt;/span&gt;
        &lt;span class="rem"&gt;/// current filter status.&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
        &lt;span class="rem"&gt;/// &amp;lt;typeparam name="T"&amp;gt;&amp;lt;/typeparam&amp;gt;&lt;/span&gt;
        &lt;span class="kwrd"&gt;internal&lt;/span&gt; &lt;span class="kwrd"&gt;class&lt;/span&gt; TraceData&amp;lt;UsingType&amp;gt;
        {
            &lt;span class="kwrd"&gt;internal&lt;/span&gt; &lt;span class="kwrd"&gt;static&lt;/span&gt; TraceData&amp;lt;UsingType&amp;gt; Instance = &lt;span class="kwrd"&gt;new&lt;/span&gt; TraceData&amp;lt;UsingType&amp;gt;();

            &lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;bool&lt;/span&gt; IsInitialized = &lt;span class="kwrd"&gt;false&lt;/span&gt;; &lt;span class="rem"&gt;// flag if we need to reinit the trace data in case of reconfigured trace settings at runtime&lt;/span&gt;
            &lt;span class="kwrd"&gt;public&lt;/span&gt; TraceTypes Enabled = TraceTypes.None; &lt;span class="rem"&gt;// Enabled trace levels for this type&lt;/span&gt;
        }
    }&lt;/pre&gt;

&lt;p&gt;&lt;/p&gt;&lt;style type="text/css"&gt;&lt;![CDATA[

.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;We do not need to pass the type as string or Type object to the trace Api. Instead we define a generic Api that accepts the using type as generic parameter. Then we can create a TraceData static instance which is due to the nature of generics a fresh instance for every new type parameter. My tests on my home machine have shown that this approach is as fast as a simple bool flag check. If you have an application with many types using tracing you do not want to bring the app down by simply enabling tracing for one special rarely used type. The trace filter performance for the types which are not enabled must be therefore the fasted code path. This approach has the nice side effect that if you store the TraceData instances in one global list you can reconfigure tracing at runtime safely by simply setting the IsInitialized flag to false. A similar effect can be achieved with a global static Dictionary&amp;lt;Type,TraceData&amp;gt; object but big hash tables have random memory access semantics which is bad for cache locality and you always need to pay for the lookup which involves hash code generation, equality check and an indexed array access.

&lt;p&gt;The generic version is wicked fast and allows you to add more features to your tracing Api with minimal perf overhead. But it is cumbersome to write the generic type argument always explicitly and worse if you do refactor code and move parts of it to other classes it might be that you cannot configure tracing correctly. I would like therefore to decorate my type with an attribute&lt;/p&gt;

&lt;pre class="csharpcode"&gt;    [CallerType]
    &lt;span class="kwrd"&gt;class&lt;/span&gt; Tracer&amp;lt;T&amp;gt; : IDisposable
    &lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;to tell the compiler to fill in the generic type argument automatically.&lt;/p&gt;

&lt;pre class="csharpcode"&gt;    &lt;span class="kwrd"&gt;class&lt;/span&gt; Program
    {
        &lt;span class="kwrd"&gt;static&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main(&lt;span class="kwrd"&gt;string&lt;/span&gt;[] args)
        {
            &lt;span class="kwrd"&gt;using&lt;/span&gt; (var t = &lt;span class="kwrd"&gt;new&lt;/span&gt; Tracer()) &lt;span class="rem"&gt;// equivalent to new Tracer&amp;lt;Program&amp;gt;()&lt;/span&gt;
            {&lt;/pre&gt;

&lt;p&gt;&lt;/p&gt;&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;That would be really useful and super fast since you do not need to pass any type object around but you do have full type infos at hand. This change would be breaking if another non generic type exists in the same namespace where now the generic counterpart would be preferred. But this is an acceptable risk in my opinion since you can today already get conflicts if two generic types of the same name are defined in different namespaces. This would be only a variation of this issue.&lt;/p&gt;

&lt;p&gt;When you do think about this further you can add more features like to trace the exception in your Dispose method if the method is left with an exception with that little trick I did &lt;a href="http://geekswithblogs.net/akraus1/archive/2010/07/04/140761.aspx"&gt;write some time ago&lt;/a&gt;. You can think of tracing as a super fast and configurable switch to write data to an output destination or to execute alternative actions. With such an infrastructure you can e.g.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reconfigure tracing at run time.&lt;/li&gt;

  &lt;li&gt;Take a memory dump when a specific method is left with a specific exception.&lt;/li&gt;

  &lt;li&gt;Throw an exception when a specific trace statement is hit (useful for testing error conditions).&lt;/li&gt;

  &lt;li&gt;Execute a passed delegate which e.g. dumps additional state when enabled. &lt;/li&gt;

  &lt;li&gt;Write data to an in memory ring buffer and dump it when specific events do occur (e.g. method is left with an exception, triggered from outside).&lt;/li&gt;

  &lt;li&gt;Write data to an output device.&lt;/li&gt;

  &lt;li&gt;….&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stuff is really useful to have when your code is in production on a mission critical server and you need to find the root cause of sporadic crashes of your application. It could be a buggy graphics card driver which throws access violations into your application (ok with .NET 4 not anymore except if you enable a compatibility flag) where you would like to have a minidump or you have reached after two weeks of operation a state where you need a full memory dump at a specific point in time in the middle of an transaction.&lt;/p&gt;

&lt;p&gt;At my older machine I do get with this super fast approach 50 million traces/s when tracing is disabled. When I do know that tracing is enabled for this type I can walk the stack by using StackFrameHelper.GetStackFramesInternal to check further if a specific action or output device is configured for this method which is about 2-3 times faster than the regular StackTrace class. Even with one String.Format I am down to 3 million traces/s so performance is not so important anymore since I do want to do something now. &lt;/p&gt;

&lt;p&gt;The CallerMemberName feature of the C# 5 compiler is nice but I would have preferred to get direct access to the MethodHandle and not to the stringified version of it. But I really would like to see a CallerType attribute implemented to fill in the generic type argument of the call site to augment the static CLR type data with run time data.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt; &lt;/p&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt; &lt;img src="http://geekswithblogs.net/akraus1/aggbug/149264.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Alois Kraus</dc:creator>
            <guid>http://geekswithblogs.net/akraus1/archive/2012/04/09/149264.aspx</guid>
            <pubDate>Mon, 09 Apr 2012 12:15:59 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/akraus1/comments/149264.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/akraus1/archive/2012/04/09/149264.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://geekswithblogs.net/akraus1/comments/commentRss/149264.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/akraus1/services/trackbacks/149264.aspx</trackback:ping>
        </item>
    </channel>
</rss>