Alois Kraus


  Home  |   Contact  |   Syndication    |   Login
  141 Posts | 8 Stories | 357 Comments | 162 Trackbacks



Post Categories


Did you ever see a nice tool with a fancy UI and thought: Hey that thing is powerful. I will try it! But later you were left back scratching your head why other can use this tool and get great results but you just don't get it?

Then I have news for you I have created a WPA profile which aims at user and not kernel mode developers. Many columns were removed from the context menus to give you all the power you need to find issues in your application

where no one else has found the root cause.

The ETW profile can be downloaded from here as a zip file. Unpack it to a directory and you are ready to go. So what is inside it?


File Description
Simple.wpaProfile The main WPA profile you can use now.
JIT.xml WPA region file referenced by Simple.wpaProfile to get JIT times of your application like PerfView.
GC.xml WPA region file referenced by Simple.wpaProfile to get Garbage Collection metrics like in PerfView only better!
HookRegions.xml WPA region file referenced by Simple.wpaProfile to see mouse clicks in your ETL file when ETWControler is running and capturing your keyboard and mouse events.
default.stacktags Stacktag file which serves as base for your application to find common .NET issues like event handle leaks, too much regex usage, monitor contention and much more.


Besides this there are also some WPR profiles added to enable recording of specific .NET events like exceptions together with GC events into a larger 1 GB buffer.

WPR of Windows 10 has under Scenario Analysis .NET Activity which has a very small 160MB ring buffer which is way to small for my needs. It might be good

for you but I have added extra WPR profiles for GC and JIT as well.

To show you the difference I have created a small PerformanceIssueGenerator.exe application. This generates various issues which you can analyze with WPA.

I have recorded the data already and put the 7z file here. If you want to extract it you need to download the 7z from the official site. To view the ETL file

you need a >= Windows 8 machine and Windows Performance Toolkit from the Windows 10 SDK.


When you have downloaded the profile and the sample etl file you can apply the profile under Profiles - Apply…


Then you get two tabs. The first one contains .NET specific analysis stuff like GC and JIT. The second tab can be used for the usual bottleneck analysis regarding

CPU, disk and memory usage as well as wait chain analysis. You do not need to load the profile every time. You can save it as your default profile

by clicking on Save Startup Profile to open all of your ETL files in this view now.


Normally I use PerfView for GC issues to check out the GC Stats of an application to see if anything significant is going on. If you have ever used PerfView then

you will have noticed that it is a great tool combined with a crappy UI. At least for some key scenarios we can now use WPA with hand crafted region files instead of PerfView.


GC Views

You now get a GC view like this


You can visualize each GC generation type and its time it did take to execute. The Count column also tells you how many GCs you did have. This can help a lot if you

want to get GC metrics only for a specific time region in your application. Now it is easy to see how much time of your use case was available to your application

and how much time your threads did have to wait for the garbage collector. This was my main issue with PerfView and its GCStats view that it is

calculated for the complete duration of the profiling run. Most of the time I want GC metrics only for specific parts of my application because

I am optimizing mostly only at a single place at one time.

Here is how you get the GC statistics in PerfView:




The numbers in GC Rollup By Generation match pretty well to the region file. Also the GC pause time correlate quite well with the distribution of generations although the timings

are not 100% the same but the ratios are a good fit.


Since PerfView and ETW use the same data you can rightfully ask the question why there is a difference at all? The answer is that WPA sums not up all GC regions

by their duration. A WPA region is defined by a start and a stop event which is the displayed in the UI like above. But if the regions happen in parallel in several threads WPA will use as sum

only the time where at least one region was active at a time. This diagram illustrates how WPA region summation works:


This detail is for GC things not so relevant but it will be very important when we look at JIT statistics. Since a foreground GC is blocking your complete application there is little

concurrency going on. We can also visualize when GCs of which types are happening. The view Generic Events GC Generations will show you how many GCs of which type were happening.

Initially it is collapsed so you have to open it. This view takes a little while to build up since it needs to parse all ETW events for GC Start events which contain as data point the generation number which is

visualized here.


This is the greatest thing about WPA of Windows 10. Now you can visualize the data of your custom traces. With a little WPA graph configuration metric you can create for your own application key graphs

which show e.g. Frames per seconds, # of db transactions, … Who needs performance counters when you can visualize your trace data much better now?

Marker Events

A key point to know where you want to start looking for an issue are Marks. In WPRUI you can press Ctrl+Win+x to open a window which allows you to write a marker event to your trace file

while you are recording data. Unfortunately this functionality is not exposed to the outside world to set marks programmatically. If you revert to xperf you can set with xperf -m marks programmatically

if you wish to. But since marks are so useful to navigate in an ETL file I really wonder why the method which xperf calls is not documented at all. If you want to know how real power users

are using WPA then you need to call wpa /?


I guess at Microsoft they let their tests run with enabled profiling while saving screenshots as ETW events. If something happens the ETL file is downloaded

from a central repository and the ETL file is opened with a test specific profile. The file is opened and zoomed into the relevant test part which are identified by markers

or regions from a region file. At least some people at Microsoft use this tool so often that it makes sense to automate it even further since the controls of WPA are UI automatable to

script nearly your complete analysis. Now THAT is what I call good performance regression testing.

A little reverse engineering which APIs xperf -m calls did finally show up the method EtwSetMark which is exported by ntdll.dll. I am sure the readers of my blog can figure out the rest as well.

But I really wish it would become an officially supported API since it is so useful. Sure you can define your own marker events as well but since the support in WPA is already built in

it would really help. It would also be nice if xperf would emit the mark event no only to the "NT Kernel Logger" ETW session but all active kernel session so you could mark also the WPA kernel session

which is currently not possible.


JIT View

PerfView can also give you the timings how much time each method did take to compile. This is useful if your are compiling too much code on the fly for your WCF proxies or your serializers. Sure it is done only once

but if this time plays a significant part of your application startup you should rethink how your application works. Since .NET 4.5 will NGen all assemblies on the fly if they are used often enough you do not need to

consider using NGen explicitly for your application. But if you have much dynamic code generation going on you can still suffer from long JIT times.



You can get a similar view by switching from Garbage Collection to JIT Time per Thread


As I have shown you in the GC section. The summation of regions is not a simple sum of the duration of all JIT events. The JIT compiler can compile code in many different threads concurrently.

The Duration sum of all threads of JIT time reported by WPA is therefore largely irrelevant if you want to compare two different runs of your application. Instead you need to look at the JIT times

of each thread. You can copy the duration column with the context menu "Copy Column Selection" into Excel



which gives us the exact same JIT time as Perfview. Now I do not need PerfView for JIT analysis anymore. I am more and more getting away from programming to configuring WPA to give me just the view I need.

If you expand JITs you get at least the namespace of each JIT event. This is the best I could come up with since WPA does not support concatenating  strings of different fields into a region name.


But you still can open the also customized view _Activity by Provider Task Process to view the "raw" data if you need more information. If you would drop e.g. Field 5 to the left of the yellow bar you would get a nice summary how

many methods in this namespace were compiled.


As you can guess there are lots of more goodies inside the now published profiles to make WPA really usable for non kernel developers. I have streamlined nearly every context menu where I removed all useless or kernel only columns

from the context menus. You have a much easier time now to concentrate on the important things. With this profile WPA is nearly a new application and central analysis station for managed and unmanaged performance analysis.

In the next posts I will walk you through the problems of PerformanceIssuesGenerator and how you can visualize them in WPA efficiently.

posted on Sunday, August 16, 2015 10:00 AM