Close this search box.

Windows Performance Toolkit for Windows 10

All are talking about Windows 10 but what about the developer Tools? Since 4/20/2015 you can also download the beta of the upcoming Visual Studio 2015 and Windows / Phone SDKs. This includes also a new version of the (at least for me) long awaited Windows Performance Analyzer. You can get the ISO image here:

From there you can download the iso file for the tools. You do not need to install Visual Studio CTP for this.  Now mount the iso files and install for x64 Windows versions (x86 is also there)

\packages\OneSDK\Windows App Certification Kit x64-x86_en-us.msi

This will put the new Windows Performance Toolkit under

C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit

You do not need Windows 10 to be able to install the toolkit beta. You can install the kit at least on Win 8.1 and (not tested) also on Win7 machines.

So what’s inside? As far as I can tell the changes are not too dramatic but some nice improvements are there.


Added option to -injectonly -merge

 -injectonly                  Do not copy (merge) events, only inject 
                                 image identification information. Cannot be 
                                 used with -mergeonly.

What is the meaning of this? You can extract from a kernel trace file the events into a separate file and then merge a user etl file which will then become readable with WPA (albeit the process names do not show up).

This will generate the extract of the kernel etl file:

xperf -merge  c:\temp\LowLatencyGCEvents.kernel.etl c:\temp\lowlatnetinjected.etl -injectonly

Now you  can merge it with a user mode session etl file without ending up with a multi GB etl file if the user mode session contains only a few interesting events in which you are interested in.

xperf -merge  c:\temp\LowLatencyGCEvents.etl c:\temp\lowlatnetinjected.etl c:\temp\mergedWithImage.etl

Besides this there are no new features to xperf that are worth noting.


Now you can open the enriched user mode etl file with WPA and finally you can read the events:

Which new features do you spot in the picture above?

  • Colors! You can select the colors you want for your graphs.
  • You can graph numerical data from custom events now!
    • This works currently only to some extent. I hope it will get better in the released version.
    • The main issue is that it cannot graph hex values which have defined in their event template as outType hex values:

<template tid=”GCHeapStats_V1″>
    <data name=”GenerationSize0″ inType=”win:UInt64″ outType=”win:HexInt64″/>

Another issue that I would like the possibility to graph a running value as line chart (e.g. memory usage, durations, …). Currently it is displayed only as a spike in the line diagram. Ok I can use bar charts but the line diagram type looks not too useful to me since I never know how many events are hiding behind this bar. Yes I can hover over the bar and press Alt+Space (also a new feature) to see more information about the graphed values.

Finally you can add at least for generic events in the View Editor a DateTime (Local) column. This is a first step into the right direction. But why on earth is it still not possible to switch on local time for the x axis? This would really help when I spot an issue and I can look at the clock, write down the time, save the trace and continue with the tests. Later I can look at the gathered data and the timepoints from my notes to see if something suspicious was going on. This is even more annoying for long term recordings (hours or even days) where the customer tells you that around 2 o`clock something bad did happen. Currently I have to look into Trace – System Configuration – Traces –  Start/End (UTC) Time to calculate from the seconds since trace start and the reported time the actual time window I should take a look at.

Other Features

Symbol Loading was significantly improved. Now you can deselect symbol locations you do not want via a checkbox instead of manually editing a long string. That is good thing although I miss the automatic heuristic to add from an etl file to add xxx.etl.NGenPDB to the symbol path if the folder exists.

The symbol settings are remembered which is good. What I do not like is that although I have disabled the MS symbol server it tries to load missing symbols still from MS when I select to load the symbols automatically after trace load.

When this is enabled there is no way to cancel symbol load after it has started and reconfigure it. This is not the best solution for me because it can take a long time until all symbols are downloaded. I have not tried to check out what will happen on a machine with no internet connection. The old WPA had a cancel button which did work after a long time but at least it was possible to cancel the symbol download.

Custom stack tag files can now be added. Before that you needed to overwrite the default.stacktags file in the folder where WPA was installed.  It was easy but I still felt dirty when I did overwrite the file.

If you have not used stack tags yet you should definitely have a look at them because if you see the same issue over and over again you can add this method to your stack tag file and add it under a problem node. When the problem node gets much CPU you can spot the “old” issue within seconds without the need to drill deep into some call stacks.

What is still on my Wish List?

  • Path Tree as in File IO as in Disc IO table.
  • Add a process command line as column in CPU Usage graphs so I can see which generic process caused which CPU consumption.
  • Some help with Wait Chain analysis. I have seen in seen in some WPA video this option but it seems not to be published.
  • WPA still knows nothing about managed code. Basic GC metrics as in PerfView should be part of it!
  • Region files where the SelfNest feature works as expected. Ideally I would like to create a “stack trace” like structure from my custom traces as regions.
    • Currently WPA seems to ignore the TID option and uses for method enter/leave traces simply the nearest trace in the timeline which gives garbage call stacks.
  • Improved Stack Tagging
    • I want to see for expensive methods how expensive they really are. This includes GC overhead. Currently this is grouped extra under CLR – GC because the stack tags “steal” from other methods if they have a better matching call stack. The exact logic is not obvious but it could be something like a match counter. The rule with most matches wins. It would be nice to add some nesting to stacktags so I can visualize the GC overhead of specific known expensive methods.

My Stacktag file looks like this one:

<Tag Name="MyApplication">
    <Tag Name="Main">
        <Entrypoint Module="LowLatencyGC.exe" Method="*Program::Main*"/>
        <Tag Name="GC">
            <Entrypoint Module="clr.dll" Method="*GC*"/>

  <Tag Name="CLR"> 
    <Tag Name="GC">
      <Entrypoint Module="clr.dll" Method="*GC*"/>

I want to group all methods which call Main under MyApplication and then factor out the GC overhead. For all other methods the generic CLR GC node should match so I can see for specific methods and the “other” GC overhead nicely lined up. This is currently not possible. Perhaps a IsChild=”true” attribute which then alters the tagging logic would do the trick?

That’s all for now. I am sure I have missed some new features like the Analysis Assistant which currently looks like some scratch pad for my notes. Perhaps there will show up in a later build wait chain traversal? Lets keep the fingers crossed that the existing bugs of the Beta get fixed and we get an even better tool which is already awesome.

  • Share This Post:
  • Twitter
  • Facebbok
  • Technorati
  •   Short Url:
This article is part of the GWB Archives. Original Author: Alois Kraus blog

Related Posts