Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  133 Posts | 8 Stories | 366 Comments | 162 Trackbacks

News



Archives

Post Categories

Programming

I have found Bruce Dawson blog where he discusses the uses of Flame Graphs. Brendan Greg a Linux performance tool developer has created a nice perl script to render call stacks from a plain text file as a svg which can be viewed in all modern browsers. Bruce has used xperf with some undocumented switches to create a text extract of all call stacks. Then he used a python script to post process the generated file to make it compatible with the flamegraph perl script. I decided to make my own managed version of stack compressor which employs the TraceEvent library from Vance Morrison which gives you a powerful etl file parser. Below is an example of the generated graph after the perl script did post process it.

image

The x axis has no order but the width of the methods is proportional to the amount of used CPU time. The vertical axis shows the stack depth. If you hover with your mouse over the methods you get the full method name displayed. This is quite handy to get a different view of deeply nested stacks with many threads and or processes involved. The graph here makes it obvious that opening a file which is quite costly but closing the FileStream is also not cheap. Reading from the file is even cheaper than opening or closing it! This was a small sample app that did open and close one small file in an endless loop to get some interesting call stacks.

My TraceParser executable does take an etl file and prints out the call stacks from all processes from the complete etl file. You can limit the time range as well as the time range and a substring match which stacks are you interested in. To get from process 5200 all stacks you need to do:

TraceParser.exe C:\Users\Alois\AppData\Local\Temp\combined.etl * * 5200 > stacks.txt

flamegraph.pl stacks.txt > stacks.svg

It did take some time to get used to the TraceEvent library which is very powerful but also not so easy to use. I was constantly struggling with incomplete call stacks until I found that the default symbol resolver does only resolve some kernel symbols instead of everything.

 public TraceLogOptions()
        {
            // These are the default modules to look up symbolically.  
            ShouldResolveSymbols = delegate(string moduleFilePath)
            {
                string moduleName = Path.GetFileNameWithoutExtension(moduleFilePath);
                if (string.Compare(moduleName, "ntdll") == 0 ||
                    string.Compare(moduleName, "kernel32") == 0 ||
                    string.Compare(moduleName, "ntkrnlpa") == 0 ||
                    string.Compare(moduleName, "ntoskrnl") == 0)
                    return true;
                return false;
            };
        }

This was definitely not what I wanted. But ShouldResolveSymbols = (s) => true did the job. If you have problems with symbol loading you need to uncomment 

ConversionLog = Console.Out

to see file and symbol loading in action on the console. I do read from an etl file only the profile events which do contain a full call stack of a specific process and print the call stack with a weight of one to the console. When you download TraceEvent from Codeplex you need to get msdia100.dll for your target platform. You can make your life easy and simply copy the one provided by PerfView which is unpacked into your user profile at e.g. %USERPROFILE%\AppData\Roaming\PerfMonitor\VER.2013-01-09.06.49.30.000\x86 or the one provided by Visual Studio for x86. You need to deploy it besides TraceEvent.dll to enable it to pick it up.

Initially I did try to extract the stacks for a specific process with

var log = TraceLog.OpenOrConvert(file,
    new TraceLogOptions
    {
        AlwaysResolveSymbols = false,
        //   ConversionLog = Console.Out,
        ShouldResolveSymbols = (s) => true,
    });
Log = log;
var source = log.CPUStacks(null, false, (ev) => ev.ProcessId = xxxx);
source.ProduceSamples( (Diagnostics.Tracing.StackSources.StackSourceSample stack) =>
        {
            if (stack.StackIndex > 0 && (int)stack.StackIndex < Log.CallStacks.MaxCallStackIndex)
            {
                DumpCallStack(Log.CallStacks[(CallStackIndex)stack.StackIndex]);
            }
        });

But this did not filter the call stacks for a specific process out. I am not sure if I did miss anything but the many different enum types which are convertible or not into each other make it rather hard to switch from one object model (StackSample) to the CallStacks collection. In the end I did the filtering all by myself and it did work out quite nicely. Below is the final result of my experiment. The next step would be to visualize non profile events as well like context switch events. I am not sure yet how I can calculate from a context switch event (CSwitchTraceData) the CPU consumption. I only do see the wait time but what about the other time? The Flame graphs do provide a nice way to visualize CPU consumption across many call stacks at once. This could help in performance regression tests to identify changes in application logic which does consume more CPU than before. Wait times are unfortunately not captured but I am not sure if this would be very useful. With TPL we do tend to get quite wasteful with threads where many worker threads are waiting for work to arrive. When I visualize the wait times then I do see mainly the idle worker threads which is not useful.

What could be useful is to visualize the thread interaction and with ready events across the timeline. In principle it should be possible to create a logic chain why thread x did sleep for x ms and which thread did wake it up and to trace in the other thread back who did wake it up to get a sophisticated wait chain analysis how the threads do interact with each other. Not sure how reliable this would be but it certainly could be interesting. When one thread holds a lock and releases it while another thread was waiting for it it is immediately scheduled for execution by the OS scheduler. The likelihood that the ReadyThread is really the one which did wake my thread up is quite good but we would need to employ some other heuristics (like wait time and other hints) to see if we could print a thread interaction chart. The best thing would be of course that stuff like Flame Graphs and thread interaction visualization is already backed into WPT so we can all use it.

class Program
{
    TraceLog Log;
    int MinMs = 0;
    int MaxMs = int.MaxValue;
    int[] Pids = null;
    string[] Args;
    string CallStackSubstring = null;

    static void Main(string[] Args)
    {
        new Program(Args).Run();
    }

    public Program(string[] Args)
    {
        this.Args = Args;
    }

    private void Run()
    {
        if (Args.Length == 0)
        {
            Console.WriteLine("(c) by Alois 2013");
            Console.WriteLine("TraceParser does parse the full call stacks of all profile events to stdout which can be consumed");
            Console.WriteLine("by flamegraph.pl to generate a .svg file which can be rederend in the browser.");
            Console.WriteLine("No etl file specified.");
            Console.WriteLine("Usage: TraceParser <etl file> <startInMs> <stopInMs> <pid> <callStackSubstring>");
            Console.WriteLine("        startInMs, stopInMs and Pid can be replaced by * to not set any specific value for it");
            return;
        }

        string file = Args[0];
        if (Args.Length > 1)
        {
            if (Args[1] != "*")
            {
                MinMs = int.Parse(Args[1]);
            }
        }
        if (Args.Length > 2)
        {
            if (Args[2] != "*")
            {
                MaxMs = int.Parse(Args[2]);
            }
        }
        if (Args.Length > 3)
        {
            if (Args[3] != "*")
            {
                Pids = Args[3].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
                                .Select(int.Parse)
                                .ToArray();
            }
        }
        if (Args.Length > 4)
        {
            CallStackSubstring = Args[4];
        }

        Symbols.SymPath._NT_SYMBOL_PATH = @"c:\windows\symbols;C:\symbols;" + file + ".NGENPDB";
        var log = TraceLog.OpenOrConvert(file,
            new TraceLogOptions
            {
                AlwaysResolveSymbols = false,
                //   ConversionLog = Console.Out,
                ShouldResolveSymbols = (s) => true,
            });
        Log = log;


        if (Pids == null)
        {
            Pids = log.Processes.Select(p => p.ProcessID).ToArray();
        }

        foreach (var pid in Pids)
        {
                foreach(var ev in log.Processes.Single(x => x.ProcessID == pid)
                                                .EventsInProcess
                                                .ByEventType<SampledProfileTraceData>()
                                                .Where(ev => ev.TimeStampRelativeMSec > MinMs && ev.TimeStampRelativeMSec < MaxMs))
                {
                    DumpCallStack(Log.CallStacks[Log.GetCallStackIndexForEvent(ev)]);
                }
        }
    }

    private void DumpCallStack(TraceCallStack traceCallStack)
    {
        if (traceCallStack == null)
        {
            return;
        }

        TraceCallStack current = traceCallStack;

        List<string> stacks = new List<string>();
        while (current != null)
        {
            if (current.CodeAddress != null)
            {
                // do not step into the kernel where drivers with .sys are on the stack
                if (current.CodeAddress.ModuleFileName.EndsWith(".exe") ||
                    current.CodeAddress.ModuleFileName.EndsWith(".dll"))
                {
                    stacks.Add(String.Format("{0}!{1}", current.CodeAddress.ModuleName, current.CodeAddress.FullMethodName));
                }
            }

            current = current.Caller;
        }

        if (this.CallStackSubstring != null)
        {
            // clear all stacks which do not contain our code 
            if (!stacks.Any(x => x.Contains(CallStackSubstring)))
            {
                stacks.Clear();
            }
        }

        stacks.Reverse();

        if (stacks.Count > 0)
        {
            Console.WriteLine();
            Console.Write(String.Join(";", stacks));
            Console.Write(" 1");
        }
    }
}

 

posted on Monday, June 10, 2013 9:02 AM

Feedback

# re: Visualize Your Callstacks Via Flame Graphs 9/2/2014 8:51 PM ronin4net
Coooooooooooooooool! :)

Post A Comment
Title:
Name:
Email:
Comment:
Verification: