Alois Kraus

December 2008 Entries

Measure Performance With Stopwatch

I do performance measurements quite regularly which involves to call a piece of code several times to measure how long it did take. I am sure nearly everybody has done this already. But as a physicist I know that (nearly) every measurement has fundamental problems which never go away. Key to a successful measurement is that you exactly know what you are measuring and not what you think you measure. The easiest way to get out of this dilemma without too much knowledge is to simply ignore the fact that you don´t know enough and restrict yourself to pure differential measurements. With differential I do mean that you measure it once, change the code in a way that makes it better and measure it again. When you measure different times then you can assume that your code change was the cause for the different timing. That does work to some extent quite well but you should always be vigilant when the results change dramatically from one build to the next. There is no error I did not make with performance measurements so I think it is safe to give away my top 101:

  1. DateTime.Now has a resolution of ~16ms. If you measure anything faster you must use Stopwatch which has a resolution of about 100ns.
  2. With Windows Vista you need to set your Power options to maximum performance to prevent the OS from change the CPU clock frequency at random times.
  3. Warm and cold start times are way different. Usually a factor of 2-6 is quite normal. With cold startup I mean that you have a fresh booted system which did have never run your test case. This cold startup time does mainly contribute to disk IO which you can watch with XPerf very nicely.
  4. Be sure that you measure the right thing. I cannot emphasize this enough. Did you really measure the right thing?
  5. Know your influencers. You need to get a good understanding how much e.g. number of iterations, input data size, concurrency, other applications can affect the outcome. I had more than once the case that I did execute a test in the evening and then next morning. But the results did differ by a factor two for no apparent reason.
  6. Debug and Release builds still differ in the managed world. Although the difference is much smaller than it was with C++.
  7. First time initialization effects should be measured separately. When you initialize a class and call some method 10 million times it makes not much sense to mix the ctor call time with pure function calling time.
  8. Do measure long enough to get reliable results. The mean value is a powerful ally to flatten sporadic effects.
  9. If you get strange disk IO reports check if a virus scanner does interfere.
  10. Other processes might be running as well. These might influence your test if they consume significant resources.
  11. Stupid but happens: Turn off tracing before you measure.

You still want to make some quick throw away measurements? You have been warned. It is a tricky thing. To get at least the timing calculation right I did create some extension methods I want to share there. With these extensions you can create an Action delegate and call the new Profile method on it to execute it n-times. To give you full control over the string formatting you can supply a format string which expands {runs}, {time} and {frequency} in a human readable way for you.

            Action func = () =>

            {

                using (File.OpenRead(@"C:\config.sys"))

                {

                }

            };

 

            func.Profile(1000, "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That little snippet produces on the console

Did open the file 1 000 times in 0.06s. Can do 16 667 File.Open/s

Since we know that the first call to File.Open will involve actual disk accesses which are much slower we want to measure the first call independently to see the "cold" startup performance as well:

 

func.ProfileWithFirst(1000, "First File Open did take {time}s", "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That will give us

First File Open did take 0.00s
Did open the file 999 times in 0.05s. Can do 20 388 File.Open/s

There are also times when you want to control the number of iterations by yourself. Nothing easier than that. Simply change the delegate type from Action to Func<int> and you are ready to go.

 

            Func<int> myFunc = () =>

                {

                    const int Runs = 1000;

                    for (int i = 0; i < Runs; i++)

                    {

                        using (File.OpenRead(@"C:\config.sys")) ;

                        using (File.OpenRead(@"C:\autoexec.bat")) ;

                    }

 

                    return Runs * 2;

                };

 

            myFunc.Profile("Did open {runs} files in {time}s");

That was almost too easy. But what if I do not like the formatting? Well it turns out you can customize it still. The format string placeholders {0} {1} and {2} are the number of runs, elapsed time in seconds  as float and call frequency (runs/s)  as float. They can be used as usual to customize the output format to your specific needs.

myFunc.Profile("Did open {runs} files in {1:F5}s");

Did open 2 000 files in 0,10000s

The actual class is written in C# 3.0 with lambda expressions. The usage of this rather low level primitives gives you a rather surprising amount of flexibility and composability to build higher level functions.

Here is the source code for the Action and Func<int> delegate extension methods:

 

using System;

using System.Globalization;

using System.Diagnostics;

 

namespace PerformanceTester

{

    /// <summary>

    /// Helper class to print out performance related data like number of runs, elapsed time and frequency

    /// </summary>

    public static class Extension

    {

        static NumberFormatInfo myNumberFormat;

 

        static NumberFormatInfo NumberFormat

        {

            get

            {

                if (myNumberFormat == null)

                {

                    var local = new CultureInfo("en-us", false).NumberFormat;

                    local.NumberGroupSeparator = " "; // set space as thousand separator

                    myNumberFormat = local; // make a thread safe assignment with a fully initialized variable

                }

                return myNumberFormat;

            }

        }

 

        /// <summary>

        /// Execute the given function and print the elapsed time to the console.

        /// </summary>

        /// <param name="func">Function that returns the number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Func<int> func, string format)

        {

 

            Stopwatch watch = Stopwatch.StartNew();

            int runs = func();  // Execute function and get number of iterations back

            watch.Stop();

 

            string replacedFormat = format.Replace("{runs}", "{3}")

                                          .Replace("{time}", "{4}")

                                          .Replace("{frequency}", "{5}");

 

            // get elapsed time back

            float sec = watch.ElapsedMilliseconds / 1000.0f;

            float frequency = runs / sec; // calculate frequency of the operation in question

 

            try

            {

                Console.WriteLine(replacedFormat,

                                    runs,  // {0} is the number of runs

                                    sec,   // {1} is the elapsed time as float

                                    frequency, // {2} is the call frequency as float

                                    runs.ToString("N0", NumberFormat),  // Expanded token {runs} is formatted with thousand separators

                                    sec.ToString("F2", NumberFormat),   // expanded token {time} is formatted as float in seconds with two digits precision

                                    frequency.ToString("N0", NumberFormat)); // expanded token {frequency} is formatted as float with thousands separators

            }

            catch (FormatException ex)

            {

                throw new FormatException(

                    String.Format("The input string format string did contain not an expected token like "+

                                  "{{runs}}/{{0}}, {{time}}/{{1}} or {{frequency}}/{{2}} or the format string " +

                                  "itself was invalid: \"{0}\"", format), ex);

            }

        }

 

        /// <summary>

        /// Execute the given function n-times and print the timing values (number of runs, elapsed time, call frequency)

        /// to the console window.

        /// </summary>

        /// <param name="func">Function to call in a for loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Action func, int runs, string format)

        {

            Func<int> f = () =>

            {

                for (int i = 0; i < runs; i++)

                {

                    func();

                }

                return runs;

            };

            f.Profile(format);

        }

 

        /// <summary>

        /// Call a function in a for loop n-times. The first function call will be measured independently to measure

        /// first call effects.

        /// </summary>

        /// <param name="func">Function to call in a loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="formatFirst">Format string for first function call performance.</param>

        /// <param name="formatOther">Format string for subsequent function call performance.</param>

        /// <remarks>

        /// The format string can contain {runs} or {0},{time} or {1} and {frequency} or {2}.

        /// </remarks>

        public static void ProfileWithFirst(this Action func, int runs, string formatFirst, string formatOther)

        {

            func.Profile(1, formatFirst);

            func.Profile(runs - 1, formatOther);

        }

    }

 

}

Efficient Memory Usage With .NET

How can you use the word efficient memory usage and mention in the same headline .NET? We all know that C++ is much more efficient with regards to memory consumption. Yes I agree that if you really love your memory you should think twice if .NET is the right choice for you. There have been reasons why  Windows Vista has not a single managed executable executed while starting up. Ok the Event Viewer is managed which explains why it is starting so slow. First of all you need to know what things cost. The following table shows you how much memory is allocated for some common object types:

 

Type Size in Bytes (32-bit)
new object() 12
new string('\0'); 20
new DummyStruct(); 4

What perhaps is surprising that each managed class object  consumes at least 12 bytes of memory. If you want to allocate a huge number of objects you are perhaps better off with a struct value type.

The program used to get the numbers was this one:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading;

using System.Runtime.InteropServices;

 

namespace MemoryAllocation

{

 

    [StructLayout(LayoutKind.Sequential,Pack=1)]

    struct DummyStruct

    {

        public int a;

        public int b;

    }

 

    class Program

    {

        static List<T> Allocate<T>(int allocations, Func<T> allocator)

        {

            List<T> memory = new List<T>();

            for (int i = 0; i < allocations; i++)

            {

                memory.Add( allocator() ); // call function that creates a new object

            }

 

            return memory;

        }

 

        static char [] empty = new char[] { '\0' };  // input for string ctor to create an empty string

 

        static void Main(string[] args)

        {

            var before = GC.GetTotalMemory(true);

 

            const int Allocations = 1000 * 1000;

 

            // allocate memory and do not release it

            var mem = Allocate(Allocations,

                //() => new object()

                //() => new DummyStruct()

                () => new string(empty)

                );

 

            GC.Collect();

            GC.Collect();

            var after = GC.GetTotalMemory(true);

 

            GC.KeepAlive(mem);

 

            // get memory allocated by one object excluding the 4 bytes which are used for the object

            // reference in the array

            Console.WriteLine("One object consumes about {0} bytes", (after - before) / Allocations);

        }

    }

}

One thing to note is that you need to subtract from the output 4 bytes for reference types (object and string) because they are stored in an array and we do not want to count the array reference as memory consumption also. As I said in my previous post "Where Did My Memory Go" every small (< 85000 bytes) managed object allocation will eat up your physical memory because the garbage collector will traverse the managed heaps from time to time to remove dead objects and to compact the heaps. That has the effect that your objects although you will never use them will stay always hot in the memory which prevents them to go into the page file. You can of course force the OS to swap all your memory out to the page file by calling SetProcessWorkingSetSize(GetCurrentProcess(),-1,-1) but that has severe performance drawbacks when you access the swapped out memory which causes hard page faults. Windows Forms application actually do this to save memory. That is the reason why the Working Set drop to some MB when you minimize a managed application. The golden rule is to use efficient data structure to consume as little memory as possible. If you want to optimize your memory consumption you need a managed memory profiler. The ones I found most useful are

 

.NET Memory Profiler from SCITech

  • It is cheap 179€/license (taken at 13.12.2008)
  • The fastest profiler I have used so far.
  • Full 64-bit support.
  • It does support allocation stacks (which functions did lead to the object allocation).
  • View content of allocated objects.
  • It can take Snapshots of your process and compare it to another snapshot. This way you can find memory leaks quite easily.
  • Object tagging to find out which objects are new since the last snapshot.
  • Nice and fast filtering capabilities.

Of course there are also some gotchas

  • The extended profiling mode is not as stable as I would have wished. It works for most applications but can crash on bigger applications.
  • The object list is not very easy to navigate to find your biggest memory consumers.

 

The .NET Memory Profiler is easy to use and definitely worth its money.

 

YourKit Profiler for .NET

  • It can be enabled during application runtime which is a quite unique feature I have found nowhere else.
  • It is both a performance and memory profiler.
  • Full 64-bit support.
  • Class List view is easy to navigate.
  • Class Tree view is very cool to find out more about which objects contain all the others on a namespace level.
  • Memory Analysis can find duplicate strings and other waste memory anti patterns which can otherwise only be found by watching each string ...

That sounds very impressive and it is. But it has also some quite severe limitations

  • It has no allocation stack support. That makes it very hard to find out who has allocated your object.
  • Static class members cannot be tracked down to the class that holds the reference to them. They show up as object roots with no connection to anywhere.
  • Stack local instances are flagged but if you e.g. allocate 200 MB inside a function you will not be able to navigate to the class that allocated the object.
  • The fast profiling mode is much less stable and crashes quite often
  • Opening a snapshot is very slow.
  • The performance profiler (sampling, tracing) does not show me the bottlenecks where thread sleeps are involved in the way I would expect them. That can lead to the wrong direction. The new Ants Profiler 4.0  or Intels VTune Performance profiler are much better suited for that job.
  • Not so cheap 389 €/license (taken at 13.12.2008).

YourKit is the leading Java profiler company which also has a .NET profiler in their portfolio. No profiler is perfect and in fact both complement each other and I would recommend to use both (SciTech and YourKit)  to get the best possible overview how your memory in your application is distributed. All profilers can be downloaded from the software vendors for free with a 14-day trial license to try them out. I recommend to do so to find out which profiler suits your needs best. There are some other profilers also out there like AQTime and the Ants Memory Profiler. AQTime seems to be able to profile .NET and C++ for performance and memory which makes it very interesting. But so far I found not enough time check because it is not easy to use. The current Ants Memory Profiler is not usable and I cannot recommend it at all. But they have a very good performance profiler which is really worth its money. The only thing I really do not understand about the Ants profilers is that it is NOT possible to launch an application with command line arguments from a batch file. That is ok for GUI applications but if your application under test spawns child processes you need to be able to call the profiler from the command line.

Equipped with a profiler we can chase our memory now.

I recommend to look at first for

  • Duplicate strings - The objects that allocated them will most likely have several instances around. Consider to make them static to save memory.
  • *Cache* in the type name. It is surprising how much memory with caches is lost. Bigger applications seem to have their own cache in each architectural layer which should be questioned. If the cache itself is ok look how many instances of your cache exist. A sane rule is that a cache should exist only once. If you find more than one cache instance it is very likely that the cached data may be the same but it is not shared.
  • XmlDocuments have their own XML DOM tree representation which consumes quite a lot of memory (x3-x5 times more than the plain xml file). The profilers have a hard time to resolve from an XmlNode to the actual object that holds a reference to them so it can be a bit tricky to find out.
  • XmlReaderSettings are a fine source of memory leaks if you store them as member inside your class. When you choose to validate your XML document you will attach the reader settings class with the validation error callbacks to the just read XmlDocument. In effect you reference from the XmlReaderSettings instance your XML DOM tree even when you do not need it ever again!
  • Huge number of objects of the same type and check who has allocated them.

 

When you have found an inefficiency it is time to fix it. Then you need to measure memory consumption again. Here comes the hard part. It is very difficult to check if a memory optimization did actually save memory. A simple look at Working Set, Private Bytes, GC Heap does not work since the GC heaps are allocated in chunks (16 MB if I remember correctly). These numbers tell you only the peak memory consumption during startup of your application. But since then half of your heaps might be empty and you can get the impression that nothing has changed after your patch. The easiest way to check if an optimization did actually work is to look at the GC.GetTotalMemory value from time to time or you can use the memory profilers overview pages which are also helpful.

Who Scraped Over My Disk?

I have installed Windows Vista Ultimate since some month now and I still wonder what system service I have missed to get a well behaved system. To troubleshoot heavy disk access I use Process Monitor from SysInternals. It is an invaluable tool to find out who did access what on the system. It can monitor every module load, process start, thread creation, registry or disk access and network activity you can imagine. If you are developing unmanaged code you can even view the call stacks who did access this file or registry key not only from which application but it captures also (when the symbols are loaded) the full call stack. To make the call stack work you need to download the symbols which is quite easy and I recommend to do it. Below is a typical screen shot from Process Monitor when the system is doing seemingly nothing:

As you can see there is still quite a lot activity going on behind the scenes. Process Monitor makes the Windows black box a white box and you can look directly at the system while it is working. If you ask with what black art this information is obtained: Process Monitor installs a filter driver (at run time only) which allows it to redirect nearly all IO related activities to the filter driver. To check if the driver is runnung you can use the fltmc command:

c:\>fltmc.exe

Filtername                     Number of Instances    Height    Frame
-----------------------------  --------------------    ----    -----
PROCMON11                           0       385001         0
luafv                                   1         135000          0
FileInfo                                8        45000            0

If you wan to find out who did who did open the files simply go to Tools - Stack Trace

and you will see  even the call stack who did try to access a specific file or registry key. I did try this with a small C++ application to get a full call stack but the latest version (2.02) seems have problems to pick up the correct events. I did not see my file IO at all (Windows Vista 32-bit). Ok that means that I can finish my post. But wait. My initial goal was to tell you a little known secret: XPerf is your secret weapon when you want to see what the Kernel really does.

Some introduction links are here:

Starting with Windows 2000 the ETW (Event Tracing for Windows) was introduced but it was a little known feature which did not get much attention. The main reason is that it was quite complicated to implement and use. Xperf has been used within MS since years but it was so useful they have published it. The tool can be used to capture traces on Server 2003 and XP systems but you must analyze them on a Vista system to decode them.

Its main uses are (where I have used it so far):

  • Find out the difference why the first start of your application takes so much longer than the second start
  • Who is accessing the file x zillion times?
  • Are my disk access patterns inefficient?

To get started simply install the tool and start a trace session with some default event providers (xperf -providers lists all of them).

  • xperf -on base

Now you can execute your use case, start your application, ... Then you can stop the trace session and save the traces to disk

  • xperf -d c:\savedTraces.etl

To view the saved traces simply type

  • xperf c:\savedTraces.etl or double click on the file in explorer if you have a file association with .etl files.

You will see some output like this:

As you can see there is CPU usage, Disk IO, Disk IO per process, .... Today application was scraping over my hard disk again and I never figured out what it was. Process Monitor shows you very much details but sometimes the amount of data can be overwhelming and I was not able to spot the issue. But here we see a nice 100% spike in the Disk Utilization by Process. Lets select the region of interest and click on Detail Graph (this is a context menu):

This graph shows you the time on the horizontal axis and the logical cluster number of you hard disk on the vertical axis. To make it even more interesting it shows you only true disk accesses. If all data is in the file system cache this graph will contain only a few dots which tells you that the hard disk had nothing to do with your slow system. From this graph you can conclude

  • Big vertical changes means long disk seeks thus slow performance. An optimal disk access pattern (linear read) would be a straight line.
  • Big horizontal gaps means that the disk or the application was busy. You just did wait for something to complete (random access) quite long.

Additionally the Summary table shows you also a wealth of information

In my case I did find that FireFox has its own database (urlclassifer3.sqlite) which does update its index from time to time. The database itself contains a list of URLs which are considered harmful. Firefox will warn you when you try to navigate to potential fraud urls. You can disable this thing under

Options - Security and uncheck

Tell me if the site I´m visiting is a suspected attack site.

Tell me if the site I´m visiting is a suspected forgery.

Shut down Firefox and delete the database. Voila no more hard disk scraping in the background and one reason less to blame Windows Vista. Finally this case can be closed. With this tool I managed also to find out what my little cpptest.exe application was doing:

 

When the _NT_SYMBOL_PATH is  set to c:\windows\symbols;SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols and you add the cpptest pdb directory you will get this output. From this stack trace we can conclude that cpptest.exe does call from its main method some long lasting functions such as WriteFile which are responsible for the disk IO we where seeing in the System process. One important thing about file IO with Windows Vista is that the OS seems to be veeeery lazy with file writes. If I write e.g. a 100 MB file it is has 0 bytes until I close the file! To force the Vista to write the contents of a file to disk you need to call FlushFileBuffers or use unbuffered IO.