Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  37 Posts | 8 Stories | 132 Comments | 165 Trackbacks

News



Article Categories

Archives

Post Categories

Image Galleries

Programming

Sunday, June 21, 2009 #

The Enterprise Library team is trying to improve the performance of the Logging Application Block (LAB) even further. Since version one we did come a long way. It is interesting that even the guidance projects of Microsoft do have change their mind. Lets recap how the LAB has evolved since it initial release in January 2005.

Released Version Logging Application Block (LAB) Characteristics ca. Logs/s
January 2005    v1.0 Log File is opened/closed after every write. Performance was awful but accepted. 400 Logs/s
January 2006  v2.0 .NET 2.0 Support, Easier configuration, Log File is flushed after every write. Much better perf. 3000 Logs/s
April 2007 3.0 New Application Blocks but no significant changes in the LAB. 3000 Logs/s
May 2008  4.0 Log files which are rolled over are finally supported out of the box. The TextFormatter was reworked to make it over 13 times faster as I suggested. 10 000 Logs/s

 

Since customers are still complaining about the performance of the Logging Application Block a performance feature was added to the product backlog:

...

LAB02: Async logging (text formatting done asynchronously) to cut down on load on primary thread (M)

...

Looks ok. We have many CPUs by now why not put it on another thread? As Grigory has pointed out there are several problems with that approach. If you switch to another thread you have to fetch the call stack of the calling thread first before doing the thread transition. But since the Stacktrace generation is a fairly expensive operation there is little if nothing at all gained to do it on an another thread. Thread affinity strikes back and destroys your nice lazy init design pattern to initialize only the properties of a log messages which are really configured to be formatted by the message template. So what is the problem here? First I think it is the customers expecting too much from a general purpose application block. The title of the post says it all: Logging is not Tracing!

Logging

When you design a big application you need to have good and flexible error reporting perhaps across machines to collect log data in a centralized way. That is a perfect use case for the Logging Application Block where you configure some remote trace listener and send the log data to a central log server which stores its log messages in a database, log file or whatever. If you use out of process communication you are limited by the network performance already which is in the best case several thousand logs/s.

 

Tracing

Besides Error Reporting you also need to trace your program flow to find out where the performance bottlenecks are and even more important when an error occurs you have a chance to find out how you did get there. In an ideal world every function would have some tracing enabled with the function duration, passed parameters and how far you did get in your function. This is what performance profilers actually do by using the Profiling API of .NET. If you have ever used such a tool you will notice that the performance of you application will be degraded by a factor 5-100. Doing automatic instrumentation is therefore not an option and you still need to write some code or use an IL weaver to add some tracing code to your app.

I think there has never been made a real distinction between logging and tracing.

  • Logging is error reporting.
  • Tracing is following your program flow and data.

Logging is always enabled. Tracing on the other hand is normally disabled since it would generate huge amounts of data in very short time. By differentiating these two use cases you can optimize each of them in a different manner. Logging needs to be flexible. Tracing has three main goals: Correctness, performance and performance. This can heavily influence your API design and other design choices to support each scenario as well as possible.

Conclusions

And now back to the original question: Why do customers still complain about the performance of the Logging Application Block (LAB)? I am very certain that these customers use the LAB for tracing and complain about its bad performance. When you try to log enter/leave of some very frequently called functions (millions calls/s) you end up with huge log files and a crawling application. What do you do if your application is slow? Use a profiler and check where most of the time is spent and find that most of the time is spent in the LAB string formatting and disk IO functions. There is little you can do about the disk IO so you leave it and try to optimize the LAB further. That will only push the limit a little and be all in vain until the next customer complains that if he Logs enter/leave of his highly optimized GetHashCode() function his application becomes very slow. How do we get out of this? There is no easy way out of this. It could help to make such performance boundaries more explicit like the WCF team has done it with DataContracts. With explicit I mean one API for logging another one for Tracing. That could help to minimize the pain of users with wrong expectations of the performance of the LAB.

That said I see still quite some potential for the LAB to support tracing better than today:

  • Create some TraceFormatter with some quite fixed message format which includes time, pid, tid, category and message for example. No Call Stack since it is too expensive.
  • In the formatters cache the the last LogEntry object and the formatted message. If you trace to two different destinations with the same formatter one could reuse the already formatted message if the object reference is the same as last time.
  • Format the time in an efficient way like it was shown here. A factor 15 compared to DateTime.ToString should help to make TraceFormatter a really fast one.
  • Do not use multithreading. It will make it slower, harder to understand and less reliable. I expect to have 100% reliable logging and tracing. Always. The following simple code should work with no exceptions:

void Main(string [] args)

{

      Log.Info("Hello world"); // could be lost if done asynchronously

}

If you introduce a time latency between the log call and its processing you can loose the most important log message just before your application has terminated.

Perhaps we will see some nice Tracing Application Block in the future if enough customers complain. Just tell Grigory to consider this as well for future version of the Enterprise Library.


Wednesday, May 27, 2009 #

Managed Performance counters are tricky (or broken, it depends how you look at them) to read when you have more than one process with the same name running managed code. Each performance counter gets as instance name a unique identifier

  1. ManagedApp
  2. ManagedApp#1
  3. ManagedApp#2
  4. ...

If you want to know for a specific process identified by its process id thing become tricky. There is a counter in the .NET Memory category called Process ID which enables us to find out the correct counter instance name without guessing.

To find the correct instance name here is a little helper class which does it in a semi performant way:

 

using System;

using System.Diagnostics;

using System.IO;

using System.Threading;

using System.Globalization;

using System.Runtime.Remoting.Messaging;

 

namespace PerformanceCounterRead

{

    public class PerfCounterReader : IDisposable

    {

        PerformanceCounter myMemoryCounter;

        const string CategoryNetClrMemory = ".NET CLR Memory";

        const string ProcessId = "Process ID";

        const int ProcessesToTry = 40;

 

        public PerfCounterReader(int processId) : this(Process.GetProcessById(processId))

        {

        }

 

        string GetInstanceNameForProcess(int instanceCount, Process p)

        {

            string instanceName = Path.GetFileNameWithoutExtension(p.MainModule.FileName);

 

            if (instanceCount > 0) // Append instance counter

            {

                instanceName += "#" + instanceCount.ToString();

            }

 

            // Reader .NET CLR Memory Process ID for the given instance to check if

            // it does match our target process

            using (PerformanceCounter counter = new PerformanceCounter(CategoryNetClrMemory, ProcessId,

                   instanceName, true))

            {

 

                long id = 0;

 

                try

                {

                    while (true)

                    {

                        var sample = counter.NextSample();

                        id = sample.RawValue;

 

                        // for some reason it takes quite a while until the counter is

                        // updated with the correct data

                        if (id > 0)

                            break;

 

                        Thread.Sleep(15);

                    }

                }

                catch (InvalidOperationException)

                {

                    // swallow exceptions from non existing instances we tried to read

                }

 

                return (id == p.Id) ? instanceName : null;

            }

 

        }

 

        string GetManagedPerformanceCounterInstanceName(Process p)

        {

            Func<int, Process, string> PidReader = GetInstanceNameForProcess;

            string instanceName = null;

            AutoResetEvent ev = new AutoResetEvent(false);

 

            for (int i = 0; i < ProcessesToTry; i++)

            {

                int tmp = i;

                // Since reading the performance counter for every process is

                // very slow we try to speed up our search by reading up to ProcessesToTry

                // in parallel

                PidReader.BeginInvoke(tmp, p, (IAsyncResult res) =>

                    {

                        if (instanceName == null)

                        {

                           string correctInstanceName = PidReader.EndInvoke(res);

 

                           if (correctInstanceName != null)

                            {

                                instanceName = correctInstanceName;

                                ev.Set();

                            }

                        }

 

                    }, null);

            }

 

 

            // wait until we got the correct instance name or give up

            if (!ev.WaitOne(20 * 1000))

            {

                throw new InvalidOperationException("Could not get managed performance counter instance name for process " + p.Id);

            }

 

            return instanceName;

        }

 

        public PerfCounterReader(Process p)

        {

            string processInstanceName = GetManagedPerformanceCounterInstanceName(p);

            myMemoryCounter = new PerformanceCounter(CategoryNetClrMemory, "# Bytes in all Heaps", processInstanceName);

        }

 

        public long BytesInAllHeaps

        {

            get

            {

                return myMemoryCounter.NextSample().RawValue;

            }

        }

 

        #region IDisposable Members

 

        public void Dispose()

        {

            myMemoryCounter.Dispose();

        }

 

        #endregion

    }

}

To use this class you can give it your current process to check how exact the counter behaves:

 

            var p = Process.GetCurrentProcess();

            using(PerfCounterReader reader = new PerfCounterReader(p))

            {

                while (true)

                {

                    Console.WriteLine("Managed Heap Memory[{0}]: {1:N0} {2:N0}", p.Id, reader.BytesInAllHeaps, GC.GetTotalMemory(false));

                    memory.Add(new List<byte>(10000 * 1000));

                    Thread.Sleep(1000);

                }

            }

 

This will produce output similar to this:

Managed Heap Memory[1616]:     868.844   1.129.604
Managed Heap Memory[1616]:     868.844  11.202.852
Managed Heap Memory[1616]:  10.840.884  20.376.384
Managed Heap Memory[1616]:  20.912.640  30.376.376
Managed Heap Memory[1616]:  20.912.640  40.449.624
Managed Heap Memory[1616]:  20.912.640  50.522.872
Managed Heap Memory[1616]:  20.912.640  60.596.120
Managed Heap Memory[1616]:  20.912.640  70.669.368
Managed Heap Memory[1616]:  20.912.640  80.734.424
Managed Heap Memory[1616]:  20.912.640  90.807.672
Managed Heap Memory[1616]:  20.912.640 100.880.920
Managed Heap Memory[1616]: 100.841.252 110.376.732
Managed Heap Memory[1616]: 100.841.252 120.449.980

What is interesting that the GC.GetTotalMemory function gives much more precise results than the performance counter. It seems that the performance counter is updated only once every 5-10 seconds which is quite slow but better than nothing. The .NET Memory Performance Counters are updated after every GC.Collect. If you want to track during unit tests your resource consumption in a timely manner you will need to add quite big sleeps or trigger a GC in the remote process to get decent reliable numbers.

As a rule of the thumb I can only emphasize measure and check your numbers for errors. Coming from nuclear physics I was educated to question the numbers and check for consistency. This art seems to have gotten lost in our fast paced IT industry where the display (excel sheet with fancy macros) seems to be more important than what you actually did measure. If these numbers do help you to track and steer resource consumption, performance, ... then you have produced real business value. Once you have got reliable measurements you can reason about the numbers what they can tell you. With an increasing amount of work you can

  1. Measure something wrong
  2. Measure something right
  3. Measure the relevant things right
  4. Measure the relevant things right and take further actions to improve your software.

If you are stuck in 1-3 then you have gained nothing for your current project because the knowledge gained from your measurements does not flow back into your software.

Measuring for example the available physical memory before and after a test will show you that you have "lost" or "gained" 100-300 MB of memory. But what does it tell you about the resource consumption of your tests? Not much since the OS does manage your physical memory of all processes. Even if you have a big memory leak it does not necessarily show up a lost physical memory since the OS is quite good at paging unused memory out into the pagefile. The machine wide memory consumption is easy to measure but of little value (2). More about the Zen of measuring performance/consumption right is the topic of a future post.


Thursday, May 14, 2009 #

Uhh What? DevPath is an environment variable that allows you specify global directories which are searched just like GAC. If you ever had the urge to load dlls from your application from subdirectories you need a probing element in your app.config which allows exactly that.

The only problem with that is that you cannot escape from your application root directory. When you try to load something from ..\Centralbin it is ignored. In that cases you need to use the GAC if you like it or not. Since DevPath was broken for some time with .NET 2.0 I thought it was no longer supported. But thanks to John Robbins article "PDB Files: What every Developer must know." I did learn a different story. That makes it possibly now to use some Microsoft tools in a standard fashion. The Xml Serialization assembly generator Sgen for example can create the serialization assembly only if all public serializable types do not have dependencies to assemblies in other directories. This is a major PITA since fresh compiled assemblies are located in other directories than the rest (except it you have set Copy To Local to true in Visual Studio but that is a bad idea either).

But now we can alter the sgen.exe.config and add one line

<?xml version ="1.0"?>
<configuration>
    <runtime>       
        <generatePublisherEvidence enabled="false"/>
        <developmentMode developerInstallation="true"/>
    </runtime>
</configuration>

And now behold lets call sgen.exe

C:\Source>sgen.exe

System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
at System.Resources.ResourceManager.TryLookingForSatellite(CultureInfo lookForCulture)
at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo culture, Boolean createIfNotExists, Boolean tryParents)
at System.Resources.ResourceManager.GetString(String name, CultureInfo culture)
at System.Environment.ResourceHelper.GetResourceStringCode(Object userDataIn)
at System.Environment.GetResourceFromDefault(String key)
at System.Environment.GetResourceString(String key)
at System.IO.Path.CheckInvalidPathChars(String path)
at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
at System.IO.Path.NormalizePath(String path, Boolean fullCheck)
at System.IO.Path.GetFullPathInternal(String path)
at System.AppDomainSetup.set_DeveloperPath(String value)
at System.AppDomain.SetupFusionStore(AppDomainSetup info)
at System.AppDomain.SetupDomain(Boolean allowRedirects, String path, String configFile)

Ups. .NET 3.5 SP1 did not do the trick? Some investigation shows that the .NET Framework is still subject to bad error handling. If DEVPATH is empty or DEVPATH contains invalid path characters such as > < | or " then it will try to report the issue so far so good. But it seems that Microsoft seems to be a lover of fast in process tests where each methods works perfectly but the whole thing blows apart when used in a true product scenario. This is not the first time that did happen with DEVPATH but I thought that since the release of .NET 2.0 in 2005 these things would have been fixed and some regression tests had been added. Apparently I was wrong.

In my specific case I did try set devpath="c:\Source\EntLib3Src\App Blocks\bin" which did fail because of the parenthesis. Once I removed the "invalid" characters all did work out fine.

During my investigation with Reflector I stumbled upon another undocumented environment variable RELPATH which does set the private probing path.

info.PrivateBinPath = Environment.nativeGetEnvironmentVariable(AppDomainSetup.PrivateBinPathEnvironmentVariable);

When I set it to e.g. subDir then I do no longer need to set the private probing path in my App.config. Nice that could come in handy in some scenarios.


Tuesday, December 16, 2008 #

I do performance measurements quite regularly which involves to call a piece of code several times to measure how long it did take. I am sure nearly everybody has done this already. But as a physicist I know that (nearly) every measurement has fundamental problems which never go away. Key to a successful measurement is that you exactly know what you are measuring and not what you think you measure. The easiest way to get out of this dilemma without too much knowledge is to simply ignore the fact that you don´t know enough and restrict yourself to pure differential measurements. With differential I do mean that you measure it once, change the code in a way that makes it better and measure it again. When you measure different times then you can assume that your code change was the cause for the different timing. That does work to some extent quite well but you should always be vigilant when the results change dramatically from one build to the next. There is no error I did not make with performance measurements so I think it is safe to give away my top 101:

  1. DateTime.Now has a resolution of ~16ms. If you measure anything faster you must use Stopwatch which has a resolution of about 100ns.
  2. With Windows Vista you need to set your Power options to maximum performance to prevent the OS from change the CPU clock frequency at random times.
  3. Warm and cold start times are way different. Usually a factor of 2-6 is quite normal. With cold startup I mean that you have a fresh booted system which did have never run your test case. This cold startup time does mainly contribute to disk IO which you can watch with XPerf very nicely.
  4. Be sure that you measure the right thing. I cannot emphasize this enough. Did you really measure the right thing?
  5. Know your influencers. You need to get a good understanding how much e.g. number of iterations, input data size, concurrency, other applications can affect the outcome. I had more than once the case that I did execute a test in the evening and then next morning. But the results did differ by a factor two for no apparent reason.
  6. Debug and Release builds still differ in the managed world. Although the difference is much smaller than it was with C++.
  7. First time initialization effects should be measured separately. When you initialize a class and call some method 10 million times it makes not much sense to mix the ctor call time with pure function calling time.
  8. Do measure long enough to get reliable results. The mean value is a powerful ally to flatten sporadic effects.
  9. If you get strange disk IO reports check if a virus scanner does interfere.
  10. Other processes might be running as well. These might influence your test if they consume significant resources.
  11. Stupid but happens: Turn off tracing before you measure.

You still want to make some quick throw away measurements? You have been warned. It is a tricky thing. To get at least the timing calculation right I did create some extension methods I want to share there. With these extensions you can create an Action delegate and call the new Profile method on it to execute it n-times. To give you full control over the string formatting you can supply a format string which expands {runs}, {time} and {frequency} in a human readable way for you.

            Action func = () =>

            {

                using (File.OpenRead(@"C:\config.sys"))

                {

                }

            };

 

            func.Profile(1000, "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That little snippet produces on the console

Did open the file 1 000 times in 0.06s. Can do 16 667 File.Open/s

Since we know that the first call to File.Open will involve actual disk accesses which are much slower we want to measure the first call independently to see the "cold" startup performance as well:

 

func.ProfileWithFirst(1000, "First File Open did take {time}s", "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That will give us

First File Open did take 0.00s
Did open the file 999 times in 0.05s. Can do 20 388 File.Open/s

There are also times when you want to control the number of iterations by yourself. Nothing easier than that. Simply change the delegate type from Action to Func<int> and you are ready to go.

 

            Func<int> myFunc = () =>

                {

                    const int Runs = 1000;

                    for (int i = 0; i < Runs; i++)

                    {

                        using (File.OpenRead(@"C:\config.sys")) ;

                        using (File.OpenRead(@"C:\autoexec.bat")) ;

                    }

 

                    return Runs * 2;

                };

 

            myFunc.Profile("Did open {runs} files in {time}s");

That was almost too easy. But what if I do not like the formatting? Well it turns out you can customize it still. The format string placeholders {0} {1} and {2} are the number of runs, elapsed time in seconds  as float and call frequency (runs/s)  as float. They can be used as usual to customize the output format to your specific needs.

myFunc.Profile("Did open {runs} files in {1:F5}s");

Did open 2 000 files in 0,10000s

The actual class is written in C# 3.0 with lambda expressions. The usage of this rather low level primitives gives you a rather surprising amount of flexibility and composability to build higher level functions.

Here is the source code for the Action and Func<int> delegate extension methods:

 

using System;

using System.Globalization;

using System.Diagnostics;

 

namespace PerformanceTester

{

    /// <summary>

    /// Helper class to print out performance related data like number of runs, elapsed time and frequency

    /// </summary>

    public static class Extension

    {

        static NumberFormatInfo myNumberFormat;

 

        static NumberFormatInfo NumberFormat

        {

            get

            {

                if (myNumberFormat == null)

                {

                    var local = new CultureInfo("en-us", false).NumberFormat;

                    local.NumberGroupSeparator = " "; // set space as thousand separator

                    myNumberFormat = local; // make a thread safe assignment with a fully initialized variable

                }

                return myNumberFormat;

            }

        }

 

        /// <summary>

        /// Execute the given function and print the elapsed time to the console.

        /// </summary>

        /// <param name="func">Function that returns the number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Func<int> func, string format)

        {

 

            Stopwatch watch = Stopwatch.StartNew();

            int runs = func();  // Execute function and get number of iterations back

            watch.Stop();

 

            string replacedFormat = format.Replace("{runs}", "{3}")

                                          .Replace("{time}", "{4}")

                                          .Replace("{frequency}", "{5}");

 

            // get elapsed time back

            float sec = watch.ElapsedMilliseconds / 1000.0f;

            float frequency = runs / sec; // calculate frequency of the operation in question

 

            try

            {

                Console.WriteLine(replacedFormat,

                                    runs,  // {0} is the number of runs

                                    sec,   // {1} is the elapsed time as float

                                    frequency, // {2} is the call frequency as float

                                    runs.ToString("N0", NumberFormat),  // Expanded token {runs} is formatted with thousand separators

                                    sec.ToString("F2", NumberFormat),   // expanded token {time} is formatted as float in seconds with two digits precision

                                    frequency.ToString("N0", NumberFormat)); // expanded token {frequency} is formatted as float with thousands separators

            }

            catch (FormatException ex)

            {

                throw new FormatException(

                    String.Format("The input string format string did contain not an expected token like "+

                                  "{{runs}}/{{0}}, {{time}}/{{1}} or {{frequency}}/{{2}} or the format string " +

                                  "itself was invalid: \"{0}\"", format), ex);

            }

        }

 

        /// <summary>

        /// Execute the given function n-times and print the timing values (number of runs, elapsed time, call frequency)

        /// to the console window.

        /// </summary>

        /// <param name="func">Function to call in a for loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Action func, int runs, string format)

        {

            Func<int> f = () =>

            {

                for (int i = 0; i < runs; i++)

                {

                    func();

                }

                return runs;

            };

            f.Profile(format);

        }

 

        /// <summary>

        /// Call a function in a for loop n-times. The first function call will be measured independently to measure

        /// first call effects.

        /// </summary>

        /// <param name="func">Function to call in a loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="formatFirst">Format string for first function call performance.</param>

        /// <param name="formatOther">Format string for subsequent function call performance.</param>

        /// <remarks>

        /// The format string can contain {runs} or {0},{time} or {1} and {frequency} or {2}.

        /// </remarks>

        public static void ProfileWithFirst(this Action func, int runs, string formatFirst, string formatOther)

        {

            func.Profile(1, formatFirst);

            func.Profile(runs - 1, formatOther);

        }

    }

 

}


Friday, December 12, 2008 #

How can you use the word efficient memory usage and mention in the same headline .NET? We all know that C++ is much more efficient with regards to memory consumption. Yes I agree that if you really love your memory you should think twice if .NET is the right choice for you. There have been reasons why  Windows Vista has not a single managed executable executed while starting up. Ok the Event Viewer is managed which explains why it is starting so slow. First of all you need to know what things cost. The following table shows you how much memory is allocated for some common object types:

 

Type Size in Bytes (32-bit)
new object() 12
new string('\0'); 20
new DummyStruct(); 4

What perhaps is surprising that each managed class object  consumes at least 12 bytes of memory. If you want to allocate a huge number of objects you are perhaps better off with a struct value type.

The program used to get the numbers was this one:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading;

using System.Runtime.InteropServices;

 

namespace MemoryAllocation

{

 

    [StructLayout(LayoutKind.Sequential,Pack=1)]

    struct DummyStruct

    {

        public int a;

        public int b;

    }

 

    class Program

    {

        static List<T> Allocate<T>(int allocations, Func<T> allocator)

        {

            List<T> memory = new List<T>();

            for (int i = 0; i < allocations; i++)

            {

                memory.Add( allocator() ); // call function that creates a new object

            }

 

            return memory;

        }

 

        static char [] empty = new char[] { '\0' };  // input for string ctor to create an empty string

 

        static void Main(string[] args)

        {

            var before = GC.GetTotalMemory(true);

 

            const int Allocations = 1000 * 1000;

 

            // allocate memory and do not release it

            var mem = Allocate(Allocations,

                //() => new object()

                //() => new DummyStruct()

                () => new string(empty)

                );

 

            GC.Collect();

            GC.Collect();

            var after = GC.GetTotalMemory(true);

 

            GC.KeepAlive(mem);

 

            // get memory allocated by one object excluding the 4 bytes which are used for the object

            // reference in the array

            Console.WriteLine("One object consumes about {0} bytes", (after - before) / Allocations);

        }

    }

}

One thing to note is that you need to subtract from the output 4 bytes for reference types (object and string) because they are stored in an array and we do not want to count the array reference as memory consumption also. As I said in my previous post "Where Did My Memory Go" every small (< 85000 bytes) managed object allocation will eat up your physical memory because the garbage collector will traverse the managed heaps from time to time to remove dead objects and to compact the heaps. That has the effect that your objects although you will never use them will stay always hot in the memory which prevents them to go into the page file. You can of course force the OS to swap all your memory out to the page file by calling SetProcessWorkingSetSize(GetCurrentProcess(),-1,-1) but that has severe performance drawbacks when you access the swapped out memory which causes hard page faults. Windows Forms application actually do this to save memory. That is the reason why the Working Set drop to some MB when you minimize a managed application. The golden rule is to use efficient data structure to consume as little memory as possible. If you want to optimize your memory consumption you need a managed memory profiler. The ones I found most useful are

 

.NET Memory Profiler from SCITech

  • It is cheap 179€/license (taken at 13.12.2008)
  • The fastest profiler I have used so far.
  • Full 64-bit support.
  • It does support allocation stacks (which functions did lead to the object allocation).
  • View content of allocated objects.
  • It can take Snapshots of your process and compare it to another snapshot. This way you can find memory leaks quite easily.
  • Object tagging to find out which objects are new since the last snapshot.
  • Nice and fast filtering capabilities.

Of course there are also some gotchas

  • The extended profiling mode is not as stable as I would have wished. It works for most applications but can crash on bigger applications.
  • The object list is not very easy to navigate to find your biggest memory consumers.

 

The .NET Memory Profiler is easy to use and definitely worth its money.

 

YourKit Profiler for .NET

  • It can be enabled during application runtime which is a quite unique feature I have found nowhere else.
  • It is both a performance and memory profiler.
  • Full 64-bit support.
  • Class List view is easy to navigate.
  • Class Tree view is very cool to find out more about which objects contain all the others on a namespace level.
  • Memory Analysis can find duplicate strings and other waste memory anti patterns which can otherwise only be found by watching each string ...

That sounds very impressive and it is. But it has also some quite severe limitations

  • It has no allocation stack support. That makes it very hard to find out who has allocated your object.
  • Static class members cannot be tracked down to the class that holds the reference to them. They show up as object roots with no connection to anywhere.
  • Stack local instances are flagged but if you e.g. allocate 200 MB inside a function you will not be able to navigate to the class that allocated the object.
  • The fast profiling mode is much less stable and crashes quite often
  • Opening a snapshot is very slow.
  • The performance profiler (sampling, tracing) does not show me the bottlenecks where thread sleeps are involved in the way I would expect them. That can lead to the wrong direction. The new Ants Profiler 4.0  or Intels VTune Performance profiler are much better suited for that job.
  • Not so cheap 389 €/license (taken at 13.12.2008).

YourKit is the leading Java profiler company which also has a .NET profiler in their portfolio. No profiler is perfect and in fact both complement each other and I would recommend to use both (SciTech and YourKit)  to get the best possible overview how your memory in your application is distributed. All profilers can be downloaded from the software vendors for free with a 14-day trial license to try them out. I recommend to do so to find out which profiler suits your needs best. There are some other profilers also out there like AQTime and the Ants Memory Profiler. AQTime seems to be able to profile .NET and C++ for performance and memory which makes it very interesting. But so far I found not enough time check because it is not easy to use. The current Ants Memory Profiler is not usable and I cannot recommend it at all. But they have a very good performance profiler which is really worth its money. The only thing I really do not understand about the Ants profilers is that it is NOT possible to launch an application with command line arguments from a batch file. That is ok for GUI applications but if your application under test spawns child processes you need to be able to call the profiler from the command line.

Equipped with a profiler we can chase our memory now.

I recommend to look at first for

  • Duplicate strings - The objects that allocated them will most likely have several instances around. Consider to make them static to save memory.
  • *Cache* in the type name. It is surprising how much memory with caches is lost. Bigger applications seem to have their own cache in each architectural layer which should be questioned. If the cache itself is ok look how many instances of your cache exist. A sane rule is that a cache should exist only once. If you find more than one cache instance it is very likely that the cached data may be the same but it is not shared.
  • XmlDocuments have their own XML DOM tree representation which consumes quite a lot of memory (x3-x5 times more than the plain xml file). The profilers have a hard time to resolve from an XmlNode to the actual object that holds a reference to them so it can be a bit tricky to find out.
  • XmlReaderSettings are a fine source of memory leaks if you store them as member inside your class. When you choose to validate your XML document you will attach the reader settings class with the validation error callbacks to the just read XmlDocument. In effect you reference from the XmlReaderSettings instance your XML DOM tree even when you do not need it ever again!
  • Huge number of objects of the same type and check who has allocated them.

 

When you have found an inefficiency it is time to fix it. Then you need to measure memory consumption again. Here comes the hard part. It is very difficult to check if a memory optimization did actually save memory. A simple look at Working Set, Private Bytes, GC Heap does not work since the GC heaps are allocated in chunks (16 MB if I remember correctly). These numbers tell you only the peak memory consumption during startup of your application. But since then half of your heaps might be empty and you can get the impression that nothing has changed after your patch. The easiest way to check if an optimization did actually work is to look at the GC.GetTotalMemory value from time to time or you can use the memory profilers overview pages which are also helpful.


Monday, December 08, 2008 #

I have installed Windows Vista Ultimate since some month now and I still wonder what system service I have missed to get a well behaved system. To troubleshoot heavy disk access I use Process Monitor from SysInternals. It is an invaluable tool to find out who did access what on the system. It can monitor every module load, process start, thread creation, registry or disk access and network activity you can imagine. If you are developing unmanaged code you can even view the call stacks who did access this file or registry key not only from which application but it captures also (when the symbols are loaded) the full call stack. To make the call stack work you need to download the symbols which is quite easy and I recommend to do it. Below is a typical screen shot from Process Monitor when the system is doing seemingly nothing:

As you can see there is still quite a lot activity going on behind the scenes. Process Monitor makes the Windows black box a white box and you can look directly at the system while it is working. If you ask with what black art this information is obtained: Process Monitor installs a filter driver (at run time only) which allows it to redirect nearly all IO related activities to the filter driver. To check if the driver is runnung you can use the fltmc command:

c:\>fltmc.exe

Filtername                     Number of Instances    Height    Frame
-----------------------------  --------------------    ----    -----
PROCMON11                           0       385001         0
luafv                                   1         135000          0
FileInfo                                8        45000            0

If you wan to find out who did who did open the files simply go to Tools - Stack Trace

and you will see  even the call stack who did try to access a specific file or registry key. I did try this with a small C++ application to get a full call stack but the latest version (2.02) seems have problems to pick up the correct events. I did not see my file IO at all (Windows Vista 32-bit). Ok that means that I can finish my post. But wait. My initial goal was to tell you a little known secret: XPerf is your secret weapon when you want to see what the Kernel really does.

Some introduction links are here:

Starting with Windows 2000 the ETW (Event Tracing for Windows) was introduced but it was a little known feature which did not get much attention. The main reason is that it was quite complicated to implement and use. Xperf has been used within MS since years but it was so useful they have published it. The tool can be used to capture traces on Server 2003 and XP systems but you must analyze them on a Vista system to decode them.

Its main uses are (where I have used it so far):

  • Find out the difference why the first start of your application takes so much longer than the second start
  • Who is accessing the file x zillion times?
  • Are my disk access patterns inefficient?

To get started simply install the tool and start a trace session with some default event providers (xperf -providers lists all of them).

  • xperf -on base

Now you can execute your use case, start your application, ... Then you can stop the trace session and save the traces to disk

  • xperf -d c:\savedTraces.etl

To view the saved traces simply type

  • xperf c:\savedTraces.etl or double click on the file in explorer if you have a file association with .etl files.

You will see some output like this:

As you can see there is CPU usage, Disk IO, Disk IO per process, .... Today application was scraping over my hard disk again and I never figured out what it was. Process Monitor shows you very much details but sometimes the amount of data can be overwhelming and I was not able to spot the issue. But here we see a nice 100% spike in the Disk Utilization by Process. Lets select the region of interest and click on Detail Graph (this is a context menu):

This graph shows you the time on the horizontal axis and the logical cluster number of you hard disk on the vertical axis. To make it even more interesting it shows you only true disk accesses. If all data is in the file system cache this graph will contain only a few dots which tells you that the hard disk had nothing to do with your slow system. From this graph you can conclude

  • Big vertical changes means long disk seeks thus slow performance. An optimal disk access pattern (linear read) would be a straight line.
  • Big horizontal gaps means that the disk or the application was busy. You just did wait for something to complete (random access) quite long.

Additionally the Summary table shows you also a wealth of information

In my case I did find that FireFox has its own database (urlclassifer3.sqlite) which does update its index from time to time. The database itself contains a list of URLs which are considered harmful. Firefox will warn you when you try to navigate to potential fraud urls. You can disable this thing under

Options - Security and uncheck

Tell me if the site I´m visiting is a suspected attack site.

Tell me if the site I´m visiting is a suspected forgery.

Shut down Firefox and delete the database. Voila no more hard disk scraping in the background and one reason less to blame Windows Vista. Finally this case can be closed. With this tool I managed also to find out what my little cpptest.exe application was doing:

 

When the _NT_SYMBOL_PATH is  set to c:\windows\symbols;SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols and you add the cpptest pdb directory you will get this output. From this stack trace we can conclude that cpptest.exe does call from its main method some long lasting functions such as WriteFile which are responsible for the disk IO we where seeing in the System process. One important thing about file IO with Windows Vista is that the OS seems to be veeeery lazy with file writes. If I write e.g. a 100 MB file it is has 0 bytes until I close the file! To force the Vista to write the contents of a file to disk you need to call FlushFileBuffers or use unbuffered IO.


Sunday, November 30, 2008 #

When you program in a high level language like .NET where the Garbage collector takes care of your memory you do not have to think about memory as often as it is the case in C++. Memory leaks tend to show up much more often in C++ and other non garbage collected languages because nobody is cleaning after you. Garbage collection is a good thing but somehow your application consumes much more memory than you thought it should. What should you do now? First of all you need to understand how your memory is organized by Windows itself. Mark Russinovich has a very eloquent article about that. If you do not understand what the whole article is all about I give you a quick start.

  1. Download Process Explorer from TechNet.

Now you see some output like this.

 

What do the numbers of Working Set, Working Set Private, Private Bytes, ... mean to a "normal" programmer? I do allocate memory, use it and let the garbage collector free it when I no longer use it. The usage of memory is easy but Windows does a lot behind the scenes to make it work in an efficient way. The most crucial part is that Windows does is to share memory between processes when it is read only. Your code and read only data in a dll is a good example of a read only data structure that can be used in many processes. If you use the same dll in more than one process it will be shared between all processes that use it.

Shared Memory

If you look at the columns in the screen shot of Process Explorer you will notice that different counters for Working Set and Private Bytes have been selected. The deeper reason for this is that these numbers are incredibly useful to tune your application. To get more data you can select the Properties (right click on a process) of a process.

 

  •  Working set is the actually used physical memory which cannot be more than the amount of your RAM chips installed on your computer.
  • Private Bytes is the memory that cannot be shared between processes.
  • Working Set Private tells you how much private bytes attribute to your working set (allocated physical memory) .

When you add the total working set of all processes you can get a much bigger number than the installed memory on your machine. The reason behind this is that much of your process data (e.g. code) can be shared. You can calculate your working set out of

Working Set = Working Set Private + Working Set Shareable

If you want to create a well behaved .NET application you would aim for a low working set and low private bytes. Private bytes are for example all your allocated objects which live on the CLR heap either in Generation 0,1,2 or the Large Object Heap. More about that comes later.

Code Sharing - NGen And Precompiled Assemblies

In the .NET environment things are complicated a bit by the JIT compiler which does compile your IL code into each process separately. To achieve full code sharing in .NET processes you need to precompile your assembly with the NGEN tool to enable cross process code sharing. If you look with process explorer at your loaded dlls (press Ctrl-D in Process Explorer) you will find that all .NET assemblies from Microsoft are precompiled to minimize the memory footprint if more than one .NET process (which is very likely) is runnig. To validate that you are you using the precompiled images look into the Fusion Log (Fuslogvw and check the Native Image checkbox). An even easier way is to look at the path of you loaded dlls. If it does contain C:\Windows\NativeImages_v2.xxxx then you have loaded the precompiled assembly successfully. If not your NGen image did not match the loaded assembly and must be updated or you are using multiple AppDomains. In that case you need to decorate your Main method with LoaderOptimization.MultiDomain value to tell the JIT compiler to share the code between AppDomains.

        [LoaderOptimization(LoaderOptimization.MultiDomain)]

         static void Main(string[] args)

 

Code size in enterprise applications can easily reach several hundred MB which would become a major headache if no code sharing between processes is possible.

 

Data Sharing

Another way to share data between processes are Memory Mapped Files which will be supported by the .NET Framework 4.0 without any PInvokes finally.

 

Page File Allocated Memory

An even trickier thing is to allocate memory in the page file directly by calling VirtualAllocEx. Since the page file is shared between all processes it is not really possible to attribute this allocation to a specific process (yet). This is the reason why Page File backed memory does not show up as private byte memory at all although your application might consume GBs of it.

 

Working Set and Allocation Size

There is a very direct relation between Working Set size and allocation size in .NET applications. Try to run the following code snippet

    class Program

    {

        static void Main(string[] args)

        {

            List<byte[]> memory = new List<byte[]>();

            const int Factor = 85; // Allocate 85000 bytes with each loop run

            while (true)

            {

                var bytes = new byte[Factor * 1000];

                memory.Add(bytes); // prevent the GC from reclaiming the memory

                Thread.Sleep(Factor / 2);  // throttle the  allocation to make it visible

                Console.WriteLine("Next run");

            }

        }

    }

That code snippet will allocate memory in blocks each the size of 85000 bytes and sleep a little to watch the memory allocations more easily. If you wonder why on Earth I did use 85000 bytes as block size: That is the size when the .NET Framework (2.0, 3.0, 3.5, 3.5 SP1) will allocate your object on the Large Object Heap. All objects on this heap are never moved by the Garbage Collector. You can observe this directly when you watch the Working Set Size. It remains constant while you allocate hundreds of MBs of private bytes memory! Windows allocates the memory and finds that since you did not touch the memory it can be moved to the page file where your application will happily allocate more and more page file but not physical memory until you reach the 2 GB limit for a 32 bit process or the page file becomes full.

The effect changes drastically when you change the Factor from 85 to 84. This will change the allocation size below the threshold and you will allocate the memory on the normal CLR heap. That heaps are compacted from time to time by the GC which means that the GC will force Windows to move our memory from the page file into the physical memory. Although our application does not access the allocated bytes the GC will which binds our memory allocation directly to physical memory!

 

When you allocate memory in smaller chunks than 85000 bytes it will be allocated in your physical RAM due to the GCs nature to traverse the whole heap from time to time.

 

That is important since it severely limits our ability to run an application with many small objects on machines with not so much RAM. It is therefor vitally important for all .NET developers to track their memory consumption and have a sharp eye on many small (<85000 bytes) object allocations which directly add to the process working set. How and which memory profilers can be used to track typical .NET applications is a topic for a future post.


Tuesday, May 20, 2008 #

In case you have not noticed the Patterns & Practices Team has released the greatest and latest version of the Enterprise Library. More info's can be found at Grigory Melniks blog.

What’s New in v4.0?

This release of Enterprise Library includes the following:

– Integration with the Unity Application Block

– Windows Management Instrumentation (WMI) 2.0 support and improved instrumentation

– Performance improvements (particularly, in the Logging Application Block)

– Pluggable Cache Managers

– Visual Studio 2008 support

– Bug fixes

Unity looks cool but I was not able to see any noticeable startup performance gain. The usage of dynamic code generation (Build Plans) to invoke ctors and properties seems not to pay off when we create from each type only one or two instances.  WMI looks nice but you need this stuff only if your architecture does not have central configuration mechanism. Alternatively you could use a custom configuration source which gets its configuration from somewhere else (WCF?) quite easily. The Visual Studio 2008 integration is (of course) seamless as expected. One really cool thing is buried in the performance improvements section where the Logging Application Block has got some attention. As noted in my previous article Really Fast Formatting With Enterprise Library there are massive performance gains possible by doing some intelligent preprocessing of the TextFormatter templates and getting into a more functional style of programming. The formatting performance of the new Enterprise Library is >10 times better. It seems that these improvements were not on the original road map (the March CTP did not contain any improvements in this area) but my proposal was taken and implemented.

I did try to extend the improved TextFormatter but did give up soon because this portion of the Enterprise Library plays a little closed source here and does not let me override the formatting of any existing token. Fortunately a little copy & paste did solve this issue quite nicely (not the best pattern of code reuse but it works). By applying the new formatting method via delegates I was able to get performance gains over a factor 2 in logging throughput which helps scalability a lot. Come and get it and see it for yourself.


Wednesday, April 09, 2008 #

The Enterprise Library 4 CTP and the feature list can be found at the Codeplex Home page.

General Features of Entlib 4 (Excerpt from Release Notes)

· Enterprise Library 4.0 takes advantage of the improved features in the .NET WMI 2.0 API to provide update capability for configuration exposed through the Manageable Configuration Source. Objects defined in the Enterprise Library configuration, such as databases, trace listeners, and cache backing stores, are exposed through WMI as a set of classes that you can query and update. Providing that you have the relevant Windows permission to update WMI values, any changes you make to the properties of these classes are reflected back through the Manageable Configuration Source and update the configuration information exposed to your application. Changes do not affect the original configuration file contents and may be overridden by Group Policy.

· In version 4.0, Enterprise Library has the Allow Partially-Trusted Caller attribute (APTCA) on all assemblies. This means that you can call the methods of Enterprise Library and the application blocks from an application running in a partial trust environment. You can do this with the signed assemblies provided with Enterprise Library. There is no longer any requirement, as there was in version 3.x, to recompile the source code and the source code for Object Builder then either use the unsigned binaries or strong-name them yourself.

That is good news for all medium trust ASP.NET web developers and Active Directory administrators that want to have an easy way to configure the application blocks in a central way.

New Logging Application Features (Excerpt from Release Notes)

· The application block now allows you to specify a value for the Filter property of each of the Trace Listeners. This property applies a filter that selects the level of message that it will detect. The valid values are All, Off, Critical, Error, Warning, Information, Verbose, and Activity Tracing. The setting effectively means "the specified level and everything more important." For example, the Warning setting will detect warnings, errors, and critical events. The default is All, which means that the behavior of the listener is the same as in previous versions of Enterprise Library if a value is not specified in the configuration.

· The application block now allows you to specify that the RollingFlatFileTraceListener will start a new file at midnight every day. To use this behavior, set the value of the RollInterval property of the RollingFlatFileTraceListener to Midnight.

· The application block contains performance improvements that include the following:

◦ Delayed gathering of context information until actually requested (except in distributed scenarios where context information is gathered even if not used).

◦ Automatic flushing is now configurable. However, the default is to flush always, which means that the behavior of the listener is the same as in previous versions of Enterprise Library if a value is not specified in the configuration. To disable automatic flushing, set the AutoFlush property of the Log Source to True. It is then your responsibility to ensure that all entries are flushed to the target, especially if an exception or failure occurs in the application. Otherwise, you will lose any cached logging information not yet written to the target.

· Log file names now support the use of environment variables. You can include environment variables such as %WINDIR%, %TEMP%, and %USERPROFILE% in the Filename property of the Flat File Trace Listener, Rolling Flat File Trace Listener, and XML Trace Listener.

The application block implements several new performance counters that you can use to monitor performance and operations. The new counters are Total Logging Events Raised, Total Trace Listener Entries Written, and Total Trace Operations Started.

First Impressions

The startup performance improvements with Unity (which is already released) are not done yet but I hope that this will help to make the usage of the application blocks nearly invisible. The new dependency injection mechanism which uses reflection like ObjectBuilder (slow) but uses dynamically generated IL code to call the actual constructors/getter/setter methods. That increases the object creation performance dramatically. I hope that this will make a noticeable difference in a real world application.

After doing a quick diff I found some improvements in the Logging Application Block. They do mainly concern the configuration via WMI and group policies with ADM templates. That enables to query and deploy logging configuration via an Active Directory without changing a physical file on the target machines. That is a nice feature but I have to yet to see if that does work in all scenarios. Low privileged users do not always have full WMI access which could hinder the usage of the new feature in low trust environments.

Writing to a log file seems to be very easy but it did take over three years (Jan. 2005 was the first release) to support environment variables in file names. The delivered code quality was always very good but why on earth did it take so long? Working agile with a very strict timeline does enable the P&P team to deliver top features on time but low priority features such as this one have been dropped. That is no problem since the community did deliver very quickly the missing features such as the Rolling File Trace Listener from Erwyn van der Meer which is in my opinion still the one of the cleanest implementations of a Rolling File Trace Listener. What I really miss is a backflow of great community extensions to the Enterprise Library main code base.

A really nifty feature of the new Rolling File Trace Listener is the roll interval Midnight to enable to start a new log file at every day. Although one should not expect a roll over exactly at midnight since the roll over is only performed when the next log call arrives after the time span has elapsed.

 

Logging Application Block Performance

The creation of LogEntry objects has become dramatically faster since the current values are only read when needed. That is good but not perfect (yet). The main observations I did make

 

  • LogEntry: Process Id, Process Name and AppDomain Name will never change during the lifetime of an application. They could be read only once and copied from static variables to the actual instance variables on demand.
  • LogEntry: The native and managed thread id is fetched on demand which could be too late if the output destination does asynchronous operations (queuing?).
  • Formatters: There is no progress in the text formatter performance. That is surprising to say the least since delayed LogEntry variable expansion does not help much if you use a TextFormatter with the default template to format your LogEntry. I whish so much I could check out the sources for TextFormatter ....

 

The Good/The Whishes/The Summary for Entlib 4

Good:

  • Many new features (central configuration, unity integration, ...)
  • Performance Improvements

Wishes

  • High quality community extensions should be included in the main code base.
  • Do NOT Authenticode sign the strong named binaries again. I really hate the 15s startup delays of non Internet connected machines! Yes I know with .NET 3.5 there is a way to turn Authenticode verification off in the App.config file but .NET 2.0 users will suffer from this.
  • Do include the pdb's for Release and Debug for all shipped binaries (ObjectBuilder too).

The next release of the Enterprise Library does deliver cool features such as Unity and support for group policy based configuration. Finally the Patterns & Practices team has found the time to fix some minor quirks which have been annoying but not mission critical. Since the new release seems to be API compatible with the previous versions you should definitely consider to upgrade as soon as possible. Have a look at the blog of Grigori Melnik for latest announcements about the upcoming release of Entlib 4.


Tuesday, April 08, 2008 #

Error handling is a difficult beast. And there is always one more way to do it. Thottam Sriram did write some nice examples what main cases you usually have to deal with. The most ugly case is the one where you cannot handle an exception but you need to do different cleanup logic in a finally block.

 

               public void Func()

        {

            bool cleanup = true;

 

            try

            {

                OtherFunc();

                cleanup = false;

            }

            catch (Exception)

            {

                // perform Cleanup on error

            }

            finally

            {

                if (cleanup)

                {

                    // perform cleanup on error

                }

            }

        }

 

The necessity to keep a state variable for a corner case does complicate the code quite a lot. You can get around that by knowing that .NET exceptions just wrap SEH (Structured Exception Handling) of Windows. The CLR does make extensive usage of SEH which you can read in the wonderful article from Chris Brumme. To sum it up in a few words if an exception is thrown an Exception Pointers structure is allocated. The actual exception processing is done two in two steps.

  1. Ask all exception handlers what about the current error can be done. If one says it can fix it the exception processing is stopped. No exception will ever happen.
  2. If no handler could fix it in the first pass they are actually invoked one by one and the stack unwinding does start.

C++ has special keywords for SEH named __try/__except/__leave which cannot be mixed with the regular try/finally C++ keywords within one function because only one exception handling chain within a function does make sense. Now you could ask how that complicated stuff can make by error handling easier? Well since we know that during an exception unwind scenario for the executing thread an Exception Pointers structure must exist we can check for its existence and we can distinguish the non exception case from the other. The BCL gives us direct access to this precious information via Marshal.GetExceptionPointers. Armed with that knowledge we can create a simple helper class that returns true if we are in an exception unwind case.

 

    public static class ExceptionHelper

    {

        /// <summary>

        /// Check if we are in a exception unwind scenario or not.

        /// </summary>

        public static bool InException

        {

            get

            {   // Errata: The red marked code seems to be necessary. Since unit tests with .NET 2.0

                // have shown that only checking for the Exception Pointers structure does not always work.

                return Marshal.GetExceptionPointers() == IntPtr.Zero &&

                       Marshal.GetExceptionCode() == 0 ? false : true;

            }

        }

    }

 

The code from above becomes now

 

        public void ImprovedFunc()

        {

            try

            {

                OtherFunc();

            }

            finally

            {

                if (ExceptionHelper.InException)

                {

                    // perform cleanup on error

                }

                else

                {

                    // Do normal cleanup in non exception case

                }

            }

        }

 

We did get rid of the catch clause and the state variable. That little trick can make your life easier in some situations. You can call the check property not only in a finally clause but also from a helper function because the Exception Pointers structure is thread local and visible as long exception processing is going on.

 

Remarks

 

At CLR IL (Intermediate Language) level there is the fault keyword which is similar. It acts like a finally block that is only executed when an exception has occurred. Besides the fact that C# does not support it is not really something that I would want. When an error happens I must execute nearly the same cleanup actions as in the non exceptional case. For that reason code sharing within one code block is essential to prevent code duplication both handlers. If you try to be smart and use the goto keyword (yes I won't dare to use it if it would have worked) to jump from a from a catch handler into the finally handler to skip a portion of the cleanup code you will get a compile error.

Since we have got our hands on the Exception Pointers structure we should be able to find there some reference to the managed exception and perhaps even return it. I tried that and failed. By looking at the Rotor source code I could not find evidence that this structure does contain any hints to the managed exception object. It looks like that the managed Thread implementation has it stored somewhere but since CLR objects are a mixture of IL code and C++ classes it is not feasible to use a simple fixed offset and cast it to the managed exception object.

If somebody has more info about that I am always interested to learn more ...


Monday, January 14, 2008 #

One of the action points for Enterprise Library 4 is to improve the performance of the TextFormatter. I like performance problems and found some interesting results I would like to share here. TextFormatter does take a template where the placeholders are expanded for every log message. The template does look like this

Timestamp: {timestamp}
Message: {message}
Category: {category}
Priority: {priority}
EventId: {eventid}
Severity: {severity}
Title:{title}
Machine: {machine}
Application Domain: {appDomain}
Process Id: {processId}
Process Name: {processName}
Win32 Thread Id: {win32ThreadId}
Thread Name: {threadName}
Extended Properties: {dictionary({key} - {value}
)}

TextFormatter does use StringBuilder to create from the template a mutable string and does then call StringBuilder.Replace for each token it could contain. It does basically a String.IndexOf to find if something has to be replaced and then allocates a new string which contains enough space for the token to be inserted. This process is repeated until all gaps are filled. This results in searches for tokens which are not present in the current template and some memory allocation/copy overhead because the string to replace will never have the same size.

But the most expensive things are the TokenFunctions which do expand e.g. the Extended Properties collection at the end of the example template. A quick look into the source shows where the problem is:

public virtual void Format(StringBuilder messageBuilder, LogEntry log)

    {

       ...

            string messageString = messageBuilder.ToString();

       ...

 

The performance optimization to use StringBuilder in the Formatter has cornered the TokenFunction formatter because it cannot to a simple replace of a fixed string and needs to parse the template. But StringBuilder gives you no access to the not yet constructed string and you need to use therefore this solution.

Armed with this knowledge I came up with an alternate approach to format a text template which does not suffer from the deficiencies of StringBuilder.Replace. The trick is to simply use StringBuilder.Append. To be able to do this I split the template apart in its constant (marked in orange) parts and use for each dynamic content a retriever function. The tuple (string, callback) is put into an object (MicroFormatter) from which we can create a list that allows us to build up the expanded template step by step.  Registering template fill callbacks does eliminate all string searches and (nearly) all buffer reallocations. The cool thing is that this is also possible for the TokenFunctions which can now directly format the parsed template.

public class MicroFormatter

    {

        /// <summary>

        /// Constant prefix which is part of the message template

        /// </summary>

        public string Prefix { get; set; }

 

        /// <summary>

        /// Gets the dynamic content of the LogEntry

        /// </summary>

        public Func<LogEntry, string> Retriever { get; set; }

 

        /// <summary>

        /// TokenFunction formatter wrapper function which does need a string template as parameter which is then formatted.

        /// The formatted string is then returned

        /// </summary>

        public Func<LogEntry, string, string> TokenFuncRetriever { get; set; }

 

        /// <summary>

        /// Template which is expanded by the token e.g. "dictionary({key} - {value}" for the DictionaryFormatter

        /// </summary>

        public string TokenFuncTemplate { get; set; }

 

        public MicroFormatter()

        {

        }

 

        public MicroFormatter(string prefix, string tokenFuncTemplate, Func<LogEntry, string, string> tokenFuncRetriver)

        {

            Prefix = prefix;

            TokenFuncTemplate = tokenFuncTemplate;

            TokenFuncRetriever = tokenFuncRetriver;

        }

 

        public MicroFormatter(string prefix, Func<LogEntry, string> retriever)

        {

            Prefix = prefix;

            Retriever = retriever;

        }

 

        /// <summary>

        /// Appends to a StringBuilder instance the static and dynamic contents of the parsed template

        /// </summary>

 

           public void Format(LogEntry entry, StringBuilder template)

 

        {

            if (Prefix != null) // Append constant template part

            {

                template.Append(Prefix);

            }

 

            if (Retriever != null) // Add dynamic content of LogEntry

            {

                template.Append(Retriever(entry));

            }

            else if (TokenFuncRetriever != null) // Or add dynamic content of TokenFunction

            {

                template.Append(TokenFuncRetriever(entry, TokenFuncTemplate));

            }

        }

    }

 

This allows us to generate a easy Format Method:

    // Combined list of microformatters which allow an easy StringBuilder.Append instead of

    // a much slower StringBuilder.Replace which does first search (many times pointless) and

    // then replaces the string which needs to reallocate and copy the buffer

    List<MicroFormatter> myFormatters = new List<MicroFormatter>();

 

    public string Format(LogEntry entry)

    {

        // allocate string builder can be dynamically sized

        StringBuilder sb = new StringBuilder(500);

 

        // Call formatters in sequence they were entered to build the spitted

        // template piece by piece together again.

        foreach (MicroFormatter fmt in myFormatters)

        {

           fmt.Format(entry, sb);

        }

 

        return sb.ToString();

    }

 

The red lines are the important changes which give us much better performance. This simple solution does improve the formatting speed over 14 times! That should be enough for everyone ;-) Another costly part is the creation of the LogEntry itself which does call a private method in its default ctor (CollectIntrinsicProperties) which can cost quite some time. It would be nice if the P&P team would make this function virtual so one has a chance to optimize things a bit. This would be beneficial if somebody does want to filter LogEntry derived objects which can be filtered out without needing any of the other properties. That would allow to defer the costly data collection when it is really ensured that the entry will be logged.

 

Performance

All Performance Tests were performed with an Intel Core Duo 6600 @ 2.40 GHz with 3 GB RAM, EntLib 3.1, VS2008 RTM and the message template which is shown at the start of the article.

 

Test Case (executed 100K times) Execution Frequency in 1/s Speed Gain (x factor)
Create LogEntry    22 000 -
Format LogEntry TextFormatter     5 300 1
Format LogEntry SmartFormatter   75 000 14

 

More Code

The complete code of my SmartFormatter is shown below:

namespace Logging

{

    /// <summary>

    /// Really fast formatter with equivalent features as TextFormatter of the Enterprise Library

    /// </summary>

    public class SmartFormatter : ILogFormatter

    {

        protected const char TokenStartChar = '{';

 

        protected const string timeStampToken = "{timestamp}";

        protected const string messageToken = "{message}";

        protected const string categoryToken = "{category}";

        protected const string priorityToken = "{priority}";

        protected const string eventIdToken = "{eventid}";

        protected const string severityToken = "{severity}";

        protected const string titleToken = "{title}";

        protected const string errorMessagesToke = "{errorMessages}";

 

        protected const string machineToken = "{machine}";

        protected const string appDomainNameToken = "{appDomain}";

        protected const string processIdToken = "{processId}";

        protected const string processNameToken = "{processName}";

        protected const string threadNameToken = "{threadName}";

        protected const string win32ThreadIdToken = "{win32ThreadId}";

        protected const string activityidToken = "{activity}";

 

        protected const string NewLineToken = "{newline}";

        protected const string TabToken = "{tab}";

 

        protected const string DictionaryStartToken = "{dictionary(";

        protected const string KeyValueStartToken = "{keyvalue(";

        protected const string TimeStampStartToken = "{timestamp(";

        protected const string ReflectedPropertyStartToken = "{property(";

        protected const string EndToken = ")}";

 

        /// <summary>

        /// Key is the Token template e.g. "{timestamp}" and value is a callback function that retrieves the requested string from

        /// a LogEntry object.

        /// </summary>

        protected internal Dictionary<string, Func<LogEntry, string>> myTokenToFunc = new Dictionary<string, Func<LogEntry, string>>();

 

        /// <summary>

        /// Key is the Token Function start e.g. "{dictionary(" and value is a callback function that formats from a

        /// configured template the contents of a LogEntry object.

        /// </summary>

        protected internal Dictionary<string, Func<LogEntry, string, string>> myTokenFormatterToFunc = new Dictionary<string, Func<LogEntry, string, string>>();

 

        // Combined list of MicroFormatters which allow an easy StringBuilder.Append instead of

        // a much slower StringBuilder.Replace which does first search (many times pointless) and

        // then replaces the string which needs to reallocate and copy the buffer

        List<MicroFormatter> myFormatters = new List<MicroFormatter>();

 

        /// <summary>

        /// For brevity we use the C# 3.0 lambda syntax

        /// </summary>

        protected internal virtual void AddWellknownTokens()

        {

            myTokenToFunc.Add(timeStampToken, (LogEntry ent) => { return ent.TimeStampString; });

            myTokenToFunc.Add(messageToken, (LogEntry ent) => { return ent.Message; });

            myTokenToFunc.Add(categoryToken, FormatCategoriesCollection);

            myTokenToFunc.Add(priorityToken, (LogEntry ent) => { return ent.Priority.ToString(); });

            myTokenToFunc.Add(eventIdToken, (LogEntry ent) => { return ent.EventId.ToString(); });

            myTokenToFunc.Add(severityToken, (LogEntry ent) => { return ent.Severity.ToString(); });

            myTokenToFunc.Add(titleToken, (LogEntry ent) => { return ent.Title; });

            myTokenToFunc.Add(errorMessagesToke, (LogEntry ent) => { return ent.ErrorMessages; });

            myTokenToFunc.Add(machineToken, (LogEntry ent) => { return ent.MachineName; });

            myTokenToFunc.Add(appDomainNameToken, (LogEntry ent) => { return ent.AppDomainName; });

            myTokenToFunc.Add(processIdToken, (LogEntry ent) => { return ent.ProcessId; });

            myTokenToFunc.Add(processNameToken, (LogEntry ent) => { return ent.ProcessName; });

            myTokenToFunc.Add(threadNameToken, (LogEntry ent) => { return ent.ManagedThreadName; });

            myTokenToFunc.Add(win32ThreadIdToken, (LogEntry ent) => { return ent.Win32ThreadId; });

            myTokenToFunc.Add(activityidToken, (LogEntry ent) => { return ent.ActivityId.ToString(); });

            myTokenToFunc.Add(NewLineToken, (LogEntry ent) => { return Environment.NewLine; });

            myTokenToFunc.Add(TabToken, (LogEntry ent) => {return "\t"; });

 

            // register token functions

            myTokenFormatterToFunc.Add(DictionaryStartToken, (LogEntry ent, string template) =>

            {

                DictionaryToken token = new DictionaryToken();

                return token.FormatToken(template, ent);

            });

 

            myTokenFormatterToFunc.Add(KeyValueStartToken, (LogEntry ent, string template) =>

                {

                    KeyValueToken token = new KeyValueToken();

                    return token.FormatToken(template, ent);

                });

 

            myTokenFormatterToFunc.Add(TimeStampStartToken, (LogEntry ent, string template) =>

                {

                    TimeStampToken token = new TimeStampToken();

                    return token.FormatToken(template, ent);

                });

 

            myTokenFormatterToFunc.Add(ReflectedPropertyStartToken, (LogEntry ent, string template) =>

                {

                    ReflectedPropertyToken token = new ReflectedPropertyToken();

                    return token.FormatToken(template, ent);

                });

        }

 

        public SmartFormatter()

        {}

 

        public SmartFormatter(string template)

        {

            AddWellknownTokens();

            BuildFormatterList(template);

        }

 

        public string Format(LogEntry entry)

        {

            // allocate string builder can be dynamically sized

            StringBuilder sb = new StringBuilder(500);

 

            // Call formatters in sequence they were entered to build the splitted

            // template piece by piece together again.

            foreach (MicroFormatter fmt in myFormatters)

            {

                fmt.Format(entry, sb);

            }

 

            return sb.ToString();

        }

 

        /// <summary>

        /// Returns the template in between the parentheses for a token function.

        /// Expecting tokens in this format: {keyvalue(myKey1)}.

        /// </summary>

        /// <param name="startPos">Start index to search for the next token function.</param>

        /// <param name="message">Message template containing tokens.</param>

        /// <returns>Inner template of the function.</returns>

        protected virtual string GetInnerTemplate(string tokenFuncStart, int startPos, string message)

        {

            int tokenStartPos = message.IndexOf(tokenFuncStart, startPos) + tokenFuncStart.Length;

            int endPos = message.IndexOf(EndToken, tokenStartPos);

            return message.Substring(tokenStartPos, endPos - tokenStartPos);

        }

 

        /// <summary>

        /// Parse the template and create the list of microformatters. This way we get an ordered list

        /// of things we really have to do instead of trying to blindly format tokens in a template which

        /// may not be present

        /// </summary>

        /// <param name="template"></param>

        internal virtual protected void BuildFormatterList(string template)

        {

            MicroFormatter fmt = new MicroFormatter();

            List<char> curNonTokenString = new List<char>();

            bool isTokenChar = false;

 

            // Go through template and build list of Microformatters

            for(int i = 0; i<template.Length; i++)

            {

                isTokenChar = false;

                if (template[i] == TokenStartChar)

                {

                    // Transform all template variables

                    foreach (string token in myTokenToFunc.Keys)

                    {

                        if (template.IndexOf(token, i) == i)

                        {

                            i += token.Length-1;

                            MicroFormatter formatter = new MicroFormatter();

                            formatter.Prefix = new string(curNonTokenString.ToArray());

                            // Get for this token the callback function which retrieves

                            // from a LogEntry object the resulting dynamic string

                            formatter.Retriever = myTokenToFunc[token];

                            myFormatters.Add(formatter);

                            isTokenChar = true;

                            curNonTokenString.Clear();

                            break;

                        }

                    }

 

                    // Transform all Token functions in the message template

                    foreach (string tokenStart in myTokenFormatterToFunc.Keys)

                    {

                        if (template.IndexOf(tokenStart, i) == i)

                        {

                            string tokenTemplate = GetInnerTemplate(tokenStart,i, template);

                            i += tokenStart.Length + tokenTemplate.Length + EndToken.Length - 1;

 

                            // Create formatter for this token function and store the token function

                            // template inside the formatter

                            MicroFormatter formatter = new MicroFormatter(new string(curNonTokenString.ToArray()),

                                                                          tokenTemplate,

                                                                          myTokenFormatterToFunc[tokenStart]);

                            myFormatters.Add(formatter);

                            isTokenChar = true;

                            curNonTokenString.Clear();

                            break;

                        }

                    }

                }

 

                // if the current template character does not belong to a token variable or function

                // store it later as constant template part in the next MicroFormatter instance

                if (!isTokenChar)

                {

                    curNonTokenString.Add(template[i]);

                }

            }

 

            // the rest of the template is constant

            if (curNonTokenString.Count != 0)

            {

                myFormatters.Add( new MicroFormatter(new string(curNonTokenString.ToArray()), null) );

            }

        }

 

        public static string FormatCategoriesCollection(LogEntry ent)

        {

            StringBuilder categoriesListBuilder = new StringBuilder();

            int i = 0;

            foreach (String category in ent.Categories)

            {

                categoriesListBuilder.Append(category);

                if (++i < ent.Categories.Count)

                {

                    categoriesListBuilder.Append(", ");

                }

            }

            return categoriesListBuilder.ToString();

        }

   }

}


Tuesday, November 20, 2007 #

This is my first test post with Windows Live Writer 2008. I hope it will not mess up when I paste source code ....

But back to business. VS2008 formerly known as Orcas is out! It is just great and has many cool features inside it.

You can download various editions of Visual Studio 2008 at the following locations.

One of the best features are Language Integrated Queries (LINQ). The first question that does arise is if LINQ to Objects is able to replace common for loops in terms of performance in every case. To check it out I did write a very simple test where we filter for specific items in an array and return the filtered list.

List<string> Test = new List<string> { "This", "is", "a", "list", "that", "will", "be", "filtered", "with",
"a", "for", "loop", "and", "LINQ"};

 

List<string> FilterFor(string substr)
{
     List<string> ret = new List<string>();

     foreach (string str in Test)
     {
         if (str.Contains(substr))
             ret.Add(str);
     }

     return ret;
}

 

List<string> FilterForLinq(string substr)
{
     var ret = from str in Test where str.Contains(substr) select str;
     return ret.ToList<string>();
}

 

On my Intel Core Duo 6600 @ 2.40 GHz with 3 GB RAM I get

  First Call in ms Tight Loop in Calls/ms
FilterFor 1 584
FilterForLinq 2 454
FilterForLinq with IEnumerable<string> as return value 1 17543

 

 

The values were measured with the .NET Framework 3.5 RTM in release configuration. It seems that for simple filters we see a small (20%) speed loss if we use LINQ in very tight loops. But when I used a profiler (ANTS 3.0) I did get no speed difference. Since the profiler uses callbacks to get the exact timing we loose data locality and CPU cache benefits which slows the program nearly 100 times down such small effects are not visible anymore.

When we do only create the query but do not execute it we are over 30 times faster than the traditional array based approach. Lazy evaluation is the key to defer the actual work if we do not need all values or do not use them always. Huge speed gains are possible by doing only as much work as need but no more. I am looking forward to PLINQ which could take advantage of my two CPUs.

If you want to get started with more complex queries I recommend the 101 LINQ Samples site to learn what you can do with such queries or MSDN.


Tuesday, February 20, 2007 #

Today I got my brand new Zune MP3 player which I did win at a Microsoft Event. Lucky me ;-). Ok install the software and you are ready to rockn roll, I thought, and I was wrong. Very wrong. At 94% of the installation it did stop with an error which was there even after a reboot. A quick Google search did confirm that other users had this problem also. One of the possible installation problems was the the Users group has the wrong name. Aha! I have a German Windows XP where the Users group is named Benutzer. The solution was to simply fire up the user management snapin and rename the Benutzer group to User. But wait I do not have installed a Home Edition of Windows XP. Darn there is no usable User Management Snapin installed. This was especially annoying since I did program a user and group management MMC snapin for my employer several years ago. I do even know the API calls. After trying around with WMI and DirectoryEntry, WindowsIdentity under Powershell, C# and VB Script I gave up and decided that there must be a simpler solution. And there was. The net command of Windows does allow you to programmatically manage groups and users. No rename there but it should be enough to create a Users group and assign it to my user account.

I add the description in German so others have the chance to find this information with Google/Another MS approved search engine.

Kann Zune Player nicht mit deutschem Windows XP installieren. Die Installation stoppt bei 94%. Um den Fehler zu beheben bitte folgende Befehle auf einem DOS command Prompt eingeben. Der String YourUserName mit dem aktuellen Benutzer Namen ersetzen. Danach sollte die Zune Player installation fehlerfrei funktionieren.

net localgroup "Users" /ADD
net localgroup "Users" "YourUserName" /ADD

net localgroup "INTERACTIVE" /ADD
net localgroup "INTERACTIVE" "YourUserName" /ADD

This will fix the error which do show up in the Windows Application Event Log.

An error occurred while applying security settings. Users is not a valid user or group. This could be a problem with the package, or a problem connecting to a domain controller on the network. Check your network connection and click Retry, or Cancel to end the install. Unable to locate the user's SID, system error 1332; (NULL); (NULL); (NULL).

Zune -- Error 1609. An error occurred while applying security settings. INTERACTIVE is not a valid user or group. This could be a problem with the package, or a problem connecting to a domain controller on the network. Check your network connection and click Retry, or Cancel to end the install. Unable to locate the user's SID, system error 1332; (NULL); (NULL); (NULL).

What do we learn from this. Even Microsoft does not always follow its own security programming best practices. The Users and Interactive groups can easily be retrieved via it's well known SIDs (Security Identifiers) and not by their localized names. For my vigilant Microsoft Employee readers I paste them here so they can fix this issue faster ;-).

SID: S-1-5-32-545
Name: Users
Description: A built-in group. After the initial installation of the operating system, the only member is the Authenticated Users group. When a computer joins a domain, the Domain Users group is added to the Users group on the computer.

SID: S-1-5-4
Name: Interactive
Description: A group that includes all users that have logged on interactively. Membership is controlled by the operating system.

I cannot tell you how good this player is compared to my IPod Nano which I incidentally did win also (no it was not an Apple event). It does still sync my MP3 folders which it managed to find somehow. What I do not like is that it does not show up as new drive where I can simply copy the files onto. This might be a tribute to DRM which will hopefully become history in the not so distant future.

Funny side note: At the back of my Zune are greetings from Seattle.

Hello From Erlangen ;-)


Tuesday, January 23, 2007 #

Strong naming has always been a complicated matter.  Sometimes it is even impossible when you get unsigned assemblies from a third party software vendor and you have to build strong named assemblies to install them into the GAC. When you decide to strong name your project it is a all or nothing decision. If your target is strong named then all references must also be strong named. If you do not have the source code and a project for your third party dll you will not be able to sign your project. I have created Signer which is hosted at CodePlex that allows you to add strong names to binaries within minutes. You can even create a strong named build of your whole project without any changes to your build settings. Below is a picture which shows Signer in action:

Signer

How does it work?

Signer does basically a full round trip by decompiling the assembly into IL code make the necessary modifications and compile it back to a valid assembly. The required modifications include
  • Update of all references
  • Change/Removal of InternalsVisibleToAttribute
  • Update of custom attributes with a type parameter
  • A little fix to work around an ILDASM problem
Reference Update in IL

This is the easy part. A reference does consist of the assembly name, public key token and it's version. Signer has only to insert the public key token where none is present to patch the IL file.

.assembly extern Microsoft.Practices.EnterpriseLibrary.Common
{
  .publickeytoken = (BE CA 05 5E 5B 7D 2D C8 ) // Inserted by Signer if not present
  .ver 2:0:0:0
}

InternalsVisibleTo Attribute

With .NET 2.0 the InternalsVisibleTo attribute was introduced that allows to grant other assemblies access to classes and methods marked with the internal keyword. To prevent in a strong name scenario that an unsigned assembly does get access to internals directly (you can use reflection instead but this comes with a perf cost) the CLR does check at runtime if all InternalsVisibleTo attribute declarations grant access to strong named assemblies only. If not a runtime exception is thrown and the process terminates.

.custom  instance void [mscorlib]System.Runtime.CompilerServices.InternalsVisibleToAttribute::.ctor(string)  =
                              ( 01 00 48 4D 69 63 72 6F 73 6F 66 74 2E 50 72 61   // ..HMicrosoft.Pra
                                63 74 69 63 65 73 2E 45 6E 74 65 72 70 72 69 73   // ctices.Enterpris
                                65 4C 69 62 72 61 72 79 2E 4C 6F 67 67 69 6E 67   // eLibrary.Logging
                                2E 43 6F 6E 66 69 67 75 72 61 74 69 6F 6E 2E 44   // .Configuration.D
                                65 73 69 67 6E 2E 54 65 73 74 73 00 00 )          // esign.Tests.."


You have to insert the public key which is over 300 bytes long into your project which is quite tedious if you want to do it manually. At the moment Signer makes life easy and simply removes the attribute to prevent run time errors but if needed the change is trivial.

Custom Attributes
This was by far the most tricky part. When you declare custom attributes with a type parameter the full qualified type name including assembly and public key token is inserted into the IL code. Normally you get this as a binary blob but with the /CAVERBAL option of ILDASM which was introduced with .NET 2.0 things get much easier.

.custom instance void [System]System.ComponentModel.EditorAttribute::.ctor(class [mscorlib]System.Type,
                                                                           class [mscorlib]System.Type)
      = {
         type(class
              'Microsoft.Practices.EnterpriseLibrary.Configuration.Design.ReferenceEditor,
               Microsoft.Practices.EnterpriseLibrary.Configuration.Design, Version=2.0.0.0, Culture=neutral,
               PublicKeyToken=beca055e5b7d2dc8')
         type(class
              'System.Drawing.Design.UITypeEditor, System.Drawing, Version=2.0.0.0, Culture=neutral,
               PublicKeyToken=b03f5f7f11d50a3a')
        }


Signer does parse the attributes and does set the PublicKeyToken where needed.

ILDASM Problems
All would work wonderful if, well if the nice /CAVERBAL option would produce running code. But I got many ConfiguratonErrors exceptions which told me that the default value was of the wrong type. Arghhh. After digging a little deeper I found out what was going on. Suppose you have a simple configuration class for settings inside the App.Config file:

using System;

using System.Collections.Generic;

using System.Text;

using System.Configuration;

 

namespace NoRoundTrip

{

    enum MyValues

    {

        V1

    }

 

    class Check : ConfigurationSection

    {

        const string PropName = "Name";

 

        // DefaultValue is a property of type object

        [ConfigurationProperty(PropName,DefaultValue=MyValues.V1)]

        public MyValues Value

        {

            get { return (MyValues) this[PropName]; }

        }

    }

 

    class Program

    {

        static void Main(string[] args)

        {

            Check c = new Check();

            MyValues defValue = c.Value;  // get default value

        }

    }

}


The program will work without problem when used as is. But after decompiling with the /CAVERBAL option and recompiling you will get a nasty runtime exception:

Unhandled Exception: System.Configuration.ConfigurationErrorsException:
                                     The default value for the property 'Name' has different type than the one of the property itself.


Here is the innocent looking IL code
    .custom instance void
          [System.Configuration]System.Configuration.ConfigurationPropertyAttribute::.ctor(string)

                = {
                      string('Name')

                      property object 'DefaultValue' = object( int32(0) )
                  }


The ConfigurationProperty DefaultValue does not preserve type identity for enums but does simply declare them as its native integer value which will lead to this exception during run time because there is an extra check code in the BCL code. The generated IL code is valid for the CLR but not for the BCL. Signer does get around this issue by decompiling every assembly once with the parser friendly /CAVERBAL option and the second time without. It will then merge the binary blobs for the ConfigurationProperty Attributes into the other IL file to get a fully functional assembly. Once this obstacle was solved all worked perfectly.

Signer Usage

If you start Signer without any options you will get a pretty self explanatory help to do signing. The only thing to remeber is that the .NET Framework tool sn  is needed to generate a strong name key pair by simply executing sn -k c:\key.snk and you are ready to use Signer out of the box with your project.

Signer (C) by Alois Kraus 2007 (akraus1@gmx.de) version 1.0.0.0

Signer -k <KeyFile> -outdir <Output Dir> -a <Assembly1> <Assembly2> ... [-decomp "Options" -comp "Options" -debug]
   Signer does sign all assemblies and updates their references if the assembly is not already signed
   This way you can convert a whole bunch of assemblies which have never seen a key file into strong named assemblies without recompiling!
   -k        Key File name of key container file which is created usually by the sign tool (sn -k)
   -outdir   Output directory where the strong signed assemblies are copied to
   -decomp   Additional ildasm options e.g. "/STATS"
   -comp     Additional ilasm options e.g. "/X64"
   -a        Can be a list of assembly names or a wildcard search pattern e.g. *.dll
   -debug    Do not delete temporary IL files in %TEMP% flder after completion

 Example: Signer -k c:\key.snk -outdir .\build -a *.dll *.exe  Copy the signed assemblies into the newly created directory build.
 Example: Signer -k c:\key.snk -outdir . -a *.dll *.exe        Sign and overwrite the orignal binaries in current directory.

Error: No Key file specifed (use -k KeyFile ).
Error: No Output directory specifed (use -outdir Dir ).
Error: No input assemblies specified (use -a Assembly ).


Conclusions

It was fun to create the tool while learning quite a lot about the assembly structure, strong names and IL. As you can see strong names are not really a protection for your software as they can be easily removed and your assemblies resigned without big problems. The only real advantage strong names offer is that you can prove that an assembly which has your public key on it was created by you as long as your secret key is kept safe.
From this point other extensions to Signer are quite easy to add. E.g. removal of strong names, make internals public or adding your own InternalsVisibleTo attribute to other assemblies. The world is full of possibilities ;-).

Monday, October 30, 2006 #

All good things have to end even your perfectly working managed executable. But do you know in what circumstances the CLR will terminate your program and much more importantly when do you have a chance to run some finalizers and shutdown code? As always in live it depends. Lets have a look at the reasons why program execution can be terminated.

Ways to terminate an application:
  1. The last  foreground thread of an application ends. The thread which entered the Main function is usually the only  foreground thread in your application.
  2. If you run an Windows Forms application you can call System.Windows.Forms.Application.Exit() to cause Application.Run to return.
  3. When you call System.Environment.Exit(nn)
  4. Pressing Ctrl-C or Ctrl-Break inside a Console application.
  5. Call System.Environment.FailFast (new in .NET 2.0) to bail out in case of fatal execution errors.
  6. An unhanded exception in any thread regardless if it is managed or unmanaged.
  7. Unmanaged exit calls within unmanaged code.
What exactly happens during shutdown is explained by an excellent article of Chris Brumme.

Shutdown Process
Generally speaking we can distinguish between two shutdown types: Cooperative and Abnormal. Cooperative means that you get some help from the CLR (execution guarantees, callback handlers, ...) while the other form of exit is very rude and you will get no help from the runtime.
During a cooperative shutdown the CLR will unload the Application Domain (e.g. after leaving the Main function). This involves killing all threads by doing a hard kill as opposed to the "normal" ThreadAbortException way where the finally blocks of each thread are executed. In reality the threads to be killed are suspended and never resumed. After all threads sleep the pending finalizers are executed inside the Finalizer thread. Now comes a new type of finalizers into the game which was introduced with .NET 2.0. The Critical Finalizers are guaranteed to be called even when normal finalization did timeout (for .NET 2.0 the time is 40s ) and further finalizer processing is aborted. If you trigger an exception inside a finalizer and it is not catched you will stop any further finalization processing, including the new Critical Finalizers. This behavior should be changed in a future version of the CLR. What are these "safe" finalizers good for when an "unsafe" finalizer can prevent them from running? There is already a separate critical finalizer queue inside the CLR which is processed after the normal finalization process did take place.

Normal Shutdown
A fully cooperative shutdown is performed in case of 1, 2 and 3. All managed threads are terminated without further notice but all finalizers are executed. No further notice means that no finally blocks are executed while terminating the thread.

Abnormal Shutdown
The bets are off and no CLR event is fired which could trigger a cleanup action such as calling finalizer, or some events. Case 4 can be converted into a normal shutdown by the code shown below. Case 6 can be partially handled inside the AppDomain.UnhandledException. The other ones (Environment.FailFast and unmanaged exits) have no knowledge of the CLR and perform therefore a rude application exit. The only exception is the Visual C++ (version 7+) Runtime which  is aware of the managed world and calls back into the CLR (CorExitProcess) when you call the unmanaged exit() or abort() functions.

Interesting Events
All these events are only processed when we are in a cooperative shutdown scenario. Killing your process via the Task Manager will cause a rude abort where none of our managed shutdown code is executed.
  • The AppDomain.DomainUnload  event is not called in your default AppDomain. When you exit e.g. the Main function the default AppDomain is unloaded but your event handler will never be called. The DomainUnload handler is called from an arbitrary thread but always in the context of the domain that is about to be unloaded.
  • AppDomain.ProcessExit is called when the CLR does detect that a shutdown has to be performed. You can register in each AppDomain such a handler but be reminded that after the DomainUnload event was fired you will never see a ProcessExit event because your AppDomain is already gone. These two events are mutually exclusive. When you kill the process via 
  • AppDomain.UnhandledException is called only in the default AppDomain. You can register this handler only in the "root" AppDomain which is the first one created for you by the CLR. Hosting of other AppDomains in a plugin architecture can cause unexpected side effects if the loaded modules register this handler and expect it to be called. When the handler is left the process finally terminates which cannot be prevented in .NET 2.0 except if you set a compatibility flag inside the runtime settings your App.config file.
     <legacyUnhandledExceptionPolicy enabled="1"/>
Ctrl-C/Ctrl-Break
The default behavior of the CLR is to do nothing. This does mean that the CLR is notified very late by a DLL_PROCESS_DETACH notification in which context no managed code can run anymore because the OS loader lock is already taken. We are left in the unfortunate situation that we get no notification events nor are any finalizers run. All threads are silently killed without a chance to execute their catch/finally blocks to do an orderly shutdown. I said in the first sentence default because there is a way to handle this situation gracefully. The Console class has got a new event member with .NET 2.0: Console.CancelKeyPress. it allows you to get notified of Ctrl-C and Ctrl-Break keys where you can stop the shutdown (only for Ctrl-C but not Ctrl-Break). If you want to do a coordinated shutdown inside the handler you will run into the gotcha that calling Environment.Exit does not trigger any finalizers. When we do have other plans to exit the process we need to spin up a little helper thread inside the event handler which will then call Environment.Exit. Voila our finalizers are called.


Graceful Shutdown when Ctrl-C or Ctrl-Break was pressed.

       static void GraceFullCtrlC()

        {

            Console.CancelKeyPress += delegate(object sender, ConsoleCancelEventArgs e)

            {

                if( e.SpecialKey == ConsoleSpecialKey.ControlBreak )

                {

                    Console.WriteLine("Ctrl-Break catched and translated into an cooperative shutdown");

                    // Envirionment.Exit(1) would NOT do a cooperative shutdown. No finalizers are called!

                    Thread t = new Thread(delegate()

                                    {

                                        Console.WriteLine("Asynchronous shutdown started");

                                        Environment.Exit(1);

                                    });

                    t.Start();

                    t.Join();

                }

                if( e.SpecialKey == ConsoleSpecialKey.ControlC )

                {

                    e.Cancel = true; // tell the CLR to keep running

                    Console.WriteLine("Ctrl-C catched and translated into cooperative shutdown");

                    // If we want to call exit triggered from out event handler we have to spin

                    // up another thread. If somebody of the CLR team reads this. Please fix!

                    new Thread(delegate()

                    {

                        Console.WriteLine("Asynchronous shutdown started");

                        Environment.Exit(2);

                    }).Start();

                }

            };

            Console.WriteLine("Press now Ctrl-C or Ctrl-Break");

            Thread.Sleep(10000);

        }




Unhandled Exceptions

It is very easy to screw up your application with an abnormal application exit. Lets suppose the following code:

        static void RudeThreadExitExample()

        {

            // Create a new thread

            new Thread(delegate()

            {

                Console.WriteLine("New Thread started");

                // throw an unhandled exception in this thread which will cause the application to terminate

                // without calling finalizers and any other code.

                throw new Exception("Uups this thread is going to die now");

            }).Start();

        }

We spin up an additional thread and create an unhandled exception. It is possible to register the AppDomain.Unhandled exception handler where you can e.g. log the exception but nonetheless your application will be rudely terminated. No finalizers are executed when an unhandled exception occurs. If you try to fix this by calling Environment.Exit inside your handler nothing will happen except that the ProcessExit event is triggered. We can employ our little Ctrl-C trick here and force an ordered cooperative shutdown from another thread.

Cooperative Unhandled Exception Handler (Finalizers are called)

        static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)

        {

            Console.WriteLine("Unhandled exception handler fired. Object: " + e.ExceptionObject);

            // Prepare cooperative async shutdown from another thread

            Thread t = new Thread(delegate()

                                {

                                    Console.WriteLine("Asynchronous shutdown started");

                                    Environment.Exit(1);

                                });

            t.Start();

            t.Join(); // wait until we have exited

        }

This way you can ensure that your application exits in a much cleaner way where all finalizers are called. What I found interesting during my investigation that there seems no time limit to exist how long I want to run the unhandled exception handler. It is therefore possible that you have several unhandled exception handlers running at the same time. An ordered application shutdown can be quite twisted if you want to do it right. Armed with the knowledge from this post you should have a much better feeling what will happen when things go wrong now.