Why Is Faulty Behaviour In The .NET Framework Not Fixed?


Here is the scenario: You have a Windows Form Application that calls a method via Invoke or BeginInvoke which throws exceptions. Now you want to find out where the error did occur and how the method has been called. Here is the output we do get when we call Begin/EndInvoke or simply Invoke

image

The actual code that was executed was like this:

        private void cInvoke_Click(object sender, EventArgs e)
        {
            InvokingFunction(CallMode.Invoke);
        }
 
         [MethodImpl(MethodImplOptions.NoInlining)]
        void InvokingFunction(CallMode mode)
        {
            switch (mode)
            {
                case CallMode.Invoke:
                    this.Invoke(new MethodInvoker(GenerateError));
 

The faulting method is called GenerateError which does throw a NotImplementedException exception and wraps it in a NotSupportedException.

 
        [MethodImpl(MethodImplOptions.NoInlining)]
        void GenerateError()
        {
            F1();
        }
 
        private void F1()
        {
            try
            {
                F2();
            }
            catch (Exception ex)
            {
                throw new NotSupportedException("Outer Exception", ex);
            }
        }
 
        private void F2()
        {
           throw new NotImplementedException("Inner Exception");
        }

It is clear that the method F2 and F1 did actually throw these exceptions but we do not see them in the call stack. If we directly call the InvokingFunction and catch and print the exception we can find out very easily how we did get into this situation. We see methods F1,F2,GenerateError and InvokingFunction directly in the stack trace and we see that actually two exceptions did occur.

image

Here is for comparison what we get from Invoke/EndInvoke

System.NotImplementedException: Inner Exception
    StackTrace:    at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
    at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
    at WindowsFormsApplication1.AppForm.InvokingFunction(CallMode mode)
    at WindowsFormsApplication1.AppForm.cInvoke_Click(Object sender, EventArgs e)
    at System.Windows.Forms.Control.OnClick(EventArgs e)
    at System.Windows.Forms.Button.OnClick(EventArgs e)

The exception message is kept but the stack starts running from our Invoke call and not from the faulting method F2. We have therefore no clue where this exception did occur! The stack starts running at the method MarshaledInvoke because the exception is rethrown with the throw catchedException which resets the stack trace.

That is bad but things are even worse because if previously lets say 5 exceptions did occur .NET will return only the first (innermost) exception. That does mean that we do not only loose the original call stack but all other exceptions and all data contained therein as well.

It is a pity that MS does know about this and simply closes this issue as not important. Programmers will play a lot more around with threads than before thanks to TPL, PLINQ that do come with .NET 4. Multithreading is hyped quit a lot in the press and everybody wants to use threads. But if the .NET Framework makes it nearly impossible to track down the easiest UI multithreading issue I have a problem with that. The problem has been reported but obviously not been solved. .NET 4 Beta 2 did not have changed that dreaded GetBaseException call in MarshaledInvoke to return only the innermost exception of the complete exception stack. It is really time to fix this.

WPF on the other hand does the right thing and wraps the exceptions inside a TargetInvocationException which makes much more sense. But Not everybody uses WPF for its daily work and Windows forms applications will still be used for a long time.

Below is the code to repro the issues shown and how the exceptions can be rendered in a meaningful way. The default Exception.ToString implementation generates a hard to interpret stack if several nested exceptions did occur.

using System;

using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading;
using System.Globalization;
using System.Runtime.CompilerServices;
 
namespace WindowsFormsApplication1
{
    public partial class AppForm : Form
    {
        enum CallMode
        {
            Direct = 0,
            BeginInvoke = 1,
            Invoke = 2
        };
 
        public AppForm()
        {
            InitializeComponent();
            Thread.CurrentThread.CurrentUICulture = CultureInfo.InvariantCulture;
            Application.ThreadException += new System.Threading.ThreadExceptionEventHandler(Application_ThreadException);
        }
 
        void Application_ThreadException(object sender, System.Threading.ThreadExceptionEventArgs e)
        {
            cOutput.Text = PrintException(e.Exception, 0, null).ToString();
        }
 
        private void cDirectUnhandled_Click(object sender, EventArgs e)
        {
            InvokingFunction(CallMode.Direct);
        }
 
        private void cDirectCall_Click(object sender, EventArgs e)
        {
            try
            {
                InvokingFunction(CallMode.Direct);
            }
            catch (Exception ex)
            {
                cOutput.Text = PrintException(ex, 0, null).ToString();
            }
        }
 
        private void cInvoke_Click(object sender, EventArgs e)
        {
            InvokingFunction(CallMode.Invoke);
        }
 
        private void cBeginInvokeCall_Click(object sender, EventArgs e)
        {
            InvokingFunction(CallMode.BeginInvoke);
        }
 
        [MethodImpl(MethodImplOptions.NoInlining)]
        void InvokingFunction(CallMode mode)
        {
            switch (mode)
            {
                case CallMode.Direct:
                    GenerateError();
                    break;
                case CallMode.Invoke:
                    this.Invoke(new MethodInvoker(GenerateError));
                    break;
                case CallMode.BeginInvoke:
                    IAsyncResult res = this.BeginInvoke(new MethodInvoker(GenerateError));
                    this.EndInvoke(res);
                    break;
            }
        }
 
        [MethodImpl(MethodImplOptions.NoInlining)]
        void GenerateError()
        {
            F1();
        }
 
        private void F1()
        {
            try
            {
                F2();
            }
            catch (Exception ex)
            {
                throw new NotSupportedException("Outer Exception", ex);
            }
        }
 
        private void F2()
        {
           throw new NotImplementedException("Inner Exception");
        }
 
        StringBuilder PrintException(Exception ex, int identLevel, StringBuilder sb)
        {
            StringBuilder builtStr = sb;
            if( builtStr == null )
                builtStr = new StringBuilder();
 
            if( ex == null )
                return builtStr;
 
 
            WriteLine(builtStr, String.Format("{0}: {1}", ex.GetType().FullName, ex.Message), identLevel);
            WriteLine(builtStr, String.Format("StackTrace: {0}", ShortenStack(ex.StackTrace)), identLevel + 1);
            builtStr.AppendLine();
 
            return PrintException(ex.InnerException, ++identLevel, builtStr);
        }
 
 
 
        void WriteLine(StringBuilder sb, string msg, int identLevel)
        {
            foreach (string trimmedLine in SplitToLines(msg)
                                           .Select( (line) => line.Trim()) )
            {
                for (int i = 0; i < identLevel; i++)
                    sb.Append('\t');
                sb.Append(trimmedLine);
                sb.AppendLine();
            }
        }
 
        string ShortenStack(string stack)
        {
            int nonAppFrames = 0;
            // Skip stack frames not part of our app but include two foreign frames and skip the rest
            // If our stack frame is encountered reset counter to 0
            return SplitToLines(stack)
                              .Where((line) =>
                              {
                                  nonAppFrames = line.Contains("WindowsFormsApplication1") ? 0 : nonAppFrames + 1;
                                  return nonAppFrames < 3;
                              })
                             .Select((line) => line)
                             .Aggregate("", (current, line) => current + line + Environment.NewLine);
        }
 
        static char[] NewLines = Environment.NewLine.ToCharArray();
        string[] SplitToLines(string str)
        {
            return str.Split(NewLines, StringSplitOptions.RemoveEmptyEntries);
        }
    }
}

author: Alois Kraus | posted @ Wednesday, March 17, 2010 10:55 AM | Feedback (0)

Run Your Tests With Any NUnit Version


I always thought that the NUnit test runners and the test assemblies need to reference the same NUnit.Framework version. I wanted to be able to run my test assemblies with the newest GUI runner (currently 2.5.3). Ok so all I need to do is to reference both NUnit versions the newest one and the official for the current project. There is a nice article form Kent Bogart online how to reference the same assembly multiple times with different versions. The magic works by referencing one NUnit assembly with an alias which does prefix all types inside it. Then I could decorate my tests with the TestFixture and Test attribute from both NUnit versions and everything worked fine except that this was ugly. After playing a little bit around to make it simpler I found that I did not need to reference both NUnit.Framework assemblies. The test runners do not require the TestFixture and Test attribute in their specific version. That is really neat since the test runners are instructed by attributes what to do in a declarative way there is really no need to tie the runners to a specific version. At its core NUnit has this little method hidden to find matching TestFixtures and Tests

 

public bool CanBuildFrom(Type type)
{
    if (!(!type.IsAbstract || type.IsSealed))
    {
        return false;
    }


    return (((Reflect.HasAttribute(type,           "NUnit.Framework.TestFixtureAttribute", true) ||

              Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TestAttribute"       , true)) ||

              Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TestCaseAttribute"   , true)) ||

              Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TheoryAttribute"     , true));
}

That is versioning and backwards compatibility at its best. I tell NUnit what to do by decorating my tests classes with NUnit Attributes and the runner executes my intent without the need to bind me to a specific version. The contract between NUnit versions is actually a bit more complex (think of AssertExceptions) but this is also handled nicely by using not the concrete type but simply to check for the catched exception type by string.

What can we learn from this? Versioning can be easy if the contract is small and the users of your library use it in a declarative way (Attributes). Everything beyond it will force you to reference several versions of the same assembly with all its consequences. Type equality is lost between versions so none of your casts will work. That means that you cannot simply use IBigInterface in two versions. You will need a wrapper to call the correct versioned one. To get out of this mess you can use one (and only one) version agnostic driver to encapsulate your business logic from the concrete versions. This is of course more work but as NUnit shows it can be easy. Simplicity is therefore not a nice thing to have but also requirement number one if you intend to make things more complex in version two and want to support any version (older and newer). Any interaction model above easy will not be maintainable. There are different approached to versioning. Below are my own personal observations how versioning works within the  .NET Framwork and NUnit.

 

Versioning Models

1. Bug Fixing and New Isolated Features

When you only need to fix bugs there is no need to break anything. This is especially true when you have a big API surface. Microsoft did this with the .NET Framework 3.0 which did leave the CLR as is but delivered new assemblies for the features WPF, WCF and Windows Workflow Foundations. Their basic model was that the .NET 2.0 assemblies were declared as red assemblies which must not change (well mostly but each change was carefully reviewed to minimize the risk of breaking changes as much as possible) whereas the new green assemblies of .NET 3,3.5 did not have such obligations since they did implement new unrelated features which did not have any impact on the red assemblies.

This is versioning strategy aimed at maximum compatibility and the delivery of new unrelated features. If you have a big API surface you should strive hard to do the same or you will break your customers code with every release.

2. New Breaking Features

There are times when really new things need to be added to an existing product. The .NET Framework 4.0 did change the CLR in many ways which caused subtle different behavior although the API´s remained largely unchanged. Sometimes it is possible to simply recompile an application to make it work (e.g. changed method signature void Func() –> bool Func()) but behavioral changes need much more thought and cannot be automated. To minimize the impact .NET 2.0,3.0,3.5 applications will not automatically use the .NET 4.0 runtime when installed but they will keep using the “old” one.

What is interesting is that a side by side execution model of both CLR versions (2 and 4) within one process is possible. Key to success was total isolation. You will have 2 GCs, 2 JIT compilers, 2 finalizer threads within one process. The two .NET runtimes cannot talk  (except via the usual IPC mechanisms) to each other. Both runtimes share nothing and run independently within the same process. This enables Explorer plugins written for the CLR 2.0 to work even when a CLR 4 plugin is already running inside the Explorer process. The price for isolation is an increased memory footprint because everything is loaded and running two times.

 

3. New Non Breaking Features

It really depends where you break things. NUnit has evolved and many different Assert, Expect… methods have been added. These changes are all localized in the NUnit.Framework assembly which can be easily extended. As long as the test execution contract (TestFixture, Test, AssertException) remains stable it is possible to write test executors which can run tests written for NUnit 10 because the execution contract has not changed.

It is possible to write software which executes other components in a version independent way but this is only feasible if the interaction model is relatively simple.

 

Versioning software is hard and it looks like it will remain hard since you suddenly work in a severely constrained environment when you try to innovate and to keep everything backwards compatible at the same time. These are contradicting goals and do not play well together. The easiest way out of this is to carefully watch what your customers are doing with your software. Minimizing the impact is much easier when you do not need to guess how many people will be broken when this or that is removed.

author: Alois Kraus | posted @ Sunday, March 07, 2010 10:33 AM | Feedback (0)

Pitfalls Of Equals/GetHashCode – How Does A Hash Table Work?


Come on that is easy:

  • bool Equals(object other) compares all member variables of another object instance to the current instance.
  • bool Equals(T other) does the same thing but expects an object of the same type than our self. This method is defined by the IEquatable<T> interface.
  • int GetHashCode() calculates of the internal state of the current object a mostly unique identifier. Mostly because if you have more than 2^32 object states some values must collide.

Most of the time you do not need to bother about these things except if you want to use your object as key in a hash table or if you want to use e. g. the List<T>.Contains(T value) method and you want to check for more than reference equality.

Although it looks like a very easy task to implement object equality and hash methods there are some pitfalls. The most important rule for both methods is that they most not throw any exceptions. Otherwise you make your type unusable in object collections (e.g. Hashtable, ArrayList, List<object>, Dictionary<object, …>, …) where objects of different types are compared against each other.

Nice rule lets violate it:

        public override bool Equals(object obj)

    {

            SimpleType other = (SimpleType)obj;

            ....

Ups this will cause an InvalidCastException if we ever try to search in an List<object> with SimpleTypes for e.g. some string. Ok many people can live with this limitation but it is so easy to make it work with the as operator:

    class SimpleType : IEquatable<SimpleType>

    {

 

        public override bool Equals(object obj)

        {

            return Equals(obj as SimpleType);

        }

 

        public bool Equals(SimpleType other)

        {

            // Reference Equality

            if (object.ReferenceEquals(this, other))

                return true;

 

            // this cannot be null if other is null we must return false

            if (object.ReferenceEquals(other,null))

                return false;

 

            return myb == other.myb && mya == other.mya;

        }

 

        int mya;

        string myb;

       …

The comment “// this cannot be null if other is null we must return false” is not really true. If you look at the string.Equals method

public bool Equals(string value)
{
    if ((value == null) && (this != null))

you find a null check for the this pointer. The reason is that you can call on any class member methods without getting a NullReferenceException until you try to dereference the this pointer the first time. A C# specific thing is that member methods are not called with the call IL instruction but the callvirt instruction which will throw a NullReferenceException when you try to call a method on a null reference. That is interesting for languages like F# which prefers direct calls whenever possible. If speed is a concern it costs you only one instruction

cmp         dword ptr [ecx],ecx

This little guy is responsible for you nice NullReferenceException which does nothing else than to compare the this pointer (= address) of your object which is always in the ecx register (at least for .NET) against the memory location it is supposed to point to. For a null pointer we end up trying to read from memory location 0 which is certainly not a valid object address. The memory page at address 0 is not readable which will cause a Windows SEHException (0xC0005 try with Windbg) which is translated by the CLR into our well known NullReferencException.

I had to tell you this because I tend to forget these little details so I can later look it up here. But back to our main topic. Common sources of errors in Equals and GetHashcode. I found a very interesting pattern to implement GetHashCode.

        public override int GetHashCode()

        {

            StringBuilder sb = new StringBuilder();

            sb.Append(mya);

            sb.Append(myb);

            return sb.ToString().GetHashCode();

Ok this does work and it produces a meaningful hash value for your object. It has only the little problem that this method does allocate strings like crazy which will be garbage collected very soon. If you have a Memory Profiler attached to your application and you see % Time in GC nearly 100% then it could be that somebody tried to use our SimpleType within a Dictionary<SimpleType,xxx> which can cause billions of GetHashCode calls within seconds. Your application will then be a nice unit test how fast the .NET garbage collector can be but there will be nearly no CPU cycles left for your application logic.

Another pitfall is to mix up operators for custom types. The == operator is not automatically routed to your instance once you override your Equals method. Instead the default behavior  for reference equality does kick in.

    class OtherType

    { }

 

    class SimpleType : IEquatable<SimpleType>

    {

        int mya;

        string myb;

        OtherType myOther;

 

        public bool Equals(SimpleType other)

        {

            // Reference Equality

            if (object.ReferenceEquals(this, other))

                return true;

 

            // this cannot be null if other is null we must return false

            if (object.ReferenceEquals(other,null))

                return false;

 

            return myb == other.myb &&

                   mya == other.mya &&

                   myOther == other.myOther; // Wrong this tests for reference equality only!!

        }

The code looks like it does what you meant but it does not. Before using an operator you need always to check if there are any custom operators defined. For strings for example you can compare two null references without getting any exception. But things are different when you try to call the member method Equals. It is also easy to add recursion to your operators by e.g. using the == operator for the null check. If your == operator calls Equals again you need an infinite stack. But these things are not subtle you will find them with unit testing.

So how does working sample look like? Below you find the SimpleType reference implementation which shows how you can add a meaningful hash function for more than one member (Hash codes of other types are combined by the xor operaton ^ which is ok for most cases but there are other approaches possible which make it possible to get a different hash code when different fields are null.

    class SimpleType : IEquatable<SimpleType>

    {

        int mya;

        string myb;

 

        public bool Equals(SimpleType other)

        {

            // Reference Equality

            if (object.ReferenceEquals(this, other))

                return true;

 

            // this cannot be null if other is null we must return false

            if (object.ReferenceEquals(other,null))

                return false;

 

            return myb == other.myb &&

                   mya == other.mya;

        }

 

        public override bool Equals(object obj)

        {

            return Equals(obj as SimpleType);

        }

 

        public override int GetHashCode()

        {

            int ret = mya;

            if (myb != null)

                ret ^= myb.GetHashCode();

 

            return ret;

        }

 

        public static bool operator ==(SimpleType a, SimpleType b)

        {

            // Enable a == b for null references to return the right value

            if (Object.ReferenceEquals(a, b))

                return true;

            // If one is null and the other not. Remember a==null will lead to Stackoverflow!

            if (Object.ReferenceEquals(a,null))

                return false;

 

            return a.Equals((object) b);

        }

 

        public static bool operator !=(SimpleType a, SimpleType b)

        {

            return !(a==b);

        }

 

        public SimpleType(int a, string b)

        {

            mya = a;

            myb = b;

        }

    }

Most programmers understand that GetHashCode and hash tables are somehow related. But only few do really understand how they achieve the ultra fast lookup times of O(1). That means that I can check for the existence of a key in constant time regardless how many objects I have stored in my Hashtable. The precondition for this magic is that GetHashCode is fast and produces a uniform distribution of hash values across the full value range of 0 – 2^32. A typical hash table implementation uses an array to to store its objects. When an object needs to be inserted it calculates the hash code and transforms this value to an index to the array. A widely used formula is

idx = hash % length

Where idx is the array index where our object is stored, hash is the hash value of the object to be stored and length is the length of the  internal storage array of our hash table. The modulo operation will magically give you an array index which is between 0 and length-1. Then you can store the object at the given index. The picture below illustrates this.

Hashtable

Now it is easy to answer the question why the size of the array of a hash table should be a prime number. Lets suppose it is not a prime number e.g. 16 then the remainder of the modulo operation would return 0 for all hash values which are divisible by 16 hence causing collisions which would force the hash table to switch on the collision resolution logic which is more complex and involves calling Equals to all previously stored objects with the same hash value to find out if the same (=Equals) object was already stored.

The lookup of a hash table takes basically 5 steps:

  1. Calculate hash of input key e.g. hash “cd” = 5
  2. Calculate from hash the corresponding array index 5 % 10 = 5
  3. Calculate the hash value from object at array Index 5: hash “cd” = 5
  4. If the value does match call Equals to check if it really the same object
  5. If not we have a hash collision. That can mean we check all further objects in the array with the same hash value until we find a match or stop when we have found an object with a different hash value.

Step 1-3 are the application of our magic formula to calculate the storage index inside our array. The other steps are there to confirm that we did not get only an object with a matching hash code (which you cannot trust to be unique) but we need to compare the key values one by one which have the same hash code until we found it.

There is still much more to say about hash tables such as custom IEqualityComparers to get more speed or a different behavior of your dictionary. It is e.g. quite easy to create a file name dictionary which will throw when you try to add the same file more than once to it. This results in a somewhat imperformant hashing function but for small numbers it can be acceptable.

With a good hashing function you can create many interesting data structures. With .NET 3.5 we did get e.g. the HashSet<T> class which enables a fast check if something was already added to the set or not. It is possible to squeeze more memory out by using a probabilistic data structure called Bloom Filter which has the unique property of constant check time regardless how many items are stored inside it. That coolness comes at the cost that false positives are possible (the bloom filter tells you it has the value stored but it did not) which can be configured during creation of the filter. An implementation in C# can be found at CodePlex.

Bloom Filters are useful for cache managers which need a first fast check if the element is in the cache. If yes we can lookup the value in the cache if no it goes directly to disk. When a false positive is reported we look in the cache in vain but that does not matter since we still can get our data from disk.  Googles Big Table for example uses bloom filters for this very purpose very successfully.

author: Alois Kraus | posted @ Sunday, February 28, 2010 3:24 PM | Feedback (4)

The Next Level Of Healthcare


Want to know how the future of healthcare looks like? You can see it here on YouTube. The video shows a heart in 3D (at about 60s of the video) and how it can be manipulated with the new software. E.g. showing only the vessels of the heart to look for signs of a heart attack. Way cool.

author: Alois Kraus | posted @ Sunday, December 06, 2009 3:46 PM | Feedback (0)

Automatic Null Checks


I have written far too many null checks in my life. Why do we have even null values? They only seem to provoke a NullReferenceException in our code after all. F# for example has the option type with the value None which is semantically the same as null without being null which makes it impossible to access invalid values by accident. It is of course possible to create null values in F# but it is not the most natural thing in a functional programming language. How can we make C# safer without writing explicit if( xxx  != null ) … statements over and over again? One thing that comes to my mind would be to have a generalized null-coalescing operator (this is the ?? operator). Ian Griffiths has also a very interesting article about lifting the . operator to call the method only when the instance is not null by using sophisticated but not practical methods.  What was this feature anyway? Lets have a look at the following code:

                string ReadConfigValues(NameValueConfigurationCollection appSettings, string key)

        {

            string currentValue = "";

            if (appSettings != null && appSettings[key] != null)

            {

                currentValue = appSettings[key].Value ?? "";

            }

 

            return currentValue;

        }

My proposal at Connect is to enhance the syntax of method calls with the new .? (generalized null-coalescing) operator which calls a method only if the object reference is not null. That allows us to rewrite the previous code to

        string ReadConfigValuesBetter(NameValueConfigurationCollection  appSettings, string key)

        {

          return appSettings.?GetElementKey(key).?Value ?? "";

        }

That is much nicer and if you want to have it in a future C# version you can vote at Connect here. In the meantime I did explore a different strategy by using some extension methods to “automate” the null check

        string ReadConfigValuesUsingExtensionMethods(NameValueConfigurationCollection appSettings, string key)

        {

            string lret = "";

            appSettings.WhenNotNull(() =>

                appSettings[key].WhenNotNull( (cfgElement) =>

                    cfgElement.Value.WhenNotNull( () =>

                         lret = cfgElement.Value)));

            return lret;

 

        }

In this specific case I still prefer the explicit null  checks because they are easier to read and the lambda expression magic. It is cool to show off but I admit that is not really readable. But there are cases where the extension methods are useful if you have true object oriented code with some base and many child classes. There you need to cast quite frequently the base class to your desired child class and check if you were successful.

        class A { }

        class B : A { }

        class C : A { }

        class D : A { }

 

 

        void CastingMadness(A o)

        {

            B b = o as B;

            C c = o as C;

            D d = o as D;

            if (b != null)

            {

                Console.WriteLine("Got B");

            }

            else if (c != null)

            {

                Console.WriteLine("Got C");

            }

            else if (d != null)

            {

                Console.WriteLine("Got D");

            }

        }

 

        void CastingBetter(A o)

        {

            (o as B).WhenNotNull((b) => Console.WriteLine("Got B"));

            (o as C).WhenNotNull((c) => Console.WriteLine("Got C"));

            (o as D).WhenNotNull((d) => Console.WriteLine("Got D"));

        }

In this case the extension methods are far better than the null check approach. For nullable (struct) types I do also provide some extensions to get the same semantic.

    public static class Extension

    {

 

        public static bool WhenNotNull<T>(this T value, Action func) where T : class

        {

            if (value != null)

            {

                func();

                return true;

            }

 

            return false;

        }

 

        public static bool WhenNotNull<T>(this Nullable<T> value, Action func) where T : struct

        {

            if (value != null)

            {

                func();

                return true;

            }

            return false;

        }

 

        public static bool WhenNotNull<T>(this T value, Action<T> func) where T:class

        {

            if (value != null)

            {

                func(value);

                return true;

            }

 

            return false;

        }

 

        public static V WhenNotNull<T,V>(this T value, Func<V> func) where T : class where V:class

        {

            if( value != null )

            {

                return func();

            }

 

            return null;

        }

 

        public static V WhenNotNull<T, V>(this Nullable<T> value, Func<V> func)

            where T : struct

            where V : class

        {

            if (value != null)

            {

                return func();

            }

 

            return null;

        }

 

        public static V WhenNotNullS<T, V>(this Nullable<T> value, Func<V> func)

            where T : struct

            where V : struct

        {

            if (value != null)

            {

                return func();

            }

 

            return default(V);

        }

 

        static public void ForNotNull<T>(this IEnumerable<T> list, Action<T> func) where T : class

        {

            if (list == null)

                return;

 

            foreach (var v in list)

            {

                if (v != null)

                    func(v);

            }

        }

 

        static public void ForNotNull<T>(this IEnumerable list, Action<T> func) where T : class

        {

            if (list == null)

                return;

 

            foreach (var v in list)

            {

                T castedValue = (T)v;

                func(castedValue);

            }

        }

    }

Please do not use this code to skip the initial null check and swallow all possible errors of your API. Normally throwing an exception is the best approach. But if you read data from external sources like disc/network you can have to be able to process the incoming data reliable even if some inputs might be not valid. There is a tradeoff between being silent about errors and throwing an exception for every detected non conformity. An application has to be robust (swallow minor errors) but also correct (stop processing by throwing an exception). The only way to find out is to test and to use some null checks before bad things happen. But be warned that adding additional checks might lead to new bugs that were not possible before. Do you know how many races you can get if you add to your event handler a null check before calling it? If you make your event calling code thread safe you enable the previously not possible race condition that you call an object which has already unsubscribed from your event.

author: Alois Kraus | posted @ Sunday, November 22, 2009 5:20 PM | Feedback (5)

Why Does My System Hang? Windows Kernel Debugging For Dummies


Did you ever wonder why your system at random times hangs? Sometimes it comes back after a few seconds (could simply be paging) but at least once a day I wish I would be able to know why the system is responding so slowly. Before going into kernel land I must confess that I have never written a device driver so my knowledge to kernel mode debugging is quite limited but on the other hand if you did not do this either you will have a much easier time to follow me.

Some hangs seem to be Heisenbugs which disappear when you start looking at them. I have found when I let Process Explorer running on my machine it seems to resolve some issue by its pure presence. It could also be that some malware and Trojan software does not even install when Sysinternals tools are running.

Did you know that you can watch with Process Explorer the call stack of all applications in your system? Simply right click on a process and select Properties and select the Threads tab where you can view the stack for each thread with full function names.

ProcessExplorerStacks

image

 

Wrong Symbols

If on your machine the function names do not appear or they are of the form xxxx.dll +0xdddd where dddd is a rather big number (see below mvfs51.sys where we do not have symbols)  then you are missing the symbols. First of all you need to download Windbg. Why Windbg? To resolve the symbol names you need a good version of dbghelp.dll which is part of Windbg. Most SysInternals Tools have the possibility to configure symbols (Options – Configure Symbols …) and the path to the Windbg version of dbghelp.dll. To make it easier to copy and paste here is the one and only

Symbol Server Path: SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols

The first part is the cache directory to which the symbols will be stored for later retrieval. Armed with this knowledge you should be able to find out the root cause of quite a lot of hangs by simply examining the call stacks.

Kernel Debugging?

On Windows XP you get the full stack including the kernel by simply looking at the process call stack in Process Explorer. With Windows Vista and above you need to run Process Explorer with elevated privileges (File – Run As Administrator in Process Explorer) to get also the kernel stack. The call stack above is typical for user mode only. If your thread stack contains stacks of the form

ntkrnlpa.exe!KiFastCallEntry+0x12a

then you are seeing the full stack including the kernel. When you find in your stack of interest .sys files you just have found a device driver. That is actually very useful to find out why something locks up.

Managed applications will find the native stack view less useful since dbghelp.dll is not able to show the managed call stack. There was one version of Windbg which is able view the mixed mode stack with the kv command in Windbg. But it was withdrawn from MS a few days later. I tried it of course with Process Explorer but it retrieves the call stack in a different way so I was not able see mixed mode stack there (yet).

The deeper reason why this feature has been hold back (at least what this is what the rumors say) has to do with legal reasons. The debugger team used deep CLR know how to walk the managed stack. Because other units within MS are not allowed to use internals of other products they would have to make them public. I am not interested in how illegal this might be but the MS lawyers are very well paid and should be able to sort this out. Seamless call stack tracking with Process Explorer and related tools would be one of my number one feature requests.

A surprisingly simple way to resolve hangs is to check in Google the name of the device drivers in your hang call stack and check for updated device drivers. In my department for example I did see quite a lot of hangs with the following call stack:

 

 

ntkrnlpa.exe!KiSwapContext+0x2f
ntkrnlpa.exe!KiSwapThread+0x8a
ntkrnlpa.exe!KeWaitForSingleObject+0x1c2
TmXPFlt.sys+0xc90d  // Trend Micro Virus Scanner
TmXPFlt.sys+0x306e
ntkrnlpa.exe!ObpCaptureObjectCreateInformation+0x19c
ntkrnlpa.exe!IopfCallDriver+0x31
ntkrnlpa.exe!IopParseDevice+0xa12
ntkrnlpa.exe!ObpLookupObjectName+0x53c
ntkrnlpa.exe!ObOpenObjectByName+0xea
ntkrnlpa.exe!IopCreateFile+0x407
ntkrnlpa.exe!IoCreateFile+0x8e
ntkrnlpa.exe!NtOpenFile+0x27
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntkrnlpa.exe!ZwOpenFile+0x11

mvfs51.sys+0x2df70  // Rational ClearCase Source Control Driver –> Google mvfs51.sys
mvfs51.sys+0x2f850

TmXPFlt.sys+0x1039 // Trend Micro Virus Scanner –> Google TmXPFLT.sys
mvfs51.sys+0x309e6
mvfs51.sys+0x12197


ntdll.dll!NtQueryAttributesFile+0xc
kernel32.dll!GetFileAttributesW+0x79  // User mode call to get the file attributes
csproj.dll!LUtilFileExists+0xe
csproj.dll!CVsProjHostProcInstance::PrepareHostProcExecutable+0x39
csproj.dll!CVsProjHostProcInstance::StartHostingProcessHelper+0xa0
csproj.dll!CVsProjHostProcInstance::StartHostingProcess+0x72

What can we learn from this one? The faulting process was Visual Studio which was about to start its hosting process. It first checks if the executable exists by reading the file attributes from the executable. Since the file is located on a drive with source control system the mvfs51.sys driver from ClearCase does some work. Then the Trend Micro virus scanner hooks in and causes other ClearCase driver calls which go back into the kernel and end up in the virus scanner again which seem to cause the deadlock. In the end the virus scanner did win the hook fight and locked up the process from which it will never recover.

Now you have got a hanging process that cannot be killed by any means. If you try to kill it you will end up with a process with one thread left that is still stuck in the device driver call. If you ever encounter an unkillable process which is still alive after you try to terminate them via the task manager  it is most likely stuck in a device driver call.

High CPU Spikes / Hanging Process

Ok that was the easy part. Now we are getting nearer to Windbg. If you have an application which behaves in strange ways (e.g. has high CPU spikes at some times) I have another SysInternals gem: ProcDump can take memory snapshots of an arbitrary application. It is especially useful if you want to know in which state an application was when it was hung or did eat up all CPU time.

ProcDump v1.1 - Writes process dump files
Copyright (C) 2009 Mark Russinovich
Sysinternals - www.sysinternals.com

Monitors a process and writes a dump file when the process exceeds the
specified CPU usage.

usage: procdump [-64] [-c CPU usage [-u] [-s seconds] [-n exceeds]] [-h] [-e] [-ma] [-r] [-o] [[<process name or PID> [dump file]] | [-x <image file> <dump file> [arguments]]


   -c      CPU threshold at which to create a dump of the process.
   -e      Write a dump when the process encounters an unhandled exception.

   -h      Write dump if process has a hung window.
….

Example: Write up to 3 dumps of a process named 'consume' when it exceeds
         20% CPU usage for three seconds to the directory
         c:\dump\consume with the name consume.dmp:
            C:\>procdump -c 20 -n 3 -o consume c:\dump\consume
Example: Write a dump for a process named 'hang.exe' when one of it's
         windows is unresponsive for more than 5 seconds:
            C:\>prodcump -h hang.exe hungwindow.dmp

The generated .dmp files can be analyzed with Windbg quite easily if you have matching symbols. This is pure user mode debugging but it is easier to start first in user mode and dig only deeper if one needs to.

 

Kernel Debugging / Hanging System

 

When your system has frozen you can not start any new processes so starting a debugger is of little use. Luckily there is a nice trick to force the generation of a kernel dump by pressing a magic key combination: Right Ctrl + Scroll Lock + Scroll Lock will generate a nice looking real blue screen. See instructions below how to enable it. Technically speaking it is a user initiated kernel dump. Please read the phrase again to notice that only the RIGHT Ctrl key in combination with double pressing the Scroll Lock will do the trick.

Before you can generate the blue screen (= kernel dump) you need to set the kernel dump mode to Complete Memory Dump. You can find this menu if you press the Windows Key + Pause and then look in the Advanced System Settings – Advanced – Startup and Recovery

image

 

 

To enable the magic key combination you need to edit some registry settings which are explained deeper on MSDN and a much more elaborate page dedicated to dump file generation and common pitfalls on Windows Server 2008 (especially on computers with much installed memory).

PS/2 Keyboard

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters

DWORD CrashOnCtrlScroll 1

USB Keyboard

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters

DWORD CrashOnCtrlScroll 1

 

Ok now we can successfully generate a memory dump of the kernel and examine it. It is actually quite simple to pinpoint common problems like crashing/hanging drivers with a few commands without the need to understand fully how the kernel works. After the reboot you can open the generated dump file (normally located at C:\Windows\Memory.dmp) with Windbg. Then you need to setup the symbol path (see wrong symbols at the beginning of the article) and now you can execute the !analyze –v command to find out the root cause why the blue screen did occur.

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
BUGCHECK_STR:  MANUALLY_INITIATED_CRASH
DEFAULT_BUCKET_ID:  DRIVER_FAULT
PROCESS_NAME:  Idle
LAST_CONTROL_TRANSFER:  from f754e7fa to 804f8925
STACK_TEXT:  
80548d38 f754e7fa 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b
80548d54 f754e032 00c0f0d8 0190e0c6 00000000 i8042prt!I8xProcessCrashDump+0x237
80548d9c 8054071d 85904b20 85c0f020 00010009 i8042prt!I8042KeyboardInterruptService+0x21c
80548d9c f758dc46 85904b20 85c0f020 00010009 nt!KiInterruptDispatch+0x3d
80548e50 80540cc0 00000000 0000000e 00000000 processr!AcpiC1Idle+0x12
80548e54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x10

 

In our case the keyboard driver did crash. A closer look reveals that the crash provoked by the user. Lets put this dump aside and have at first a look at a “real” blue screen. A blue screen is actually a well defined exit point which can be triggered by drivers intentionally when it is no longer safe to continue. The function is KeBugCheck which causes the blue screen and dump generation when configured. This function can only be called by kernel drivers. No you can´t blue screen Windows from a user mode application. I have not tried to send to Windows the magic Right Ctrl + Scroll Lock + Scroll Lock combination from a user mode application but I do not think that this will work since the keyboard driver won´t get these events.

Lets have a look at a real crash caused by a driver on a 64 bit machine and analyze it.

7: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME:  xxxxx

FAULTING_IP:
nt!KeBugCheck+0
fffff800`02261620 4883ec28        sub     rsp,28h

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: fffff80002261620 (nt!KeBugCheck)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000001
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x0

CURRENT_IRQL:  0

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x0 (7)
Child-SP         RetAddr          Call Site

LAST_CONTROL_TRANSFER:  from fffffa6009a2536b to fffff80002261620

STACK_TEXT: 
fffffa60`0b5e0c68 fffffa60`09a2536b : fffffa60`09a59824 00000000`00000008 00000000`00000001 fffff800`022a27d0 : nt!KeBugCheck
fffffa60`0b5e0c70 fffffa60`09a28da2 : fffffa80`231ffaf0 fffff880`10a9f634 fffffa60`00000007 fffffa80`21ad5b40 : mvfs60x64+0x1936b
fffffa60`0b5e0dc0 fffffa60`09a2e5af : fffffa60`09a65ba8 fffff880`00000017 fffffa60`0b5e0eec fffffa80`4ab8733a : mvfs60x64+0x1cda2
fffffa60`0b5e0e70 fffffa60`09a3edc0 : fffffa80`0f1fb260 fffffa80`2038ef40 fffffa80`00000017 fffffa60`09a296e0 : mvfs60x64+0x225af
fffffa60`0b5e0f40 fffffa60`09a42e34 : fffffa80`00000001 fffffa80`0f1fb260 fffffa80`2038ef40 fffffa60`09a55fc4 : mvfs60x64+0x32dc0
fffffa60`0b5e1000 fffffa60`09a48ba0 : fffffa80`215ff7d0 fffffa80`21ad5b40 fffffa60`0b5e10d0 00000000`5346564d : mvfs60x64+0x36e34
fffffa60`0b5e1090 fffffa60`09a4b52a : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x3cba0
fffffa60`0b5e1150 fffffa60`09a4e890 : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x3f52a
fffffa60`0b5e11a0 fffffa60`09a2feb3 : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x42890
fffffa60`0b5e12a0 fffffa60`09a4cc00 : 00000000`00000000 fffffa60`0b5e14a0 fffffa60`0b5e1400 fffffa60`0b5e13e8 : mvfs60x64+0x23eb3
fffffa60`0b5e13a0 fffffa60`09a4ee4f : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d75710 fffff800`024e58f4 : mvfs60x64+0x40c00
fffffa60`0b5e1550 fffffa60`09a25fc0 : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d759d8 fffffa80`13683010 : mvfs60x64+0x42e4f
fffffa60`0b5e1590 fffffa60`00c08e17 : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d75a20 fffffa80`13895af0 : mvfs60x64+0x19fc0

fffffa60`0b5e15e0 fffffa60`00c2526c : fffffa80`13895af0 fffffa80`13683010 fffffa80`24d75700 fffffa60`0b5e16a0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x227
fffffa60`0b5e1650 fffff800`024e81f3 : 00000000`00000005 fffffa80`1209f010 00000000`00000040 00000000`00000000 : fltmgr!FltpCreate+0x25d
fffffa60`0b5e1700 fffff800`024e1ec9 : fffffa80`1367e7c0 00000000`00000000 fffffa80`2529a010 00000000`00000001 : nt!IopParseDevice+0x5e3
fffffa60`0b5e18a0 fffff800`024e5db4 : 00000000`00000000 fffffa80`252f7701 fffffa80`00000040 00000000`00000000 : nt!ObpLookupObjectName+0x5eb
fffffa60`0b5e19b0 fffff800`024f2360 : 00000000`80100080 00000000`0047ac08 fffffa60`09a42f01 00000000`00000000 : nt!ObOpenObjectByName+0x2f4
fffffa60`0b5e1a80 fffff800`024f2e98 : 00000000`0047ab98 00000000`80100080 fffffa80`00000000 00000000`0047abb8 : nt!IopCreateFile+0x290
fffffa60`0b5e1b20 fffff800`022610f3 : 00000000`000001e4 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtCreateFile+0x78
fffffa60`0b5e1bb0 00000000`77515fca : 00000000`773ccb6c 00000000`02818b90 00000000`00000002 00000000`009e0000 : nt!KiSystemServiceCopyEnd+0x13
00000000`0047ab28 00000000`773ccb6c : 00000000`02818b90 00000000`00000002 00000000`009e0000 00000000`00000002 : ntdll!ZwCreateFile+0xa
00000000`0047ab30 00000005`16f47c4e : 00000000`02818b90 00000000`80000000 00000000`00000005 00000005`16f46f93 : KERNEL32!CreateFileW+0x26c
00000000`0047ac80 00000005`16f49f76 : 00000000`02819a80 00000000`00000000 00000000`00000000 00000000`0047ada0 : diasymreader!IStreamCRTFile::Create+0xba
00000000`0047acf0 00000005`16f4a10c : 00000000`00000000 00000000`02819a80 00000000`00000000 00000005`16f8158b : diasymreader!MSF_HB::internalOpen+0x36
00000000`0047ad30 00000005`16f36782 : 00000000`02818b00 00000000`0047be00 00000000`00000000 00000000`00000400 : diasymreader!MSF::Open+0x5c
00000000`0047ad70 00000005`16f36ef0 : 00000000`02818b90 00000005`16f033e0 00000000`00000000 00000000`00000ef0 : diasymreader!PDB1::OpenEx2W+0xda
00000000`0047adf0 00000005`16f3753b : 00000000`0000001f 00000000`00000000 00000000`02818b90 00000000`0000000c : diasymreader!PDB1::OpenValidate4+0x7c
00000000`0047ae90 00000005`16f4b43f : 00000000`02818db0 00000000`00000ee4 00000000`02817cd0 00000000`00000ee4 : diasymreader!PDB::OpenValidate4+0x47
00000000`0047aef0 00000005`16f4c2db : 00000000`02818b90 00000000`0281854c 00000000`02817c00 00000000`02818bac : diasymreader!LOCATOR::FOpenValidate4+0x73
00000000`0047af50 00000005`16f4c62e : 00000000`0047be00 00000000`000002fb 00000000`02817cd0 00000000`00000094 : diasymreader!LOCATOR::FLocatePdbPathHelper+0x133
00000000`0047afb0 00000005`16f4c961 : 00000000`0047be00 00000000`00000003 00000000`0047be00 00000000`028139cc : diasymreader!LOCATOR::FLocatePdbPath+0x11a
00000000`0047ba40 00000005`16f37ae3 : 00000000`0047cc40 00000005`16f80e44 00000000`0047cc40 00000005`16f80e44 : diasymreader!LOCATOR::FLocatePdb+0x1b5
00000000`0047bde0 00000005`16f3417a : 00000000`028139a0 00000000`009e92d0 00000000`0047cc40 00000000`00000000 : diasymreader!PDBCommon::OpenValidate5+0x9f
00000000`0047cbb0 00000005`16f5fef0 : 00000000`028139a0 00000000`03084b14 00000000`03084b14 00000000`00000000 : diasymreader!PDB::OpenValidate5+0x36
00000000`0047cc00 00000005`16f2b747 : 00000000`00000000 00000000`00000000 00000000`02808010 00000000`009e92d0 : diasymreader!CDiaDataSource::loadDataForExe+0x90
00000000`0047cd30 00000005`16f1ce3b : 00000000`00000000 00000000`11401f6e 00000000`03074348 00000000`028139a0 : diasymreader!CDiaWrapper::Create+0xbb
00000000`0047cd90 00000005`16f1a704 : 00000000`00000079 00000000`0000004a 00000000`00800000 00000000`106ee840 : diasymreader!SymReader::Initialize+0xa7
00000000`0047cdf0 00000000`106adbfc : 00000000`00000014 00000000`0049c0e0 00000000`00000000 00000000`00000000 : diasymreader!SymBinder::GetReaderForFile+0x170
00000000`0047ce70 00000000`00000014 : 00000000`0049c0e0 00000000`00000000 00000000`00000000 00000000`036c1fc0 : CPI!CPI_GetCallbacks+0x1a52c
00000000`0047ce78 00000000`0049c0e0 : 00000000`00000000 00000000`00000000 00000000`036c1fc0 00000000`00000800 : 0x14
00000000`0047ce80 00000000`00000000 : 00000000`00000000 00000000`036c1fc0 00000000`00000800 00000000`03074348 : 0x49c0e0

STACK_COMMAND:  kb

FOLLOWUP_IP:
mvfs60x64+1936b
fffffa60`09a2536b cc              int     3

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  mvfs60x64+1936b

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: mvfs60x64

IMAGE_NAME:  mvfs60x64.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a37fa38

FAILURE_BUCKET_ID:  X64_0x0_mvfs60x64+1936b

BUCKET_ID:  X64_0x0_mvfs60x64+1936b

Followup: MachineOwner
---------

From this real live crash we get the faulting 64 bit ClearCase driver on a silver tablet. We have the call stack, managed call stack if available, failing module name and much more presented with one command. If you want to find out why your machine blue screens from time to time the information presented here is sufficient to find the buggy driver and either uninstall the damn thing (lets hope it was not important anyway) or look at the driver vendors homepage to get an updated version. If you really care you can file a bug report and send them your dump to analyze further.

You can convert a full kernel dump into a mini dump by using the .dump outputFileName.dmp command inside Windbg.

 

Now lets see how we can debug a real hang scenario. And I won´t get sidetracked by other interesting details.

When you get a user initiated memory dump you need to find out which processes were running and examine the call stack of the interesting ones. The Windbg command !process 0 0 will give you a complete list of all processes running.

7: kd>!process 0 0

PROCESS fffffa801347a9c0
    SessionId: 1  Cid: 3ac8    Peb: 7fffffde000  ParentCid: 03b4
    DirBase: 19486c000  ObjectTable: fffff8800b78b430  HandleCount: 158.
    Image: mobsync.exe

You can select a specific process by giving the process handle to the process command. This will give you a wealth of information about its current state and all call stacks inside it. That should help to find out where the system was hanging.

7: kd> !process fffffa801347a9c0
PROCESS fffffa801347a9c0
    SessionId: 1  Cid: 3ac8    Peb: 7fffffde000  ParentCid: 03b4
    DirBase: 19486c000  ObjectTable: fffff8800b78b430  HandleCount: 158.
    Image: mobsync.exe
    VadRoot fffffa800f0b1650 Vads 80 Clone 0 Private 1129. Modified 2. Locked 0.
    DeviceMap fffff8800cb88ca0
    Token                             fffff88014d16060
    ElapsedTime                       00:00:57.613
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         155336
    QuotaPoolUsage[NonPagedPool]      7584
    Working Set Sizes (now,min,max)  (2435, 50, 345) (9740KB, 200KB, 1380KB)
    PeakWorkingSetSize                2449
    VirtualSize                       79 Mb
    PeakVirtualSize                   80 Mb
    PageFaultCount                    2545
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      1343

        THREAD fffffa8016695bb0  Cid 3ac8.3c04  Teb: 000007fffffdc000 Win32Thread: fffff900c2c08450 WAIT: (WrUserRequest) UserMode Non-Alertable
            fffffa801253e4b0  SynchronizationEvent
        Not impersonating
        DeviceMap                 fffff8800cb88ca0
        Owning Process            fffffa801347a9c0       Image:         mobsync.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      3233090        Ticks: 3690 (0:00:00:57.564)
        Context Switch Count      81                 LargeStack
        UserTime                  00:00:00.000
        KernelTime                00:00:00.015
        Win32 Start Address 0x00000000ff685d38
        Stack Init fffffa601008bdb0 Current fffffa601008b720
        Base fffffa601008c000 Limit fffffa6010083000 Call 0
        Priority 10 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffffa60`1008b760 fffff800`0226728a nt!KiSwapContext+0x7f
        fffffa60`1008b8a0 fffff800`0226868a nt!KiSwapThread+0x2fa
        fffffa60`1008b910 fffff960`001bb817 nt!KeWaitForSingleObject+0x2da
        fffffa60`1008b9a0 fffff960`001bb8ae win32k!xxxRealSleepThread+0x25f
        fffffa60`1008ba40 fffff960`001bb1fa win32k!xxxSleepThread+0x56
        fffffa60`1008ba70 fffff960`001bb4a9 win32k!xxxRealInternalGetMessage+0x72e
        fffffa60`1008bb50 fffff960`001bca15 win32k!xxxInternalGetMessage+0x35
        fffffa60`1008bb90 fffff800`022610f3 win32k!NtUserGetMessage+0x79
        fffffa60`1008bc20 00000000`772ed09a nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffffa60`1008bc20)
        00000000`0029f7a8 00000000`00000000 USER32!ZwUserGetMessage+0xa

A hang can be caused by a shared lock where different processes try to acquire it. This common deadlock scenario can be check with the !locks command to examine locks to which more than one process wants to get access:

7: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks..

Resource @ 0xfffffa800f952218    Exclusively owned
     Threads: fffffa800e585040-01<*>
KD: Scanning for held locks...

Another very useful command is !stacks to show from all processes the last stack frame where the are standing:

7: kd> !stacks


    [fffffa8012d8b7d0 csrss.exe]
2e0.0003e0  fffffa801307b7a0 ffce9c7b Blocked    cdd!PresentWorkerThread+0x479
2e0.0003ec  fffffa80130903d0 ffce9ccd Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.0003fc  fffffa8012ce8ad0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00046c  fffffa80131619c0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.000f78  fffffa800f052060 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00133c  fffffa800f1d92e0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.0011a0  fffffa800f1d6700 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00101c  fffffa800f141800 ffce9c7c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.000dbc  fffffa800f1ffbb0 ffce9c8e Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00038c  fffffa801fd14060 ffce9c7c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.001408  fffffa80207b0360 ffce9ccd Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00140c  fffffa8012efa060 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.001678  fffffa800f1017c0 ffce9c60 Blocked    win32k!xxxRealSleepThread+0x25f

A more thorough list has been created by Dmitry Vostokov at his famous Crash Dump Analysis web site which gives a good overview about the most used Windbg commands. To dig deeper you will need to buy the Windows Internals book by Mark Russinovich and understand how the Windows kernel and drivers do work and visit the NT Debugging blog where Microsoft escalation engineers show some advanced kernel debugging techniques.

If you have read until here you (should) have lost fear of the dreaded blue screen. Its not the end but the beginning of an interesting debugging session. It is a pity that so few people are able to analyze kernel dumps even at the most basic level. In many cases it is possible to find out which device driver is the guilty one. You then have the option to remove the faulting driver entirely or try to get an updated one. At least you know who is to blame and most of the time it is not the OS.

One last note: If you transfer the dump to another machine you not only need the dump file but also the exact same executable binaries on the analyzing machine to load the correct pdbs. You need to set them up under File – Image File Path in Windbg to successfully analyze dumps.

author: Alois Kraus | posted @ Sunday, October 04, 2009 4:37 PM | Feedback (3)

Auto login Fails Sporadically To Load The Profile Of A Domain User


Just a quick note to myself to remember the registry key

HKLM\Software\Policies\Microsoft\Windows NT\CurrentVersion\Winlogon

        DWORD SyncForegroundPolicy=0x1

Setting this key will ensure that Windows does not try to log on until the network subsystem has been fully initialized. Domain users with a roaming profile revert back to a local profile because the log on is simply too fast. When no user is sitting in front of the PC to enter his user name and password Windows does not always have enough time to initialize the networking subsystem. During logon this can cause failure to map network drives or to load the roaming profile. The latter one can screw your roaming profile if you e.g. log the same user on different PCs on. When one user does a log off it will store the profile back in the domain potentially screwing up the newer settings which just recently have been configured for this user.

author: Alois Kraus | posted @ Monday, October 05, 2009 4:11 PM | Feedback (0)

XmlSerializer in Action or why does my application start csc.exe processes?


If you have ever wondered why your .net application has a slow startup performance you normally start watching with Process Explorer at your processes. But this will not always give you the full picture since it can loose some csc.exe calls because if you have a fast machine it will not get them. A more reliable tool is Procmon where you can look for process starts:

Procmon_SlowApplication

Where do these compiler invocations come from? It turns out that XmlSerializer is to blame which generates a C# file in your TEMP folder, compiles it with the C# compiler csc.exe, loads this assembly into the application and removes all traces from your disc after it has done its work. This operation can cost you 0.3 - 2s of your application startup time so it is most of the time worth investigating. The assembly is needed to generate really fast de/serialization code for the types it must de/serialize. There are ways around this by switching to another serializer like DataContractSerializer from .NET Framework 3.0 or to use sgen to generate this seralization assembly only once and for all so it can be loaded later without the code generation overhead. DataContracts do not suffer from the assembly generation overhead because it does not generate any code at all at the cost of a little slower de/serialization.

Sometimes it can be quite hard to find out why some applications cause csc.exe calls. The first thing to use is the fusion log viewer (fuslogvw.exe) which is part of the .NET Framework SDK and is in your path if you start a Visual Studio 20xx command prompt.

image

Here we see assembly load failures for assemblies which names end with .XmlSerializers which is a good way to identify how many compiler calls where made and from which assemblies the de/serialized types originate from. Most of the time this information is enough but there are times when you want to identify the exact types that caused these calls. E.g. you do not want to generate serialization code for all public types of a rather huge assembly which would bloat the code size of the generated serialization assembly or you wan to embed the serialization code into your own assembly.

Luckily XMLSerializer has some debugging capabilities which can be turned on in your application App.config or machine wide if you really want to be sure that you do not miss anything.

The machine config file is located in your windows directory at

%WINDIR%\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config

Here you only need to add

<system.diagnostics>

<switches>

  <add name="XmlSerialization.Compilation" value="true"/>

</switches>

</system.diagnostics>

This will prevent XmlSerializer to clean the %temp% directory up so you have a chance to look at the generated code. Now go to your %TEMP% directory and search for *.cs files.

The the command

findstr /C:"t == typeof" *.cs

will give you the requested types

kzbrxewx.0.cs:                if (t == typeof(global::SlowApplication.Program)) {

in a very easy way. If you are afraid of an additional assembly to load you can also generate with sgen only for the needed type a code file which you can add to your project to stay lightweight. You only need to decorate your serialized type with

[XmlSerializerAssembly("YouMainModule, Version=..., Culture=...,, PublicKeyToken=....")]

That allows your code to stay slim at the expense that you need to regenerate this thing from time to time to stay in sync with the implementation of your serialized types.

 

If you ever wondered if it is possible to use generics within an XMLSerializer serialized class: No you can´t because the XML array type does not know from which collection class it originated from.  If you try to create your custom collection like this

public class FilterItemList : List<FilterItem>

        {

        }


it will not work. But there is a way around it. You only need to name your collection class of the form ArrayOf<ClassName> : List<ClassName> does work.

public class ArrayOfFilterItem : List<FilterItem>

      {

      }

Now we are back in the game and can use fields, properties with our generic type ArrayOfFilterItem  and XmlSerializer will not complain.

author: Alois Kraus | posted @ Monday, September 14, 2009 11:13 PM | Feedback (0)

Logging is not Tracing


The Enterprise Library team is trying to improve the performance of the Logging Application Block (LAB) even further. Since version one we did come a long way. It is interesting that even the guidance projects of Microsoft do have change their mind. Lets recap how the LAB has evolved since it initial release in January 2005.

Released Version Logging Application Block (LAB) Characteristics ca. Logs/s
January 2005    v1.0 Log File is opened/closed after every write. Performance was awful but accepted. 400 Logs/s
January 2006  v2.0 .NET 2.0 Support, Easier configuration, Log File is flushed after every write. Much better perf. 3000 Logs/s
April 2007 3.0 New Application Blocks but no significant changes in the LAB. 3000 Logs/s
May 2008  4.0 Log files which are rolled over are finally supported out of the box. The TextFormatter was reworked to make it over 13 times faster as I suggested. 10 000 Logs/s

 

Since customers are still complaining about the performance of the Logging Application Block a performance feature was added to the product backlog:

...

LAB02: Async logging (text formatting done asynchronously) to cut down on load on primary thread (M)

...

Looks ok. We have many CPUs by now why not put it on another thread? As Grigory has pointed out there are several problems with that approach. If you switch to another thread you have to fetch the call stack of the calling thread first before doing the thread transition. But since the Stacktrace generation is a fairly expensive operation there is little if nothing at all gained to do it on an another thread. Thread affinity strikes back and destroys your nice lazy init design pattern to initialize only the properties of a log messages which are really configured to be formatted by the message template. So what is the problem here? First I think it is the customers expecting too much from a general purpose application block. The title of the post says it all: Logging is not Tracing!

Logging

When you design a big application you need to have good and flexible error reporting perhaps across machines to collect log data in a centralized way. That is a perfect use case for the Logging Application Block where you configure some remote trace listener and send the log data to a central log server which stores its log messages in a database, log file or whatever. If you use out of process communication you are limited by the network performance already which is in the best case several thousand logs/s.

 

Tracing

Besides Error Reporting you also need to trace your program flow to find out where the performance bottlenecks are and even more important when an error occurs you have a chance to find out how you did get there. In an ideal world every function would have some tracing enabled with the function duration, passed parameters and how far you did get in your function. This is what performance profilers actually do by using the Profiling API of .NET. If you have ever used such a tool you will notice that the performance of you application will be degraded by a factor 5-100. Doing automatic instrumentation is therefore not an option and you still need to write some code or use an IL weaver to add some tracing code to your app.

I think there has never been made a real distinction between logging and tracing.

  • Logging is error reporting.
  • Tracing is following your program flow and data.

Logging is always enabled. Tracing on the other hand is normally disabled since it would generate huge amounts of data in very short time. By differentiating these two use cases you can optimize each of them in a different manner. Logging needs to be flexible. Tracing has three main goals: Correctness, performance and performance. This can heavily influence your API design and other design choices to support each scenario as well as possible.

Conclusions

And now back to the original question: Why do customers still complain about the performance of the Logging Application Block (LAB)? I am very certain that these customers use the LAB for tracing and complain about its bad performance. When you try to log enter/leave of some very frequently called functions (millions calls/s) you end up with huge log files and a crawling application. What do you do if your application is slow? Use a profiler and check where most of the time is spent and find that most of the time is spent in the LAB string formatting and disk IO functions. There is little you can do about the disk IO so you leave it and try to optimize the LAB further. That will only push the limit a little and be all in vain until the next customer complains that if he Logs enter/leave of his highly optimized GetHashCode() function his application becomes very slow. How do we get out of this? There is no easy way out of this. It could help to make such performance boundaries more explicit like the WCF team has done it with DataContracts. With explicit I mean one API for logging another one for Tracing. That could help to minimize the pain of users with wrong expectations of the performance of the LAB.

That said I see still quite some potential for the LAB to support tracing better than today:

  • Create some TraceFormatter with some quite fixed message format which includes time, pid, tid, category and message for example. No Call Stack since it is too expensive.
  • In the formatters cache the the last LogEntry object and the formatted message. If you trace to two different destinations with the same formatter one could reuse the already formatted message if the object reference is the same as last time.
  • Format the time in an efficient way like it was shown here. A factor 15 compared to DateTime.ToString should help to make TraceFormatter a really fast one.
  • Do not use multithreading. It will make it slower, harder to understand and less reliable. I expect to have 100% reliable logging and tracing. Always. The following simple code should work with no exceptions:

void Main(string [] args)

{

      Log.Info("Hello world"); // could be lost if done asynchronously

}

If you introduce a time latency between the log call and its processing you can loose the most important log message just before your application has terminated.

Perhaps we will see some nice Tracing Application Block in the future if enough customers complain. Just tell Grigory to consider this as well for future version of the Enterprise Library.

author: Alois Kraus | posted @ Sunday, June 21, 2009 4:47 PM | Feedback (3)

How To Read .NET Performance Counters Correctly


Managed Performance counters are tricky (or broken, it depends how you look at them) to read when you have more than one process with the same name running managed code. Each performance counter gets as instance name a unique identifier

  1. ManagedApp
  2. ManagedApp#1
  3. ManagedApp#2
  4. ...

If you want to know for a specific process identified by its process id thing become tricky. There is a counter in the .NET Memory category called Process ID which enables us to find out the correct counter instance name without guessing.

To find the correct instance name here is a little helper class which does it in a semi performant way:

 

using System;

using System.Diagnostics;

using System.IO;

using System.Threading;

using System.Globalization;

using System.Runtime.Remoting.Messaging;

 

namespace PerformanceCounterRead

{

    public class PerfCounterReader : IDisposable

    {

        PerformanceCounter myMemoryCounter;

        const string CategoryNetClrMemory = ".NET CLR Memory";

        const string ProcessId = "Process ID";

        const int ProcessesToTry = 40;

 

        public PerfCounterReader(int processId) : this(Process.GetProcessById(processId))

        {

        }

 

        string GetInstanceNameForProcess(int instanceCount, Process p)

        {

            string instanceName = Path.GetFileNameWithoutExtension(p.MainModule.FileName);

 

            if (instanceCount > 0) // Append instance counter

            {

                instanceName += "#" + instanceCount.ToString();

            }

 

            // Reader .NET CLR Memory Process ID for the given instance to check if

            // it does match our target process

            using (PerformanceCounter counter = new PerformanceCounter(CategoryNetClrMemory, ProcessId,

                   instanceName, true))

            {

 

                long id = 0;

 

                try

                {

                    while (true)

                    {

                        var sample = counter.NextSample();

                        id = sample.RawValue;

 

                        // for some reason it takes quite a while until the counter is

                        // updated with the correct data

                        if (id > 0)

                            break;

 

                        Thread.Sleep(15);

                    }

                }

                catch (InvalidOperationException)

                {

                    // swallow exceptions from non existing instances we tried to read

                }

 

                return (id == p.Id) ? instanceName : null;

            }

 

        }

 

        string GetManagedPerformanceCounterInstanceName(Process p)

        {

            Func<int, Process, string> PidReader = GetInstanceNameForProcess;

            string instanceName = null;

            AutoResetEvent ev = new AutoResetEvent(false);

 

            for (int i = 0; i < ProcessesToTry; i++)

            {

                int tmp = i;

                // Since reading the performance counter for every process is

                // very slow we try to speed up our search by reading up to ProcessesToTry

                // in parallel

                PidReader.BeginInvoke(tmp, p, (IAsyncResult res) =>

                    {

                        if (instanceName == null)

                        {

                           string correctInstanceName = PidReader.EndInvoke(res);

 

                           if (correctInstanceName != null)

                            {

                                instanceName = correctInstanceName;

                                ev.Set();

                            }

                        }

 

                    }, null);

            }

 

 

            // wait until we got the correct instance name or give up

            if (!ev.WaitOne(20 * 1000))

            {

                throw new InvalidOperationException("Could not get managed performance counter instance name for process " + p.Id);

            }

 

            return instanceName;

        }

 

        public PerfCounterReader(Process p)

        {

            string processInstanceName = GetManagedPerformanceCounterInstanceName(p);

            myMemoryCounter = new PerformanceCounter(CategoryNetClrMemory, "# Bytes in all Heaps", processInstanceName);

        }

 

        public long BytesInAllHeaps

        {

            get

            {

                return myMemoryCounter.NextSample().RawValue;

            }

        }

 

        #region IDisposable Members

 

        public void Dispose()

        {

            myMemoryCounter.Dispose();

        }

 

        #endregion

    }

}

To use this class you can give it your current process to check how exact the counter behaves:

 

            var p = Process.GetCurrentProcess();

            using(PerfCounterReader reader = new PerfCounterReader(p))

            {

                while (true)

                {

                    Console.WriteLine("Managed Heap Memory[{0}]: {1:N0} {2:N0}", p.Id, reader.BytesInAllHeaps, GC.GetTotalMemory(false));

                    memory.Add(new List<byte>(10000 * 1000));

                    Thread.Sleep(1000);

                }

            }

 

This will produce output similar to this:

Managed Heap Memory[1616]:     868.844   1.129.604
Managed Heap Memory[1616]:     868.844  11.202.852
Managed Heap Memory[1616]:  10.840.884  20.376.384
Managed Heap Memory[1616]:  20.912.640  30.376.376
Managed Heap Memory[1616]:  20.912.640  40.449.624
Managed Heap Memory[1616]:  20.912.640  50.522.872
Managed Heap Memory[1616]:  20.912.640  60.596.120
Managed Heap Memory[1616]:  20.912.640  70.669.368
Managed Heap Memory[1616]:  20.912.640  80.734.424
Managed Heap Memory[1616]:  20.912.640  90.807.672
Managed Heap Memory[1616]:  20.912.640 100.880.920
Managed Heap Memory[1616]: 100.841.252 110.376.732
Managed Heap Memory[1616]: 100.841.252 120.449.980

What is interesting that the GC.GetTotalMemory function gives much more precise results than the performance counter. It seems that the performance counter is updated only once every 5-10 seconds which is quite slow but better than nothing. The .NET Memory Performance Counters are updated after every GC.Collect. If you want to track during unit tests your resource consumption in a timely manner you will need to add quite big sleeps or trigger a GC in the remote process to get decent reliable numbers.

As a rule of the thumb I can only emphasize measure and check your numbers for errors. Coming from nuclear physics I was educated to question the numbers and check for consistency. This art seems to have gotten lost in our fast paced IT industry where the display (excel sheet with fancy macros) seems to be more important than what you actually did measure. If these numbers do help you to track and steer resource consumption, performance, ... then you have produced real business value. Once you have got reliable measurements you can reason about the numbers what they can tell you. With an increasing amount of work you can

  1. Measure something wrong
  2. Measure something right
  3. Measure the relevant things right
  4. Measure the relevant things right and take further actions to improve your software.

If you are stuck in 1-3 then you have gained nothing for your current project because the knowledge gained from your measurements does not flow back into your software.

Measuring for example the available physical memory before and after a test will show you that you have "lost" or "gained" 100-300 MB of memory. But what does it tell you about the resource consumption of your tests? Not much since the OS does manage your physical memory of all processes. Even if you have a big memory leak it does not necessarily show up a lost physical memory since the OS is quite good at paging unused memory out into the pagefile. The machine wide memory consumption is easy to measure but of little value (2). More about the Zen of measuring performance/consumption right is the topic of a future post.

author: Alois Kraus | posted @ Wednesday, May 27, 2009 11:02 AM | Feedback (2)

DEVPATH Is Back!


Uhh What? DevPath is an environment variable that allows you specify global directories which are searched just like GAC. If you ever had the urge to load dlls from your application from subdirectories you need a probing element in your app.config which allows exactly that.

The only problem with that is that you cannot escape from your application root directory. When you try to load something from ..\Centralbin it is ignored. In that cases you need to use the GAC if you like it or not. Since DevPath was broken for some time with .NET 2.0 I thought it was no longer supported. But thanks to John Robbins article "PDB Files: What every Developer must know." I did learn a different story. That makes it possibly now to use some Microsoft tools in a standard fashion. The Xml Serialization assembly generator Sgen for example can create the serialization assembly only if all public serializable types do not have dependencies to assemblies in other directories. This is a major PITA since fresh compiled assemblies are located in other directories than the rest (except it you have set Copy To Local to true in Visual Studio but that is a bad idea either).

But now we can alter the sgen.exe.config and add one line

<?xml version ="1.0"?>
<configuration>
    <runtime>       
        <generatePublisherEvidence enabled="false"/>
        <developmentMode developerInstallation="true"/>
    </runtime>
</configuration>

And now behold lets call sgen.exe

C:\Source>sgen.exe

System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
at System.Resources.ResourceManager.TryLookingForSatellite(CultureInfo lookForCulture)
at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo culture, Boolean createIfNotExists, Boolean tryParents)
at System.Resources.ResourceManager.GetString(String name, CultureInfo culture)
at System.Environment.ResourceHelper.GetResourceStringCode(Object userDataIn)
at System.Environment.GetResourceFromDefault(String key)
at System.Environment.GetResourceString(String key)
at System.IO.Path.CheckInvalidPathChars(String path)
at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
at System.IO.Path.NormalizePath(String path, Boolean fullCheck)
at System.IO.Path.GetFullPathInternal(String path)
at System.AppDomainSetup.set_DeveloperPath(String value)
at System.AppDomain.SetupFusionStore(AppDomainSetup info)
at System.AppDomain.SetupDomain(Boolean allowRedirects, String path, String configFile)

Ups. .NET 3.5 SP1 did not do the trick? Some investigation shows that the .NET Framework is still subject to bad error handling. If DEVPATH is empty or DEVPATH contains invalid path characters such as > < | or " then it will try to report the issue so far so good. But it seems that Microsoft seems to be a lover of fast in process tests where each methods works perfectly but the whole thing blows apart when used in a true product scenario. This is not the first time that did happen with DEVPATH but I thought that since the release of .NET 2.0 in 2005 these things would have been fixed and some regression tests had been added. Apparently I was wrong.

In my specific case I did try set devpath="c:\Source\EntLib3Src\App Blocks\bin" which did fail because of the parenthesis. Once I removed the "invalid" characters all did work out fine.

During my investigation with Reflector I stumbled upon another undocumented environment variable RELPATH which does set the private probing path.

info.PrivateBinPath = Environment.nativeGetEnvironmentVariable(AppDomainSetup.PrivateBinPathEnvironmentVariable);

When I set it to e.g. subDir then I do no longer need to set the private probing path in my App.config. Nice that could come in handy in some scenarios.

author: Alois Kraus | posted @ Thursday, May 14, 2009 7:56 PM | Feedback (2)

Measure Performance With Stopwatch


I do performance measurements quite regularly which involves to call a piece of code several times to measure how long it did take. I am sure nearly everybody has done this already. But as a physicist I know that (nearly) every measurement has fundamental problems which never go away. Key to a successful measurement is that you exactly know what you are measuring and not what you think you measure. The easiest way to get out of this dilemma without too much knowledge is to simply ignore the fact that you don´t know enough and restrict yourself to pure differential measurements. With differential I do mean that you measure it once, change the code in a way that makes it better and measure it again. When you measure different times then you can assume that your code change was the cause for the different timing. That does work to some extent quite well but you should always be vigilant when the results change dramatically from one build to the next. There is no error I did not make with performance measurements so I think it is safe to give away my top 101:

  1. DateTime.Now has a resolution of ~16ms. If you measure anything faster you must use Stopwatch which has a resolution of about 100ns.
  2. With Windows Vista you need to set your Power options to maximum performance to prevent the OS from change the CPU clock frequency at random times.
  3. Warm and cold start times are way different. Usually a factor of 2-6 is quite normal. With cold startup I mean that you have a fresh booted system which did have never run your test case. This cold startup time does mainly contribute to disk IO which you can watch with XPerf very nicely.
  4. Be sure that you measure the right thing. I cannot emphasize this enough. Did you really measure the right thing?
  5. Know your influencers. You need to get a good understanding how much e.g. number of iterations, input data size, concurrency, other applications can affect the outcome. I had more than once the case that I did execute a test in the evening and then next morning. But the results did differ by a factor two for no apparent reason.
  6. Debug and Release builds still differ in the managed world. Although the difference is much smaller than it was with C++.
  7. First time initialization effects should be measured separately. When you initialize a class and call some method 10 million times it makes not much sense to mix the ctor call time with pure function calling time.
  8. Do measure long enough to get reliable results. The mean value is a powerful ally to flatten sporadic effects.
  9. If you get strange disk IO reports check if a virus scanner does interfere.
  10. Other processes might be running as well. These might influence your test if they consume significant resources.
  11. Stupid but happens: Turn off tracing before you measure.

You still want to make some quick throw away measurements? You have been warned. It is a tricky thing. To get at least the timing calculation right I did create some extension methods I want to share there. With these extensions you can create an Action delegate and call the new Profile method on it to execute it n-times. To give you full control over the string formatting you can supply a format string which expands {runs}, {time} and {frequency} in a human readable way for you.

            Action func = () =>

            {

                using (File.OpenRead(@"C:\config.sys"))

                {

                }

            };

 

            func.Profile(1000, "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That little snippet produces on the console

Did open the file 1 000 times in 0.06s. Can do 16 667 File.Open/s

Since we know that the first call to File.Open will involve actual disk accesses which are much slower we want to measure the first call independently to see the "cold" startup performance as well:

 

func.ProfileWithFirst(1000, "First File Open did take {time}s", "Did open the file {runs} times in {time}s. Can do {frequency} File.Open/s");

That will give us

First File Open did take 0.00s
Did open the file 999 times in 0.05s. Can do 20 388 File.Open/s

There are also times when you want to control the number of iterations by yourself. Nothing easier than that. Simply change the delegate type from Action to Func<int> and you are ready to go.

 

            Func<int> myFunc = () =>

                {

                    const int Runs = 1000;

                    for (int i = 0; i < Runs; i++)

                    {

                        using (File.OpenRead(@"C:\config.sys")) ;

                        using (File.OpenRead(@"C:\autoexec.bat")) ;

                    }

 

                    return Runs * 2;

                };

 

            myFunc.Profile("Did open {runs} files in {time}s");

That was almost too easy. But what if I do not like the formatting? Well it turns out you can customize it still. The format string placeholders {0} {1} and {2} are the number of runs, elapsed time in seconds  as float and call frequency (runs/s)  as float. They can be used as usual to customize the output format to your specific needs.

myFunc.Profile("Did open {runs} files in {1:F5}s");

Did open 2 000 files in 0,10000s

The actual class is written in C# 3.0 with lambda expressions. The usage of this rather low level primitives gives you a rather surprising amount of flexibility and composability to build higher level functions.

Here is the source code for the Action and Func<int> delegate extension methods:

 

using System;

using System.Globalization;

using System.Diagnostics;

 

namespace PerformanceTester

{

    /// <summary>

    /// Helper class to print out performance related data like number of runs, elapsed time and frequency

    /// </summary>

    public static class Extension

    {

        static NumberFormatInfo myNumberFormat;

 

        static NumberFormatInfo NumberFormat

        {

            get

            {

                if (myNumberFormat == null)

                {

                    var local = new CultureInfo("en-us", false).NumberFormat;

                    local.NumberGroupSeparator = " "; // set space as thousand separator

                    myNumberFormat = local; // make a thread safe assignment with a fully initialized variable

                }

                return myNumberFormat;

            }

        }

 

        /// <summary>

        /// Execute the given function and print the elapsed time to the console.

        /// </summary>

        /// <param name="func">Function that returns the number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Func<int> func, string format)

        {

 

            Stopwatch watch = Stopwatch.StartNew();

            int runs = func();  // Execute function and get number of iterations back

            watch.Stop();

 

            string replacedFormat = format.Replace("{runs}", "{3}")

                                          .Replace("{time}", "{4}")

                                          .Replace("{frequency}", "{5}");

 

            // get elapsed time back

            float sec = watch.ElapsedMilliseconds / 1000.0f;

            float frequency = runs / sec; // calculate frequency of the operation in question

 

            try

            {

                Console.WriteLine(replacedFormat,

                                    runs,  // {0} is the number of runs

                                    sec,   // {1} is the elapsed time as float

                                    frequency, // {2} is the call frequency as float

                                    runs.ToString("N0", NumberFormat),  // Expanded token {runs} is formatted with thousand separators

                                    sec.ToString("F2", NumberFormat),   // expanded token {time} is formatted as float in seconds with two digits precision

                                    frequency.ToString("N0", NumberFormat)); // expanded token {frequency} is formatted as float with thousands separators

            }

            catch (FormatException ex)

            {

                throw new FormatException(

                    String.Format("The input string format string did contain not an expected token like "+

                                  "{{runs}}/{{0}}, {{time}}/{{1}} or {{frequency}}/{{2}} or the format string " +

                                  "itself was invalid: \"{0}\"", format), ex);

            }

        }

 

        /// <summary>

        /// Execute the given function n-times and print the timing values (number of runs, elapsed time, call frequency)

        /// to the console window.

        /// </summary>

        /// <param name="func">Function to call in a for loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="format">Format string which can contain {runs} or {0},{time} or {1} and {frequency} or {2}.</param>

        public static void Profile(this Action func, int runs, string format)

        {

            Func<int> f = () =>

            {

                for (int i = 0; i < runs; i++)

                {

                    func();

                }

                return runs;

            };

            f.Profile(format);

        }

 

        /// <summary>

        /// Call a function in a for loop n-times. The first function call will be measured independently to measure

        /// first call effects.

        /// </summary>

        /// <param name="func">Function to call in a loop.</param>

        /// <param name="runs">Number of iterations.</param>

        /// <param name="formatFirst">Format string for first function call performance.</param>

        /// <param name="formatOther">Format string for subsequent function call performance.</param>

        /// <remarks>

        /// The format string can contain {runs} or {0},{time} or {1} and {frequency} or {2}.

        /// </remarks>

        public static void ProfileWithFirst(this Action func, int runs, string formatFirst, string formatOther)

        {

            func.Profile(1, formatFirst);

            func.Profile(runs - 1, formatOther);

        }

    }

 

}

author: Alois Kraus | posted @ Tuesday, December 16, 2008 11:21 AM | Feedback (2)

Efficient Memory Usage With .NET


How can you use the word efficient memory usage and mention in the same headline .NET? We all know that C++ is much more efficient with regards to memory consumption. Yes I agree that if you really love your memory you should think twice if .NET is the right choice for you. There have been reasons why  Windows Vista has not a single managed executable executed while starting up. Ok the Event Viewer is managed which explains why it is starting so slow. First of all you need to know what things cost. The following table shows you how much memory is allocated for some common object types:

 

Type Size in Bytes (32-bit)
new object() 12
new string('\0'); 20
new DummyStruct(); 4

What perhaps is surprising that each managed class object  consumes at least 12 bytes of memory. If you want to allocate a huge number of objects you are perhaps better off with a struct value type.

The program used to get the numbers was this one:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading;

using System.Runtime.InteropServices;

 

namespace MemoryAllocation

{

 

    [StructLayout(LayoutKind.Sequential,Pack=1)]

    struct DummyStruct

    {

        public int a;

        public int b;

    }

 

    class Program

    {

        static List<T> Allocate<T>(int allocations, Func<T> allocator)

        {

            List<T> memory = new List<T>();

            for (int i = 0; i < allocations; i++)

            {

                memory.Add( allocator() ); // call function that creates a new object

            }

 

            return memory;

        }

 

        static char [] empty = new char[] { '\0' };  // input for string ctor to create an empty string

 

        static void Main(string[] args)

        {

            var before = GC.GetTotalMemory(true);

 

            const int Allocations = 1000 * 1000;

 

            // allocate memory and do not release it

            var mem = Allocate(Allocations,

                //() => new object()

                //() => new DummyStruct()

                () => new string(empty)

                );

 

            GC.Collect();

            GC.Collect();

            var after = GC.GetTotalMemory(true);

 

            GC.KeepAlive(mem);

 

            // get memory allocated by one object excluding the 4 bytes which are used for the object

            // reference in the array

            Console.WriteLine("One object consumes about {0} bytes", (after - before) / Allocations);

        }

    }

}

One thing to note is that you need to subtract from the output 4 bytes for reference types (object and string) because they are stored in an array and we do not want to count the array reference as memory consumption also. As I said in my previous post "Where Did My Memory Go" every small (< 85000 bytes) managed object allocation will eat up your physical memory because the garbage collector will traverse the managed heaps from time to time to remove dead objects and to compact the heaps. That has the effect that your objects although you will never use them will stay always hot in the memory which prevents them to go into the page file. You can of course force the OS to swap all your memory out to the page file by calling SetProcessWorkingSetSize(GetCurrentProcess(),-1,-1) but that has severe performance drawbacks when you access the swapped out memory which causes hard page faults. Windows Forms application actually do this to save memory. That is the reason why the Working Set drop to some MB when you minimize a managed application. The golden rule is to use efficient data structure to consume as little memory as possible. If you want to optimize your memory consumption you need a managed memory profiler. The ones I found most useful are

 

.NET Memory Profiler from SCITech

  • It is cheap 179€/license (taken at 13.12.2008)
  • The fastest profiler I have used so far.
  • Full 64-bit support.
  • It does support allocation stacks (which functions did lead to the object allocation).
  • View content of allocated objects.
  • It can take Snapshots of your process and compare it to another snapshot. This way you can find memory leaks quite easily.
  • Object tagging to find out which objects are new since the last snapshot.
  • Nice and fast filtering capabilities.

Of course there are also some gotchas

  • The extended profiling mode is not as stable as I would have wished. It works for most applications but can crash on bigger applications.
  • The object list is not very easy to navigate to find your biggest memory consumers.

 

The .NET Memory Profiler is easy to use and definitely worth its money.

 

YourKit Profiler for .NET

  • It can be enabled during application runtime which is a quite unique feature I have found nowhere else.
  • It is both a performance and memory profiler.
  • Full 64-bit support.
  • Class List view is easy to navigate.
  • Class Tree view is very cool to find out more about which objects contain all the others on a namespace level.
  • Memory Analysis can find duplicate strings and other waste memory anti patterns which can otherwise only be found by watching each string ...

That sounds very impressive and it is. But it has also some quite severe limitations

  • It has no allocation stack support. That makes it very hard to find out who has allocated your object.
  • Static class members cannot be tracked down to the class that holds the reference to them. They show up as object roots with no connection to anywhere.
  • Stack local instances are flagged but if you e.g. allocate 200 MB inside a function you will not be able to navigate to the class that allocated the object.
  • The fast profiling mode is much less stable and crashes quite often
  • Opening a snapshot is very slow.
  • The performance profiler (sampling, tracing) does not show me the bottlenecks where thread sleeps are involved in the way I would expect them. That can lead to the wrong direction. The new Ants Profiler 4.0  or Intels VTune Performance profiler are much better suited for that job.
  • Not so cheap 389 €/license (taken at 13.12.2008).

YourKit is the leading Java profiler company which also has a .NET profiler in their portfolio. No profiler is perfect and in fact both complement each other and I would recommend to use both (SciTech and YourKit)  to get the best possible overview how your memory in your application is distributed. All profilers can be downloaded from the software vendors for free with a 14-day trial license to try them out. I recommend to do so to find out which profiler suits your needs best. There are some other profilers also out there like AQTime and the Ants Memory Profiler. AQTime seems to be able to profile .NET and C++ for performance and memory which makes it very interesting. But so far I found not enough time check because it is not easy to use. The current Ants Memory Profiler is not usable and I cannot recommend it at all. But they have a very good performance profiler which is really worth its money. The only thing I really do not understand about the Ants profilers is that it is NOT possible to launch an application with command line arguments from a batch file. That is ok for GUI applications but if your application under test spawns child processes you need to be able to call the profiler from the command line.

Equipped with a profiler we can chase our memory now.

I recommend to look at first for

  • Duplicate strings - The objects that allocated them will most likely have several instances around. Consider to make them static to save memory.
  • *Cache* in the type name. It is surprising how much memory with caches is lost. Bigger applications seem to have their own cache in each architectural layer which should be questioned. If the cache itself is ok look how many instances of your cache exist. A sane rule is that a cache should exist only once. If you find more than one cache instance it is very likely that the cached data may be the same but it is not shared.
  • XmlDocuments have their own XML DOM tree representation which consumes quite a lot of memory (x3-x5 times more than the plain xml file). The profilers have a hard time to resolve from an XmlNode to the actual object that holds a reference to them so it can be a bit tricky to find out.
  • XmlReaderSettings are a fine source of memory leaks if you store them as member inside your class. When you choose to validate your XML document you will attach the reader settings class with the validation error callbacks to the just read XmlDocument. In effect you reference from the XmlReaderSettings instance your XML DOM tree even when you do not need it ever again!
  • Huge number of objects of the same type and check who has allocated them.

 

When you have found an inefficiency it is time to fix it. Then you need to measure memory consumption again. Here comes the hard part. It is very difficult to check if a memory optimization did actually save memory. A simple look at Working Set, Private Bytes, GC Heap does not work since the GC heaps are allocated in chunks (16 MB if I remember correctly). These numbers tell you only the peak memory consumption during startup of your application. But since then half of your heaps might be empty and you can get the impression that nothing has changed after your patch. The easiest way to check if an optimization did actually work is to look at the GC.GetTotalMemory value from time to time or you can use the memory profilers overview pages which are also helpful.

author: Alois Kraus | posted @ Friday, December 12, 2008 1:52 PM | Feedback (6)

Who Scraped Over My Disk?


I have installed Windows Vista Ultimate since some month now and I still wonder what system service I have missed to get a well behaved system. To troubleshoot heavy disk access I use Process Monitor from SysInternals. It is an invaluable tool to find out who did access what on the system. It can monitor every module load, process start, thread creation, registry or disk access and network activity you can imagine. If you are developing unmanaged code you can even view the call stacks who did access this file or registry key not only from which application but it captures also (when the symbols are loaded) the full call stack. To make the call stack work you need to download the symbols which is quite easy and I recommend to do it. Below is a typical screen shot from Process Monitor when the system is doing seemingly nothing:

As you can see there is still quite a lot activity going on behind the scenes. Process Monitor makes the Windows black box a white box and you can look directly at the system while it is working. If you ask with what black art this information is obtained: Process Monitor installs a filter driver (at run time only) which allows it to redirect nearly all IO related activities to the filter driver. To check if the driver is runnung you can use the fltmc command:

c:\>fltmc.exe

Filtername                     Number of Instances    Height    Frame
-----------------------------  --------------------    ----    -----
PROCMON11                           0       385001         0
luafv                                   1         135000          0
FileInfo                                8        45000            0

If you wan to find out who did who did open the files simply go to Tools - Stack Trace

and you will see  even the call stack who did try to access a specific file or registry key. I did try this with a small C++ application to get a full call stack but the latest version (2.02) seems have problems to pick up the correct events. I did not see my file IO at all (Windows Vista 32-bit). Ok that means that I can finish my post. But wait. My initial goal was to tell you a little known secret: XPerf is your secret weapon when you want to see what the Kernel really does.

Some introduction links are here:

Starting with Windows 2000 the ETW (Event Tracing for Windows) was introduced but it was a little known feature which did not get much attention. The main reason is that it was quite complicated to implement and use. Xperf has been used within MS since years but it was so useful they have published it. The tool can be used to capture traces on Server 2003 and XP systems but you must analyze them on a Vista system to decode them.

Its main uses are (where I have used it so far):

  • Find out the difference why the first start of your application takes so much longer than the second start
  • Who is accessing the file x zillion times?
  • Are my disk access patterns inefficient?

To get started simply install the tool and start a trace session with some default event providers (xperf -providers lists all of them).

  • xperf -on base

Now you can execute your use case, start your application, ... Then you can stop the trace session and save the traces to disk

  • xperf -d c:\savedTraces.etl

To view the saved traces simply type

  • xperf c:\savedTraces.etl or double click on the file in explorer if you have a file association with .etl files.

You will see some output like this:

As you can see there is CPU usage, Disk IO, Disk IO per process, .... Today application was scraping over my hard disk again and I never figured out what it was. Process Monitor shows you very much details but sometimes the amount of data can be overwhelming and I was not able to spot the issue. But here we see a nice 100% spike in the Disk Utilization by Process. Lets select the region of interest and click on Detail Graph (this is a context menu):

This graph shows you the time on the horizontal axis and the logical cluster number of you hard disk on the vertical axis. To make it even more interesting it shows you only true disk accesses. If all data is in the file system cache this graph will contain only a few dots which tells you that the hard disk had nothing to do with your slow system. From this graph you can conclude

  • Big vertical changes means long disk seeks thus slow performance. An optimal disk access pattern (linear read) would be a straight line.
  • Big horizontal gaps means that the disk or the application was busy. You just did wait for something to complete (random access) quite long.

Additionally the Summary table shows you also a wealth of information

In my case I did find that FireFox has its own database (urlclassifer3.sqlite) which does update its index from time to time. The database itself contains a list of URLs which are considered harmful. Firefox will warn you when you try to navigate to potential fraud urls. You can disable this thing under

Options - Security and uncheck

Tell me if the site I´m visiting is a suspected attack site.

Tell me if the site I´m visiting is a suspected forgery.

Shut down Firefox and delete the database. Voila no more hard disk scraping in the background and one reason less to blame Windows Vista. Finally this case can be closed. With this tool I managed also to find out what my little cpptest.exe application was doing:

 

When the _NT_SYMBOL_PATH is  set to c:\windows\symbols;SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols and you add the cpptest pdb directory you will get this output. From this stack trace we can conclude that cpptest.exe does call from its main method some long lasting functions such as WriteFile which are responsible for the disk IO we where seeing in the System process. One important thing about file IO with Windows Vista is that the OS seems to be veeeery lazy with file writes. If I write e.g. a 100 MB file it is has 0 bytes until I close the file! To force the Vista to write the contents of a file to disk you need to call FlushFileBuffers or use unbuffered IO.

author: Alois Kraus | posted @ Monday, December 08, 2008 12:53 PM | Feedback (0)

Where Did My Memory Go?


When you program in a high level language like .NET where the Garbage collector takes care of your memory you do not have to think about memory as often as it is the case in C++. Memory leaks tend to show up much more often in C++ and other non garbage collected languages because nobody is cleaning after you. Garbage collection is a good thing but somehow your application consumes much more memory than you thought it should. What should you do now? First of all you need to understand how your memory is organized by Windows itself. Mark Russinovich has a very eloquent article about that. If you do not understand what the whole article is all about I give you a quick start.

  1. Download Process Explorer from TechNet.

Now you see some output like this.

 

What do the numbers of Working Set, Working Set Private, Private Bytes, ... mean to a "normal" programmer? I do allocate memory, use it and let the garbage collector free it when I no longer use it. The usage of memory is easy but Windows does a lot behind the scenes to make it work in an efficient way. The most crucial part is that Windows does is to share memory between processes when it is read only. Your code and read only data in a dll is a good example of a read only data structure that can be used in many processes. If you use the same dll in more than one process it will be shared between all processes that use it.

Shared Memory

If you look at the columns in the screen shot of Process Explorer you will notice that different counters for Working Set and Private Bytes have been selected. The deeper reason for this is that these numbers are incredibly useful to tune your application. To get more data you can select the Properties (right click on a process) of a process.

 

  •  Working set is the actually used physical memory which cannot be more than the amount of your RAM chips installed on your computer.
  • Private Bytes is the memory that cannot be shared between processes.
  • Working Set Private tells you how much private bytes attribute to your working set (allocated physical memory) .

When you add the total working set of all processes you can get a much bigger number than the installed memory on your machine. The reason behind this is that much of your process data (e.g. code) can be shared. You can calculate your working set out of

Working Set = Working Set Private + Working Set Shareable

If you want to create a well behaved .NET application you would aim for a low working set and low private bytes. Private bytes are for example all your allocated objects which live on the CLR heap either in Generation 0,1,2 or the Large Object Heap. More about that comes later.

Code Sharing - NGen And Precompiled Assemblies

In the .NET environment things are complicated a bit by the JIT compiler which does compile your IL code into each process separately. To achieve full code sharing in .NET processes you need to precompile your assembly with the NGEN tool to enable cross process code sharing. If you look with process explorer at your loaded dlls (press Ctrl-D in Process Explorer) you will find that all .NET assemblies from Microsoft are precompiled to minimize the memory footprint if more than one .NET process (which is very likely) is runnig. To validate that you are you using the precompiled images look into the Fusion Log (Fuslogvw and check the Native Image checkbox). An even easier way is to look at the path of you loaded dlls. If it does contain C:\Windows\NativeImages_v2.xxxx then you have loaded the precompiled assembly successfully. If not your NGen image did not match the loaded assembly and must be updated or you are using multiple AppDomains. In that case you need to decorate your Main method with LoaderOptimization.MultiDomain value to tell the JIT compiler to share the code between AppDomains.

        [LoaderOptimization(LoaderOptimization.MultiDomain)]

         static void Main(string[] args)

 

Code size in enterprise applications can easily reach several hundred MB which would become a major headache if no code sharing between processes is possible.

 

Data Sharing

Another way to share data between processes are Memory Mapped Files which will be supported by the .NET Framework 4.0 without any PInvokes finally.

 

Page File Allocated Memory

An even trickier thing is to allocate memory in the page file directly by calling VirtualAllocEx. Since the page file is shared between all processes it is not really possible to attribute this allocation to a specific process (yet). This is the reason why Page File backed memory does not show up as private byte memory at all although your application might consume GBs of it.

 

Working Set and Allocation Size

There is a very direct relation between Working Set size and allocation size in .NET applications. Try to run the following code snippet

    class Program

    {

        static void Main(string[] args)

        {

            List<byte[]> memory = new List<byte[]>();

            const int Factor = 85; // Allocate 85000 bytes with each loop run

            while (true)

            {

                var bytes = new byte[Factor * 1000];

                memory.Add(bytes); // prevent the GC from reclaiming the memory

                Thread.Sleep(Factor / 2);  // throttle the  allocation to make it visible

                Console.WriteLine("Next run");

            }

        }

    }

That code snippet will allocate memory in blocks each the size of 85000 bytes and sleep a little to watch the memory allocations more easily. If you wonder why on Earth I did use 85000 bytes as block size: That is the size when the .NET Framework (2.0, 3.0, 3.5, 3.5 SP1) will allocate your object on the Large Object Heap. All objects on this heap are never moved by the Garbage Collector. You can observe this directly when you watch the Working Set Size. It remains constant while you allocate hundreds of MBs of private bytes memory! Windows allocates the memory and finds that since you did not touch the memory it can be moved to the page file where your application will happily allocate more and more page file but not physical memory until you reach the 2 GB limit for a 32 bit process or the page file becomes full.

The effect changes drastically when you change the Factor from 85 to 84. This will change the allocation size below the threshold and you will allocate the memory on the normal CLR heap. That heaps are compacted from time to time by the GC which means that the GC will force Windows to move our memory from the page file into the physical memory. Although our application does not access the allocated bytes the GC will which binds our memory allocation directly to physical memory!

 

When you allocate memory in smaller chunks than 85000 bytes it will be allocated in your physical RAM due to the GCs nature to traverse the whole heap from time to time.

 

That is important since it severely limits our ability to run an application with many small objects on machines with not so much RAM. It is therefor vitally important for all .NET developers to track their memory consumption and have a sharp eye on many small (<85000 bytes) object allocations which directly add to the process working set. How and which memory profilers can be used to track typical .NET applications is a topic for a future post.

author: Alois Kraus | posted @ Sunday, November 30, 2008 11:48 AM | Feedback (3)