Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  133 Posts | 8 Stories | 368 Comments | 162 Trackbacks

News



Archives

Post Categories

Image Galleries

Programming

We all make dumb mistakes which are not interesting to share. But sometimes people make interesting mistakes. Below is a simple helper class that makes combining several delegates in a logical and manner very easy.

 

    static class Constraints<T>
    {
        /// <summary>
        /// Logical and of a list of predicates.
        /// </summary>
        public static Predicate<T> And(params Predicate<T>[] ands)
        {
            // return a new delegate which checks if all delegates in the array did return 
            // true for a given item
            return item => ands.All(x => x(item));
        }
    }

You can use it then like this:

   class Program
    {
        static void Main(string[] args)
        {
            int[] data = new int[20 * 1000 * 1000]; // 20 million ints

            Predicate<int> CheckForZero = Constraints<int>.And(i => i > -1, 
                                                               i => i < 1);

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < data.Length; i++)
            {
                if (CheckForZero(data[i]) == false)
                {
                    Console.WriteLine("Violation found at {0}", i);
                }
            }
            sw.Stop();
            Console.WriteLine("Did take: {0:F2}s", sw.Elapsed.TotalSeconds);
        }
   }
Now you have a nice way to combine separate checks in a reusable way. The only issue is that for many calls to CheckForZero the running time is quite high.
Did take: 0,90s

If we compare this to the "normal" way

            for (int i = 0; i < data.Length; i++)
            {
                if ( !( data[i] > -1 && data[i] < 1) )
                {
                    Console.WriteLine("Violation found at {0}", i);
                }
            }

Did take: 0,01s

Whoa. The delegate version is 90 times slower! I found this issue by looking at the mount Everest's in terms of CPU consumption. What is causing so much overhead?

Well you just did wake up a sleeping dragon

image

with your LINQ delegate version. It is the Garbage Collector! Here is the allocation report of our slow sample:

image

We have a 80MB int array with 20 million elements. But we allocate 2,5GB of closure classes! Our Constraints.And method needs one closure to capture the list of delegates.

This is only needed once since we cache the returned delegate and will not show up in the profiler at all.

image

 

Things get interesting when we invoke the And method of the captured class. In this case we are allocating one additional closure to give access to the current item to check and the enclosing capturing class which gives us access to the ands array needs also be allocated. On top of that we need to create a new Func<Predicate<T>, bool> delegate instance as argument to the All LINQ statement.

image

To sum it up we allocate on each iteration

  • 1 c_DisplayClass4   32 bytes
  • 1 Func<Predicate<T,bool>> delegate instance 64 bytes
  • 1 SZArrayHelper+SZGenericArrayEnumerator 32 bytes
    • Where does this come from? It is caused by the All LINQ statements which iterates over an IEnumerable. If you foreach over an array it needs to get an enumerator from the object to enumerate. Since an array has none out of the box the CLR allocates a helper enumerator which is this strange SZArrayHelper class. If you directly foreach over an array this is NOT used.  Instead more efficient code is generated.

 

In total we allocate 128 bytes on each integer we want to check (4 bytes) which is quite some overhead. 128*20.000.000=2.560.000.000 which is exactly what the profiler reports. Since I did use from PerfView the ETW .NET Alloc method

image 

I get a correct total sum of the allocated bytes back. But the types which are responsible for the allocation and especially the counts can be misleading. If you use .NET Alloc you get all allocations but the use case will not run 1s but >2 minutes which is not something I really like to do. If you want to get exact allocation number you need to check the .NET Alloc checkbox which injects a profiling dll into all new started processes so you can get exact allocation reports. Unfortunately this does not seem to work with the old and current PerfView (1.7) version where I see anything but my new allocations.

image

When in doubt I would always use ETW .NET Alloc to get reliable although sampled numbers. This works also for JITed code in x64 on Windows 7 machines because for each allocation stack a managed stackwalk is also performed which gives you the full allocation stack. ETW does not work for JITed x64 code on Win7 except if you precompile it with NGen as workaround. On Win 8 and beyond this is no longer an issue except if you try to use API Hooking. There the x64 stack walker still stops walking the stack when the method interception is done in a form it cannot recognize.

This sounds like some complex stuff. Can we fix it? How about providing an overload to our class?

        public static Predicate<T> And(Predicate<T> p1, Predicate<T> p2)
        {
            return item => p1(item) && p2(item);
        }
 

This little overload spares us pretty much all of the overhead. We do not allocate a single byte while iterating over our array. This time we get

Did take: 0,14s

Which is nearly 9 times faster. This is quite good for such a simple overload which is preferred by the compiler when it finds this one as best match. Now we only pay for 3 delegate invocations which are not avoidable if we want to keep the flexibility here.

Now go and check your LINQ statements for hidden dragons. 2,5GB allocations with one line are quite easy to achieve with delegates and LINQ. This is the main reason why LINQ is banned from the Roslyn project because of these subtle delegate issues. It is kind of ironic that the inventors of LINQ do not want to use their own invention in their C# compiler backend for performance reasons.

posted on Sunday, December 21, 2014 2:44 AM

Feedback

# re: A Single Line Performance Error 1/14/2015 1:23 PM Frank Hileman
I do not like LINQ for the same reason. It seems to be built around the concept of "Pit of Failure." Very easy to make a giant mistake.

# re: A Single Line Performance Error 1/18/2015 4:18 AM Alois Kraus
I like LINQ but you have to be aware of its performance implications in performance sensitive code. As with every abstraction that makes you code nicer you loose also a bit control what is actually executed.

Post A Comment
Title:
Name:
Email:
Comment:
Verification: