Search
Close this search box.

Of C# Iterators and Performanc

Some of you reading this will be wondering, “what is an iterator” and think I’m locked in the world of C++.  Nope, I’m talking C# iterators.  No, not enumerators, iterators.

So, for those of you who do not know what iterators are in C#, I will explain it in summary, and for those of you who know what iterators are but are curious of the performance impacts, I will explore that as well.

Iterators have been around for a bit now, and there are still a bunch of people who don’t know what they are or what they do.  I don’t know how many times at work I’ve had a code review on my code and have someone ask me, “what’s that yield word do?”

Basically, this post came to me as I was writing some extension methods to extend IEnumerable<T> — I’ll post some of the fun ones in a later post.  Since I was filtering the resulting list down, I was using the standard C# iterator concept; but that got me wondering: what are the performance implications of using an iterator versus returning a new enumeration?

So, to begin, let’s look at a couple of methods.  This is a new (albeit contrived) method called Every(…).  The goal of this method is to access and enumeration and return every nth item in the enumeration (including the first).  So Every(2) would return items 0, 2, 4, 6, etc.

Now, if you wanted to write this in the traditional way, you may come up with something like this:

public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)
    {
        List<T> newList = new List<T>();
        int count = 0;
 
        foreach (var i in list)
        {
            if ((count++ % interval) == 0)
            {
                newList.Add(i);
            }
        }
 
        return newList;
    }
   

So basically this method takes any IEnumerable<T> and returns a new IEnumerable<T> that contains every nth item.  Pretty straight forward.

The problem?  Well, Every<T>(…) will construct a list containing every nth item whether or not you care.  What happens if you were searching this result for a certain item and find that item after five tries?  You would have generated the rest of the list for nothing.

Enter iterators.  This C# construct uses the yield keyword to effectively defer evaluation of the next item until it is asked for.  This can be very handy if the evaluation itself is expensive or if there’s a fair chance you’ll never want to fully evaluate a list.

We see this all the time in Linq, where many expressions are chained together to do complex processing on a list.  This would be very expensive if each of these expressions evaluated their entire possible result set on call. 

Let’s look at the same example function, this time using an iterator:

  public static IEnumerable<T> Every<T>(this IEnumerable<T> list, int interval)
    {
        int count = 0;
        foreach (var i in list)
        {
            if ((count++ % interval) == 0)
            {
                yield return i;
            }
        }
    }

Notice it does not create a new return value explicitly, the only evidence of a return is the “yield return” statement.  What this means is that when an item is requested from the enumeration, it will enter this method and evaluate until it either hits a yield return (in which case that item is returned) or until it exits the method or hits a yield break (in which case the iteration ends.

Behind the scenes, this is all done with a class that the CLR creates behind the scenes that keeps track of the state of the iteration, so that every time the next item is asked for, it finds that item and then updates the current position so it knows where to start at next time.

It doesn’t seem like a big deal, does it?  But keep in mind the key point here: it only returns items as they are requested. Thus if there’s a good chance you will only process a portion of the return list and/or if the evaluation of each item is expensive, an iterator may be of benefit.

This is especially true if you intend your methods to be chainable similar to the way Linq methods can be chained. 

For example, perhaps you have a List<int> and you want to take every tenth one until you find one greater than 10.  We could write that as:

 List<int> someList = new List<int>();
   
    // fill list here
   
    someList.Every(10).TakeWhile(i => i <= 10);

Now is the difference more apparent?  If we use the first form of Every that makes a copy of the list.  It’s going to copy the entire list whether we will need those items or not, that can be costly! 

With the iterator version, however, it will only take items from the list until it finds one that is > 10, at which point no further items in the list are evaluated.

So, sounds neat eh?  But what’s the cost is what you’re probably wondering.  So I ran some tests using the two forms of Every above on lists varying from 5 to 500,000 integers and tried various things. 

Now, iteration isn’t free.  If you are more likely than not to iterate the entire collection every time, iterator has some very slight overhead:

Copy vs Iterator on 100% of Collection (10,000 iterations)

Collection SizeNum IteratedTypeTotal ms
55Copy5
55Iterator5
5050Copy28
5050Iterator27
500500Copy227
500500Iterator247
50005000Copy2266
50005000Iterator2444
50,00050,000Copy24,443
50,00050,000Iterator24,719
500,000500,000Copy250,024
500,000500,000Iterator251,521

Notice that when iterating over the entire produced list, the times for the iterator are a little better for smaller lists, then getting just a slight bit worse for larger lists.  In reality, given the number of items and iterations, the result is near negligible, but just to show that iterators come at a price.  However, it should also be noted that the form of Every that returns a copy will have a left-over collection to garbage collect.

However, if we only partially evaluate less and less through the list, the savings start to show and make it well worth the overhead.  Let’s look at what happens if you stop looking after 80% of the list:

Copy vs Iterator on 80% of Collection (10,000 iterations)

Collection SizeNum IteratedTypeTotal ms
54Copy5
54Iterator5
5040Copy27
5040Iterator23
500400Copy215
500400Iterator200
50004000Copy2099
50004000Iterator1962
50,00040,000Copy22,385
50,00040,000Iterator19,599
500,000400,000Copy236,427
500,000400,000Iterator196,010

Notice that the iterator form is now operating quite a bit faster.  But the savings really add up if you stop on average at 50% (which most searches would typically do):

Collection SizeNum IteratedTypeTotal ms
52Copy5
52Iterator4
5025Copy25
5025Iterator16
500250Copy188
500250Iterator126
50002500Copy1854
50002500Iterator1226
50,00025,000Copy19,839
50,00025,000Iterator12,233
500,000250,000Copy208,667
500,000250,000Iterator122,336

Now we see that if we only expect to go on average 50% into the results, we tend to shave off around 40% of the time.  And this is only for one level deep.  If we are using this in a chain of query expressions it only adds to the savings.

So my recommendation?  If you have a resonable expectation that someone may only want to partially consume your enumerable result, I would always tend to favor an iterator.  The cost if they iterate the whole thing does not add much at all — and if they consume only partially, you reap some really good performance gains.

Next time I’ll discuss some of my favorite extensions I’ve created to make development life a little easier and maintainability a little better.

This article is part of the GWB Archives. Original Author: James Michael Hare

Related Posts