More Fun with C# Iterators and Generators

In my last post, I talked quite a bit about iterators and how they can be really powerful tools for filtering a collection of items down to a subset of items. This had both pros and cons over returning a full collection, which, in summary, were:

Pros:

If traversal is only partial, does not have to visit rest of collection.
If evaluation method is costly, only incurs that cost on elements visited.
Adds little to no garbage collection pressure.

Cons:

Very slight performance impact if you know caller will always consume all items in collection.

And as we saw in the last post, that con for the cost was very, very small and only really became evident on very tight loops consuming very large collections completely.

One of the key items to note, though, is the garbage! In the traditional (return a new collection) method, if you have a 1,000,000 element collection, and wish to transform or filter it in some way, you have to allocate space for that copy of the collection. That is, say you have a collection of 1,000,000 items and you want to double every item in the collection. Well, that means you have to allocate a collection to hold those 1,000,000 items to return, which is a lot especially if you are just going to use it once and toss it.

Iterators, though, don’t have this problem. Each time you visit the node, it would return the doubled value of the node (in this example) and not allocate a second collection of 1,000,000 doubled items. Do you see the distinction? In both cases, we’re consuming 1,000,000 items. But in one case we pass back each doubled item which is just an int (for example’s sake) on the stack and in the other case, we allocate a list containing 1,000,000 items which then must be garbage collected.

So iterators in C# are pretty cool, eh? Well, here’s one more thing a C# iterator can do that a traditional “return a new collection” transformation can’t! It can return unbounded enumerations!

I know, I know, that smells a lot like an infinite loop, eh? Yes and no. Basically, you’re relying on the caller to put the bounds on the iteration, and as long as the caller doesn’t you keep going. Consider this example:

public static class Fibonacci {
  public static IEnumerable<int> Sequence() {
    int first = 1;
    int second = 1;

    yield return first;
    yield return second;

    // this enumerable sequence is bounded by the caller.
    while (true) {
      int current = first + second;
      yield return current;

      // wind up for next number if we're requesting one
      first = second;
      second = current;
    }
  }
}

Whoa, you say! Yes, that’s an infinite loop! What the heck is going on there? Yes, that was intentional. Would it be better to have a fibonacci sequence that returns only a specific number of items? Perhaps, it would be safer, but then your Fibonacci sequence must be passed a stop condition, perhaps as a lambda, which could make it feel less natural.

The beauty of this function is it is as infinite as the sequence itself! The fibonacci sequence is unbounded, and so is this method. It will continue to return fibonacci numbers for as long as you ask for them. Now that’s not something you can’t do as naturally with a traditional method that would return a collection of ints representing each number. In that case you would eventually run out of memory as you got to higher and higher numbers. This method, though, never runs out of memory.

Now, that said, you do have to know when you use it that it is an infinite collection and bound it appropriately. Fortunately, Linq provides a lot of these extension methods for you!

Let’s say you only want the first 10 fibonacci numbers:

 foreach(var fib in Fibonacci.Sequence().Take(10))
    {
        Console.WriteLine(fib);
    }

Or let’s say you only want the fibonacci numbers that are less than 100:

foreach(var fib in Fibonacci.Sequence()
        .TakeWhile(f => f < 100))
    {
        Console.WriteLine(fib);
    }

So, you see, one of the nice things about iterators is their power to work with virtually any size (even infinite) collections without adding the garbage collection overhead of making new collections.

You can also do fun things like this to make a more “fluent” interface for for loops:


// A set of integer generator extension methods
public static class IntExtensions {
  // Begins counting to inifity, use To() to range this.
  public static IEnumerable<int> Every(this int start) {
    // deliberately avoiding condition because keeps going
    // to infinity for as long as values are pulled.
    for (var i = start;; ++i) {
      yield return i;
    }
  }

  // Begins counting to infinity by the given step value, use To() to
  public static IEnumerable<int> Every(this int start, int byEvery) {
    // deliberately avoiding condition because keeps going
    // to infinity for as long as values are pulled.
    for (var i = start;; i += byEvery) {
      yield return i;
    }
  }

  // Begins counting to inifity, use To() to range this.
  public static IEnumerable<int> To(this int start, int end) {
    for (var i = start; i <= end; ++i) {
      yield return i;
    }
  }

  // Ranges the count by specifying the upper range of the count.
  public static IEnumerable<int> To(this IEnumerable<int> collection, int end) {
    return collection.TakeWhile(item => item <= end);
  }
}

Note that there are two versions of each method. One that starts with an int and one that starts with an IEnumerable<int>. This is to allow more power in chaining from either an existing collection or from an int. This lets you do things like:

// count from 1 to 30
foreach(var i in 1.To(30))
{
    Console.WriteLine(i);
}
   
// count from 1 to 10 by 2s
foreach(var i in 0.Every(2).To(10))
{
    Console.WriteLine(i);
}
   
// or, if you want an infinite sequence counting by 5s until something inside breaks you out...
foreach(var i in 0.Every(5))
{
    if (someCondition)
    {
        break;
    }
    ...
}

Yes, those are kind of syntactical-candy functions and not particularly useful, but they show some of the power of generators and iterators as extension methods to form a fluid interface.

So what do you think? What are some of your favorite generators and iterators?

posted on Wednesday, April 21, 2010 10:31 PM
Filed Under [ My Blog C# ]

This article is part of the GWB Archives. Original Author: James Michael Hare

Replatforming Guide: Pros, Cons, and Impact

Deciding to replatform is no small feat; it’s like setting sails for new horizons with your digital presence. Weighing the

Cypress vs Selenium: Why Cypress is Better!

Navigating the competitive landscape of web testing tools, Cypress emerges as a noteworthy contender, outshining Selenium with its cutting-edge advantages.

More Fun with C# Iterators and Generators

Related Posts

Replatforming Guide: Pros, Cons, and Impact

Cypress vs Selenium: Why Cypress is Better!