James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links


Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud


Post Categories



Little Wonders

Little Wonders


C#/.NET Little Wonders: The ToLookup() LINQ Extension Method

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can really help improve your code by making it easier to write and maintain.

LINQ has added so many wonderful extension methods to our .NET toolbox!  In particular, there are a few methods that are useful for creating a collection from an IEnumerable<T> instance. 

The ToDictionary() and ToList() extension methods, for example, are handy for taking an IEnumerable<T> and creating a Dictionary or List respectively (you can learn more about those two methods here).

This week, we'll look at the ToLookup() extension method, which gives us yet another way to convert an IEnumerable<T> enumeration of data into a useful data structure.

The Problem

Let's start with a look at a hypothetical POCO of data we'll want to query. In this case we'll create a simple class to represent a Product which will have an Id, a Category, and a Value.

It’s a very simple class with some members we can query, nothing fancy about it, though obviously in your own code the class can be as simple or complex as you like, this is just being kept simple for illustrative purposes.

   1: public sealed class Product
   2: {
   3:     public int Id { get; set; }
   4:     public string Category { get; set; }
   5:     public double Value { get; set; }
   7:     public override string ToString()
   8:     {
   9:         return string.Format("[{0}: {1} - {2}]", Id, Category, Value);
  10:     }
  11: }

Now, let's take a look at some sample data to fill a collection:

   1: var products = new List<Product>
   2:     {
   3:         new Product { Id = 1, Category = "Electronics", Value = 15.0 },
   4:         new Product { Id = 2, Category = "Groceries", Value = 40.0 },
   5:         new Product { Id = 3, Category = "Garden", Value = 210.3 },
   6:         new Product { Id = 4, Category = "Pets", Value = 2.1 },
   7:         new Product { Id = 5, Category = "Electronics", Value = 19.95 },
   8:         new Product { Id = 6, Category = "Pets", Value = 21.25 },
   9:         new Product { Id = 7, Category = "Pets", Value = 5.50 },
  10:         new Product { Id = 8, Category = "Garden", Value = 13.0 },
  11:         new Product { Id = 9, Category = "Automotive", Value = 10.0 },
  12:         new Product { Id = 10, Category = "Electronics", Value = 250.0 },
  13:     };

Once again, nothing too fancy, I'm just loading a list full of some sample values.  In a real application these could be sourced from a database, web service, or any other source.  The beauty of LINQ is that it works on any IEnumerable<T> which means basically any enumeration of data (whether collection or iterator) will do.

So here's the problem, what if we want to get all the products, but grouped together by category?  Well, one of the obvious choices would be to use the GroupBy() extension method:

   1: foreach (var group in products.GroupBy(p => p.Category))
   2: {
   3:     Console.WriteLine(group.Key);
   5:     foreach (var item in group)
   6:     {
   7:         Console.WriteLine("\t" + item);
   8:     }
   9: }

This seems to fit the bill and gives us the desired results:

   1: Electronics
   2:         [1: Electronics - 15]
   3:         [5: Electronics - 19.95]
   4:         [10: Electronics - 250]
   5: Groceries
   6:         [2: Groceries - 40]
   7: Pets
   8:         [4: Pets - 2.1]
   9:         [6: Pets - 21.25]
  10:         [7: Pets - 5.5]
  11: Automotive
  12:         [9: Automotive - 10]

Looks good, right?  So what's the problem?  Well, there is no problem, per se, but there is a quirk.  When we use a GroupBy() extension method, it uses deferred execution.  This means, in a nutshell, that when you iterate through the collection the next item may or may not be computed until it is called for.  This can be a great performance improvement if you are searching for a particular item, but it can cause interesting side effects if not considered carefully.  In the case of GroupBy(), it actually creates the groupings when the first item is called for and not when GroupBy() is first called, which causes an interesting twist.

Consider this: what if you loaded data from the database and then wanted to group together the results and store them for quick lookup?  We'd want to store this in a collection for further use, but what if the underlying data changes before we start to enumerate?

   1: // group the products -- uses deferred execution!
   2: var groups = products.GroupBy(p => p.Category);
   4: // at some time before we iterate, all garden items get removed
   5: products.RemoveAll(p => p.Category == "Garden");
   7: // now when we iterate all garden items gone even though
   8: // we created the grouping before they were removed...
   9: foreach (var product in groups)
  10: {
  11:     Console.WriteLine(product.Key);
  13:     foreach (var item in product)
  14:     {
  15:         Console.WriteLine("\t" + item);
  16:     }
  17: }

It turns out if we run the code sample above we will see the same result as before except that all products with the garden category are missing.  This may be fine if the data we are using was created in a thread safe manner and not accessible to other threads or synchronized in some way, but an easier approach would be to immediately put the groupings into a collection.

Now, if the key we want to organize by is unique (such as Id) we could just use ToDictionary() which will load a Dictionary with the results.  The problem is that in this case, organizing by Category is non-unique, and will cause exceptions to be thrown due to duplicate entries with same key.  So we must do something else.

Of course, we could take the results of our GroupBy() operation and then convert it into a Dictionary<TKey, List<TValue>> so that each key is associated with a list of values:

   1: // can group, then use ToDictionary() on each 
   2: // group using key selector and converting group 
   3: // values to list
   4: var groups = products.GroupBy(p => p.Category)
   5:     .ToDictionary(g => g.Key, g => g.ToList());

This is starting to get convoluted, however.  Basically we're creating groups, and then using the key of the group as the key of the dictionary, and the list of values in the group gets converted to the value of the dictionary (which, is a list).

Worse yet!  You can attempt to roll your own!

   1: var groups = new Dictionary<string, List<Product>>();
   3: foreach (var p in products)
   4: {
   5:    List<Product> productsByGroup;
   7:    if (!groups.TryGetValue(p.Category, out productsByGroup))
   8:    {
   9:        productsByGroup = new List<Product>();
  10:        groups.Add(p.Category, productsByGroup);
  11:    }
  13:    productsByGroup.Add(p);
  14: }

But obviously this code is much more difficult to maintain unless you understand what it's really doing.  So what other tools does LINQ offer us?

The ToLookup() method

So the hand rolled code is obviously less than desirable due to its complexity, and the LINQ query using GroupBy() with ToDictionary() is better, but could still be simpler.  Thank goodness we do have an extension method just for this purpose: ToLookup().

The ToLookup() method creates something similar to a Dictionary of Lists, but it's actually a new .NET collection called a Lookup (surprise, surprise).  The odd thing about the Lookup class is that it has no public constructor, so the only way to create it is as the result of an internal call -- typically either from ToLookup() or the first call to GetEnumerator() on a GroupBy() result also returns a Lookup.  Also, lookups, unlike dictionaries, are immutable.  That means that once you create a lookup you cannot add or remove elements from it.

The usage of ToLookup() is very similar to that of ToDictionary(), both allow you to specify key selectors, value selectors, and comparers.  The main difference is that ToLookup() allows (and expects) the duplicate keys whereas ToDictionary() does not.

   1: // create the lookup and presumably store it somewhere
   2: var productsByCategory = products.ToLookup(p => p.Category);
   4: // later, if we want to use it
   5: foreach (var group in productsByCategory)
   6: {
   7:     // the key of the lookup is in key property
   8:     Console.WriteLine(group.Key);
  10:     // the list of values is the group itself.
  11:     foreach (var item in group)
  12:     {
  13:         Console.WriteLine("\t" + item);
  14:     }
  15: }

Notice that the key used in the grouping comes back as the Key property, and interestingly enough the collection of items is the grouping itself (the grouping implements IEnumerable<T> so iterating over the grouping iterates over all items in it).  You can also get directly to a grouping you want by using the indexer on the lookup, like:

   1: // note, if not found this does not throw, but returns empty enumerable
   2: foreach (var item in productsByGroup["Garden"])
   3: {
   4:     ...
   5: }

You don’t have to choose the whole item as the values in each grouping, either.  As an example of using a value selector, let's say you want your lookup to be a mapping of categories to product Id (instead of the whole product).  We can specify a value selector:

   1: var productIdsByCategory = prodcuts.ToLookup(p => p.Category, p => p.Id);

Now, we can easily create lookups and store them for future processing.  The syntax is much easier than the GroupBy() with ToDictionary() or using GroupBy() and it's much, much easier than rolling your own loop!


The ToLookup() method is a handy method for creating a collection from an IEnumerable<T> that creates a 1:n mapping of keys to a list of values for each key.  It can be handy for organizing data into groups and is much cleaner than rolling your own or using a dictionary of lists.

Print | posted on Thursday, March 24, 2011 5:39 PM | Filed Under [ My Blog C# Software .NET Little Wonders ]

Powered by: