James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 166 , comments - 1431 , trackbacks - 0

My Links


Welcome to my blog! I'm a Sr. Software Development Engineer in the Seattle area, who has been performing C++/C#/Java development for over 20 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

Follow BlkRabbitCoder on Twitter

Tag Cloud

Article Categories


Post Categories



Little Wonders

Little Wonders


C#/.NET Little Wonders: ToDictionary() and ToList()

The Little Wonders series received so much positive response I decided to make it a recurring theme in my blog as new ones popped in my head. 

There are two simple, yet great, LINQ extension methods you may or may not know about, but that can really simplify the task of converting collection queries into collections: ToDictionary() and ToList(). 

Introduction: LINQ and Deferred Execution

Depending on your knowledge of LINQ, you may be oblivious as to what many of these query expressions do behind the scenes.  For example, let’s say for the purposes of our examples today that we have some contrived POCO class (POCO stands for Plain Old CLR Object and refers to a class that is a collect of properties with sets and gets and generally very little functionality, this concept was derived from POJO in Java).

   1: // just a simple product POCO class.  
   2: public class Product
   3: {
   4:     public string Name { get; set; }
   5:     public int Id { get; set; }
   6:     public string Category { get; set; }
   7: }

Very simple class, right?  I’m not saying the application need be so simple, just to focus on LINQ itself and not necessarily what we’re trying to query.  So, in our program we can construct a simple example collection of these objects for the purposes of our examples below:

   1: var products = new List<Product>
   2:     {
   3:         new Product { Name = "CD Player", Id = 1, Category = "Electronics" },
   4:         new Product { Name = "DVD Player", Id = 2, Category = "Electronics" },
   5:         new Product { Name = "Blu-Ray Player", Id = 3, Category = "Electronics" },
   6:         new Product { Name = "LCD TV", Id = 4, Category = "Electronics" },
   7:         new Product { Name = "Wiper Fluid", Id = 5, Category = "Automotive" },
   8:         new Product { Name = "LED TV", Id = 6, Category = "Electronics" },
   9:         new Product { Name = "VHS Player", Id = 7, Category = "Electronics" },
  10:         new Product { Name = "Mud Flaps", Id = 8, Category = "Automotive" },
  11:         new Product { Name = "Plasma TV", Id = 9, Category = "Electronics" },
  12:         new Product { Name = "Washer", Id = 10, Category = "Appliances" },
  13:         new Product { Name = "Stove", Id = 11, Category = "Electronics" },
  14:         new Product { Name = "Dryer", Id = 12, Category = "Electronics" },
  15:         new Product { Name = "Cup Holder", Id = 13, Category = "Automotive" },
  16:     };

So let’s say you had a collection of these Product objects and you needed to query them.  For example, we could get an enumeration of all Product instances where Category is equal to the string “Electronics”:

   1: var electronicProducts = products.Where(p => p.Category == "Electronics");

The result of many of the extension methods like Where() is to create an iterator that executes the query as you move it through the list.  Thus, at this point electronicProducts is not a List<Product>, but is simply an IEnumerable<Product> that will be evaluated on the fly as you move through the list.  This is called deferred execution which is one of the great things about LINQ because you don’t evaluate the expression until you need the result.  So at this point electronicProducts is waiting for us to do something with it so that it can give us the results of the list!

Let me illustrate, what if we had this:

   1: // select all electronics, there are 7 of them
   2: IEnumerable<Product> electronicProducts = products.Where(p => p.Category == "Electronics");
   4: // now clear the original list we queried
   5: products.Clear();
   7: // now iterate over those electronics we selected first
   8: Console.WriteLine(electronicProducts.Count());

So, do you think we got 7 or 0?  The answer is zero because even though we set up a query on line 2 for all electronics, we cleared the list at line 5.  Thus when we actually process the query on line 8 (when we execute Count()) the list is now empty and it finds no results. 

If you are confused, think of it this way: creating a query using LINQ extension methods (and LINQ expression syntax too) is a lot like defining a stored procedure that is not “run” until you actually call into it.  I know that’s not a 100% accurate analogy, but hopefully it kind of illustrates that the LINQ expression we built in statement 2 is not executed until we process the IEnumerable. 

The ToList() LINQ Extension Method

This is why if you want to immediately get (and store) the results of your LINQ expression you should put it into another collection before the original collection can be modified.  You could, of course construct a list by hand and fill it in one of many ways.

   1: IEnumerable<Product> electronicProducts = products.Where(p => p.Category == "Electronics");
   3: // You could create a list and then hand-iterate - BULKY!
   4: var results = new List<Product>();
   6: foreach (var product in electronicProducts)
   7: {
   8:     results.Add(product);
   9: }
  11: // OR, you could take advantage of AddRange() - GOOD!
  12: var results2 = new List<Product>();
  13: results2.AddRange(electronicProducts);
  15: // OR, you could take advantage of List's constructor that takes an IEnumerable<T> - BETTER!
  16: var results3 = new List<Product>(electronicProducts);

Some may actually hand-roll a loop, which is very verbose, or you may take advantage of AddRange() or the List<T> constructor that takes an IEnumerable<T> and does it for you. 

But there’s another way as well that you can utilize.  LINQ contains a ToList() extension method that will take any IEnumerable<T> and use it to fill a List<T>.  This comes in handy if you want to execute a query and use it to fill a list all in one step:

   1: var electronicProducts = products.Where(p => p.Category == "Electronics").ToList();

Now, instead of electronicProducts being an IEnumerable<T> that is executed dynamically over the original collection, it will be a separate collection and modifications to the original collection will not affect it.

This has pros and cons, of course.  As a general rule, if you’re just going to iterate over the results and process them, you don’t need (nor want) to store it in a separate list as this just wastes memory that will later need to be garbage collected.  However, if you want to save off that subset and assign it to another class, ToList() comes in very handy so that you do not need to worry about changes in the original collection. 

The ToDictionary() LINQ Extension Method

If ToList() takes an IEnumerable<T> and converts it to a List<T> guess what ToDictionary() does and you’d probably be right.  Basically ToDictionary() is a very handy method for taking the results of a query (or any IEnumerable<T>) and organizes it into a Dictionary<TKey,TValue> instead.  The trick is you need to define how the T converts to TKey and TValue respectively.

Let’s say we have our super-mega-uber big Products list and we want to put it in a Dictionary<int, Product> so that we can get the fastest possible lookup times based on the ID.  Well, you could have done something like this:

   1: var results = new Dictionary<int, Product>();
   2: foreach (var product in products)
   3: {
   4:     results.Add(product.Id, product);
   5: }

 And it looks like a pretty benign piece of code all in all, but there’s no need to hand-write such logic with LINQ anymore.  We could have just as easily said:

   1: var results = products.ToDictionary(product => product.Id);

This constructs a Dictionary<int, Product> where the key is the Id property of the Product and the value is the Product itself.  This is the simplest form of ToDictionary() where you just specify a key selector.  What if you want something different as your value?  For instance what if you don’t care about the whole Product, you just want to be able to convert the ID to the Name?  We could do this:

   1: var results = products.ToDictionary(product => product.Id, product => product.Name);

This creates a Dictionary<int, string> where the key is the Id property and the value is the Name property of each Product.  So as you see this method has a lot of power for processing IEnumerable<T> from either a collection or a query result into a dictionary.

So let’s get even crazier!  What if (and I’ve done this plenty of times in different scenarios at work) I need a Dictionary that contains lists of logical groups? 

Update: There is also a Lookup<TKey, TValue> class and ToLookup() extension method which can accomplish this in a similar fashion.  They're not perfectly identical solutions (Dictionary and Lookup have interface differeces and their behavior of the item indexer when key is not found is different).  Thanks for the point Jon! 

So, in our Product example, let’s say we want to create a Dictionary<string, List<Product>> such that the key is a category and the value is the list of all products in that category.  Well, in the olden days you may have rolled your own loop like:

   1: // create your dictionary to hold results
   2: var results = new Dictionary<string, List<Product>>();
   4: // iterate through products
   5: foreach (var product in products)
   6: {
   7:     List<Product> subList;
   9:     // if the category is not in there, create new list and add to dictionary
  10:     if (!results.TryGetValue(product.Category, out subList))
  11:     {
  12:         subList = new List<Product>();
  13:         results.Add(product.Category, subList);
  14:     }
  16:     // add the product to the new (or existing) sub-list
  17:     subList.Add(product);
  18: }

But that’s a lot of code to do something that should be so simple!  It’s difficult to maintain and anyone new looking at this piece of code would probably have to analyze it before fully understanding it.

Fortunately, for us, we can take advantage of both ToDictionary() and ToList() to make this a snap with the help of our friend the GroupBy() LINQ extension method:

   1: // one line of code!
   2: var results = products.GroupBy(product => product.Category)
   3:     .ToDictionary(group => group.Key, group => group.ToList());

GroupBy() is the LINQ expression query that creates an IGrouping with a Key field and is also an IEnumerable of the items in the group.  So once we did GroupBy() , all we had to do was convert those groups to a dictionary, so our key selector (group => group.Key) takes the grouping field (Category) and makes it the key in the dictionary, and the value selector (group => group.ToList()) takes the items in the group, and converts it to a List<Product> as the value of our dictionary! 

So much easier to read and write and so much less code to unit test!  I know many out there will argue that lamda expressions are more difficult to read, but they are such an integral part to the C# language now that understanding them should become a must for the serious developer.  And truly, I think you’ll find as you use them more and more their conciseness actually leads to a better understanding of code and more readability than before.


Hope you enjoyed my latest two little wonders.  They may not look like much, but these two LINQ extension methods taken together can add a lot of punch to collection processing code with very little technical debt!


 Technorati Tags: ,,,,,,


Print | posted on Thursday, October 21, 2010 6:42 PM | Filed Under [ My Blog C# Software .NET Fundamentals Little Wonders ]

Powered by: