Search
Close this search box.

LINQ: GroupBy method tutorial

NTRUDUCTION

It has been almost 2 years since LINQ was officially released by Microsoft. It was another brilliant idea of Anders Hejlsberg – the lead architect of C# programming language in Microsoft. It is an amazingly easy-to-use technology that lets the programmer to greatly simplify typical queries to many popular data sources, therefore it quickly became a very popular solution.

Most of the methods offered by LINQ are very intuitive and trivial. Some of them, however, are pretty complex and complicated. In this article I will focus on describing in details one such method: GroupBy. Soon I will also publish an article concerning GroupJoin method which is even more intricated. All examples will be based on LINQ-To-Objects.

IDEA OF ‘GROUPING BY’

Grouping by could basically be described as an operation that takes a collection of some items and ‘joins’ those items that have the same key into one ‘piece’. The following example shows the idea of grouping:

A123
A123
A123
B123
B123
C123
A369
B246
C123

n this example grouping occurred by first column (first column was considered as a key in grouping). Results set contains three items since three distinct values existed in first column in the original set.

‘GROUPING BY’ IN LINQ TO OBJECTS

LINQ`s GroupBy is basically pure C# equivalent of SQL`s ‘group by’ statement. It works exactly in the same way, although it has been adjusted to C#`s needs. Here`s its full signature:

public static IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult>(
    this IEnumerable<TSource> source, Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector,
    Func<TKey, IEnumerable<TElement>, TResult> resultSelector)

Looks a bit odious and disgusting, huh? Still, it`s much better than the GroupJoin method which will be fully covered in my next article shortly J. It looks so complex partially because I shall consider almost the most complicated version (overload) of this method in this article. However in certain simpler scenarios the two latter parameters may be omitted. There are exactly 8 overloads of GroupBy method shipped with .NET Framework. They are all listed at [1].

To understand how GroupBy method works, let`s take a closer look at the following example. Firstly I have defined those 2 classes:

class Carriage

{
  public int Length { get; set; }
}

class Train

{
  public string Country { get; set; }

  public List<Carriage> Carriages { get; set; }
}

Afterwards I made a collection of Train objects and filled it with some exemplary objects:

List<Train> trains = new List<Train>();

trains.Add(new Train()

               {

                 Country = "usa",

                 Carriages =
                     new List<Carriage>() {

                       new Carriage() { Length = 25 },

                       new Carriage() { Length = 25 }
                     }

               });

trains.Add(new Train()

               {

                 Country = "USA",

                 Carriages =
                     new List<Carriage>() {

                       new Carriage() { Length = 20 },

                       new Carriage() { Length = 20 },

                       new Carriage() { Length = 18 }
                     }

               });

trains.Add(new Train()

               {

                 Country = "Japan",

                 Carriages =
                     new List<Carriage>() {

                       new Carriage() { Length = 22 },

                       new Carriage() { Length = 22 },

                       new Carriage() { Length = 19 },

                       new Carriage() { Length = 19 }
                     }

               });

trains.Add(new Train()

               {

                 Country = "japan",

                 Carriages =
                     new List<Carriage>() {

                       new Carriage() { Length = 22 },

                       new Carriage() { Length = 22 },

                       new Carriage() { Length = 19 },

                       new Carriage() { Length = 19 },

                       new Carriage() { Length = 15 }
                     }

               });

trains.Add(new Train()

               {

                 Country = "India",

                 Carriages =
                     new List<Carriage>() {

                       new Carriage() { Length = 28 },

                       new Carriage() { Length = 28 }
                     }

               });

Our goal is to group this collection by each train`s country, but to make things more challenging, for each such group I also want to have an average number of carriages in this group and average length of all carriages in this group.

GroupBy method (or speaking more specifically – its overload that is under our consideration in this article) has 3 parameters (actually it has 4 but the first one is omitted as it is used to define the GroupBy as an ‘Extension Method’). Let`s analyze the first parameter:

Func<TSource, TKey> keySelector

It is used to define a key by which collection will be grouped. In other words, every collection item that has the same key will be put in the same group. Therefore results will contain as many groups as there are objects with different keys in base collection. So, since we want to group trains collection by country they belong to, first parameter will look like this:

var result = trains.GroupBy(

                train => train.Country.ToUpper(),

Easy as a pie so far. Second parameter is quite similiar:

Func<TSource, TElement> elementSelector

Its purpose is to define how to project each element of a base collection into something that will later be used to define the shape of a result set. It should become pretty obvious after discussing the last parameter:

Func<TKey, IEnumerable<TElement>, TResult> resultSelector

Although it looks a bit complex, it provides a handy way to define a structure of a result set. In other words, it is here where you will define what exactly will GroupBy method produce. You can define pretty much whatever you only want here, but there is one limitation. You can only use data from 2 sources: first source is the group`s key (so the country name in our example) and second one is defined via the 2nd GroupBy method parameter discussed earlier. Therefore by writing this:

var result = trains.GroupBy(

                train => train.Country.ToUpper(),

                train => train.Carriages,

our results set may contain anything taken from:

  • each train`s Country.ToUpper()or
  • from each train`s Carriages collection.

This means that if Train class would contain some extra property – let`s say color of the train – then it would not be accessible and couldn`t appear in the results set.

Now let`s move on to the 3rd and last parameter which will define our results set. As stated before, I want it to contain 3 kind of information:

  1.  Train`s country
  2. Average number of carriages in all trains within this group
  3. Average length of all carriages in all trains within this group

Firstly I will give you the code for it and then we will dig through all the details:

var result = trains.GroupBy(

    train => train.Country.ToUpper(),

    train => train.Carriages,

    (train, carriages) =>

        new {

          Country = train,

          AvgCarriagesCount = carriages.Average(list => list.Count),

          AvgCarriagesLength = carriages.SelectMany(carriage => carriage)
                                   .Average(carr => carr.Length)
        });

This is a lambda expression that takes 2 arguments (which are objects that act as data source for our results set) and produces one output which is of anonymous type containing 3 fields – one for each requested kind of information.

First field should contain train`s country, so it`s obvious that first argument would work just fine.

For initializing two further fields we will have to use data from the second lambda argument – carriages (which is precisely an IEnumerable of lists of carriages for the given train). AvgCarriagesCount iterates through the collection of lists of carriages, taking each list`s length and then calculates it`s standard average value.  

Third field is a bit trickier since it should contain average value of carriages assigned for a particular group. Since it should return standard arithmetical average, every carriage should join by the same weight. Therefore a collection of lists of carriages should be flattened. In other words, I want to turn IEnumerable<List<Carriage>> into IEnumerable<Carriage> which should hold all Carriage objects from the base collection. SelectMany method does just that. The rest is fairly simple.

The following picture shows our results:

CONCLUSION

I hope I managed to encourage you to use GroupBy method whenever possible. After first shock, it appears as very straightforward to use and is very flexible as well, thanks to many overloads available. Happy coding!

REFERENCES

[1] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.groupby.aspx

[2] http://www.hookedonlinq.com/GroupByOperator.ashx

[3] http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx

Posted on Thursday, October 29, 2009 7:52 AM | Back to top

This article is part of the GWB Archives. Original Author: MARCIN KASPRZYKOWSKI

Related Posts