LINQ: GroupBy method tutorial

INTRUDUCTION

It has been almost 2 years since LINQ was officially released by Microsoft. It was another brilliant idea of Anders Hejlsberg – the lead architect of C# programming language in Microsoft. It is an amazingly easy-to-use technology that lets the programmer to greatly simplify typical queries to many popular data sources, therefore it quickly became a very popular solution.

Most of the methods offered by LINQ are very intuitive and trivial. Some of them, however, are pretty complex and complicated. In this article I will focus on describing in details one such method: GroupBy. Soon I will also publish an article concerning GroupJoin method which is even more intricated. All examples will be based on LINQ-To-Objects.

 

IDEA OF ‘GROUPING BY’

Grouping by could basically be described as an operation that takes a collection of some items and ‘joins’ those items that have the same key into one ‘piece’. The following example shows the idea of grouping:

A

1

2

3

A

1

2

3

A

1

2

3

B

1

2

3

B

1

2

3

C

1

2

3




 

A

3

6

9

B

2

4

6

C

1

2

3

 

In this example grouping occurred by first column (first column was considered as a key in grouping). Results set contains three items since three distinct values existed in first column in the original set.

 

‘GROUPING BY’ IN LINQ TO OBJECTS

LINQ`s GroupBy is basically pure C# equivalent of SQL`s ‘group by’ statement. It works exactly in the same way, although it has been adjusted to C#`s needs. Here`s its full signature:

public static IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector,
    Func<TKey, IEnumerable<TElement>, TResult> resultSelector
)

 

Looks a bit odious and disgusting, huh? Still, it`s much better than the GroupJoin method which will be fully covered in my next article shortly J. It looks so complex partially because I shall consider almost the most complicated version (overload) of this method in this article. However in certain simpler scenarios the two latter parameters may be omitted. There are exactly 8 overloads of GroupBy method shipped with .NET Framework. They are all listed at [1].

To understand how GroupBy method works, let`s take a closer look at the following example. Firstly I have defined those 2 classes:

    class Carriage

    {

        public int Length { get; set; }

    }

 

    class Train

    {

        public string Country { get; set; }

        public List<Carriage> Carriages { get; set; }

    }

Afterwards I made a collection of Train objects and filled it with some exemplary objects:

    List<Train> trains = new List<Train>();

 

    trains.Add(new Train()

    {

        Country = "usa",

        Carriages = new List<Carriage>() {

        new Carriage() { Length = 25 },

        new Carriage() { Length = 25 } }

    });

    trains.Add(new Train()

    {

        Country = "USA",

        Carriages = new List<Carriage>() {

        new Carriage() { Length = 20 },

        new Carriage() { Length = 20 },

        new Carriage() { Length = 18 } }

    });

    trains.Add(new Train()

    {

        Country = "Japan",

        Carriages = new List<Carriage>() {

        new Carriage() { Length = 22 },

        new Carriage() { Length = 22 },

        new Carriage() { Length = 19 },

        new Carriage() { Length = 19 } }

    });

    trains.Add(new Train()

    {

        Country = "japan",

        Carriages = new List<Carriage>() {

        new Carriage() { Length = 22 },

        new Carriage() { Length = 22 },

        new Carriage() { Length = 19 },

        new Carriage() { Length = 19 },

        new Carriage() { Length = 15 } }

    });

    trains.Add(new Train()

    {

        Country = "India",

        Carriages = new List<Carriage>() {

        new Carriage() { Length = 28 },

        new Carriage() { Length = 28 }}

    });

 

 

Our goal is to group this collection by each train`s country, but to make things more challenging, for each such group I also want to have an average number of carriages in this group and average length of all carriages in this group.

GroupBy method (or speaking more specifically – its overload that is under our consideration in this article) has 3 parameters (actually it has 4 but the first one is omitted as it is used to define the GroupBy as an ‘Extension Method’). Let`s analyze the first parameter:

Func<TSource, TKey> keySelector

It is used to define a key by which collection will be grouped. In other words, every collection item that has the same key will be put in the same group. Therefore results will contain as many groups as there are objects with different keys in base collection. So, since we want to group trains collection by country they belong to, first parameter will look like this:

var result = trains.GroupBy(

                train => train.Country.ToUpper(),

Easy as a pie so far. Second parameter is quite similiar:

Func<TSource, TElement> elementSelector

Its purpose is to define how to project each element of a base collection into something that will later be used to define the shape of a result set. It should become pretty obvious after discussing the last parameter:

Func<TKey, IEnumerable<TElement>, TResult> resultSelector

Although it looks a bit complex, it provides a handy way to define a structure of a result set. In other words, it is here where you will define what exactly will GroupBy method produce. You can define pretty much whatever you only want here, but there is one limitation. You can only use data from 2 sources: first source is the group`s key (so the country name in our example) and second one is defined via the 2nd GroupBy method parameter discussed earlier. Therefore by writing this:

var result = trains.GroupBy(

                train => train.Country.ToUpper(),

                train => train.Carriages,

our results set may contain anything taken from:

  • each train`s Country.ToUpper()or
  • from each train`s Carriages collection.

This means that if Train class would contain some extra property – let`s say color of the train – then it would not be accessible and couldn`t appear in the results set.

Now let`s move on to the 3rd and last parameter which will define our results set. As stated before, I want it to contain 3 kind of information:

  1.  Train`s country
  2. Average number of carriages in all trains within this group
  3. Average length of all carriages in all trains within this group

Firstly I will give you the code for it and then we will dig through all the details:

var result = trains.GroupBy(

                train => train.Country.ToUpper(),

                train => train.Carriages,

                (train, carriages) =>

                new {

                    Country = train,

                    AvgCarriagesCount = carriages.Average(list => list.Count),

                    AvgCarriagesLength = carriages.SelectMany(carriage => carriage).Average(carr => carr.Length) });

This is a lambda expression that takes 2 arguments (which are objects that act as data source for our results set) and produces one output which is of anonymous type containing 3 fields – one for each requested kind of information.

First field should contain train`s country, so it`s obvious that first argument would work just fine.

For initializing two further fields we will have to use data from the second lambda argument – carriages (which is precisely an IEnumerable of lists of carriages for the given train). AvgCarriagesCount iterates through the collection of lists of carriages, taking each list`s length and then calculates it`s standard average value.  

Third field is a bit trickier since it should contain average value of carriages assigned for a particular group. Since it should return standard arithmetical average, every carriage should join by the same weight. Therefore a collection of lists of carriages should be flattened. In other words, I want to turn IEnumerable<List<Carriage>> into IEnumerable<Carriage> which should hold all Carriage objects from the base collection. SelectMany method does just that. The rest is fairly simple.

The following picture shows our results:



CONCLUSION

I hope I managed to encourage you to use GroupBy method whenever possible. After first shock, it appears as very straightforward to use and is very flexible as well, thanks to many overloads available. Happy coding!

 

REFERENCES

[1] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.groupby.aspx

[2] http://www.hookedonlinq.com/GroupByOperator.ashx

[3] http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx

 

posted @ Thursday, October 29, 2009 7:52 AM
Print

Comments on this entry:

# re: LINQ: GroupBy method tutorial

Left by Alex at 10/29/2009 12:25 PM
Gravatar
This is very helpful. I like how you explain each parameter, what it means and then provide the usage example. Looking forward to the next article. Thanks for posting this.

# re: LINQ: GroupBy method tutorial

Left by Johny at 11/6/2009 8:12 AM
Gravatar
Very nicely explained.

# re: LINQ: GroupBy method tutorial

Left by Sabatar Uuli at 1/27/2010 9:20 AM
Gravatar
Smart article, thanks!

# re: LINQ: GroupBy method tutorial

Left by Chuck at 7/9/2011 3:48 PM
Gravatar
Thank you - It was quite a shock and I had to re-read your post several times to wrap my mind around it and run (and debug) to get a grip. Nice post - very helpful!

# re: LINQ: GroupBy method tutorial

Left by Jet at 8/9/2011 7:38 AM
Gravatar
thank you for your very nice explanation

# re: LINQ: GroupBy method tutorial

Left by Deepak at 7/15/2012 2:41 AM
Gravatar
very nice

# re: LINQ: GroupBy method tutorial

Left by ripe at 2/15/2013 1:28 AM
Gravatar
One of the best explanations I've ever read. Thank you!

# re: LINQ: GroupBy method tutorial

Left by The Muppets Bear at 12/18/2013 12:21 AM
Gravatar
Nicely done, sir

Your comment:



(not displayed)

 
 
 
 
 

Live Comment Preview:

 
«July»
SunMonTueWedThuFriSat
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789