I had the opportunity today to write some more complex LINQ queries.
First, I started with a simple group by expression allowing me to subtotal some data for a particular key. Certainly I could have done this in the database. Many would argue that that the database is the expert at these sorts of things, so we should let the expert take care of it. I have a couple of reasons for placing the group by in the LINQ query. First, I don't have a lot of control over the data layer. Second, I can unit test the group by code with simple NUnit tests, without ever hitting the database. To me, the second reason is much more compelling. I don't have enough data yet to draw any conclusions, but the concept is promising. Instead of inserting test data into a few data tables, I can simply mock the data that is used as input to my LINQ query.
As, my requirements evolved, I discovered a simple group by would not suffice. I needed to join two lists of data together. I was able to replace the group by with a "group join" using the "into" keyword. The join includes groups from the outer list for all of the entries of the inner list. Sometimes this would result in an "empty" group. I was able to eliminate the empty group by simply testing the count.
Here is what my query basically looks like in the end (some names have been changed to protect the innocent):
1: var clientSummaryQuery = from client in _dataContext.AllClients
2: join clientDiscount in discounts on client equals clientDiscount.Client into d
3: join clientOrder in orders on client equals clientOrder.Client into o
4: where o.Count() > 0
5: orderby client.FullName
6: select new
7: {
8: Client = client,
9: TotalOrder = o.Sum( orderSummary => orderSummary.OrderAmount ),
10: TotalDiscount = o.Sum( orderSummary => orderSummary.OrderAmount ) * d.Sum( discountSummary => discountSummary.DiscountPercentage ),
11: NetOrder = o.Sum( orderSummary => orderSummary.OrderAmount ) * (1 - d.Sum( discountSummary => discountSummary.DiscountPercentage )),
12: };
The query above is only slightly simpler than my production query, but you should be able to get the gist of what I am doing. I tweaked the query above, so it could have a syntax error here or there.
I start with all clients in the system, then I join to two other lists. In both cases the joins are "group joins" (acheived with the "into" keyword). I include a where clause to ensure I only list those clients who have outstanding orders. Finally, I use the Sum aggregate method on my groups and I'm done.
Again, the beauty of this is that I can mock what my data context gives me for AllClients and discounts and orders. I can exercise different scenarios with some simple unit tests. For example, I can include clients that have no discounts, a single discount or multiple discounts. Similarly, I can start with a surplus of clients and verify that those with no orders are excluded. Finally, I can verify that all of the orders for a client are summarized as expected. All of this can be done without ever hitting the database. I use Rhino Mock to mock my data.
By the way, if anyone knows of a better way to achieve what I have done here, please share.