James Michael Hare

...hare-brained ideas from the realm of software development...
posts - 136 , comments - 1091 , trackbacks - 0

My Links

News

Welcome to my blog! I'm a Sr. Software Development Engineer in Seattle, WA. I've been doing C++/C#/Java development for over 18 years, but have definitely learned that there is always more to learn!

All thoughts and opinions expressed in my blog and my comments are my own and do not represent the thoughts of my employer.

Blogs I Read

MCC Logo MVP Logo

Follow BlkRabbitCoder on Twitter

Tag Cloud

Archives

Post Categories

C#: LINQ vs foreach - Round 1.

So I was reading Peter Kellner's blog entry on Resharper 5.0 and its LINQ refactoring and thought that was very cool.  But that raised a point I had always been curious about in my head -- which is a better choice: manual foreach loops or LINQ? 
 
The answer is not really clear-cut.  There are two sides to any code cost arguments: performance and maintainability.  The first of these is obvious and quantifiable.  Given any two pieces of code that perform the same function, you can run them side-by-side and see which piece of code performs better.
 
Unfortunately, this is not always a good measure.  Well written assembly language outperforms well written C++ code, but you lose a lot in maintainability which creates a big techncial debt load that is hard to offset as the application ages.  In contrast, higher level constructs make the code more brief and easier to understand, hence reducing technical cost.
 
Now, obviously in this case we're not talking two separate languages, we're comparing doing something manually in the language versus using a higher-order set of IEnumerable extensions that are in the System.Linq library.
 
Well, before we discuss any further, let's look at some sample code and the numbers.  First, let's take a look at the for loop and the LINQ expression.  This is just a simple find comparison:

   1: // find implemented via LINQ
   2: public static bool FindViaLinq(IEnumerable<int> list, int target)
   3: {
   4:     return list.Any(item => item == target);
   5: }
   6:  
   7:  
   8: // find implemented via standard iteration
   9: public static bool FindViaIteration(IEnumerable<int> list, int target)
  10: {
  11:     foreach (var i in list)
  12:     {
  13:         if (i == target)
  14:         {
  15:             return true;
  16:         }
  17:     }
  18:  
  19:     return false;
  20: }

Okay, looking at this from a maintainability point of view, the Linq expression is definitely more concise (8 lines down to 1) and is very readable in intention.  You don't have to actually analyze the behavior of the loop to determine what it's doing.
 
So let's take a look at performance metrics from 100,000 iterations of these methods on a List<int> of varying sizes filled with random data.  For this test, we fill a target array with 100,000 random integers and then run the exact same pseudo-random targets through both searches.

   1: List<T> On 100,000 Iterations
   2:  
   3: Method      Size     Total (ms)  Per Iteration (ms)  % Slower
   4:  
   5: Any         10       26          0.00046             30.00%
   6: Iteration   10       20          0.00023             -
   7: Any         100      116         0.00201             18.37%
   8: Iteration   100      98          0.00118             -
   9: Any         1000     1058        0.01853             16.78%
  10: Iteration   1000     906         0.01155             -
  11: Any         10,000   10,383      0.18189             17.41%
  12: Iteration   10,000   8843        0.11362             -
  13: Any         100,000  104,004     1.8297              18.27%
  14: Iteration   100,000  87,941      1.13163             -

The LINQ expression is running about 17% slower for average size collections and worse for smaller collections.  Presumably, this is due to the overhead of the state machine used to track the iterators for the yield returns in the LINQ expressions, which seems about right in a tight loop such as this.
 
So what about other LINQ expressions?  After all, Any() is one of the more trivial ones.  I decided to try the TakeWhile() algorithm using a Count() to get the position stopped like the sample Pete was using in his blog that Resharper refactored for him into LINQ:

   1: // Linq form
   2: public static int GetTargetPosition1(IEnumerable<int> list, int target)
   3: {
   4:     return list.TakeWhile(item => item != target).Count();
   5: }
   6:  
   7: // traditionally iterative form
   8: public static int GetTargetPosition2(IEnumerable<int> list, int target)
   9: {
  10:     int count = 0;
  11:  
  12:     foreach (var i in list)
  13:     {
  14:         if(i == target)
  15:         {
  16:             break;
  17:         }
  18:  
  19:         ++count;
  20:     }
  21:  
  22:     return count;
  23: }
  24:  

Once again, the LINQ expression is much shorter, easier to read, and should be easier to maintain over time, reducing the cost of technical debt.  So I ran these through the same test data:

   1: List<T> On 100,000 Iterations
   2:  
   3: Method      Size     Total (ms)  Per Iteration (ms)  % Slower
   4:     
   5: TakeWhile   10       41          0.00041             128%
   6: Iteration   10       18          0.00018             -
   7: TakeWhile   100      171         0.00171             88%
   8: Iteration   100      91          0.00091             -
   9: TakeWhile   1000     1604        0.01604             94%
  10: Iteration   1000     825         0.00825             -
  11: TakeWhile   10,000   15765       0.15765             92%
  12: Iteration   10,000   8204        0.08204             -
  13: TakeWhile   100,000  156950      1.5695              92%
  14: Iteration   100,000  81635       0.81635             -
Wow!  I expected some overhead due to the state machines iterators produce, but 90% slower?  That seems a little heavy to me.  So then I thought, well, what if TakeWhile() is not the right tool for the job?  The problem is TakeWhile returns each item for processing using yield return, whereas our for-loop really doesn't care about the item beyond using it as a stop condition to evaluate.

So what if that back and forth with the iterator state machine is the problem?  Well, we can quickly create an (albeit ugly) lambda that uses the Any() along with a count in a closure (if a LINQ guru knows a better way PLEASE let me know!), after all , this is more consistent with what we're trying to do, we're trying to find the first occurence of an item and halt once we find it, we just happen to be counting on the way.  This mostly matches Any().

   1: // a new method that uses linq but evaluates the count in a closure.
   2: public static int TakeWhileViaLinq2(IEnumerable<int> list, int target)
   3: {
   4:     int count = 0;
   5:     list.Any(item =>
   6:         {
   7:             if(item == target)
   8:             {
   9:                 return true;
  10:             }
  11:  
  12:             ++count;
  13:             return false;
  14:         });
  15:     return count;
  16: }

 
Now how does this one compare? 

   1: List<T> On 100,000 Iterations
   2:  
   3: Method         Size     Total (ms)  Per Iteration (ms)  % Slower
   4:  
   5: TakeWhile      10       41          0.00041             128%
   6: Any w/Closure  10       23          0.00023             28%
   7: Iteration      10       18          0.00018             -
   8: TakeWhile      100      171         0.00171             88%
   9: Any w/Closure  100      116         0.00116             27%
  10: Iteration      100      91          0.00091             -
  11: TakeWhile      1000     1604        0.01604             94%
  12: Any w/Closure  1000     1101        0.01101             33%
  13: Iteration      1000     825         0.00825             -
  14: TakeWhile      10,000   15765       0.15765             92%
  15: Any w/Closure  10,000   10802       0.10802             32%
  16: Iteration      10,000   8204        0.08204             -
  17: TakeWhile      100,000  156950      1.5695              92%
  18: Any w/Closure  100,000  108378      1.08378             33%
  19: Iteration      100,000  81635       0.81635             -

Much better!  It seems that the overhead of TakeAny() returning each item and updating the state in the state machine is drastically reduced by using Any() since Any() iterates forward until it finds the value we're looking for -- for the task we're attempting to do.
 
So the lesson there is, make sure when you use a LINQ expression you're choosing the best expression for the job, because if you're doing more work than you really need, you'll have a slower algorithm.  But this is true of any choice of algorithm or collection in general.
   
Even with the Any() with the count in the closure it is still about 30% slower, but let's consider that angle carefully.  For a list of 100,000 items, it was the difference between 1.01 ms and 0.82 ms roughly in a List<T>.  That's really not that bad at all in the grand scheme of things.  Even running at 90% slower with TakeWhile(), for the vast majority of my projects, an extra millisecond to save potential errors in the long term and improve maintainability is a small price to pay.  And if your typical list is 1000 items or less we're talking only microseconds worth of difference.
 
It's like they say: 90% of your performance bottlenecks are in 2% of your code, so over-optimizing almost never pays off.  So personally, I'll take the LINQ expression wherever I can because they will be easier to read and maintain (thus reducing technical debt) and I can rely on Microsoft's development to have coded and unit tested those algorithm fully for me instead of relying on a developer to code the loop logic correctly.
 
If something's 90% slower, yes, it's worth keeping in mind, but it's really not until you start get magnitudes-of-order slower (10x, 100x, 1000x) that alarm bells should really go off.  And if I ever do need that last millisecond of performance?  Well then I'll optimize JUST THAT problem spot.  To me it's worth it for the readability, speed-to-market, and maintainability.

 

Technorati Tags: , , ,

Print | posted on Friday, April 23, 2010 6:54 AM | Filed Under [ My Blog C# Software .NET Fundamentals ]

Feedback

Gravatar

# re: C#: LINQ vs foreach - Round 1.

Interesting comparison. Code with linq is more readable than code with interations, when we will have perfomance problem now we are know how to solve it, thanks
4/23/2010 10:54 AM | Denis Gladkikh
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Totally agree on readability. Given that I'm an app architect in the business/finance world, correctness is far more important overall than performance. Not that performance isn't important, but if it's fast & wrong it costs a lot more!
4/23/2010 11:54 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Actually, you are missing much of the value of LINQ when explicitly calling the extension methods. The primary purpose is NATURAL expression of a query.

So:

return list.TakeWhile(item => item != target).Count();

Becomes
(from x in list where x == target select x).Count();

At this point reading (the literal words on the screen) out loud is MUCH closer to a non-computer specific means of expressing the task.
4/23/2010 2:20 PM | TheCPUWizard
Gravatar

# re: C#: LINQ vs foreach - Round 1.

To some extent, though I think it's arguable which is more readable, that part comes down a bit more to preferece.

Though I do agree that either calling the extension methods directly or using the LINQ to objects syntax is more readable to a traditional for loop.

Either way, the LINQ-to-objects syntax is really just syntactical sugar on top of the System.Linq extension methods and in either case, the performance numbers would be identical.
4/23/2010 2:31 PM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Now that .Net 4.0 has launched, remember another benefit of Linq queries: they can be easily parallelized (see AsParallel() in the PLINQ APIs). This will be increasingly important in the coming years with the shift to multi-core.
4/27/2010 3:00 AM | Samuel Jack
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Wait a sec. I fail to see how this (your original loop code, as I would format it):

public static int GetTargetPosition2 ( IEnumerable<int> list, int target ) {
int count = 0;
foreach (var i in list) {
if (i == target) break;
++count;
}
return count;
}

is "less readable" or "harder to maintain" or a "heavier technical debt load" than (your supposedly "better" Linq code as I would format it):

public static int TakeWhileViaLinq2 ( IEnumerable<int> list, int target ) {
int count = 0;
list.Any(item => {
if (item == target) return true;
++count;
return false;
});
return count;
}

Come on people! That linq code is NOT more readable or easier to maintain than the simple loop, which is more efficient!
7/30/2010 6:52 PM | Tim Blaisdell
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Damn thing lost all my formatting. Nevermind.
7/30/2010 6:53 PM | Tim Blaisdell
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Tim:

Efficiency can be misleading. I grant the final Linq query I showed is more convoluted than some, but the point remains that Linq gives us a large functional base and library that can be used to make code easier to read and more powerful.

The Any, GroupBy, and query syntax are shorter, use deferred execution, are easy to parallelize with PLinq and save a lot of mundane and error prone algorithm writing.
7/31/2010 1:43 PM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Bah Linq, ever thought of what your going to do if you wish to port your application to a different language that doesn't use Lamda expressions or Linq! youre stuffed! there really is no difference between linq and a for loop. If it was faster than a for loop I would think its cool, but its not and it's also harder to look at, understand and debug. It's just a bunch a snobs trying to be smarter and cleverer than everyone else, Wow! look at the weird and cryptic linq expression I can write! it's slower and harder to understand and if I ever want to poert my code to objective C++ or PHP or something. I can just sit on it, and go wow, I thought it was cool but its slower and really just microsofts way of keeping you locked into their product! So THERE!
3/18/2011 12:43 AM | Linqisfor snobs
Gravatar

# re: C#: LINQ vs foreach - Round 1.

No offense but that's a bit of a stretch of an argument. Saying LINQ locks you into C# is like saying boost locks you into C++ or swing locks you into Java. It's a library like any ore library and no one is forcing you to use it.

Also it is only marginally slower than hand rolling your own code but it's already created and fully unit tested, so it definitely has benefit.

Keep in mind the reason it's marginally slower is because it supports deferred execution and not because it's LINQ or uses lambdas. You'd encounter the same slightly slower behavior if you used yield return directly. This doesn't mean don't use yield return, of course, it means you should use it judiciously when you can benefit from it.

It's like asking which is faster: string concat, string builder, or string format? The answer as all things in computer science is it depends on the job.

And don't forget that lambda expressions exist in other languages as well. S it's nothing new to C# and you could argue learning them makes your knowledge more portable.

Even C++ now has lambdas, and even had gremlin before in boost.

Thus I think your argument against LINQ is a bit oversimplified...
3/18/2011 7:24 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

I like to keep it simple, thus the oversimplified, I'm just saying I looked at it and didn't like it, it does exactly the same as a for loop but looks different. Sure use it if it provides some benefit, but don't use it just because you can.

Take this code for example:

public static void SetStatus2(string status)
{
//notify all the clients.
var query = (from c in clients
select c.Value).ToList();

//---create the callback action---
Action<IWCFServiceCallBack> action =
delegate(IWCFServiceCallBack callback)
{
//---callback to pass the seats booked
// by a client to all other clients---
callback.NotifyClient2(status);
};

//---for each connected client, invoke the callback---
query.ForEach(action);
}


I wanted to add some error checking that would remove the client if they had disconnected without informing the server.


This is what I ended up with:



public static void SetStatus2(string status)
{
Dictionary<IWCFServiceClient, IWCFServiceCallBack> copy = new Dictionary<IWCFServiceClient, IWCFServiceCallBack>(clients);
foreach (WCFServiceClient c in copy.Keys)
{
try
{
IWCFServiceCallBack cb = copy[c];
cb.NotifyClient2(status);
}
catch // TODO: TEST AND LOG THE EXCEPTION.
{
clients.Remove(c);
}
}
}



Simple! does exactly the same thing but easier to understand and add code too if you ever need to, Now if I'd continued down the linq path, sure it's possible, just pointless, looks complicated, and less effecient. I couldn't access the key from the original statement the linq way without things getting overcomplicated, or maybe I didn't know how too, Who cares! just use a for loop!

I'm just looking for someone to give me a reason to use linq, like when is it really useful? what jobs would you use linq for? because at the moment I don't see it?

I admit that using lamda expressions is a good typing shortcut, which looks nice when used for simple queries. which I may use, but if I ever want to port my code to php or something, I would regret it then. I have ported code to PHP before as a PHP server is much cheaper to run and maintain than a .NET server.

Just some things to think about before rushing out and using linq whenever you can just because it looks clever. keep it simple.
3/20/2011 7:46 PM | Linqisfor snobs
Gravatar

# re: C#: LINQ vs foreach - Round 1.

But that's a key difference. You say it does exactly the same thing as a for loop and it doesn't. They are very different mechanisms. One always executes and one does deferred execution. They are very different animals each with their own purpose. You are looking at LINQ as a "for replacement" of sorts and it's not.

As for lambdas, lambdas pre-date LINQ and have a lot of purpose even without LINQ, lambdas are just shorthand syntax for anonymous delegates and as I said exist in nearly all functional languages and many standard languages as well.

It just seems to me you're wanting to code to the least common demoninator for portability, which isn't invalid, but it does limit some of the higher-power stuff you can do with .NET.

So when would you really get a lot of traction with deferred execution? Basically anytime you want to process the contents of an enumerated collection without having to store them or process all of them if you want to short-cut.

So, if you are processing hundreds of thousands of records and want to treat them as an enumeration, but don't want to store them all in memory, you can use deferred execution to retrieve them as they are requested. Or, if you want to process items that have a heavy calculation cost, but only want to go till you find a certain one and not incur the cost for all, it can be of benefit.
3/21/2011 9:37 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Another thing to keep in mind with LINQ is you need to use it judiciously.

Incidentally your code example above boils down to one line:

clients.Values.ToList().ForEach(a => a.NotifyClient2(status));

Unfortunately a ForEach doesn't exist on IEnumerable (only on List), but you can roll your own really quickly.

The key is to remember that LINQ is NOT just loop replacement, the power of linq is in selecting and manipulating data from a larger set.

If all you are doing is iterating over all items, by all means use a foreach or for. But it's when you want to search, select, or transform data that LINQ is stellar!
3/21/2011 9:47 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

I think your missing the point, Linq takes away the granularity of control that is in a for loop.

In my first post I mentioned how it is harder to debug linq, your one line of code is a pure example of what I am talking about.

How would I add error checking and then remove the client from my original list if there was an error as in the refactored code that uses a for loop?
Also with that one liner I don't even know which client is throwing the error, if I have a thousand clients and one of them throws a generic error in NotifyClient2? how do I know which one it was? how can I get any information about the client from it, How do I remove it from the original list, point is I don't know how with linq, pass some kind of delegate or action I guess, It just makes it more complicated than a simple for loop.

Deferred execution can be executed more simply by placing your for loop in a method that can be called whenever you need it to loop over a collection, on saying that, I can't really delve into that kind of argument without a better understanding of it and when it can be used, so I defer.

So what you are saying is that Linq is a syntactical short cut for searching, selecting or transforming data, that is less efficient than a for loop, but quicker to cut code if you have learnt it. which I agree with, although I think it may be harder to maintain in the future, because of debugging, and the fact that not all future developers that take over your code may have a deep understanding of Linq. but a for loop is simple programming 101.

I'll say one thing Linq is good for: increasing job security.

And yes I am bringing the code down to the least common denominator, not just for portability but simplicity as well.
3/21/2011 7:01 PM | Linqisfor snobs
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Unfortunately, I truly think you are missing the point, you keep trying to treat them as an either/or. They are designed to be used for their specific purposes. First of all the one-liner was just to demonstrate that your first example was artificially overblown. The only method in there that even remotely is from LINQ is the ToList() method, the ForEach() is a member of the List class and not related to LINQ at all.

Once again, I reiterate: deferred execution is better when the cost of finding the next item is high, in which case it can be more efficient than holding all items together in memory or processing them all.

You would never just convert a for loop to LINQ for the sake of using LINQ, and even then it is not LINQ you are using but iterators which are a part of the C# language and not necessarily LINQ itself.

LINQ is not a syntactical short cut, it is a full fledged library that lets you query collections of objects like SQL. Saying for is more efficient would be like saying it's more efficient to return all the data from a database and hand roll your own loops client side. LINQ specializes in querying data (hence the name Language INtegrated Query).

If you are trying to just use LINQ to replace a for loop (as your example shows) you're using it for the wrong purpose and should use a for loop instead. If, however, you are trying to organize a large amount of data (say collecting a set of registrations together based on a category field into a Dictionary of Lists or something similar) then LINQ has a lot more power that is already coded and unit tested and will probably outperform the average developer's hand-rolled loop, will be much faster to market, much easier to maintain, and much less likely to have errors.

I'm just afraid you're adherence to performacne and simplicity is like the developers I used to know in a telecommunications company I worked for who would never consider Java or C++ at the time because they thought it was all too complex and they loved the simplicity of COBOL.

Lambda expressions are the future of nearly all major programming languages. There's a reason C++ added them in 0x and boost added them to C++ even before that.
3/21/2011 8:02 PM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

I didn't write that code, I found it and had to alter it. I think its a good example of what I'm trying to say. I'm just giving people a few things to think about before they start charging off using linq, anonymous delegates, actions, lamda expressions etc everywhere, all the new sugar-coated things, which I'm sure have their place. tbh I quite like the syntax of lambda expressions.
3/22/2011 1:13 AM | Linqisfor snobs
Gravatar

# re: C#: LINQ vs foreach - Round 1.

You might as well tell them not to use C#, then. You're talking about a lot of key features of the C# language there, LINQ aside. Anonymous delegates are really not that much different that anonymous subclasses in Java. Actions are just a type of generic delegate, which have been around for a long time in C#, and are very akin to function pointers in C++. Lambdas, once again, are not new to comp sci.

If you choose to code to the lowest common denominator for portability, that's fine. I think you'll find that your simplicity, though, is somewhat artificial. Quicker time to market and better maintainability are two of the big driving forces of software innovation today.

Even though some of these concepts are relatively new to C#, they have been in the comp sci community a long time in one form or another.
3/22/2011 9:33 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Gotta love the "porting" argument. if LINQ maps so directly to for loops, how does it present a barrier to porting to a new language? It doesn't; not always.

Personally, I use it simply because it's shorter. Instead of writing foreach loops I would ask myself if this might work as a LINQ expression. for example, when acquiring those keys in a list that start with a given string, I use this:

String[] gotkeys = (from w in mSoundSources.Keys
where w.ToUpper().StartsWith(keyprefix.ToUpper())
select w.ToUpper()).ToArray();

rather than this:

List<String> getkeys=new List<String>();
foreach(var w in mSoundSources.Keys)
{
if (w.ToUpper().StartsWith(keyprefix.ToUpper()))
getkeys.Add(w.ToUpper());



}
return getkeys.ToArray();


The clear disadvantage in the second case is the use of a temporary list. Eliminating that would entail quite a bit of otherwise boilerplate array handling code; and perf issues might make one prefer to allocate that array in chunks, meaning it has to be resized at the end of it. Arguably- LINQ probably has similar discardables being generated (like the List) but personally I trust the people working on the language to get it write once more than I trust myself to do it properly countless times.

I also fail to see how knowing LINQ would increase Job security. It is hardly rocket science. It's not C# specific, similar constructs exist in other languages; you can even get LINQ libraries for PHP, so if for some reason you didn't understand LINQ expressions enough to convert them to the allegedly simple for loops they translate to, you can use one of those when porting.

What this sounds more like is "Linqisfor snobs" glanced at LINQ, jumped to conclusions, and is now creating arguments based entirely on those false conclusions. Of course there is no reason to use LINQ everywhere, or to avoid using for loops entirely. Personally, I just use it when I don't have to think about it too much. As I use it I'm finding that to be more and more common.

I'll admit my initial thoughts on LINQ were pretty much of the "it's for snobs" camp, but It's definitely made things easier for me. Most of the times I use it, converting it to a foreach or equivalent structure would entail the creation of any number of temporary lists and objects, countless lines of code but for what purpose? merely to not use LINQ? That's a pretty silly excuse.
7/1/2011 1:15 PM | BC_Programmer
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Linqisfor snobs

In your example, what happens if you want to order/sort your clients by one of its properties?
What sorting algorithm are you going to use? How long are you going to take to test your sorting algorithm? Can your implemention be used on any collections? Why not just use LINQ OrderBy function?

Just one example of MANY other such as GroupBy, Join, ....
8/25/2011 1:07 PM | Mark
Gravatar

# re: C#: LINQ vs foreach - Round 1.

I see a bunch of "straw man" arguments here from both sides of this debate... YES, "BC_Programmer", LINQ does create a temp list just like your for loop code.

Personally, I avoid LINQ for trivial looping and selection.

(1) LINQ imposes a minimum platform of .NET 3.5, which is good enough reason to avoid it if your project audience is large and your primary use of LINQ is merely syntactic sugar;
(2) LINQ is slightly slower;
(3) LINQ is much more resource intensive (I marvel that resource usage is almost always left out of performance debates).

Frankly, I don't find either side's arguments are overwhelming. Sometimes LINQ can produce some real nifty, readable, short code. Other times, LINQ produces a convoluted mess of nested generic and anonymous Frankenstein types that could be trounced by a plain-old for loop and a couple if-then statements.

Bottom line: Use it like any tool--when it's needed; not gratuitously. I will not introduce LINQ code into any project (thereby increasing minimum requirements) until I have concluded LINQ is a significant benefit for the project, and not just a for-loop replacement. That's my own opinion.
1/29/2012 5:14 PM | Kevin
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Kevin: opinions are good things, but I think you make too much of the minimum requirements for LINQ, everyone should be on .NET 3.5 for a number of reasons, not just for LINQ. In fact, I'd say everyone should move to 4.0 if no other reason than to take advantage of the TPL.

Also, you're painting all of LINQ when really what you are talking about is iterators. Iterators are a bit slower, but they also have the power of deferred execution, and many LINQ statements (Any(), All(), etc) do NOT use deferred execution (iterators) because they don't need to.

Personally, I appreciate that LINQ is very well tuned and will likely become more so as it matures, and it's fully unit tested which in of itself is a major plus, less code I have to write for both the algorithm and for unit testing.
1/30/2012 9:23 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

Thanks for your insightful post. Based on your code, I have some interesting findings simply changing IEnumerable<int> to int[]. (Only for the first Find example)

1) int[] is much faster than IEnumerable<int>. I mean, even if the data we plug in the functions are all arrays, simply changing the function prototype from FindViaIteration(IEnumerable<int>, int) to FindViaIteration(int[], int) would speed up 2x.

2) int[] can be much more highly optimized. In Release mode, its speed can be further improved 5x compared to Debug mode. While there is no such observation for IEnumerable<int> (there is speed up, but not as high as 5x. actually it's about 2x).

3) unsafe doesn't improve the performance here.

Testing on my laptop for finding 9,999,999 on a 10M length array, my timing results are as follows.
168ms for Debug + IEnumerable<int> + foreach
87ms for Debug + int[] + foreach
10ms for Release + int[] + foreach
138ms for Release + int[] + Linq

The basic conclusion is, if you need dev speed, use LINQ. When performance becomes serious, use for/foreach + array in the important areas (which comes from the profiler) to maximize the compiler optimization.
1/26/2013 12:41 AM | Yan
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Yan:

Yes, in general array iteration is faster since it uses indexing to get to the next item, wheras LINQ uses iterators and enumerators which have object allocations tied with them.

Now, that said, there's nothing that says that MS can't decide in the future to make optimizations to make arrays iterate faster in LINQ (foreach has many such optimizations depending on what you are iterating over).

But I think you'll find that arrays often beat the other containers in terms of iteration speed. The trade-off, of course, is that they are a fairly rigid structure and difficult to grow/shrink as needed.

So, as in all things, make it correct first, and then (and only if you determine that the array is *the* performance bottleneck of your code) optimize as needed.
1/29/2013 11:32 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

James:

Your statement "only if you determine that the array is *the* performance bottleneck" - which will almost never be true and becomes the defense for "it's OK to use Linq even though it's slower". I generally find that the bottleneck in any application process is not a single method or a single Linq statement, but many tiny things that add up to create the performance issue. For example, if I create 1 billion little tiny files, my hard drive would fill up - and I would hunt down the largest files I can find and delete them, but my drive would still be nearly full. Seeing a little tiny file, I pass on deleting it because it's so tiny it couldn't possibly make a difference. But in reality, deleting a bunch of little tiny files would in fact free up a lot of space. Same thing can be said for performance - which is why I optimize nearly all my methods, i.e. declaring variables properly, performing work outside a loop, storing local references to object that are more expensive to retrieve like unboxing an object, etc... I don't spend hours looking for the fastest approach - but I definately start with for/foreach before even considering Linq.

Recently a question was asked - "how can I send parts (subsets) of an array to a webservice" and one of the top, most posted, answers was Linq's Skip().Take() extensions. It was written in a single line of code. I thought WOW! Micosoft wrote it, so it must be good and fast! Well... I was wrong. I wrote some performance tests and found that it was 282,993% slower than a simple method written using Array.Copy from .NET 1.1. I modified my new method to be an extension, and now I can perform a Linq like Skip().Take() without the performance hit. In fact, Linq reminds me of the basic rules around Reflection - don't use it unless you absolutely have to - However, I do cheat from time to time and use the Find() extension.

As you've stated, Linq can be very useful and the deferred execution when reading large files is awesome. In small processes with a known, limited execution plan, Linq can definately speed up development and the performance hit is so minor that it doesn't matter. But in large applications, with hundreds of execution points calling hundreds of unknown methods (written by many different developers), Linq could be the bottleneck of the application and nobody would know it because "it's OK to use Linq even though it's a little slower".
2/17/2013 6:18 AM | Chris G.
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Chris G.:

I'd have to say I disagree. Often times it's not the tiny things that add up to become a performance bottleneck, it's the large items. LINQ is a tool provided by Microsoft to easy development pain, much like the collection classes.

And yes, because MS provides them we can make certain assumptions: they are safe, they are (hopefully) well tested, they are error-free, and they perform (within expectations).

The later is extremely important. If one is using LINQ without know what the algorithmic complexity is, it is just as bad as choosing a List<T> when a HashSet<T> would be more appropriate.

But to your point about large applications, again I totally disagree, in my previous job we had multiple, high-performance, large-scale services using C# and making heavy use of LINQ. This allowed us to develop the services very quickly and (more importantly) relatively error free. Developing complex algorithmic logic in many of these cases could have costed more time, required more testing, and possibly been more error prone and even less performant.

When we *did* have performance issues, we profiled the code and found more often than not that the problem was caused by choosing the wrong collection, or database access, or network times. Very, very rarely did we have a performance issue that ended up being a LINQ issue.

I'd really like to see this code that is supposedly 282,993% slower in LINQ. I have a feeling that there is a different underlying issue there than just LINQ.

Now, that said, if I optimized my code and determined that a particular line of LINQ was running unacceptably slow, would I "de-normalize" it? Sure! But only after I had optimized it.

Remember the optimization creedo: "It's easier to make a correct program fast than a fast program correct."

After all, if speed was the end-all-be-all we'd all still be using C++ (Assembly, etc....)
2/18/2013 10:46 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@James

Here's the link to my test results - Linq was 5,773% slower and Skip().Take() was 282,993% slower than my Array.Copy solution.

http://stackoverflow.com/questions/11207526/best-way-to-split-an-array

While it might have been faster to type Skip().Take(), I wouldn't want my users to experience this level of delay retrieving a single web page. Also note, that Skip().Take() (or the Linq solution) would have most likely been used in a single method but may not be the sole bottleneck despite the huge percentage increase - in fact, the use of Skip().Take() may go unnoticed all together if the collection is small enough and the Linq version definately would have been overlooked as a bottleneck. However, a 5000% decrease in performance is still 5000% - and if you add up enough of those in a single web request, pretty soon the user waits 2-3 seconds vs 0.2 seconds for the same page. Acceptable? Probably from a business perspective - but unacceptable from a memory/performance perspective.

You did touch on a good point - Linq makes it easier and faster to code and then you go on to state "If one is using LINQ without knowing what the algorithmic complexity is". Isn't that the point of Linq in the first place? To make it easier and faster to code with? So that you don't need to know or understand what the algorithmic complexity is? Why bother wasting time trying to optimize something that should be easier and faster to code with? If I have to go to those lengths to use Linq, why wouldn't I simply write the complex algorithmic method myself and move on?

In my opinion, Linq has many good uses, some that can solve complex business rules quite easily, deferred execution for reading files or streaming results from SQL, etc. But I would never use it to simply replace a for/foreach loop just because I can write it in one line of code. This method of coding adds unnecessary complexity making the code more difficult to read and debug - not to mention the performance hit. On the other hand, I wouldn't spend 1/2 a day avoiding a simple Linq statement to solve a complex business rule that performed quite well either. The bottom line is that a developer still needs to be a developer and know when to use Linq and when not to - in other words, developers still need to do their homework and understand the tools they're using rather than simply learning syntax and claim "I'm a developer because I know Linq". Microsoft has made it far too easy for any bum off the street to become a "developer", which is why I have a job in the performance department - and quite happy to be there too.


3/6/2013 7:40 AM | Chris G.
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Chris:

First of all, there's an error in your Skip().Take() solution where you are allocating an extra array that is unneeded.

Second, this is a somewhat contrived example. If your goal is to take 1,000,000 sliding slices of an array, then using direct indexing is obviously an advantage. It would be like saying a Hashtable is horrible because to sort the data in it would be expensive, and that's not what Hashtables do well, so why did you even choose it?

So what you're doing is timing a choice of algorithm for a *particular* problem that favors a specific solution. This does not mean that Skip().Take() is bad nor that Linq is bad, it just means that for a particular case where you want to slice an array of 1,000,000 element thousands of times, an array.Copy() is more efficient, which few would disagree with.

However, when you look at the general case, LINQ performs decently well. Is it as fast as slicing a sub-array from an array? Of course not, that's optimized for Arrays, but LINQ works for all manner of sequences, not just arrays.

You can't use the array.Copy() as easily, for instance, if you have a sequence that doesn't allow random access. So in those cases, would you suggest someone first copy it to an array and then take a slice?

Again, the whole point is to pick the best tool for the job. And a great majority of the time LINQ fits that build because it is easier to use and fairly optimal when used correctly.

And no, I do have to say I reject your assertion that LINQ is supposed to remove the need to know algorithmic complexity. Absolutely not! Every programmer should always understand algorithmic complexity in the code they write. Otherwise, they'd be using Arrays where a Dictionary is more appropriate and so on.

So yes, your example does slog down with Skip().Take() which makes sense because you are re-iterating over the array several times. So you're not just timing Skip() and Take(), but your application of Skip() and Take().

I do agree with the majority of your last paragraph though. I don't think that a developer just needs to know LINQ and that's it. I think any good developer should know the trade-offs of all the tools and use them as appropriate. I also think every good developer should have a thorough understanding of algorithmic analysis (I'm old school that way) and know how to use a profiler to find their hot-spots in the code.

I don't though agree that MS has "made it far too easy for any bum off the street to become a 'developer'", I think MS's tools have made it far easier for a good developer to become a better developer because they have a wider array of tools to choose from, and many of those help us crank out correct code faster.

The danger of "bad" developers becoming developers has always there, I don't think MS has really done anything differently here than any other sufficiently high-level language has.

So yes, I agree on choose the right tool for the job, and slicing an array with Array.Copy() is a great tool especially if you need to make multiple slices over and over. But that does not in any way mean Array.Copy() is the only tool for the job. If you have a non-array sequence it would be much more prohibitive to use, for example.
3/6/2013 10:55 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Chris: BTW, I did like your research though so I +1'd you on your SO answer.
3/6/2013 10:57 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

@Chris: It's interesting, the more I look at that SO question, the less I like Skip() and Take() for the answer. Keep in mind this is nothing against Skip() and Take() itself, which I think are useful for other reasons, but because in this case the person wants to divide an array into chunks of 100, Skip() and used in that manner will reiterate the front of the array n times making it an additional O(n^2) operation.

So, my gripe against the Skip().Take() in this case is that the Skip() is a poor choice for the job, it would be much better to use the LINQ solution, the Array.Copy() solution, or some other method of remembering where you are and then advancing.

It would be nice if the authors of .NET would give Skip() an optimization for Collection<T> containers and thus let them jump straight to the item. Perhaps we'll see that optimization some day, but until then, repeatedly skipping over the first few elements in an array is thrashing IMO and thus Skip() was a poor choice for this particular problem.
3/6/2013 11:15 AM | James Michael Hare
Gravatar

# re: C#: LINQ vs foreach - Round 1.

True dat!! Maybe I'll dig into it myself and write my own Skip().Take() extensions.

And thanks for the upvote!
3/21/2013 6:04 AM | Chris G.
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: