I was looking through my groups’ C# coding standards the other day and there were a couple of legacy items in there that caught my eye. They had been passed down from committee to committee so many times that no one even thought to second guess and try them for a long time. It’s yet another example of how micro-optimizations can often get the best of us and cause us to write code that is not as maintainable as it could be for the sake of squeezing an extra ounce of performance out of our software.
So the two standards in question were these, in paraphrase:
- Prefer StringBuilder or string.Format() to string concatenation.
- Prefer string.Equals() with case-insensitive option to string.ToUpper().Equals().
Now some of you may already know what my results are going to show, as these items have been compared before on many blogs, but I think it’s always worth repeating and trying these yourself. So let’s dig in.
The first test was a pretty standard one. When concatenating strings, what is the best choice: StringBuilder, string.Concat(), or string.Format()?
So before we being I read in a number of iterations from the console and a length of each string to generate. Then I generate that many random strings of the given length and an array to hold the results. Why am I so keen to keep the results? Because I want to be able to snapshot the memory and don’t want garbage collection to collect the strings, hence the array to keep hold of them. I also didn’t want the random strings to be part of the allocation, so I pre-allocate them and the array up front before the snapshot. So in the code snippets below:
- num – Number of iterations.
- strings – Array of randomly generated strings.
- results – Array to hold the results of the concatenation tests.
- timer – A System.Diagnostics.Stopwatch() instance to time code execution.
- start – Beginning memory size.
- stop – Ending memory size.
- after – Memory size after final GC.
So first, let’s look at the concatenation loop:
// build num strings using concattenation.
for (int i = 0; i < num; i++) {
results[i] = "This is test #" + i + " with a result of " + strings[i];
}
Pretty standard, right? Next for string.Format():
// build strings using string.Format()
for (int i = 0; i < num; i++) {
results[i] =
string.Format("This is test #{0} with a result of {1}", i, strings[i]);
}
Finally, StringBuilder:
// build strings using StringBuilder
for (int i = 0; i < num; i++) {
var builder = new StringBuilder();
builder.Append("This is test #");
builder.Append(i);
builder.Append(" with a result of ");
builder.Append(strings[i]);
results[i] = builder.ToString();
}
So I take each of these loops, and time them by using a block like this:
// get the total amount of memory used, true tells it to run GC first.
start = System.GC.GetTotalMemory(true);
// restart the timer
timer.Reset();
timer.Start();
// *** code to time and measure goes here. ***
// get the current amount of memory, stop the timer, then get memory after GC.
stop = System.GC.GetTotalMemory(false);
timer.Stop();
other = System.GC.GetTotalMemory(true);
So let’s look at what happens when I run each of these blocks through the timer and memory check at 500,000 iterations:
Operator + -Time : 547,
Memory : 56104540 / 55595960 - 500000 string.Format() - Time : 749,
Memory : 57295812 / 55595960 - 500000 StringBuilder - Time : 608,
Memory : 55312888 / 55595960 – 500000
Egad! string.Format brings up the rear and + triumphs, well, at least in terms of speed. The Concat() burns more memory than StringBuilder but less than string.Format().
This shows two main things:
- StringBuilder is not always the panacea many think it is.
- The difference between any of the three (in the context of a creating a string in a single statement) is miniscule!
The second point is extremely important! You will often here people who will grasp at results and say, “look, operator + is 10% faster than StringBuilder so always use StringBuilder.” Statements like this are a disservice and often misleading. For example, if I had a good guess at what the size of the string would be, I could have pre-allocated my StringBuilder like so:
for (int i = 0; i < num; i++) {
// pre-declare StringBuilder to have 100 char buffer.
var builder = new StringBuilder(100);
builder.Append("This is test #");
builder.Append(i);
builder.Append(" with a result of ");
builder.Append(strings[i]);
results[i] = builder.ToString();
}
Now let’s look at the times:
Operator + -Time : 551,
Memory : 56104412 / 55595960 - 500000 string.Format() - Time : 753,
Memory : 57296484 / 55595960 - 500000 StringBuilder - Time : 525,
Memory : 59779156 / 55595960 - 500000
Whoa! All of the sudden StringBuilder is back on top again for this example code! But notice, it takes more memory now. This makes perfect sense if you examine the IL behind the scenes. Whenever you do a string.Concat() – or operator + of course – in your code, it examines the lengths of the arguments and creates a StringBuilder behind the scenes of the appropriate size for you.
But even IF we know the approximate size of our StringBuilder, look how much less readable it is! That’s why I feel you should always take into account both readability and performance. After all, consider all these timings are over 500,000 iterations. That’s at best 0.0004 ms difference per call which is negligible at best.
The key is to pick the best tool for the job you are trying to do. What do I mean? Consider these words of wisdom:
- Concatenate (+) is great at concatenating several strings in one single statement.
- StringBuilder is great when you need to building a string across multiple statements or a loop.
- Format is great at performing formatting of strings in ways that concatenation cannot.
Just remember, there is no magic bullet. If one of these always beat the others we’d only have one and not three choices available to us, but each has their purpose and each has times when they outshine the others, so do not take this as “string concat is always faster” because that is not true, nor take this as “a sized StringBuilder is always faster” because again that is not always true! The salient point, which I can’t stress enough, is that each performs a certain job well and the key is to know which tools does which job best.
So, in general, the string.Concat() is clean and often optimal for joining together a known set of strings in a single statement.
StringBuilder, on the other hand, excels when you need to build a string across multiple statements or in a loop. Use it in those times when you are looping till you hit a stop condition and building a result and it won’t steer you wrong.
Finally, String.Format() seems to be the looser from the stats, but consider which of these is more readable:
// build a date via concatenation
var date1 = (month < 10 ? string.Empty : "0") + month + '/' +
(day < 10 ? string.Empty : "0") + '/' + year;
// build a date via string builder
var builder = new StringBuilder(10);
if (month < 10) builder.Append('0');
builder.Append(month);
builder.Append('/');
if (day < 10) builder.Append('0');
builder.Append(day);
builder.Append('/');
builder.Append(year);
var date2 = builder.ToString();
// build a date via string.Format
var date3 = string.Format("{0:00}/{1:00}/{2:0000}", month, day, year);
So the strength in string.Format() is that it makes constructing a formatted string easy to read. Yes, it’s slower, but look at how much more elegant it is to do zero-padding and anything else string.Format() does.
So my lesson is, don’t look for the silver bullet! Choose the best tool. Micro-optimization can often bite you in the end if you sacrifice more readable code for the sake of a performance gain that may or may not exist. This is not to say feel free to write ill-performing code, you should still understand the complexity of the code you are writing and of course prefer linear algorithms to quadratic ones and so on, but make sure before you optimize code that you understand what gains (if any) you are going to get.
I love the rules of optimization. They’ve been stated before in many forms, but here’s how I always remember them:
- For Beginners: Do not optimize.
- For Experts: Do not optimize yet.
Many of the time on today’s modern hardware, a micro-second optimization at the sake of readability will net you little because it won’t be your biggest bottleneck. Code for readability, choose the best tool for the job which will usually be the most readable and maintainable as well. Then, if you need that extra performance boost after profiling your code and finding the true bottleneck you can optimize away.
Technorati Tags: C#,CSharp,.NET,Fundamentals,string,Format,Concat,StringBuilder