Geeks With Blogs
John Conwell: aka Turbo Research in the visual exploration of data

One thing you read in most articles or books that talk about code optimization in .Net is string concatenation.  Many books will say always use a StringBuilder to build your strings.  Some even go so far as to say use the String.Conct() method for 2-4 strings.  But you should never use the + style concatenation.  Well I say BULL! 

The C# compiler folks at Microsoft did a great thing with 1.1 (I don't think its there in 1.0).  if you write some code like:


private
string StringConcatTest(string stuff)
{
        string thing = stuff + stuff + stuff + stuff;
        return thing;
}

What you get on the IL side is this:

.method private hidebysig instance string
        StringConcatTest(string stuff) cil managed
{
  // Code size       16 (0x10)
  .maxstack  4
  .locals init ([0] string thing,
           [1] string CS$00000003$00000000)
  IL_0000:  ldarg.1
  IL_0001:  ldarg.1
  IL_0002:  ldarg.1
  IL_0003:  ldarg.1
  IL_0004:  call       string [mscorlib]System.String::Concat(string,
                                                              string,
                                                              string,
                                                              string)
  IL_0009:  stloc.0
  IL_000a:  ldloc.0
  IL_000b:  stloc.1
  IL_000c:  br.s       IL_000e
  IL_000e:  ldloc.1
  IL_000f:  ret
}

Notice that it loads argument 1 (the input string) on to the stack 4 times, then calls String.Concat().  If you look at String.Concat using Reflector you’ll see that if you pass 2 or more strings into the function, the function first adds up the total number of characters for all the strings passed in. It then calls an extern FastAllocateString function, passing in the size of the new string it is going to build, which allocates a block of memory big enough to hold the entire return string. Then for each string passed in, another extern function FillStringChecked is called. This function takes a pointer to the block of memory that was allocated in FastAllocateString, the starting spot in memory for the string being added, the ending point in memory, and the string to add. It does this for each passed in string to build the newly concatenated string, which it then returns. This is a very fast way to concatenate 2 – 4 strings together (from my tests with the .NET Performance Test Harness, the String.Concat function outperforms the StringBuilder by 2.3 times.

Now if you write code to concat 5 or more strings using the + style concatenation, what the C# compiler will do is build a string array based on all the strings your trying to concatenate.  Then it loads that string array onto the stack and calls String.Concat(string[]).  This overload of Concat() behaves similiarly to the one I just described.  First it spins through the array, counting up how many characters it needs.  Then it calls FastAllocateString to create a block of memory of that size.  It then spins through the array one more time calling FillStringChecked on each string to build the returned string. 

I was curious if there was a point when the C# compiler would stop building the string array, but with 32 strings concatenated with + the IL produced still build the string array.

This method of building strings is also faster then using the string builder.  In my tests with concatenating 16 strings, I found it 2.8 times faster than a StringBuilder. (I did not pre-initialize the string builder with the known number of chars, so that might improve the StringBuilder performance).

The String.Concat also allows you to pass in object types as well as strings.  Now this is where it gets a bit inconsistent, and I'm not sure what the folks at MS were thinking when they wrote this.  If you pass in 2-4 objects (anything but strings) the String.Concat will call ToString() on each object and use the standard + style concatenation to build the returned string.  Not so good.  BUT if you pass in an array of objects, it handles it exactly the same as if you passed in a array of strings.  It builds a string array based on the ToString() of each object, then uses the two extern functions to create a fast string.  I'm not sure why they did it that way because it seems fairly inconsistent.

So when should you use StringBuilder?  Basically the only time I ever use a StringBuilder is when you have to build a string over several lines of code, or within a loop.  In these two scenarios, you would end up creating temporary strings in memory of you used either + style concatenation or String.Concat.

 

Posted on Friday, May 27, 2005 8:54 AM Code Performance | Back to top


Comments on this post: String concatenation the fast way...maybe not what you think

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
John,
I'm surprised that nobody even commented on this since you posted it. Good detective work. And it proves, once again, that one should not accept as "Gospel" what the SPGT's (Self Proclaimed Guru Types) say, usually since they are only parroting each other anyway.
Left by Peter Bromberg on Jan 13, 2006 4:02 PM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
John,

I found the article quite informative and interesting - the fact that it's backed up with bytecode makes it very difficult to dispute! Unfortunately, Microsoft don't seem to mention anything about string initialisation - their demo only seems to cover building strings inside more complicated algorithms. As you rightly pointed out, it's probably best to use a StringBuilder class in these cases. But it would seem that in the case of initialisation, e.g.
string sqlQuery = "SELECT * FROM stuff WHERE " +
"stuff_id=" + id;

it will get optimized as you mentioned, which is well worth remembering.

Microsoft's article:

http://support.microsoft.com/kb/306822/
Left by Nick Lee on Mar 14, 2006 10:21 PM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
Great detective work. However, my experience with optimization is that the true performance improvements are achieved when the application is using optimal im-memory data structures and does not make unnecessary database calls.

Saving a few milliseconds here and there (per request) with a better way to concatenate strings is helpful but does not change the overall picture for the client if hundreds of milliseconds (or even seconds) are being used for (unnecessary) data marshalling, inefficient memory management (that's why we have the garbage collector!) database calls, trips through the tiers (for the sake of n-tier aesthetics), jumps through metadata hoops (for the sake of metadata aesthetics).

On the other hand, each saved millisecond helps. Great research work.
Left by Sam on Apr 12, 2006 11:04 AM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
This is an excellent article. The only thing missing is your test code. If you could provide us with the source for the tests that you ran to come up with your performance results, then we would be armed with cold hard proof to back up your arguments.
Left by Rob on Mar 23, 2007 9:19 AM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
What happens when you have string concatenation of multiple string literals? For example:

string CRLF = "\r\n";
SqlCommand myCommand = new SqlCommand();

myCommand.CommandText =
"SELECT N.Namespace, N.Description, N.id, N.ChangeUser_ID, N.ChangeTimestamp, ISNULL(NP.Value, 'Namespace') AS Type" + CRLF +
"FROM Namespaces N" + CRLF +
"INNER JOIN [References] R" + CRLF +
"ON R.SourceBaseType = 'Namespace'" + CRLF +
"AND R.SourceID = N.id" + CRLF +
"AND R.DestinationID = @DestinationObjectID" + CRLF +
"AND R.DestinationBaseType = @DestinationObjectType" + CRLF +
"LEFT OUTER JOIN [Namespace Properties] NP" + CRLF +
"ON N.id = NP.Namespace_ID AND NP.Property = 'type'";

The code above concats a number of string literals to produce a SQL query. I find this more readable at design time, and runtime.

Will the C# compiler optimize this to produce a single string? What if I change the code above as follows:

string CRLF = Environment.NewLine;

Can anyone help?
Left by Saajid on Jan 18, 2008 2:04 AM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
When we did some tests, the c# compiler did turn multiple string literals (and consts) into a single string literal - avoiding any concatenation at runtime.
Left by Rob on Oct 07, 2008 2:20 AM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
Now that IS worth knowing. If only we could be sure MS will always retain this optimisation in future versions...
Left by Dugeen on Mar 10, 2010 5:34 AM

# re: String concatenation the fast way...maybe not what you think
Requesting Gravatar...
Wow boiling it down to IL really lets you see how the framework does it's magic. Great work here!
Left by Doyle on Mar 18, 2011 5:58 PM

Your comment:
 (will show your gravatar)


Copyright © John Conwell | Powered by: GeeksWithBlogs.net