Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.
This post continues a series of Little Wonders in the BCL String class. Yes, we all work with strings in .NET daily, so perhaps you already know most of these. However, there are a lot of little fun things that the String class can do that often get overlooked.
Today we are going to look at a pair of String method families to Split() and Join() string data.
Background
Many times when dealing with string data – especially data coming to or from files – we may need to divide a string based on a set of delimiters (like commas in CSV files) or join a sequence of strings together using a delimiter.
Many people know about the Split() method for busting apart strings, but fewer tend to know of its counterpart, the Join() method for putting a sequence of strings together. So let’s look at both.
Split() – split string into parts based on delimiters
Many people know about the Split() method, but it actually has some interesting options we’ll discuss for a bit before heading to Join().
The Split() method is useful for taking a string that is logically divided by delimiters and busting it into an array of the strings between the delimiters. The resulting array will have one entry for every string separated by the delimiters, and all the delimiters will be removed.
The delimiters to split can logically be one of several forms:
Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders post can be found here.
This post continues a series of Little Wonders in the BCL String class. Yes, we all work with strings in .NET daily, so perhaps you already know most of these. However, there are a lot of little fun things that the String class can do that often get overlooked.
Today we are going to look at a pair of String method families to Split() and Join() string data.
Background
Many times when dealing with string data – especially data coming to or from files – we may need to divide a string based on a set of delimiters (like commas in CSV files) or join a sequence of strings together using a delimiter.
Many people know about the Split() method for busting apart strings, but fewer tend to know of its counterpart, the Join() method for putting a sequence of strings together. So let’s look at both.
Split() – split string into parts based on delimiters
Many people know about the Split() method, but it actually has some interesting options we’ll discuss for a bit before heading to Join().
The Split() method is useful for taking a string that is logically divided by delimiters and busting it into an array of the strings between the delimiters. The resulting array will have one entry for every string separated by the delimiters, and all the delimiters will be removed.
The delimiters to split can logically be one of several forms:
An array of 1 character:
For example ‘,’ or new [] { ‘,’ } as in: “A,B,C,D,E,F”
- An array of many characters:
- For example, ‘,’, ‘-‘, ‘*’ or new [] { ‘,’ ‘-‘ ‘*’ } as in: “A,B-C,D*E,F”
- An array of one string:
- For example, new [] { “=>” } as in: “A=>B=>C=>D=>E=>F”
- An array of many strings:
- For example, new [] { “=>”, “<=” } as in: “A=>B<=C=>D<=E=>F”
There’s a couple of quick things to notice here. First of all you typically pass an array to Split(), though on the form of Split() that just takes a params char[], you can pass the char(s) directly. Secondly, calling Split() on { ‘=’, ‘>’ } is not the same as calling Split() on { “=>” }. The former will consider either ‘=’ or ‘>’ alone as delimiters, whereas the latter only considers the combination of “=>” together as a delimiter.
So let’s look at these in action. When splitting on a single char:
1: string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
2:
3: // can pass the array explicitly...
4: string[] results = testString.Split(new[] { ',' });
5:
6: // or pass it implicitly since its a params array as long as you are not
7: // using the overloads that take string[], int, or StringSplitOptions
8: results = testString.Split(',');
We’d get results split only on the ‘,’ character:
1: Element 0: "James Hare"
2: Element 1: "1001 Broadway Ave"
3: Element 2: "St. Louis"
4: Element 3: "MO"
5: Element 4: "63101"
Whereas, if we split on both ‘,’ and ‘ ‘ (space):
1: string testString = "James Hare,1001 Broadway Ave,St. Louis,MO,63101";
2:
3: // once again, you can pass the array of char explicitly
4: string[] results = testString.Split(new[] { ',', '-' });
5:
6: // or implicitly since its a params array, as long as you are not using
7: // the overloads that take string[], int, or StringSplitOPtions
8: results = testString.Split(',', '-');
We’d get the results split on both ‘,’ and ‘ ‘ (space):
1: Element 0: "James"
2: Element 1: "Hare"
3: Element 2: "1001"
4: Element 3: "Broadway"
5: Element 4: "Ave"
6: Element 5: "St."
7: Element 6: "Louis"
8: Element 7: "MO"
9: Element 8: "63101"
And as you can imagine, the Split() with string delimiters works much the same, except that the string has to appear all together:
1: string testString = "James Hare,,1001 Broadway Ave,St. Louis,MO,63101";
2:
3: // all the string[] methods do not have a params parameter, so the array
4: // of string delimiters must be explicit, even if only one.
5: string[] results = testString.Split(new[] { ",," }, StringSplitOptions.None);
Which yields:
1: Element 0: "James Hare"
2: Element 1: "1001 Broadway Ave,St. Louis,MO,63101"
Notice that on the string delimiter version of Split(), you must provide a StringSplitOptions value. This is optional on the char delimiter versions as well and gives you flexibility on how you handle back-to-back delimiters (that is, two delimiters with empty string in the middle).
This enum has two defined values:
- StringSplitOptions.None: default, if an empty string would result between two delimiters, it is returned as an empty string in the result.
- StringSplitOptions.RemoveEmptyEntries: if an empty string would result between two delimiters, it is omitted from the result.
So basically, what this does is let you decide if back-to-back delimiters should return an empty entry (string.Empty) or just “skip” the empties from the results:
1: string testString = "James Hare,,1001 Broadway Ave,,,St. Louis,MO,63101";
2:
3: // Perform the split and keep empty entries.
4: // These overloads do not support params array (since must be last arg).
5: string[] resultsWithEmpties = testString.Split(new[] { ',' },
6: StringSplitOptions.None);
7:
8: // perform the split and discard empty entries
9: string[] resultsWithoutEmpties = testString.Split(new[] { ',' },
10: StringSplitOptions.RemoveEmptyEntries);
So notice, the first call with StringSplitOptions.None treats back-to-back delimiters as an empty entry, whereas StringSplitOptions.RemoveEmptyEntries does not have those empty entries in the results:
1: With StringSplitOptions.None:
2: Element 0: "James Hare"
3: Element 1: ""
4: Element 2: "1001 Broadway Ave"
5: Element 3: ""
6: Element 4: ""
7: Element 5: "St. Louis"
8: Element 6: "MO"
9: Element 7: "63101"
10:
11: With StringSplitOptions.RemoveEmptyEntries:
12: Element 0: "James Hare"
13: Element 1: "1001 Broadway Ave"
14: Element 2: "St. Louis"
15: Element 3: "MO"
16: Element 4: "63101"
Finally, there are forms of Split() for both char and string that let you limit the number of results returned to a maximum count of entries. In these forms, the first n-1 entries in the result are the first n-1 delimited strings, and the nth entry is the remainder of the string:
1: string testString = "James Hare,,1001 Broadway Ave,,,St. Louis,MO,63101";
2:
3: // returns only at most 2 items. The first is the first delimited string, and
4: // the rest is the second. Again, no params array support since not last arg
5: string[] results = testString.Split(new[] { ',' }, 2, StringSplitOptions.None);
6:
7: for (int i = 0; i < results.Length; i++)
8: {
9: Console.WriteLine("\tElement {0}: \"{1}\"", i, results[i]);
10: }
Using the max count of 2 above, the first delimited entry is James Hare, and the second is the remainder of the string, as below:
1: Element 0: "James Hare"
2: Element 1: ",1001 Broadway Ave,,,St. Louis,MO,63101"
Notice that in this case the second element has only one delimiter removed. Since there were back to back delimiters, the first was removed since it was after the first entry, but since the second entry was the remainder, it has the remainder. This just happens to be an empty entry followed by the second delimiter to the end.
So as you can see, while you may have known about Split() in the basic sense, it has a lot of other options to explore for altering the way your split is performed.
Join() – joining multiple items together into one string
So the antithesis of the Split() is the Join(), which is a static method of the String class that joins a sequence of items together into one string, adding a delimiter between each part.
Now, I’ve actually met many people who knew about Split() but didn’t know there was a string.Join() static method that did the opposite, and so they end up writing code that looks like this:
1: string[] parts = { "Apple", "Orange", "Banana", "Pear", "Peach" };
2:
3: var builder = new StringBuilder();
4: for (int i = 0; i < parts.Length; i++)
5: {
6: builder.Append(parts[i]);
7:
8: // don't want a comma after last item...
9: if (i != parts.Length - 1)
10: {
11: builder.Append(", ");
12: }
13: }
14:
15: // result is "Apple, Orange, Banana, Pear, Peach"
16: var result = builder.ToString();
But there’s no need to do this, string.Join() already does this for you:
1: string[] parts = { "Apple", "Orange", "Banana", "Pear", "Peach" };
2:
3: // result is "Apple, Orange, Banana, Pear, Peach"
4: var result = string.Join(", ", parts);
The logic is all written for you! Your delimiter can be any string you want. If you want them all concatenated with no delimiter, you can pass String.Empty or null as the delimiter.
1: // result is "AppleOrangeBananaPearPeach"
2: var result = string.Join("", parts);
Also, many people don’t realize that Join() can create a joined string of sequences of any type. That is, you can String.Join() sequences of int, DateTime, double, or any other custom type!
So what happens if you do that? If you call string.Join() on any non-string sequence, it will invoke the ToString() method on each item in the sequence. This, of course, only works well if the type of item in the sequence has a meaningful ToString() defined – though most of the BCL types do, and you can provide a meaningful one for any custom types you create.
For example:
1: // On a sequence of ints ==> "1,2,3,4,5,6,7,8,9,10"
2: var numsFromOneToTen = string.Join(",", Enumerable.Range(1, 10));
3:
4: // On a sequence of objects ==> "1-3.1415927-9/8/2011 4:20:32 PM" -- (at the time I am writing this)
5: var variousObjects = string.Join("-", new object[] { 1, 3.1415927, DateTime.Now });
Finally, the Join() method obviously supports IEnumerable<T> and object[], string[] explicitly, but the forms that take object[], and string[] are variable argument lists (params) which means that you don’t have to pass an explicit array but can just list all the things to join in the call, if possible. Just keep in mind the first argument is the delimiter, and all the other arguments are the implicit array of things to join:
1: // You can pass in the string[] as a variable argument list. Just remember
2: // that the first argument is the delimiter, and the others are the list.
3: // This yields ==> "A,B,C,D,E"
4: var numsFromOneToTen = string.Join(",", "A", "B", "C", "D", "E");
5:
6: // This is also true on object[] as a variable argument list. So if
7: // your types are mixed, it will use the params object[] form which
8: // calls ToString() on each one. Same as if the array were passed explicitly.
9: // This yields ==> "1-3.1415927-9/8/2011 4:20:32 PM" -- (at the time I am writing this)
10: var variousObjects = string.Join("-", 1, 3.1415927, DateTime.Now);
So as you can see, string.Join() can be very useful in summarizing a sequence of any type, provided the ToString() representation is what you want.
Summary
So, in conclusion, if you need to break apart a string into delimited entries, or join together a sequence of items into a delimited string, check out Split() and Join(). Beyond their basic uses, they both have features that make them quite useful.
Print | posted on Thursday, September 8, 2011 7:21 PM | Filed Under [ My BlogC#Software.NETLittle Wonders ]