Geeks With Blogs
Rahul Anand's Blog If my mind can conceive it, and my heart can believe it, I know I can achieve it.

Coming from C# background I assumed certain behavior while using split() method on strings that no matter what the returned array will have delimiter count + 1 elements. And my assumption cost me few hours of debugging to fix a simple issue!!

Java behaves differently with split() and provides many possible options configurable through second parameter.

The default implementation of split() with single parameter assumes second parameter value as ‘0’, which means trailing empty elements are ignored.

[CODE]

        String line = "A|B|C||||";
        String [] fields = line.split("\\|");
        System.out.println(fields.length);

[/CODE]

The output is 3.

Which is same with following code:

[CODE]

String line = "A|B|C||||";
String [] fields = line.split("\\|", 0);
System.out.println(fields.length);

[/CODE]

In case if you are wondering why “\\|” instead of “|”, it is required to escape ‘|’ character as it has a special meaning in regex and the first parameter to split() is expected to be a regex.

The documentation of split() says it all:

 

          public String[] split(String regex, int limit)

Splits this string around matches of the given regular expression.

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

In my case I passed “-1” (non-positive) to return all elements even if they are trailing empty strings.

[CODE]

String line = "A|B|C||||";
String [] fields = line.split("\\|", -1);
System.out.println(fields.length);

[/CODE]

The output is 7. But soon I realized this is done for a purpose and for better performance it is recommended to have low memory foot print, and the default behavior helps in reducing memory usage when dealing with huge volume of data. Realizing this, I modified my code to access the returned array with a length check instead of assuming the constant size of the returned array. And it helped me optimize performance up to 1.5 x.

 

Reference:

http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String, int)

Posted on Tuesday, April 2, 2013 10:39 AM Java | Back to top


Comments on this post: Java behaves differently with string split compared to C#

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Rahul Anand | Powered by: GeeksWithBlogs.net