posts - 58, comments - 59, trackbacks - 86

My Links

News

Archives

My Links

RegEx for CSV

After much searching, this is the best RegEx I can find for splitting a line of text from a CSV file:
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

I found it here: http://thedotnet.com/howto/work213583.aspx

Here is the magical working code:

      protected virtual string[] SplitCSV(string line)
      {         System.Text.RegularExpressions.RegexOptions options = ((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace | System.Text.RegularExpressions.RegexOptions.Multiline) 
            | System.Text.RegularExpressions.RegexOptions.IgnoreCase);
         Regex reg = new Regex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)", options);
         MatchCollection coll = reg.Matches(line);
         string[] items = new string[coll.Count];
         int i = 0;
         foreach(Match m in coll)
         {
            items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
         }
         return items;
      }

Print | posted on Saturday, September 04, 2004 8:15 AM |

Feedback

Gravatar

# re: RegEx for CSV

thou are god
thank you
6/24/2005 4:18 PM | bob the coder
Gravatar

# re: RegEx for CSV

I can't get this regex to work if the CSV looks like this...

Title,Price,Description
HelloWorld,"10,00",Desc

I get an array lenght of 4 for the second line.
11/4/2005 1:47 AM | Steven
Gravatar

# re: RegEx for CSV

I changed it a little to be more 'up-to-date' for Java and to work in any case.

static String[] splitCSV( String line ) {
// java.util.ArrayList<String> elements = new java.util.ArrayList<String>(); // JAVA >=1.5
java.util.ArrayList elements = new java.util.ArrayList(); // JAVA <=1.4
java.util.regex.Matcher m = java.util.regex.Pattern.compile( "(?:^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)" ).matcher( line );
while( m.find() ) {
elements.add( m.group()
.replaceAll( "^,", "" ) // remove first comma if any
.replaceAll( "^?\"(.*)\"$", "$1" ) // remove outer quotations if any
.replaceAll( "\"\"", "\"" ) ); // replace double inner quotations if any
}
return (String[])elements.toArray( new String[0] );
}
3/12/2007 3:35 AM | Mihi
Gravatar

# re: RegEx for CSV

Doesn't work if you have a comma in your value.
7/19/2008 3:48 PM | cheezus
Gravatar

# re: RegEx for CSV

Hi
I've find the some problem with the Regex used in my following code:


System.Text.RegularExpressions.RegexOptions options =
((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace |
System.Text.RegularExpressions.RegexOptions.Multiline) |
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Regex reg =newRegex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)",options);
MatchCollection coll = reg.Matches(line);
string[]items = new string[coll.Count];
int i = 0;
foreach(Match m in coll)
{
items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
}

This code is for splitting CSV. Exactly, I'm getting stuck at the Regex method Matches which is taking so much time around 10 Mins when I've passed the following CSV:

string line = "C,\"123 PSST, MASSACHUSETTS,5245352,3343432";



Any Help will be appreciated.

Thank you.
3/4/2009 1:29 AM | SandeepT
Gravatar

# re: RegEx for CSV

For a personal project, I'm using this one which I thought up:

("(?:[^\\"]|\\.)*")\s*($|,)

It's not exactly a pure CSV file regexp, but it works for the purpose I designed it for. It matches everything between a pair of double quotes in a CSV file, including escaped double quotes. You can use commas in the values, and empty values are ignored. If it's fed incorrect data as such as [ "value 1","value 2" error here "value 3", "value 4" ], it will ignore the second value and return the last usable quoted string before the comma.

The quoted content is returned in the first (and only) matching group, stripped of remaining leading and trailing whitespace (and newlines).

There might be a few errors still lurking in this regexp though, but it works a 100% for me.
4/8/2009 10:13 AM | R. Hanouwer
Gravatar

# re: RegEx for CSV

With regards to my previous reply;

Add a "?:" in the last matching group to correct the second matching group appearing in the resultset. I copied the wrong regexp from my text editor. Sorry for that--my bad.

This is the (full) correct regexp:

("(?:[^\\"]|\\.)*")\s*(?:$|,)
4/8/2009 10:17 AM | R. Hanouwer
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 

Powered by: