RegEx for CSV

After much searching, this is the best RegEx I can find for splitting a line of text from a CSV file:
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

I found it here: http://thedotnet.com/howto/work213583.aspx

Here is the magical working code:

      protected virtual string[] SplitCSV(string line)
      {         System.Text.RegularExpressions.RegexOptions options = ((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace | System.Text.RegularExpressions.RegexOptions.Multiline) 
            | System.Text.RegularExpressions.RegexOptions.IgnoreCase);
         Regex reg = new Regex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)", options);
         MatchCollection coll = reg.Matches(line);
         string[] items = new string[coll.Count];
         int i = 0;
         foreach(Match m in coll)
         {
            items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
         }
         return items;
      }
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati
Print | posted on Saturday, September 04, 2004 8:15 AM

Feedback

# re: RegEx for CSV

left by bob the coder at 6/24/2005 4:18 PM Gravatar
thou are god
thank you

# re: RegEx for CSV

left by Steven at 11/4/2005 1:47 AM Gravatar
I can't get this regex to work if the CSV looks like this...

Title,Price,Description
HelloWorld,"10,00",Desc

I get an array lenght of 4 for the second line.

# re: RegEx for CSV

left by Mihi at 3/12/2007 3:35 AM Gravatar
I changed it a little to be more 'up-to-date' for Java and to work in any case.

static String[] splitCSV( String line ) {
// java.util.ArrayList<String> elements = new java.util.ArrayList<String>(); // JAVA >=1.5
java.util.ArrayList elements = new java.util.ArrayList(); // JAVA <=1.4
java.util.regex.Matcher m = java.util.regex.Pattern.compile( "(?:^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)" ).matcher( line );
while( m.find() ) {
elements.add( m.group()
.replaceAll( "^,", "" ) // remove first comma if any
.replaceAll( "^?\"(.*)\"$", "$1" ) // remove outer quotations if any
.replaceAll( "\"\"", "\"" ) ); // replace double inner quotations if any
}
return (String[])elements.toArray( new String[0] );
}

# re: RegEx for CSV

left by cheezus at 7/19/2008 3:48 PM Gravatar
Doesn't work if you have a comma in your value.

# re: RegEx for CSV

left by SandeepT at 3/4/2009 1:29 AM Gravatar
Hi
I've find the some problem with the Regex used in my following code:


System.Text.RegularExpressions.RegexOptions options =
((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace |
System.Text.RegularExpressions.RegexOptions.Multiline) |
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Regex reg =newRegex("(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)",options);
MatchCollection coll = reg.Matches(line);
string[]items = new string[coll.Count];
int i = 0;
foreach(Match m in coll)
{
items[i++] = m.Groups[0].Value.Trim('"').Trim(',').Trim('"').Trim();
}

This code is for splitting CSV. Exactly, I'm getting stuck at the Regex method Matches which is taking so much time around 10 Mins when I've passed the following CSV:

string line = "C,\"123 PSST, MASSACHUSETTS,5245352,3343432";



Any Help will be appreciated.

Thank you.

# re: RegEx for CSV

left by R. Hanouwer at 4/8/2009 10:13 AM Gravatar
For a personal project, I'm using this one which I thought up:

("(?:[^\\"]|\\.)*")\s*($|,)

It's not exactly a pure CSV file regexp, but it works for the purpose I designed it for. It matches everything between a pair of double quotes in a CSV file, including escaped double quotes. You can use commas in the values, and empty values are ignored. If it's fed incorrect data as such as [ "value 1","value 2" error here "value 3", "value 4" ], it will ignore the second value and return the last usable quoted string before the comma.

The quoted content is returned in the first (and only) matching group, stripped of remaining leading and trailing whitespace (and newlines).

There might be a few errors still lurking in this regexp though, but it works a 100% for me.

# re: RegEx for CSV

left by R. Hanouwer at 4/8/2009 10:17 AM Gravatar
With regards to my previous reply;

Add a "?:" in the last matching group to correct the second matching group appearing in the resultset. I copied the wrong regexp from my text editor. Sorry for that--my bad.

This is the (full) correct regexp:

("(?:[^\\"]|\\.)*")\s*(?:$|,)

# re: RegEx for CSV

left by ferdi tayfur dinle at 11/12/2009 7:33 AM Gravatar
thou are go0d
thank you

# re: RegEx for CSV

left by selda bağcan dinle at 2/17/2010 5:32 AM Gravatar
Thanks you

# re: RegEx for CSV

left by yozgat chat at 2/18/2010 6:00 AM Gravatar
Thank youS

# re: RegEx for CSV

left by fugu at 8/26/2010 2:04 AM Gravatar
Thanks for this and thanks @Mihi for the Java version!

# re: RegEx for CSV

left by Sebastián Rojas at 10/6/2011 6:00 PM Gravatar
Thank u!!!!

# re: RegEx for CSV

left by Chris Marisic at 11/28/2011 9:51 AM Gravatar
There's something broken in that regex. If you goto http://regexpal.com/ and test it with the string

,test,test2,test3,test4,test5

You'll see it skips matching the first test.
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification: