Technorati Tags:

 

I see a lot of discussions on Stack Overflow about how to process CSV (comma-separated values) or other delimited files.

The answerers usually suggest solutions involving regular expressions, but there's a simple solution built into the .NET framework: Microsoft.VisualBasic.FileIO.TextFieldParser.

I don't know why it's hidden in the Microsoft.VisualBasic namespace hierarchy - there's nothing VB-specific about it.

It's easy to use:

   1:  using (var myCsvFile = new TextFieldParser(myFilePath)
   2:  {
   3:      myCsvFile.TextFieldType = FieldType.Delimited;
   4:      myCsvFile.SetDelimiters(",");
   5:      myCsvFile.CommentTokens = new[] { "HEADER", "COMMENT: ", "TRAILER" };
   6:   
   7:      while (!myCsvFile.EndOfData)
   8:      {
   9:          string[] fieldArray;
  10:          try
  11:          {
  12:              fieldArray = myCsvFile.ReadFields();
  13:          }
  14:          catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
  15:          {
  16:              // not a valid delimited line - log, terminate, or ignore
  17:              continue;
  18:          }
  19:          // process values in fieldArray
  20:      }
  21:  }

Line 1:

  • There are several constructor overloads, so you can point it to a file path or a stream or TextReader.
  • TextFieldParser implements IDisposable, so you can use it in a "using" block to automatically close the file or stream when finished.

Line 4: The SetDelimiters method is used to specify that our file is comma-delimited.

Line 5: The CommentTokens property is used to tell TextFieldParser to ignore header, trailer and comment lines (lines where the text begins with one of the specified strings).

Line 12: The ReadFields method loads a string array with the "columns" from the file and advances to the next line. Other useful line-handling methods are:

  • ReadLine, which gets the entire line into a string.
  • PeekChars, which allows you to do a "non-destructive" (it doesn't "advance" the file and affect the positioning of the next ReadLine or ReadFields method) read of a specified number of characters from the next line in the file.

Line 14: If a line isn't in the expected delimited format, a MalformedLineException is thrown.

TextFieldParser also handles "fixed-column" (bytes 1-10 = customer number, 11-30 = name, etc.) files (see the FieldWidths property).

If you need to process "flat files", TextFieldParser can make the job a lot easier. C# developers, don't dismiss it just because it's hanging out in the VB namespace.