see a lot of discussions on Stack Overflow about how to process CSV (comma-separated values) or other delimited files.
The answerers usually suggest solutions involving regular expressions, but there’s a simple solution built into the .NET framework: Microsoft.VisualBasic.FileIO.TextFieldParser.
I don’t know why it’s hidden in the Microsoft.VisualBasic namespace hierarchy – there’s nothing VB-specific about it.
It’s easy to use:
1: using (var myCsvFile = new TextFieldParser(myFilePath)
2: {
3 : myCsvFile.TextFieldType = FieldType.Delimited;
4 : myCsvFile.SetDelimiters(",");
5 : myCsvFile.CommentTokens = new[] { "HEADER", "COMMENT: ", "TRAILER" };
6 : 7 : while (!myCsvFile.EndOfData) 8 : {
9 : string[] fieldArray;
10 : try
11 : {
12 : fieldArray = myCsvFile.ReadFields();
13:
}
14 : catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex) 15 : {
16 : // not a valid delimited line - log, terminate, or ignore
17 : continue;
18:
}
19 : // process values in fieldArray
20:
}
21: }
Line 1:
- There are several constructor overloads, so you can point it to a file path or a stream or TextReader.
- TextFieldParser implements IDisposable, so you can use it in a “using” block to automatically close the file or stream when finished.
Line 4: The SetDelimiters method is used to specify that our file is comma-delimited.
Line 5: The CommentTokens property is used to tell TextFieldParser to ignore header, trailer and comment lines (lines where the text begins with one of the specified strings).
Line 12: The ReadFields method loads a string array with the “columns” from the file and advances to the next line. Other useful line-handling methods are:
- ReadLine, which gets the entire line into a string.
- PeekChars, which allows you to do a “non-destructive” (it doesn’t “advance” the file and affect the positioning of the next ReadLine or ReadFields method) read of a specified number of characters from the next line in the file.
Line 14: If a line isn’t in the expected delimited format, a MalformedLineException is thrown.
TextFieldParser also handles “fixed-column” (bytes 1-10 = customer number, 11-30 = name, etc.) files (see the FieldWidths property).
If you need to process “flat files”, TextFieldParser can make the job a lot easier. C# developers, don’t dismiss it just because it’s hanging out in the VB namespace.