I noticed that the question on how to skip or bypass a trailer record or a badly formatted/empty row in a SSIS package keeps coming back on the MSDN SSIS Forum.
I tried to figure out the reason why and after an extensive search inside the forum and outside it on the entire Web (using several search engines) I indeed found that it seems even thought there is a number of posts and articles on the topic none of them are employing the simplest and the most efficient technique. When I say efficient I mean the shortest time to solution for the fellow developers.
OK, enough talk. Let’s face the problem:
Typically a flat file (e.g. a comma delimited/CSV) needs to be processed (loaded into a database in most cases really). Oftentimes, such an input file is produced by some sort of an out of control, 3-rd party solution and would come in with some garbage characters and/or even malformed/miss-formatted rows.
One such example could be this imaginary file:
As you can see several rows have no data and there is an occasional garbage character (1, in this example on row #7).
Our task is to produce a clean file that will only capture the meaningful data rows. As an aside, our output/target may be a database table, but for the purpose of this exercise we will simply re-format the source.
Let’s outline our course of action to start off:
- Will use SSIS 2005 to create a DFT;
- The DFT will use a Flat File Source to our input [bad] flat file;
- We will use a Conditional Split to process the bad input file; and finally
- Dump the resulting data to a new [clean] file.
Well, only four steps, let’s see if it is too much of work.
1: Start the BIDS and add a DFT to the Control Flow designer (I named it Process Dirty File DFT):
2, and 3:
I had added the data viewer to just see what I am getting, alas, surprisingly the data issues were not seen it:
What really is the key in the approach it is to properly set the Conditional Split Transformation. Visually it is:
and specifically its SSIS Expression
LEN([After CS Column 0]) > 1
The point is to employ the right Boolean expression (yes, the Conditional Split accepts only Boolean conditions).
For the sake of this post I re-named the Output Name “No Empty Rows”, but by default it will be named Case 1 (remember to drag your first column into the expression area)!
You can close your Conditional Split now. The next part will be crucial – consuming the output of our Conditional Split.
Last step – #4: Add a Flat File Destination or any other one you need. Click on the Conditional Split and choose the green arrow to drop onto the target. When you do so make sure you choose the No Empty Rows output and NOT the Conditional Split Default Output. Make the necessary mappings.
At this point your package must look like:
As the last step will run our package to examine the produced output file. F5:
and… it looks great!
Tuesday, February 15, 2011 4:57 PM