Blog Stats
  • Posts - 100
  • Articles - 1
  • Comments - 166
  • Trackbacks - 0

 

SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

I noticed that the question on how to skip or bypass a trailer record or a badly formatted/empty row in a SSIS package keeps coming back on the MSDN SSIS Forum.

I tried to figure out the reason why and after an extensive search inside the forum and outside it on the entire Web (using several search engines) I indeed found that it seems even thought there is a number of posts and articles on the topic none of them are employing the simplest and the most efficient technique. When I say efficient I mean the shortest time to solution for the fellow developers.

OK, enough talk. Let’s face the problem:

Typically a flat file (e.g. a comma delimited/CSV) needs to be processed (loaded into a database in most cases really). Oftentimes, such an input file is produced by some sort of an out of control, 3-rd party solution and would come in with some garbage characters and/or even malformed/miss-formatted rows.

One such example could be this imaginary file:

DirtyInputFile

As you can see several rows have no data and there is an occasional garbage character (1, in this example on row #7).

Our task is to produce a clean file that will only capture the meaningful data rows. As an aside, our output/target may be a database table, but for the purpose of this exercise we will simply re-format the source.

Let’s outline our course of action to start off:

  1. Will use SSIS 2005 to create a DFT;
  2. The DFT will use a Flat File Source to our input [bad] flat file;
  3. We will use a Conditional Split to process the bad input file; and finally
  4. Dump the resulting data to a new [clean] file.

Well, only four steps, let’s see if it is too much of work.

1: Start the BIDS and add a DFT to the Control Flow designer (I named it Process Dirty File DFT):

DFT

2, and 3:

2 and 3

I had added the data viewer to just see what I am getting, alas, surprisingly the data issues were not seen it:

DirtyFileView 

What really is the key in the approach it is to properly set the Conditional Split Transformation. Visually it is:

CondSplitSetting

and specifically its SSIS Expression

LEN([After CS Column 0]) > 1

The point is to employ the right Boolean expression (yes, the Conditional Split accepts only Boolean conditions).

For the sake of this post I re-named the Output Name “No Empty Rows”, but by default it will be named Case 1 (remember to drag your first column into the expression area)!

You can close your Conditional Split now. The next part will be crucial – consuming the output of our Conditional Split.

Last step - #4: Add a Flat File Destination or any other one you need. Click on the Conditional Split and choose the green arrow to drop onto the target. When you do so make sure you choose the No Empty Rows output and NOT the Conditional Split Default Output. Make the necessary mappings.

At this point your package must look like:

CompleteDFT

As the last step will run our package to examine the produced output file. F5:

Result

and… it looks great!


Feedback

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar I do the same but I get the same dirty file, so i wanna know how do you configure your flat file connection? format? (Ragged,width, delimied) row delimiter?

Thanks, 3/14/2011 7:40 PM | Mayra Zelada

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar Your flat file connection has to mirror the file format. It was comma delimited in my case. 3/15/2011 10:32 PM | Arthur

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar Thanks for this article - it helped me to find what I think is a more universal solution:

If one of your columns is of type int you cannot use LEN on this data type.

Another approach:

For "Flat File Source" component one needs to set RetainNulls to True (default value is False)

This causes values for columns with no data to be assigned NULL values (as opposed to empty string or 0 when RetainNulls is set to False)

Then one can check for NULL value in "Conditional Split" component. 1/24/2012 5:23 PM | Joe Schmoe

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar What does this mean? After CS?
If I try to use this the package errors out. 8/30/2012 4:22 PM | David S

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar 'After CS Column 0' is a column name. Can be any in your package.
And I am curious what error do you get (so I could troubleshoot it)? 9/24/2012 2:39 PM | Arthur

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar How to the rows to be ignored have more columns? 7/7/2014 4:52 PM | Jon Doe

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar Jon,

I can think of a solution to set the SSIS expression in the Split Component to count the number of commas

E.g. FINDSTRING([Column1],",",1) > 25 7/7/2014 8:56 PM | Arthur

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar Just wanted to take a moment to say thank you for putting this together. I am importing a flat file from business objects and it had the blank line at the bottom with just a carriage return.

Your post helped me solve this in just under a half hour or 1 cup of coffee.

Thank you again!! 9/26/2014 11:30 AM | Dennis Board

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

Gravatar I am glad I was of help to you Dennis! 9/27/2014 5:06 PM | Arthur

Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

 

 

Copyright © Compudicted