Geeks With Blogs

Arthur Zubarev Compudicted

I noticed that the question on how to skip or bypass a trailer record or a badly formatted/empty row in a SSIS package keeps coming back on the MSDN SSIS Forum.

I tried to figure out the reason why and after an extensive search inside the forum and outside it on the entire Web (using several search engines) I indeed found that it seems even thought there is a number of posts and articles on the topic none of them are employing the simplest and the most efficient technique. When I say efficient I mean the shortest time to solution for the fellow developers.

OK, enough talk. Let’s face the problem:

Typically a flat file (e.g. a comma delimited/CSV) needs to be processed (loaded into a database in most cases really). Oftentimes, such an input file is produced by some sort of an out of control, 3-rd party solution and would come in with some garbage characters and/or even malformed/miss-formatted rows.

One such example could be this imaginary file:

DirtyInputFile

As you can see several rows have no data and there is an occasional garbage character (1, in this example on row #7).

Our task is to produce a clean file that will only capture the meaningful data rows. As an aside, our output/target may be a database table, but for the purpose of this exercise we will simply re-format the source.

Let’s outline our course of action to start off:

  1. Will use SSIS 2005 to create a DFT;
  2. The DFT will use a Flat File Source to our input [bad] flat file;
  3. We will use a Conditional Split to process the bad input file; and finally
  4. Dump the resulting data to a new [clean] file.

Well, only four steps, let’s see if it is too much of work.

1: Start the BIDS and add a DFT to the Control Flow designer (I named it Process Dirty File DFT):

DFT

2, and 3:

2 and 3

I had added the data viewer to just see what I am getting, alas, surprisingly the data issues were not seen it:

DirtyFileView 

What really is the key in the approach it is to properly set the Conditional Split Transformation. Visually it is:

CondSplitSetting

and specifically its SSIS Expression

LEN([After CS Column 0]) > 1

The point is to employ the right Boolean expression (yes, the Conditional Split accepts only Boolean conditions).

For the sake of this post I re-named the Output Name “No Empty Rows”, but by default it will be named Case 1 (remember to drag your first column into the expression area)!

You can close your Conditional Split now. The next part will be crucial – consuming the output of our Conditional Split.

Last step - #4: Add a Flat File Destination or any other one you need. Click on the Conditional Split and choose the green arrow to drop onto the target. When you do so make sure you choose the No Empty Rows output and NOT the Conditional Split Default Output. Make the necessary mappings.

At this point your package must look like:

CompleteDFT

As the last step will run our package to examine the produced output file. F5:

Result

and… it looks great!

Posted on Tuesday, February 15, 2011 4:57 PM | Back to top


Comments on this post: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
I do the same but I get the same dirty file, so i wanna know how do you configure your flat file connection? format? (Ragged,width, delimied) row delimiter?

Thanks,
Left by Mayra Zelada on Mar 14, 2011 7:40 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
Your flat file connection has to mirror the file format. It was comma delimited in my case.
Left by Arthur on Mar 15, 2011 10:32 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
Thanks for this article - it helped me to find what I think is a more universal solution:

If one of your columns is of type int you cannot use LEN on this data type.

Another approach:

For "Flat File Source" component one needs to set RetainNulls to True (default value is False)

This causes values for columns with no data to be assigned NULL values (as opposed to empty string or 0 when RetainNulls is set to False)

Then one can check for NULL value in "Conditional Split" component.
Left by Joe Schmoe on Jan 24, 2012 5:23 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
What does this mean? After CS?
If I try to use this the package errors out.
Left by David S on Aug 30, 2012 4:22 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
'After CS Column 0' is a column name. Can be any in your package.
And I am curious what error do you get (so I could troubleshoot it)?
Left by Arthur on Sep 24, 2012 2:39 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
How to the rows to be ignored have more columns?
Left by Jon Doe on Jul 07, 2014 4:52 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
Jon,

I can think of a solution to set the SSIS expression in the Split Component to count the number of commas

E.g. FINDSTRING([Column1],",",1) > 25
Left by Arthur on Jul 07, 2014 8:56 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
Just wanted to take a moment to say thank you for putting this together. I am importing a flat file from business objects and it had the blank line at the bottom with just a carriage return.

Your post helped me solve this in just under a half hour or 1 cup of coffee.

Thank you again!!
Left by Dennis Board on Sep 26, 2014 11:30 AM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
I am glad I was of help to you Dennis!
Left by Arthur on Sep 27, 2014 5:06 PM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
If in first Column data is null and in next column data is present then this criteria is it works?

For Ex:
col1,Col2,Col3
test1,test2,test3,
NULL,test2,test3
test1,test2,test3

is it insets 2nd row to table?
Left by Prashant Shelavadi on Jan 13, 2015 9:57 AM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
Cool. Many thanks for this.
Left by Ravi on Apr 17, 2015 5:11 AM

# re: SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File
Requesting Gravatar...
A flat file (fixed width) has thousands of rows. Different rows have different number of columns. (Columns also have varying width). The first 4 digits of each of the row determine into which SQL server table that particular row should be loaded.

Example of FlatFile data:

1000AA USA<CR,LF>
1000AB CAN<CR,LF>
1000AC MEX<CR,LF>
2000PQR 1234 DIFFERENT RECORD LOAD<CR,LF>
2000QRS 1233 DIFF REC LOAD<CR,LF>

If Conditional Split is used, the first row starting with '2000' is being parsed in the last row starting with '1000'.

In the 'DataViewer', it appears as below:

1000AC MEX2000PQR 1234 DIFFERENT RECORD LOAD

From the second row starting with '2000', loading correctly into the appropriate table. But, the first row starting with '2000' is not getting loaded into the table.

If I manually remove the rows starting with '1000', then all rows starting with '2000' are being loaded correctly.

What is the best to remove the rows starting with '1000'?
Left by Chandler on May 04, 2015 9:06 PM

Your comment:
 (will show your gravatar)


Copyright © Compudicted | Powered by: GeeksWithBlogs.net | Join free