Evan Koch

Musings on BizTalk Server 2006 and SQL Server 2005
posts - 20, comments - 31, trackbacks - 0

My Links

News

Archives

SSIS and Unstructured Flat Files

Recently I worked with another consultant on trying to import data from a flat file into a SQL Server 2005 database.  There were issues loading the file using the SSIS Import and Export Wizard, however, because not all of the rows in the file had the same number of columns.  This wouldn't have been an issue using the SQL Server 2000 DTS Import but this functionality has been modified in the new version of SQL Server.  Below is the best answer I could find as to why:

 The simple answer to the question "Why has this been changed in SSIS?" can be summed up in one word. Performance. In order that DTS could cope with files like this it would parse each row twice; once to find the row delimiter and a second time to find all the columns in the row. In SSIS that's no longer necessary; because the file is expected to be in a certain format SSIS doesn't worry that the row delimiter might be in the wrong place hence each row is only parsed once. Obviously then doing it the SSIS way results in a lot less work and thus the file can be loaded a lot quicker.

So let's assume we have a flat file that contains the following rows:

0,1,2
1,2,3,4,5,6,7
1,2,3,4,5
a,b,c,d,e,f,g

Attempting to load the file through SSIS Import Wizard and previewing the columns:

Not quite what we were expecting.  If we were to try the same in SQL 2000, we'd see:

So to get this file loaded properly in SSIS, we need to develop our own package with a Script Component.  Below is the layout for the Data Flow:

Here's the flat file connection manager.  Note that I'm not using a columnn delimiter - the approach here is to load all the data for each row into its own column and break it apart in the Script Component.

Within the properties for the Script Component, we have one input column (Column0) and create one output column for each possible column in the file.

The code within the Scripting Component (which I shamelessly ripped from one of the blogs mentioned at the end of this post) is where the input column is broken up into the separate values and assign the output column values.

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Row.Col1 = Tokenise(Row.Column0, ",", 1)

        Row.Col2 = Tokenise(Row.Column0, ",", 2)

        Row.Col3 = Tokenise(Row.Column0, ",", 3)

        Row.Col4 = Tokenise(Row.Column0, ",", 4)

        Row.Col5 = Tokenise(Row.Column0, ",", 5)

        Row.Col6 = Tokenise(Row.Column0, ",", 6)

        Row.Col7 = Tokenise(Row.Column0, ",", 7)

    End Sub

    'Private function that parses out the columns of the whole row

    Private Function Tokenise(ByVal input As String, ByVal delimiter As String, ByVal token As Integer) As String

        Dim tokenArray As String()

        tokenArray = input.Split(delimiter.ToCharArray) 'Split the string by the delimiter

        If tokenArray.Length < token Then 'Protect against a request for a token that doesn't exist

            Return ""

        Else

            Return tokenArray(token - 1)

        End If

    End Function

End Class

 

 I've created the table beforehand that will contain the data within SQL and set that as my destination.

So all that remains is to execute the data flow task.  After it has successfully run, we can check the destination table:

Source code for this example can be found here.

 

Print | posted on Wednesday, September 19, 2007 6:30 AM |

Feedback

Gravatar

# re: SSIS and Unstructured Flat Files

It's not publicly available, but last year a team I was on put together a pretty impressive "jagged" file reader using a script task just like this. Each record a single header row and (N > 0) secondary data rows. Each row could have any number of columns, as long as it was greater than a previously agreed on X. Each row could be thrown out based on data within. Also, in each file there were multiple record sets and each set had a header and footer row with verification data about the records between. We initially avoided this approach but the script task ran surprisingly fast and could parse hundreds of thousands of rows in minutes.
9/19/2007 10:54 AM | Brian
Gravatar

# re: SSIS and Unstructured Flat Files

Praveen,
Try updating the code in the scripting component to use a delimiter of a space rather than a comma. If you have any more problems, let me know and I'll post some code to demonstrate this.

Evan
3/30/2008 11:40 AM | Evan Koch
Gravatar

# re: SSIS and Unstructured Flat Files

Hi i ahve the sam ekind of file. in my case i have vertical bar "|" as column delimeter. but i am having problem at my script task. I have validation error. Error : 30456 col1 is not a member of script_compnent_............................................

if u have put all the underlying details in it it would have been much help

Thanks
4/1/2008 4:19 PM | Gyanendra
Gravatar

# re: SSIS and Unstructured Flat Files

I'm missing something - row.col1, etc is not available since it is an output column.
When I follow the steps above, the script does not have access to
row.col1 = tokenise(row.column0,",",1)

Am I missing some information?
5/15/2008 10:19 AM | rob
Gravatar

# re: SSIS and Unstructured Flat Files

Hi,
I need to import a flat file into Tables in SQL server and export them back to a single flat file from tables.All these are done based on user requests.Can we use import/export wizard for this process?Is that possible to trigger import and export in SSIS from java code?
6/17/2008 1:39 AM | Jack
Gravatar

# re: SSIS and Unstructured Flat Files

When you add the script component select the Transform option.
8/24/2008 12:18 AM | dfwdba
Gravatar

# re: SSIS and Unstructured Flat Files

I recently needed to do this and used a slightly differrent pattern. Essentially, a different flat file connection manager for each format. Useful when you are creating an output for mainframe.
9/16/2008 1:37 PM | Vijay
Gravatar

# re: SSIS and Unstructured Flat Files

Sorry the link to my entry would be helpful...
http://www.duckfeet.net/geonewscommunity/Home/tabid/36/EntryID/6/Default.aspx
9/16/2008 1:46 PM | Vijay
Gravatar

# re: SSIS and Unstructured Flat Files

I tried it and it worked fine but if I have a longer input column0, script component still shows size 50 which is gray I can't change it. and I get truncate error.

So if I have 0,1,2,3,4,5,6666,666,77,888,88,9,9,900 instead of 0,1,2 I get tdata truncate error.

How can I increse its size.

Thanks,


3/12/2009 7:29 PM | amy
Gravatar

# re: SSIS and Unstructured Flat Files

nevermind. I think I need to select columno(input) 's datatype as DT_Text

Thanks,
3/13/2009 2:32 PM | amy
Gravatar

# re: SSIS and Unstructured Flat Files

nope that didn't work. What do you suggest , if input column0 is too long(I have 100 fields to store in one input column and then split it)
3/13/2009 2:58 PM | amy
Gravatar

# re: SSIS and Unstructured Flat Files

That code was pretty helpful. I had somthing simular, I had some data extract from a legacy OLTP system which came out like this 123#1234567.AL. I needed to break it apart and place in columns of their own. I knew how to do this in java code with the string tokeniser, but not in VB. Thanks for post
3/19/2009 9:12 AM | raymond
Gravatar

# re: SSIS and Unstructured Flat Files

How can we do the reverse procidure. I need to export data from 6 different tables to the flat dilimited file with record identifier. Each table has 100 fields

Thanks for your help.
3/19/2009 6:19 PM | amy
Gravatar

# re: SSIS and Unstructured Flat Files

Hi
I need to extract a data from the below format ...

INPUT IS IN .TXT FILE IN BELOW FORMAT

********************
ABC
----------------------------
CDE
********************
GHJ
----------------------------

OUTPUT IN BELOW FORMAT

COLUMN A COLUMN B
ABC CDE
GHJ NULL


THANKS IN ADVANCE
4/24/2009 8:02 AM | mohamed idhris
Gravatar

# re: SSIS and Unstructured Flat Files

I have the same problem only that I do not have delimiters but a fixed width file.

How do I go about this with a fixed width unstructured file?
5/11/2009 9:47 AM | mukami
Gravatar

# re: SSIS and Unstructured Flat Files

How do I parse a flat file to get:
- the filename and date on the 1st line
- BOF on 2nd line
- column headers on the 3rd line
- .... record details
- EOF
- Number of records: 999 on the last line?
9/22/2009 6:29 AM | George
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 

Powered by: