Geeks With Blogs
Evan Koch Musings on BizTalk Server and SQL Server

Recently I worked with another consultant on trying to import data from a flat file into a SQL Server 2005 database.  There were issues loading the file using the SSIS Import and Export Wizard, however, because not all of the rows in the file had the same number of columns.  This wouldn't have been an issue using the SQL Server 2000 DTS Import but this functionality has been modified in the new version of SQL Server.  Below is the best answer I could find as to why:

 The simple answer to the question "Why has this been changed in SSIS?" can be summed up in one word. Performance. In order that DTS could cope with files like this it would parse each row twice; once to find the row delimiter and a second time to find all the columns in the row. In SSIS that's no longer necessary; because the file is expected to be in a certain format SSIS doesn't worry that the row delimiter might be in the wrong place hence each row is only parsed once. Obviously then doing it the SSIS way results in a lot less work and thus the file can be loaded a lot quicker.

So let's assume we have a flat file that contains the following rows:

0,1,2
1,2,3,4,5,6,7
1,2,3,4,5
a,b,c,d,e,f,g

Attempting to load the file through SSIS Import Wizard and previewing the columns:

Not quite what we were expecting.  If we were to try the same in SQL 2000, we'd see:

So to get this file loaded properly in SSIS, we need to develop our own package with a Script Component.  Below is the layout for the Data Flow:

Here's the flat file connection manager.  Note that I'm not using a columnn delimiter - the approach here is to load all the data for each row into its own column and break it apart in the Script Component.

Within the properties for the Script Component, we have one input column (Column0) and create one output column for each possible column in the file.

The code within the Scripting Component (which I shamelessly ripped from one of the blogs mentioned at the end of this post) is where the input column is broken up into the separate values and assign the output column values.

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Row.Col1 = Tokenise(Row.Column0, ",", 1)

        Row.Col2 = Tokenise(Row.Column0, ",", 2)

        Row.Col3 = Tokenise(Row.Column0, ",", 3)

        Row.Col4 = Tokenise(Row.Column0, ",", 4)

        Row.Col5 = Tokenise(Row.Column0, ",", 5)

        Row.Col6 = Tokenise(Row.Column0, ",", 6)

        Row.Col7 = Tokenise(Row.Column0, ",", 7)

    End Sub

    'Private function that parses out the columns of the whole row

    Private Function Tokenise(ByVal input As String, ByVal delimiter As String, ByVal token As Integer) As String

        Dim tokenArray As String()

        tokenArray = input.Split(delimiter.ToCharArray) 'Split the string by the delimiter

        If tokenArray.Length < token Then 'Protect against a request for a token that doesn't exist

            Return ""

        Else

            Return tokenArray(token - 1)

        End If

    End Function

End Class

 

 I've created the table beforehand that will contain the data within SQL and set that as my destination.

So all that remains is to execute the data flow task.  After it has successfully run, we can check the destination table:

Source code for this example can be found here.

 

Posted on Wednesday, September 19, 2007 6:30 AM | Back to top


Comments on this post: SSIS and Unstructured Flat Files

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
It's not publicly available, but last year a team I was on put together a pretty impressive "jagged" file reader using a script task just like this. Each record a single header row and (N > 0) secondary data rows. Each row could have any number of columns, as long as it was greater than a previously agreed on X. Each row could be thrown out based on data within. Also, in each file there were multiple record sets and each set had a header and footer row with verification data about the records between. We initially avoided this approach but the script task ran surprisingly fast and could parse hundreds of thousands of rows in minutes.
Left by Brian on Sep 19, 2007 10:54 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Praveen,
Try updating the code in the scripting component to use a delimiter of a space rather than a comma. If you have any more problems, let me know and I'll post some code to demonstrate this.

Evan
Left by Evan Koch on Mar 30, 2008 11:40 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Hi i ahve the sam ekind of file. in my case i have vertical bar "|" as column delimeter. but i am having problem at my script task. I have validation error. Error : 30456 col1 is not a member of script_compnent_............................................

if u have put all the underlying details in it it would have been much help

Thanks
Left by Gyanendra on Apr 01, 2008 4:19 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I'm missing something - row.col1, etc is not available since it is an output column.
When I follow the steps above, the script does not have access to
row.col1 = tokenise(row.column0,",",1)

Am I missing some information?
Left by rob on May 15, 2008 10:19 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Hi,
I need to import a flat file into Tables in SQL server and export them back to a single flat file from tables.All these are done based on user requests.Can we use import/export wizard for this process?Is that possible to trigger import and export in SSIS from java code?
Left by Jack on Jun 17, 2008 1:39 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
When you add the script component select the Transform option.
Left by dfwdba on Aug 24, 2008 12:18 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I recently needed to do this and used a slightly differrent pattern. Essentially, a different flat file connection manager for each format. Useful when you are creating an output for mainframe.
Left by Vijay on Sep 16, 2008 1:37 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Sorry the link to my entry would be helpful...
http://www.duckfeet.net/geonewscommunity/Home/tabid/36/EntryID/6/Default.aspx
Left by Vijay on Sep 16, 2008 1:46 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I tried it and it worked fine but if I have a longer input column0, script component still shows size 50 which is gray I can't change it. and I get truncate error.

So if I have 0,1,2,3,4,5,6666,666,77,888,88,9,9,900 instead of 0,1,2 I get tdata truncate error.

How can I increse its size.

Thanks,


Left by amy on Mar 12, 2009 7:29 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
nevermind. I think I need to select columno(input) 's datatype as DT_Text

Thanks,
Left by amy on Mar 13, 2009 2:32 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
nope that didn't work. What do you suggest , if input column0 is too long(I have 100 fields to store in one input column and then split it)
Left by amy on Mar 13, 2009 2:58 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
That code was pretty helpful. I had somthing simular, I had some data extract from a legacy OLTP system which came out like this 123#1234567.AL. I needed to break it apart and place in columns of their own. I knew how to do this in java code with the string tokeniser, but not in VB. Thanks for post
Left by raymond on Mar 19, 2009 9:12 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
How can we do the reverse procidure. I need to export data from 6 different tables to the flat dilimited file with record identifier. Each table has 100 fields

Thanks for your help.
Left by amy on Mar 19, 2009 6:19 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Hi
I need to extract a data from the below format ...

INPUT IS IN .TXT FILE IN BELOW FORMAT

********************
ABC
----------------------------
CDE
********************
GHJ
----------------------------

OUTPUT IN BELOW FORMAT

COLUMN A COLUMN B
ABC CDE
GHJ NULL


THANKS IN ADVANCE
Left by mohamed idhris on Apr 24, 2009 8:02 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I have the same problem only that I do not have delimiters but a fixed width file.

How do I go about this with a fixed width unstructured file?
Left by mukami on May 11, 2009 9:47 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
How do I parse a flat file to get:
- the filename and date on the 1st line
- BOF on 2nd line
- column headers on the 3rd line
- .... record details
- EOF
- Number of records: 999 on the last line?
Left by George on Sep 22, 2009 6:29 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
i can't get the last row in the flat file when i processed i have 62000 records but i get the 61999 records..!
Left by vaibhav on Jan 08, 2010 7:31 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Hi I do not know if this is the right place to write this but here it goes.

I have been wondering from blog to forum in order to find a solution for my problem:

I have a flat file connection to an Unix file, that has no row delimiter and the data has a header row. My idea has to get ride of the header and just load the data.
It seems simple but the header does not have the same lenght as the rows below. An example:

htwewe2 22323 232323 ssddfdf 33e
23232323 wewew22 22323 wqeweew223322323
23232323 wewew22 22323 wqeweewty9098768
45454544 wewew22 34343 wqeweew223322323


Any ideas on how to cut the header of?

Thanks in advance,

Bruno
Left by Bruno Pimenta on Feb 01, 2010 4:13 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Does anyone know how this script can be modified to recognize:

1, 2, 3, "4,000", 5, 6, "7,000"

Where "4,000" is an entry for one field

Thanks in advance!
Left by Aeric Gonzales on Apr 14, 2010 6:01 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Basically, I wanted to have a modified version of the solution above to recognize columns that have comma's encased inside double quotes (as text qualifier) and I couldn't find any site that is specific to handling those.

I wanted to post the modified script that handles this that a friend wrote for me:


Public Function QuoteSplit(ByVal token As Integer, ByVal parseString As String, ByVal ParamArray Delimiters() As String) As String
Dim Results() As String
Dim StringEncoding As New Text.ASCIIEncoding()
Using MemStream As New MemoryStream(StringEncoding.GetBytes(parseString))
Using Parser As New FileIO.TextFieldParser(MemStream)
Parser.Delimiters = Delimiters
Parser.HasFieldsEnclosedInQuotes = True
Results = Parser.ReadFields()
End Using
End Using
If Results.Length < token Then 'Protect against a request for a token that doesn't exist
Return ""
Else
Return Results(token - 1)
End If
'Return Results(token - 1)
' Return New List(Of String)(Results)
End Function

Just need to change the Sub to

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Row.Col1 = QuoteSplit(1, Row.Column0, ",")
Row.Col2 = QuoteSplit(2, Row.Column0, ",")
Row.Col3 = QuoteSplit(3, Row.Column0, ",")
Row.Col4 = QuoteSplit(4, Row.Column0, ",")
Row.Col5 = QuoteSplit(5, Row.Column0, ",")
Row.Col6 = QuoteSplit(6, Row.Column0, ",")
Row.Col7 = QuoteSplit(7, Row.Column0, ",")
End Sub

I hope this helps anyone out there.

Enjoy!
Left by Aeric Gonzales on Apr 14, 2010 9:16 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I like this post
Left by khaled on Jun 06, 2010 7:11 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Thank you for post
Left by Kamala on Jul 21, 2010 1:44 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
This example is perfect for what I’m trying to do. I copied the code exactly and I’m getting a error: namespace does not directly contain members such as fields or methods…..I’m using SSIS 2008. I have no coding experience. Please Help.
Left by Faisal Lodhi on Nov 03, 2010 6:54 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
The solution is good but I have a problem in respect the row I have to split is over 12000 characters long therefore iit is ablob. how can I split that.
Left by Kev Wisbey on Dec 15, 2010 2:17 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Awesome. Thanks mate, you've saved my day
Left by Andrey on Feb 20, 2011 4:24 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
How to split this file:

aaaa a a 12345 0 0
aaaa a a 12345 0 0SS

I want it to become:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
aaaa a a 12345 0 0 null null
aaaa a a 12345 0 0 S S
Left by Len on Mar 27, 2011 2:26 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Thanks friend, that was a big help!
Left by Dave D on Apr 11, 2011 8:51 AM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
Great article! Thanks for sharing thanks for sharing this scenario.

Cheers :)
Left by Art on Aug 07, 2011 7:58 PM

# re: SSIS and Unstructured Flat Files
Requesting Gravatar...
I am having a file with format like below.
Here the index 1-2 to col1, 3-4 to col2 , 5-10 to col3, 11-29 to col3.How can I import the above formatted flat file to sql server data base using ssis.I am struggling a lot.Please provide me a way.
Data:
AS231345 5678900 23 555555 66 WER 12AS
Left by subhransu on Dec 11, 2013 8:57 PM

Your comment:
 (will show your gravatar)


Copyright © Evan Koch | Powered by: GeeksWithBlogs.net