Recently I was assigned a task of helping a developer to cope with an unusual input file processing.
The file we needed to load into SQL Server had double quotes surrounding those fields that contained comma(s).
So a short excerpt from it would resemble something like:123,ABC456,”D,E,F””7,89″,GHS
Since SSIS’s Flat File Connection does not support alternating text qualifiers, the chief difficulty stems from the fact if we even only remove the quotes, the file structure becomes broken because the extra commas would break such a field into additional columns. I proposed to replace the comma as our delimiter to the vertical pipe (|) character thus leaving commas intact. The quotes become then unnecessary.
Initially we wanted to pre-process the file by using a PowerShell script batch, but while testing it turned out that unless you use .Net objects in PowerShell it cannot operate on relatively large files, we were getting an out of memory errors from the PowerShell when using Get-Content or similar methods. The next attempt was to use the Data Flow Task’s Script Component set as a Transformation. It worked, here is how:
We created a new DFT to just reformat the input file
The payload is basically implemented inside the Script Transformation, but I am showing how the input is set first that came from the flat file set to be consumed as a single row. I want to stress this out – it is important to configure your Flat File Source to “see” the input file as a single column file:
The next step was to add the output that will be our modified row:
So nothing fancy is going on here and we are now ready to code:
1 : ' Microsoft SQL Server Integration Services user script component 2
: ' This is your new script component in Microsoft Visual Basic .NET 3
: ' ScriptMain is the entrypoint class for script components 4 : 5
: Imports System 6 : Imports System.Data 7
: Imports System.Math 8
: Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper 9
: Imports Microsoft.SqlServer.Dts.Runtime.Wrapper 10 : 11
: Public Class ScriptMain 12 : Inherits UserComponent 13 : 14
: Public Overrides Sub UnTransformedInput_ProcessInputRow(
ByVal Row As UnTransformedInputBuffer) 15 : ' 16
: ' Unquote and replace the comma to pipe (|) to become the delimiter 17
: ' 18 : 19 : Dim raw_string As String = String.Empty 20
: Dim seacrhedPos As Int32 21 : Dim firstCommaPos As Int32 22
: Dim nextQuotePos As Int32 23 : Dim section_to_modify As String 24 : 25
: raw_string = Row.UnTransformedOrgRow 26 : 27
: 'MesssageBox to see the initial value 28
: System.Windows.Forms.MessageBox.Show(raw_string) 29 : 30
: If raw_string.Length
> 0 Then 31 : 32 : seacrhedPos = raw_string.IndexOf(
"," "", 0) 33 : 34
: While seacrhedPos<> - 1 35
: firstCommaPos = raw_string.IndexOf(",", seacrhedPos + 1) 36 : 37
: nextQuotePos = raw_string.IndexOf(
"" ",", firstCommaPos + 1) 38 : 39 : If nextQuotePos
> 0 Then 40
: section_to_modify = raw_string.Substring(seacrhedPos + 1,
raw_string.Length - seacrhedPos -
(raw_string.Length -
nextQuotePos)) 41 : 42
: section_to_modify = section_to_modify.Replace(",", "~").Replace(
"" "", "") 43 : 44 : 'Replace 45
: raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
&
_ 46
: raw_string.Substring(nextQuotePos + 1, raw_string.Length - nextQuotePos -
1) 47 : 48 : Else 49
: section_to_modify = raw_string.Substring(seacrhedPos + 1,
raw_string.Length - seacrhedPos -
1) 50 : 51
: section_to_modify = section_to_modify.Replace(",", "~").Replace(
"" "", "") 52 : 53 : 'Replace 54
: raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
55 : 56 : End If 57 : 58
: 'Get the next set 59 : seacrhedPos = raw_string.IndexOf(
"," "", 0) 60 : End While 61 : 62
: raw_string = raw_string.Replace(",", "|")
.Replace("~", ",") 63 : 64 : End If 65 : 66
: 'MesssageBox to see the final value 67
: System.Windows.Forms.MessageBox.Show(raw_string) 68 : 69
: Row.TransformedRow = raw_string 70 : End Sub 71 : 72 : End Class 73:
This code will pop up a message box showing the pre-processed record, and then the end result, you can safely remove these two MessageBox.Show calls before productionizing your package.
The end result of this DFT will be a new flat file in this format:
123|ABC
456|D,E,F
7,89|GHS