The SSIS Expression Way–Skipping an Unwanted File


Yesterday at the MSDN SSIS Forum somebody asked a question if it is possible to skip a particular file conditionally while keep processing the others.

Such a situation arises in cases when a file mask/wildcard set for the file names to pick for processing in a ForEach Loop container collects extra files.

For example, if we have a folder with files as in the picture below, and we do not want to process (for a specific reason) say File2.txt, our only option to specify the files eligible for processing would be File*.txt only (alas, the Collection’s property page “Files:” text box does not support Regular Expressions).

image

So what is the solution? I proposed to implement the proper conditional processing. How this can be done? The answer is by using the Precedence Constraints.

As an aside, the Precedence Constraint is an often overlooked entity, but a very powerful decision control mechanism in SSIS, yet it seems that many SSIS developers tend to avoid or dislike it.

OK, so let’s tackle the package up!

To start off, let‘s create a SSIS project, add or modify the existing/created package by dragging and dropping the ForEach Loop container in which will set the file mask for the files to be collected. It may look like below in BIDS/Visual Studio 2008:

SNAGHTML3be511

At this point let’s map the file name picked to a new package scope variable

image

We will also need one more variable to keep the name of the file to exclude:

SNAGHTML3f428c

( only the bottom variable is relevant)

By the way, this approach also permits to save this variable to a configuration file to get the ability of changing its value after the package is deployed, but involving a config file is beyond the scope of this post.

Next step is to continue adding components to the package and at this point it will up to what needs to be achieved by the package, so in my case for the sake of simplicity I will use a Script Task (to just serve the purpose of the processing inception) which will display and/or output the currently processed file to the Output window of BIDS and two Data Flow Tasks for each category of files (those that need to be processed and the one that not).

In short I was able to arrive to a design like this:

image

I will share for the sake of completeness what is in the Script Task to just let you know where the file names get generated:

   1:  public void Main()
   2:          {
   3:              // Uncomment for a visual representation
   4:              // MessageBox.Show (Dts.Variables["pkVarCurrentFile"].Value.ToString());
   5:   
   6:              // Log the file name being processed to the Output Window
   7:              bool fireAgain = true;
   8:              this.Dts.Events.FireInformation(1, "", "Current file: " + Dts.Variables["pkVarCurrentFile"].Value.ToString(), String.Empty, 0, ref fireAgain);
   9:              
  10:              Dts.TaskResult = (int)ScriptResults.Success;
  11:          }

To display a graphical message box one should un-comment the line #4, but I find more SSISish to use the FireInformation function.

Now is the main part: the Precedence Constraint logic. If you remember we used two variables. One is to store the currently processed file and another is the one with the file name to exclude: pkVarCurrentFile and pkVarFilesToSkip.

What goes into the Precedence Constraints is this expression:

FINDSTRING( @[User::pkVarCurrentFile],@[User::pkVarFilesToSkip],1 ) < 1

Into the one leading the DFT that will process the file, and this one into the one that will not:

FINDSTRING( @[User::pkVarCurrentFile],@[User::pkVarFilesToSkip],1 ) >= 1

As you can see only the boolean condition is reversed. That’s all! See a picturised representation:

SNAGHTML50f789

At this point we are ready to test!

I added a breakpoint to the DFT that supposed to not to process the unwanted file (the File2.txt if you remember). Such a DFT can merely be empty, but I chose to use a single Trash Component.

So to make sure we do process the needed files we do a test run:

image

And our Script Task is reporting:

SSIS package "DontProcessUnwantedFiles.dtsx" starting.
Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File1.txt
Information: 0x4004300A at Process this file, SSIS.Pipeline: Validation phase is beginning.
Information: 0x4004300C at Process this file, SSIS.Pipeline: Execute phase is beginning.
Information: 0x4004300B at Process this file, SSIS.Pipeline: "component "Trash Destination" (1)" wrote 0 rows.
Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File1_2.txt
Information: 0x4004300A at Process this file, SSIS.Pipeline: Validation phase is beginning.
Information: 0x4004300C at Process this file, SSIS.Pipeline: Execute phase is beginning.
Information: 0x4004300B at Process this file, SSIS.Pipeline: "component "Trash Destination" (1)" wrote 0 rows.
Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File1_3.txt
Information: 0x4004300A at Process this file, SSIS.Pipeline: Validation phase is beginning.
Information: 0x4004300C at Process this file, SSIS.Pipeline: Execute phase is beginning.
Information: 0x4004300B at Process this file, SSIS.Pipeline: "component "Trash Destination" (1)" wrote 0 rows.
Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File2.txt
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Skip this file' has been hit

Excellent. We have just hit the breakpoint, and at that stage the DFT was processing the needed files.

Now we let the package continue and …

Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File2.txt
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Skip this file' has been hit
Information: 0x4004300A at Skip this file, SSIS.Pipeline: Validation phase is beginning.
Information: 0x4004300C at Skip this file, SSIS.Pipeline: Execute phase is beginning.
Information: 0x4004300B at Skip this file, SSIS.Pipeline: "component "Trash Destination" (1)" wrote 0 rows.
Information: 0x1 at Show the current file name in Output Window: Current file: C:\Temp\FilesToProcess\File2_1.txt
Information: 0x4004300A at Process this file, SSIS.Pipeline: Validation phase is beginning.

voilà, the unwanted file was skipped!

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Wednesday, January 11, 2012 11:51 PM | Feedback (2)

Get Better Equipped and Educated in Tech Year 2012, Yet - For Free!


Tech year 2012 is upon us!

Who else more than the IT professionals need to constantly catch up after the latest features, new tools, technologies, software releases or just keep educating themselves?

This is all not new to any seasoned developer, administrator or any other IT person.

Luckily, Microsoft realizes this does keep many people up at night and as a result we are fortunate to have several resources for at our disposal this year to alleviate the stress levels.

Namely, these are as follows:

Microsoft Virtual Academy

Improve your IT skill set and help advance your career with a free, easy to access training portal. Learn at your own pace, focusing on Microsoft technologies, gain points and get recognition.

Evaluation Download Center

Get fully functional software for evaluation purposes. A ton of titles available. Do not spend a dime on enterprise, world class products.

TechNet Cloud Hub

Want a glimpse into the future of IT? Acquire fresh and very much in demand skills? Sure you do!


What can be better than knowing all of the above is completely free?

Have fun and prosperous Tech year 2012!

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Friday, January 06, 2012 3:31 PM | Feedback (1)

How to Remove Quotes and Replace the Delimiter in a File using PowerShell Preserving the Commas


Recently I was tasked with a file cleansing exercise. The endeavor looked simple, but the solution was not, I needed to prepare a file for processing, but wanted to first proof concept my approach so I opted to utilizing this opportunity to learn PoweShell.

More about the task: I was supplied with a series of CSV files in which the data looked like

123,ABC,XYZ
"1,2,3", ABC, "XY,Z"
123,"A,BC", XYZ

As you can see this is something I think we may call alternating double-quotes comma delimited file with embedded commas, I simply needed to remove the occasional quotes and replace the delimiter to pipe to be able to process the file preserving the original quotes. You may notice, I blogged in the previous post I was able to achieve the same using MS SSIS Script Transformation which basically uses some .Net code. To be fair, I did not come up with the algorithm, turned out my colleague Danny T. had a very extensive experience in textual data processing, so I was just the implementer.

The script turned out to work as expected, but alas there is a limitation to PowerShell’s get-content Cmdlet on how long data file it can digest. So the script may not be bale to process the entire input file. I guess the workaround is in using a revised version that utilizes the .Net stream class, hope somebody would share it with me someday.

So here is the script:

   1:  # Author: Arthur Zubarev
   2:  # Notes: Removes " and replaces the delimiter to |
   3:  # In DOS batch usage:
   4:  # Powershell.exe "& 'D:\PSScripts\RemoveQuotes-ReplaceDelim.ps1'" -InputCSV ""D:\Input.csv"" -NonInteractive -WindowStyle hidden -ExecutionPolicy RemoteSigned
   5:   
   6:  param([string]$InputCSV = "")
   7:   
   8:  # Temp assignment!
   9:  $InputCSV = "D:\Input.csv"
  10:   
  11:  $process_yes_no = 0
  12:   
  13:  if ( $InputCSV -eq "" ) {
  14:      Write-Host "Input parameter is missing. Usage: .\RemoveQuotes-ReplaceDelim.ps1 -InputCSV <file path>" -ForegroundColor red -BackgroundColor yellow
  15:      Exit
  16:  }
  17:   
  18:  function GetScriptDirectory {
  19:      $Invocation = (Get-Variable MyInvocation -Scope 1).Value
  20:      Split-Path $Invocation.MyCommand.Path
  21:  }
  22:   
  23:  Write-Host "Processing" $InputCSV -foregroundcolor green -backgroundcolor black
  24:   
  25:  # make the "fixed" file up
  26:  $fixed_file = Join-Path(GetScriptDirectory) "fixed.csv"; 
  27:   
  28:  Write-Host "Saving fixed file to" $fixed_file -foregroundcolor yellow -backgroundcolor black
  29:   
  30:  # read the input csv as binary
  31:  $reader = [System.IO.File]::OpenText($InputCSV)
  32:  try {
  33:      for(;;) {
  34:          $to_repair_t = $reader.ReadLine()
  35:          if ($to_repair_t -eq $null) { 
  36:          break 
  37:          }
  38:  $to_repair = get-content -encoding byte $to_repair_t
  39:  $fixed = New-Object System.Collections.ArrayList
  40:  $byteEncoder = New-Object System.Text.ASCIIEncoding
  41:  $in_quotes = 0
  42:   
  43:  foreach ($c in $to_repair) {
  44:      # toggle in-quotes flag
  45:      if ($c -eq 34) {
  46:          $in_quotes = !$in_quotes;
  47:          $process_yes_no = 1
  48:      }
  49:   
  50:      # replace comma in quotes with ~
  51:      if ($c -eq 44 -and $in_quotes) {
  52:          foreach ($b in $byteEncoder.GetBytes("~")) {
  53:              [void]$fixed.Add($b);
  54:          }
  55:          continue;
  56:      }
  57:   
  58:      # pass through the bytes
  59:      [void]$fixed.Add($c);
  60:  }
  61:   
  62:  if ($process_yes_no -eq 1 ) {
  63:      # write-out to fixed csv
  64:      set-content -encoding byte -path $fixed_file -value $fixed
  65:   
  66:      # carry on with the commas replacements
  67:      (Get-Content $fixed_file) | 
  68:      Foreach-Object {$_ -replace ",", "|"} | 
  69:      Set-Content $fixed_file
  70:   
  71:      # next do ~ replacements
  72:      (Get-Content $fixed_file) | 
  73:      Foreach-Object {$_ -replace "~", ","} | 
  74:      Set-Content $fixed_file
  75:   
  76:      # last fix step " removals
  77:      (Get-Content $fixed_file) | 
  78:      Foreach-Object {$_ -replace """", ""} | 
  79:      Set-Content $fixed_file
  80:   
  81:      # Swap the files
  82:      Move-Item $fixed_file $InputCSV -Force -ErrorAction SilentlyContinue
  83:   
  84:      #$File_Name = split-path $InputCSV -Leaf # in case the file name needs to be captured
  85:  }
  86:  else {
  87:      Write-Host "No processing was necessary."
  88:  }
  89:  }}
  90:  finally {
  91:      $reader.Close()
  92:  }
  93:   
  94:  Write-Host "Processing complete." -BackgroundColor DarkGreen -ForegroundColor yellow
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Thursday, September 22, 2011 4:51 PM | Feedback (0)

SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter


Recently I was assigned a task of helping a developer to cope with an unusual input file processing.

The file we needed to load into SQL Server had double quotes surrounding those fields that contained comma(s).

So a short excerpt from it would resemble something like:

123,ABC
456,"D,E,F"
"7,89",GHS

Since SSIS’s Flat File Connection does not support alternating text qualifiers, the chief difficulty stems from the fact if we even only remove the quotes, the file structure becomes broken because the extra commas would break such a field into additional columns. I proposed to replace the comma as our delimiter to the vertical pipe (|) character thus leaving commas intact. The quotes become then unnecessary.

Initially we wanted to pre-process the file by using a PowerShell script batch, but while testing it turned out that unless you use .Net objects in PowerShell it cannot operate on relatively large files, we were getting an out of memory errors from the PowerShell when using Get-Content or similar methods. The next attempt was to use the Data Flow Task’s Script Component set as a Transformation. It worked, here is how:

We created a new DFT to just reformat the input file

Transforming_DFT

The payload is basically implemented inside the Script Transformation, but I am showing how the input is set first that came from the flat file set to be consumed as a single row. I want to stress this out – it is important to configure your Flat File Source to “see” the input file as a single column file:

InputCols

The next step was to add the output that will be our modified row:

Input_And_Output

So nothing fancy is going on here and we are now ready to code:

   1:  ' Microsoft SQL Server Integration Services user script component
   2:  ' This is your new script component in Microsoft Visual Basic .NET
   3:  ' ScriptMain is the entrypoint class for script components
   4:   
   5:  Imports System
   6:  Imports System.Data
   7:  Imports System.Math
   8:  Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
   9:  Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
  10:   
  11:  Public Class ScriptMain
  12:      Inherits UserComponent
  13:   
  14:      Public Overrides Sub UnTransformedInput_ProcessInputRow(ByVal Row As UnTransformedInputBuffer)
  15:          '
  16:          ' Unquote and replace the comma to pipe (|) to become the delimiter
  17:          '
  18:   
  19:          Dim raw_string As String = String.Empty
  20:          Dim seacrhedPos As Int32
  21:          Dim firstCommaPos As Int32
  22:          Dim nextQuotePos As Int32
  23:          Dim section_to_modify As String
  24:   
  25:          raw_string = Row.UnTransformedOrgRow
  26:   
  27:          'MesssageBox to see the initial value
  28:          System.Windows.Forms.MessageBox.Show(raw_string)
  29:   
  30:          If raw_string.Length > 0 Then
  31:   
  32:              seacrhedPos = raw_string.IndexOf(",""", 0)
  33:   
  34:              While seacrhedPos <> -1
  35:                  firstCommaPos = raw_string.IndexOf(",", seacrhedPos + 1)
  36:   
  37:                  nextQuotePos = raw_string.IndexOf(""",", firstCommaPos + 1)
  38:   
  39:                  If nextQuotePos > 0 Then
  40:                      section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - (raw_string.Length - nextQuotePos))
  41:   
  42:                      section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
  43:   
  44:                      'Replace 
  45:                      raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify & _
  46:                      raw_string.Substring(nextQuotePos + 1, raw_string.Length - nextQuotePos - 1)
  47:   
  48:                  Else
  49:                      section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - 1)
  50:   
  51:                      section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
  52:   
  53:                      'Replace 
  54:                      raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
  55:   
  56:                  End If
  57:   
  58:                  'Get the next set
  59:                  seacrhedPos = raw_string.IndexOf(",""", 0)
  60:              End While
  61:   
  62:              raw_string = raw_string.Replace(",", "|").Replace("~", ",")
  63:   
  64:          End If
  65:   
  66:          'MesssageBox to see the final value
  67:          System.Windows.Forms.MessageBox.Show(raw_string)
  68:   
  69:          Row.TransformedRow = raw_string
  70:      End Sub
  71:   
  72:  End Class
  73:   

This code will pop up a message box showing the pre-processed record, and then the end result, you can safely remove these two MessageBox.Show calls before productionizing your package.

The end result of this DFT will be a new flat file in this format:

123|ABC
456|D,E,F
7,89|GHS
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Monday, September 19, 2011 9:41 PM | Feedback (0)

How to Run a PowerShell Script From Command Line Both With Spaces in Path And Input Parameter


This post is going to be the shortest of my all.

Recently, to my surprise, I found myself developing quite a lot with PowerShell, it is not that I have never written scripts, but those mostly were VBScripts.

To start off I can tell that there is a ton of examples on the World Wide Web, and I know of nine free PowerShell books, alas I had no time to read any due to very short project deadlines and when I needed a specific example the search results returned were exhaustive too often, yet incomplete.

At one of the final stages I needed to schedule to execute a PowerShell script that needed to take in a parameter in form of a file path containing spaces and yet the path to the very PS script contained spaces, too.

After fiddling a bit with the bath file I ended up with the following syntax:

Powershell "& 'D:\Documents and Settings\User.Name\My Documents\Development\PSScripts\MyPSScript.ps1'" -InputFile ""D:\Documents and Settings\User.Name\My Documents\InputFile.csv"" -NonInteractive

Note the use of the single quote denoting the path to the PS file and the double quotation marks for its parameter.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Sunday, September 11, 2011 12:07 AM | Feedback (0)

Cascading IF-ELSE-ELSEIF A.K.A. Nested IF-ELSE Statements in a SSIS Expression


Today it will be a concise post, but I hope quite helpful and informative.

As most SSIS developers know SSIS Expressions are used to drive several key tidbits of a package components as the Conditional Split, Package Variables value assignments, Derived Column Transformation, form a SQL statement for Execute SQL task, drive precedence constraints, set package properties as connection string for example and more. So it is hard to underestimate the importance of the SSIS Expressions for a SSIS developer.

However, the SSIS expressions look hard to use for some beginners yet, feel un-natural to most people. This fact does not undermine the power and flexibility of the SSIS Expression. The only hard to swallow portion in my opinion is the code bloat versa editing space ratio that limits the effectiveness of writing long expressions. Luckily, there is a free tool to help: SSIS Expression Editor & Tester! I do not see any dramatic improvements in regard to the SSIS Expression in the forthcoming SQL Server “Denali”, but it will add a number of features that made me excited, perhaps I shall dedicate a separate post on this subject, but I will mention one: Expression Task. This task will help to assign a result of an expression to a variable. So the SSIS Expressions are here to stay and important to know.

And now I will share the most commonly asked question on MSDN SSIS forum: “how do I handle or process multiple IF-ELSE conditions”?

The answer is it is simple!

As we know, the conditional statement in SSIS Expression has the following notation:

«boolean_test» ? «true_result» : «false_result»

The trick is the right part can accept yet another [nested] IF-ELSE, and then more:

@[User::MyStringVar] == "TEST1" ? "IT WAS TEST1": 
@[User::MyStringVar] == "TEST2" ? "IT WAS TEST2": 
@[User::MyStringVar] == "TEST3" ? "IT WAS TEST3":
"MATCH NOT FOUND"

The end result of running this expression is it will check the MyStringVar contents and try to pick the first match, if the match not found it will produce “Match not found”.

This expression is good for a variable assignment, for a Derived Column transformation one should use:

[MyStringVar] == "TEST1" ? "IT WAS TEST1": 
[MyStringVar] == "TEST2" ? "IT WAS TEST2": 
[MyStringVar] == "TEST3" ? "IT WAS TEST3":
"MATCH NOT FOUND"

It is easy to expand this expression further to more choices by just adding lines before the last assignment.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Wednesday, July 13, 2011 11:33 AM | Feedback (1)

How to capture a Header or Trailer Count Value in a Flat File and Assign to a Variable


Recently I had several questions concerning how to process files that carry a header and trailer in them. Typically those files are a product of data extract from non Microsoft products e.g. Oracle database encompassing various tables data where every row starts with an identifier. For example such a file data record could look like:

HDR,INTF_01,OUT,TEST,3/9/2011 11:23

B1,121156789,DATA TEST DATA,2011-03-09 10:00:00,Y,TEST 18 10:00:44,2011-07-18 10:00:44,Y
B2,TEST DATA,2011-03-18 10:00:44,Y
B3,LEG 1 TEST DATA,TRAN TEST,N

B4,LEG 2 TEST DATA,TRAN TEST,Y

FTR,4,TEST END,3/9/2011 11:27

A developer is normally able to break the records using a Conditional Split Transformation component by employing an expression similar to

Output1 --  SUBSTRING(Output1,1,2) == "B1"

and so on, but often a verification is required after this step to check if the number of data records read corresponds to the number specified in the trailer record of the file.

This portion sometimes stumbles some people so I decided to share what I came up with.

As an aside, I want to mention that the approach I use is slightly more portable than some others I saw because I use a separate DFT that can be copied and pasted into a new SSIS package designer surface or re-used within the same package again and it can survive several trailer/footer records (!).

See how a ready DFT can look:

DFT_Composition

The first step is to create a Flat File Connection Manager and make sure you get the row split into columns like this:

FFS_Columns_Value FFSE_Look

After you are done with the Flat File connection, move onto adding an aggregate which is in use to simply assign a value to a variable (here the aggregate is used to handle the possibility of multiple footers/headers):

Aggr_Editor

The next step is adding a Script Transformation as destination that requires very little coding.

First, some variable setup:

Script_Setup

and finally the code:

Script_Code

As you can see it is important to place your code into the appropriate routine in the script, otherwise the end result may not be as expected.

As the last step you would use the regular Script Component to compare the variable value obtained from the DFT above to a package variable value obtained say via a Row Count component to determine if the file being processed has the right number of rows.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Tuesday, June 28, 2011 10:37 AM | Feedback (0)

Playing Around With the New SSIS Balanced Data Distributor Transform Component


Microsoft has just released a new SSIS 2008 transform component called SQL Server Integration Services Balanced Data Distributor.

As the download site states “this transform takes a single input and distributes the incoming rows to one or more outputs uniformly via multithreading”.

Sounds like we got a component to evenly distribute some data across multiple destinations. Frankly, I though we could do this even before using either a Script Component or the Conditional Split, alas this component has a different logic.

Unlike anything before, after a quick download and an interaction with a license agreement Word document I get a link to the actual component download, install and I am finally ready to take a closer look at it after I added it to my Data Flow Destination items.

In my experiments I decided to re-use one of my relatively simple sandbox SSIS packages. As a side note, this component works with SSIS 2008 and SSIS 2008 R2 versions only.

The properties of this new component look rather spartan and do not offer customizations of any kind other than its name, internal metadata validation and description. I was a bit surprised to see such a limited set of properties exposed.

I decided to capture some data from the Adventure Works database and distribute it to several flat files. Balanced Data Distributor to the rescue!

Configuring this transform is a simple drag and drop one source output to a destination then another becomes available, simple!

Next I loaded a subset of data (rows-wise) from the Adventure Works database Sales.OrderDetail table into my sandbox table and fired the package. I got a result that I almost expected:

Buffer10K

Well… I did not expect to see so few rows in the left file. The a question popped up in my head: what will happen if I reduce the number of rows even further?

Here is the result:

Buffer9K

No rows were loaded into the 2nd destination!

This made me realize that the component is using an internal buffer of 10,000 rows and it is pre-set. No property page to override it. Changing the row width does not seem to affect this behavior either. At least in my experiments.

So the conclusion is component has some pre-defined logic how to distribute rows.

As a closure, a word of caution, making this component input data onto the same flat file destination does not make sense because the file will be locked and the package will fail after a warning message: “[… File Destination [nnn]] Warning: The process cannot access the file because it is being used by another process.“

Related: if you want to see how this new component performs in terms of speed you can visit Boyan Penev’s post here.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Saturday, May 21, 2011 12:25 AM | Feedback (0)

How to FTP and Process Anyhow Named File


A colleague of mine approached me asking what would be the easiest approach to download a single file from a FTP location that when the file name is changing frequently without any logical pattern.

My immediate thought was to use an empty ForEach Loop Container because it can just iterate over files, so even though we are dealing with one file it would be able to capture its name in a single hop. Now to the specifics.

As an aside, it was an existing package that used to consume a fixed name file, but the requirements have changed. To glance ahead, the package after it was completed looks like:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now the specifics: so the FTP component is simply set to download the entire contents of the FTP directory (the file there just keeps updating). Thus nothing special is in there.

The trick is in linking this FTP Task to the ForEach Loop Container! Inside this ForEach there is nothing and its setup is literally trivial:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

It uses a file enumerator and it is set to go to a specific directory (blurred) to get all files with the .csv extension. So far a no brainer. Next setting to apply is to map the file name retrieved to a variable:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 This way we capture the file name.

At the next or any subsequent stage we are able to consume this file through its name and location.

Since the package stored this file aside as a copy by design we have a File System Task connected next to the ForEach Loop to copy it:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is it! 

I guess if we need to handle multiple files this package is as easily modifiable to just move those outside components into the ForEach Loop.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Tuesday, May 17, 2011 12:15 PM | Feedback (0)

How to Import Negative Decimal or Floating Values to Flat File Without Losing Leading Zero


A question popped up not so long ago on the MSDN SSIS Forum from somebody struggling with a leading zero being dropped from the data being exported to a flat file to be returned back. The person asking was also interested in trimming (not rounding) the values to a certain scale.

As an aside, this issue occurs when you have decimal or floating point numbers in a database and its value is between 0 and 1 e.g. 0.445 or a negative one e.g. -0.123 which you want to export to a flat file as CSV being used as an example here.

Frankly, I have seen a lot of questions about this and the remedy seemed to be always in converting the incoming data to NUMERIC, DT_NUMERIC if to be SSIS-precise. So the expression would be like follows:

[Number] < 1 ? "0" + (DT_WSTR,20)([Number]) : (DT_WSTR,20)([Number])

Alas, this, and none of the workarounds I could find so far covered the situation when a number in the source is negative.

In other words trying to run a package on a value of -0.114 results in 0-.114 passed into the flat file, this is not good!

Then how to properly handle all the situations?

Well, let's start with defining our test data:

Incoming Data

 

 

 

 

 

 

 

 As you can see we have covered all the possible ranges that may occur. Please note that in my test table the flNumber column is defined as Float, but it may very well be a decimal.

Now, as we all know, the workaround is in converting this column value to DT_NUMERIC, let's do that by adding a Data Conversion Transformation:

Data Conversion Transformation

 

 

 

Please note two things here:

  1. We used a Copy Column to make it numeric, and
  2. Specified scale of 2 to trim (not round) any extra figures thereafter.

If we leave the package as is at this stage and you would run it, the leading zeros would not appear for those values on rows 1, 4 and 5 inside the flat file (a complete package layout to follow).

All is standard at this point, thus let's move to using the expression that adds the leading zero. We will do that by dragging and dropping a Derived Column Transformation to our DFT.

An expression in it I used covers the range between 0 and +1:

[Copy of flNumber] < 1 && [Copy of flNumber] >= 0 ? "0" + (DT_STR,18,1252)([Copy of flNumber]) : (DT_STR,18,1252)([Copy of flNumber])

 

As you can see we added a new column this time. But you may ask then what about the negative values?

And I can tell you this is a legitimate question. The answer is because you can only have one expression per column in the Derived Column Transformation we will need to add another one!

 

Its expression is

(DT_NUMERIC,18,2)[ModifiedFloat] < 0 && [ModifiedFloat] > -1 ? "-0" + REPLACE((ModifiedFloat),"-","") : (ModifiedFloat)

The above expression is more complex but covers the negative values.

Let me tell you at this point how the logic works if you have not figured it out yet:

If we get a positive number with a leading zero (0 inclided to +1) then we artificially insert in front of our number, and convert to string.

Then we proceed with another evaluation. At this time we check if the number is negative (-1 to 0), if yes, we append -0 and remove the original minus (-) sign.

The last step is adding the Flat File Destination which I guess is trivial and should not be discussed here, but I will mention that you can define your FlatFile Connection as you want, in my case, since a flat file is really data type agnostic, I used Unicode String as the data type for all my output columns. Below is a snapshot of the entire package:

Running the package produces this output:

Output

The end!

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

author: Compudicted | Posted On Monday, March 28, 2011 3:21 PM | Feedback (1)