SSIS Business Rules Engine
I am by no means a data warehouse or SSIS expert but I have been forced to learn with one of my latest projects. One of the requirements that came out of the analysis and design phase was a client request for a “rules engine” within the data warehouse loader that could be “easily configured” with no code changes.
The solution chosen is fairly straight-forward. It required one custom-table (called BusinessRules) and relies on Reflection and the Script Component in the SSIS data flow.
Before I proceed I want to share so of the assumptions that were involved with the design of this functionality:
- All the data from disparate data sources is staged into a single table (called SourceStaging) and tracked using the DataSources table.
- The quantity of data we deal with is ‘relatively’ small so we will take an optimistic approach to memory/caching management.
- The rules engine has to support hierarchies.
- Example (Company Name or First and Last Name)
- Different data sources can have different rules applied to the same column
- Example Data Source 1: Requires either a Company name or a First and Last Name
- Example Data Source 2: Requires only a Last Name
For the purposes of this sample I will use a simplified DB Structure:
(As a side note the SourceStaging table and DataSources table are implementation specific and aside from the tie into datasource id they do not affect the ultimate solution.)
The Rules Table
I started by creating a rules table that could be managed via a custom interface that would be built during Phase II of the project. The table consists of:
- RuleID: The primary key of the table.
- TargetColumnName: The name of the column on the SourceStaging table which the rule applies to.
- AllowNullValue: Indicate whether null/empty strings are allowable values
- ValidationExpression: An optional regular expression that will be used to validate the column value. (You may note that the AllowNullValue column could be replaced by using strictly regular expressions. We decided not to pursue this route to make the regular expressions easier to maintain.)
- ErrorMessage: An optional user defined error message that would be meaningful when the rule is violated (i.e. Home or Business Phone is required). When no value is specified a generic message is generated (i.e. <ColumnName> invalid).
- MinimumScore: This column is the minimum score that the rule must achieve in order for the rule to be valid. This column is used primarily for hierarchies.
- RuleWeight: This is the value of the rule in terms of points towards the overall rule score. Again this is used primarily for hierarchy support.
- IsParent: A flag to indicate if a rule if a top-level or parent rule.
- NextRuleID: This is a foreign key to the RuleID that allows the package to walk the rule hierarchy.
- DataSourceID: A foreign key that denotes the original source of the row we are working with
Step-by-Step Attacking the Solution
(I have omitted the steps necessary to create a new SSIS package, configuring the data source(s), and the adding of a data flow task to the package control flow. It should be noted that for the sample code to rule you will need to use the ADO.Net SqlClient Data Provider)
- Create and configure an Data Reader Source – To optimize performance if you are dealing with multiple data sources add an order by clauses to sort the result set by data source id.
- Add a ‘Derived Column’ Component and create the following working variables:
3. Add a Script Component (Select Transformation)
4. Select all the input columns and set the usage type for IsRowValid, RowScore, ValidationErrors, MinRowScore to ‘ReadWrite’.
5. Configure the Input and Outputs:
- The default input row was renamed to ‘RowInput’
- The default output was renamed to ‘RowValidOutput’ and I set the ExclusionGroup to ‘1’ so that I can filter the rows to the appropriate output.
- I added a second output called ‘RowInvalidOutput’. I set the SynchronousInputID so that it aligned with the ‘RowValidOutput’ and set the ExclusionGroup to ‘2’.
6. Define a connection manager
7. Design the script…
Make sure you add a reference to the ‘System.Xml’ namespace
Imports System
Imports System.Data
Imports System.Math
Imports System.Text
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Data.SqlClient
Imports System.Reflection
Imports System.Text.RegularExpressions
Public Class ScriptMain
Inherits UserComponent
Dim sqlConn As SqlConnection
Dim ds As New DataSet
Dim dv As New DataView
Dim lastDataSourceID As Int32 = 0
Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
Try
Dim null As DBNull
sqlConn = DirectCast(Connections.SandboxConnection.AcquireConnection(null), SqlConnection)
Catch ex As Exception
'Do you error logging...for the purpose of this
'example we will just use the message box
Windows.Forms.MessageBox.Show("Setup: " + ex.Message)
End Try
End Sub
Public Overrides Sub PreExecute()
'Load up the entire Busines Rules tables
'In the case of the project where this was used the rules table was
'expected to be relatively small
Dim sqlCmd As SqlCommand = New SqlCommand("SELECT * FROM BusinessRules", sqlConn)
sqlCmd.CommandType = CommandType.Text
Dim adapter As New SqlDataAdapter(sqlCmd)
adapter.Fill(ds, "BusinessRules")
End Sub
Public Overrides Sub RowInput_ProcessInputRow(ByVal Row As RowInputBuffer)
Dim rowScore As Decimal = 0
Dim minRowScore As Decimal = 0
Dim errorMessage As New StringBuilder()
'Determine if the row we are processing is the same data source as the prior row
If Not Row.DataSourceID = lastDataSourceID Then
'If the data source is not the same we need to reload the dataview so that
'it contains the relevant parent-level rules for datasource we are processing
dv = New DataView
With dv
.Table = ds.Tables("BusinessRules")
.RowFilter = "IsParent=true AND DataSourceID=" + Row.DataSourceID.ToString()
End With
lastDataSourceID = Row.DataSourceID ' Set the lastDataSourceID for the next iteration
End If
'Determine if there are any rules to apply
If dv.Count > 0 Then
'Start looping through the business rules
For Each dataRow As DataRowView In dv
'Use reflection to get the property info
Dim columnPropertyInfo As PropertyInfo = Row.GetType().GetProperty(dataRow("TargetColumnName").ToString())
'Quick safety check
If Not columnPropertyInfo Is Nothing Then
'Set-up our rule scores
Dim minRuleScore As Decimal = Convert.ToDecimal(dataRow("MinimumScore"))
Dim ruleScore As Decimal = 0
Dim rule As DataRowView = dataRow 'Used to walk the parent/child hierachy
Dim ruleErrorMessage As String
If Not String.IsNullOrEmpty(rule("ErrorMessage").ToString()) Then
ruleErrorMessage = rule("ErrorMessage").ToString()
Else
ruleErrorMessage = rule("TargetColumnName").ToString() + " Invalid!"
End If
minRowScore = minRowScore + minRuleScore 'Keep track of the row score
'Begin the loop
While Not rule Is Nothing And Not columnPropertyInfo Is Nothing
'Use Reflection to get the row value
Dim rowValue As Object = columnPropertyInfo.GetValue(Row, Nothing)
'Get the string value (in our case we need to handle multiple data types
Dim rowStringVal As String = GetStringFromRowValue(rowValue)
'Get the rule weight
Dim ruleWeight As Decimal = Convert.ToDecimal(rule("RuleWeight"))
'Validate the rule against the row/column value
If ApplyValidationRule(rule, rowStringVal) Then
'The column passed validation increment the rule and row score
ruleScore = ruleScore + ruleWeight
rowScore = rowScore + ruleWeight
End If
'Check to see if we have met our minimum score
If ruleScore >= minRuleScore Then
Exit While 'Leave the loop
Else
'Determine if there are other rules to process
If Not rule("NextRuleID") Is DBNull.Value Then
'Get the next rule
rule = GetNextBusinessRule(Convert.ToInt32(rule("NextRuleID")))
If Not rule Is Nothing Then
'Use reflection to get the associated property info
columnPropertyInfo = Row.GetType().GetProperty(rule("TargetColumnName").ToString())
End If
Else
Exit While 'End of the rule chain...exit the loop
End If
End If
End While 'End Loop
If ruleScore < minRuleScore Then 'Determine if the rule is valid
Row.IsRowValid = False
errorMessage.Append(ruleErrorMessage + ";")
End If
'Increment the rowScore
rowScore = rowScore + ruleScore
End If
Next
End If
'Move the working vars to the row
Row.RowScore = rowScore
Row.MinRowScore = minRowScore
Row.ValidationErrors = errorMessage.ToString()
If Row.IsRowValid Then 'Redirect the row to the proper output
Row.DirectRowToRowValidOutput()
Else
Row.DirectRowToRowInvalidOutput()
End If
End Sub
#Region "Helper Methods"
Private Function GetNextBusinessRule(ByVal ruleID As Int32) As DataRowView
Dim v As New DataView
'We use data views and row filters to prevent roundtrips to the DB.
With v
.Table = ds.Tables("BusinessRules")
.RowFilter = "IsParent=false AND RuleID=" + ruleID.ToString()
End With
If v.Count = 1 Then
Return v(0)
End If
Return Nothing
End Function
Private Function ApplyValidationRule(ByVal rule As DataRowView, ByVal val As String) As Boolean
'Determine if the value passed in is blank
If Not String.IsNullOrEmpty(val) Then
'Make sure we have a validation expression
If Not String.IsNullOrEmpty(rule("ValidationExpression").ToString()) Then
'Check for a match
If Regex.IsMatch(val, rule("ValidationExpression").ToString()) = False Then
Return False
End If
End If
Else
'If the rule does not allow for nulls, we need to fail the rule
If Convert.ToBoolean(rule("AllowNullValue")) = False Then
Return False
End If
End If
Return True
End Function
'This method is basically used to handle columns that are not of type string
Private Function GetStringFromRowValue(ByVal rowValue As Object) As String
Dim rowStringVal As String = String.Empty
If Not rowValue Is Nothing Then
Select Case rowValue.GetType.Name
Case "String", "Int32", "Boolean", "Byte"
rowStringVal = DirectCast(rowValue, String)
Case "DateTime"
rowStringVal = CDate(rowValue).ToString("MM/dd/yyyy")
Case Else
'Do nothing
End Select
End If
Return rowStringVal
End Function
#End Region
End Class
The SSIS Package Data Flow

As always comments, suggestions and constructive criticism is always welcomed and appreciated!
Tight Lines!! -- Chris
Credits: There are too many sources to thank and credit for this article. I'm am basically learning as I go with SSIS and VB.Net (my primarily language is C#) and I make extensive use of online resources. If you see something egregious that should be cited please let me know.