SSIS Business Rules Engine

SSIS Business Rules Engine

 

I am by no means a data warehouse or SSIS expert but I have been forced to learn with one of my latest projects. One of the requirements that came out of the analysis and design phase was a client request for a “rules engine” within the data warehouse loader that could be “easily configured” with no code changes.

 

The solution chosen is fairly straight-forward. It required one custom-table (called BusinessRules) and relies on Reflection and the Script Component in the SSIS data flow.

 

Before I proceed I want to share so of the assumptions that were involved with the design of this functionality:

    • All the data from disparate data sources is staged into a single table (called SourceStaging) and tracked using the DataSources table.
    • The quantity of data we deal with is ‘relatively’ small so we will take an optimistic approach to memory/caching management.
    • The rules engine has to support hierarchies.
    • Example  (Company Name or First and Last Name)
    • Different data sources can have different rules applied to the same column
      • Example  Data Source 1: Requires either a Company name or a First and Last Name
      • Example Data Source 2: Requires only a Last Name

For the purposes of this sample I will use a simplified DB Structure:

 

(As a side note the SourceStaging table and DataSources table are implementation specific and aside from the tie into datasource id they do not affect the ultimate solution.)

 

The Rules Table

 

I started by creating a rules table that could be managed via a custom interface that would be built during Phase II of the project. The table consists of:

-          RuleID: The primary key of the table.

-          TargetColumnName: The name of the column on the SourceStaging table which the rule applies to.

-          AllowNullValue: Indicate whether null/empty strings are allowable values

-          ValidationExpression: An optional regular expression that will be used to validate the column value. (You may note that the AllowNullValue column could be replaced by using strictly regular expressions. We decided not to pursue this route to make the regular expressions easier to maintain.)

-          ErrorMessage: An optional user defined error message that would be meaningful when the rule is violated (i.e. Home or Business Phone is required). When no value is specified a generic message is generated (i.e. <ColumnName> invalid).

-          MinimumScore: This column is the minimum score that the rule must achieve in order for the rule to be valid. This column is used primarily for hierarchies.

-          RuleWeight: This is the value of the rule in terms of points towards the overall rule score. Again this is used primarily for hierarchy support.

-          IsParent: A flag to indicate if a rule if a top-level or parent rule.

-          NextRuleID: This is a foreign key to the RuleID that allows the package to walk the rule hierarchy.

-          DataSourceID: A foreign key that denotes the original source of the row we are working with

 

 

 

Step-by-Step Attacking the Solution

 

(I have omitted the steps necessary to create a new SSIS package, configuring the data source(s), and the adding of a data flow task to the package control flow. It should be noted that for the sample code to rule you will need to use the ADO.Net SqlClient Data Provider)

 

  1. Create and configure an Data Reader Source – To optimize performance if you are dealing with multiple data sources add an order by clauses to sort the result set by data source id.
  2. Add a ‘Derived Column’ Component and create the following working variables:

 

3. Add a Script Component (Select Transformation)

 

4. Select all the input columns and set the usage type for IsRowValid, RowScore, ValidationErrors, MinRowScore to ‘ReadWrite’.

 

 

5. Configure the Input and Outputs:

- The default input row was renamed to ‘RowInput’

- The default output was renamed to ‘RowValidOutput’ and I set the ExclusionGroup to ‘1’ so that I can filter the rows to the appropriate output.

 

 

- I added a second output called ‘RowInvalidOutput’. I set the SynchronousInputID  so that it aligned with the ‘RowValidOutput’ and set the ExclusionGroup to ‘2’.

 

6. Define a connection manager

 

7.  Design the script…

Make sure you add a reference to the ‘System.Xml’ namespace

 

Imports System

Imports System.Data

Imports System.Math

Imports System.Text

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

 

Imports System.Data.SqlClient

Imports System.Reflection

Imports System.Text.RegularExpressions

 

Public Class ScriptMain

    Inherits UserComponent

 

    Dim sqlConn As SqlConnection

    Dim ds As New DataSet

    Dim dv As New DataView

    Dim lastDataSourceID As Int32 = 0

 

    Public Overrides Sub AcquireConnections(ByVal Transaction As Object)

        Try

            Dim null As DBNull

            sqlConn = DirectCast(Connections.SandboxConnection.AcquireConnection(null), SqlConnection)

        Catch ex As Exception

            'Do you error logging...for the purpose of this

            'example we will just use the message box

            Windows.Forms.MessageBox.Show("Setup: " + ex.Message)

        End Try

    End Sub

 

    Public Overrides Sub PreExecute()

        'Load up the entire Busines Rules tables

        'In the case of the project where this was used the rules table was

        'expected to be relatively small

        Dim sqlCmd As SqlCommand = New SqlCommand("SELECT * FROM BusinessRules", sqlConn)

 

        sqlCmd.CommandType = CommandType.Text

 

        Dim adapter As New SqlDataAdapter(sqlCmd)

 

        adapter.Fill(ds, "BusinessRules")

    End Sub

 

    Public Overrides Sub RowInput_ProcessInputRow(ByVal Row As RowInputBuffer)

        Dim rowScore As Decimal = 0

        Dim minRowScore As Decimal = 0

        Dim errorMessage As New StringBuilder()

 

        'Determine if the row we are processing is the same data source as the prior row

        If Not Row.DataSourceID = lastDataSourceID Then

            'If the data source is not the same we need to reload the dataview so that

            'it contains the relevant parent-level rules for datasource we are processing

            dv = New DataView

 

            With dv

                .Table = ds.Tables("BusinessRules")

                .RowFilter = "IsParent=true AND DataSourceID=" + Row.DataSourceID.ToString()

            End With

 

            lastDataSourceID = Row.DataSourceID ' Set the lastDataSourceID for the next iteration

        End If

 

        'Determine if there are any rules to apply

        If dv.Count > 0 Then

            'Start looping through the business rules

            For Each dataRow As DataRowView In dv

                'Use reflection to get the property info

                Dim columnPropertyInfo As PropertyInfo = Row.GetType().GetProperty(dataRow("TargetColumnName").ToString())

 

                'Quick safety check

                If Not columnPropertyInfo Is Nothing Then

                    'Set-up our rule scores

                    Dim minRuleScore As Decimal = Convert.ToDecimal(dataRow("MinimumScore"))

                    Dim ruleScore As Decimal = 0

                    Dim rule As DataRowView = dataRow 'Used to walk the parent/child hierachy

                    Dim ruleErrorMessage As String

 

                    If Not String.IsNullOrEmpty(rule("ErrorMessage").ToString()) Then

                        ruleErrorMessage = rule("ErrorMessage").ToString()

                    Else

                        ruleErrorMessage = rule("TargetColumnName").ToString() + " Invalid!"

                    End If

 

                    minRowScore = minRowScore + minRuleScore 'Keep track of the row score

 

                    'Begin the loop

                    While Not rule Is Nothing And Not columnPropertyInfo Is Nothing

                        'Use Reflection to get the row value

                        Dim rowValue As Object = columnPropertyInfo.GetValue(Row, Nothing)

                        'Get the string value (in our case we need to handle multiple data types

                        Dim rowStringVal As String = GetStringFromRowValue(rowValue)

                        'Get the rule weight

                        Dim ruleWeight As Decimal = Convert.ToDecimal(rule("RuleWeight"))

 

                        'Validate the rule against the row/column value

                        If ApplyValidationRule(rule, rowStringVal) Then

                            'The column passed validation increment the rule and row score

                            ruleScore = ruleScore + ruleWeight

                            rowScore = rowScore + ruleWeight

                        End If

 

                        'Check to see if we have met our minimum score

                        If ruleScore >= minRuleScore Then

                            Exit While 'Leave the loop

                        Else

                            'Determine if there are other rules to process

                            If Not rule("NextRuleID") Is DBNull.Value Then

                                'Get the next rule

                                rule = GetNextBusinessRule(Convert.ToInt32(rule("NextRuleID")))

 

                                If Not rule Is Nothing Then

                                    'Use reflection to get the associated property info

                                    columnPropertyInfo = Row.GetType().GetProperty(rule("TargetColumnName").ToString())

                                End If

                            Else

                                Exit While 'End of the rule chain...exit the loop

                            End If

                        End If

                    End While 'End Loop

 

                    If ruleScore < minRuleScore Then 'Determine if the rule is valid

                        Row.IsRowValid = False

                        errorMessage.Append(ruleErrorMessage + ";")

                    End If

 

                    'Increment the rowScore

                    rowScore = rowScore + ruleScore

                End If

            Next

        End If

 

        'Move the working vars to the row

        Row.RowScore = rowScore

        Row.MinRowScore = minRowScore

        Row.ValidationErrors = errorMessage.ToString()

 

        If Row.IsRowValid Then 'Redirect the row to the proper output

            Row.DirectRowToRowValidOutput()

        Else

            Row.DirectRowToRowInvalidOutput()

        End If

    End Sub

 

#Region "Helper Methods"

    Private Function GetNextBusinessRule(ByVal ruleID As Int32) As DataRowView

        Dim v As New DataView

 

        'We use data views and row filters to prevent roundtrips to the DB.       

        With v

            .Table = ds.Tables("BusinessRules")

            .RowFilter = "IsParent=false AND RuleID=" + ruleID.ToString()

        End With

 

        If v.Count = 1 Then

            Return v(0)

        End If

 

        Return Nothing

    End Function

 

    Private Function ApplyValidationRule(ByVal rule As DataRowView, ByVal val As String) As Boolean

        'Determine if the value passed in is blank

        If Not String.IsNullOrEmpty(val) Then

            'Make sure we have a validation expression

            If Not String.IsNullOrEmpty(rule("ValidationExpression").ToString()) Then

                'Check for a match

                If Regex.IsMatch(val, rule("ValidationExpression").ToString()) = False Then

                    Return False

                End If

            End If

        Else

            'If the rule does not allow for nulls, we need to fail the rule

            If Convert.ToBoolean(rule("AllowNullValue")) = False Then

                Return False

            End If

        End If

 

        Return True

    End Function

 

    'This method is basically used to handle columns that are not of type string

    Private Function GetStringFromRowValue(ByVal rowValue As Object) As String

        Dim rowStringVal As String = String.Empty

 

        If Not rowValue Is Nothing Then

            Select Case rowValue.GetType.Name

                Case "String", "Int32", "Boolean", "Byte"

                    rowStringVal = DirectCast(rowValue, String)

                Case "DateTime"

                    rowStringVal = CDate(rowValue).ToString("MM/dd/yyyy")

                Case Else

                    'Do nothing

            End Select

        End If

 

        Return rowStringVal

    End Function

#End Region

 

End Class

The SSIS Package Data Flow

 

 

 As always comments, suggestions and constructive criticism is always welcomed and appreciated!

 

Tight Lines!! -- Chris

Credits: There are too many sources to thank and credit for this article. I'm am basically learning as I go with SSIS and VB.Net (my primarily language is C#) and I make extensive use of online resources. If you see something egregious that should be cited please let me know.

 

Print | posted @ Friday, March 07, 2008 11:01 AM

Comments on this entry:

Gravatar # re: SSIS Business Rules Engine
by John Parker at 11/20/2008 4:53 AM

Thanks for the example - saved me quite a bit of time recently.

For my purposes adding an extra couple of outputs was really useful - RowExceptionOutput (allowing the user to flag data problems) and RowExclusionOutput (allowing the user to exclude certain data).
Gravatar # re: SSIS Business Rules Engine
by Paul at 7/21/2010 5:47 AM

Chris,
Has this method ever been expanded to use things other than regular expressions - like checking whether a new item already exists in another dataset? (sql expressions)
Paul
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification: