Chris Falter

.NET Design and Best Practices
posts - 41, comments - 74, trackbacks - 24

My Links

News

All source code published on this blog is placed in the public domain.

Archives

Post Categories

Image Galleries

About Me

Wednesday, December 30, 2009

How to Use Extension Methods and Lambda Expressions to Write Elegant Unit Tests

Suppose you have to write a test to verify that an edit form will throw a certain exception under certain conditions.  If you don't use VB 9 features, you might end up with this effusion of verbosity:
 
        Try
            editForm.Save(MyDataContext)
            Throw New AssertFailedException("Save should have thrown an exception")
        Catch ex As Exception
            If Not (ex.GetType().Equals(GetType(BusinessOperationException))) Then
                Dim errorMessage = String.Format("Save should should have thrown BusinessOperationException; threw {0} instead", ex.GetType())
                Throw New AssertFailedException(errorMessage)
            End If
            Dim bopException = DirectCast(ex, BusinessOperationException)
            Assert.IsTrue(bopException.DataFormErrorInfo.SQLErrorInfo.ErrorMessage.Contains("ERR_MEMBERSPECIFIEDMEMBERS_INVALID"))
        End Try
 
Of course, you could make the code a little simpler by not handling any exceptions of the wrong type, which would implicitly cause the test to fail if an exception of the wrong type is thrown:
 
         Try
             editForm.Save(MyDataContext) 
             Throw New AssertFailedException("Save should have thrown an exception")
         Catch ex As BusinessOperationException
             ' exceptions of other types are not handled 
             Assert.IsTrue(ex.DataFormErrorInfo.SQLErrorInfo.ErrorMessage.Contains("ERR_MEMBERSPECIFIEDMEMBERS_INVALID"))
         End Try
 
While this is better, the code is not without problems:
  • The reader of this code has to infer that exceptions of a type other than BusinessOperationException will fail cause a test failure.  Yes, you can mitigate this by including a helpful comment--if you remember.  I'd rather not have to remember.
  • The intention of the test is buried inside the mechanics of the try/catch block.
  • The assertion about the state of the BusinessOperationException instance is not easy to follow, since the reader of the code has to walk a lengthy property hierarchy (DataFormErrorInfo => SQLErrorInfo => ErrorMessage).  This is a glaring violation of the Law of Demeter.  In addition, it is not self-evident from reading the code that DataFormErrorInfo.SQLErrorInfo.ErrorMessage would naturally be the location where one would find the argument to a SQL raiserror call.
Prior to VB 9 it would have been very difficult to write this code with a smoother syntax due to the limitations of the language.  But now we can solve these problems by using lambda expressions and extension methods, as follows:
 
        Dim saveForm As Action = Function() Me.Save(editForm)
        saveForm.ShouldThrow(Of BusinessOperationException).Where(Function(e) e.DBExceptionCode.Contains("ERR_MEMBERSPECIFIEDMEMBERS_INVALID"))
 
Here the fact that the form's .Save() method should throw a BusinessOperationException with a "ERR_MEMBERSPECIFIEDMEMBERS_INVALID" exception code walks up to the reader and punches him in the face.  And that's what we want all of our code to do; pugnacious code is good code!  Moreover, the ShouldThrow() and Where() methods will emit useful, detailed messages for any error conditions.  Let's see how VB 9 helps us to work this magic.

Extension Methods

An extension method acts as if it belongs to a type; thus you may create an extension method for a type even if you do not possess the type's source code.  Without extension methods, your only choice is to create a shared helper method that takes a type instance as a parameter.  For example, in earlier VB versions you could have written a shared Throws() method that takes an Action delegate as a parameter.  However, the extension method has greater clarity and readability:
 
Extension Methods vs. Shared/Static Methods
Coding Technique Extension Method Shared Method
Syntax Subject + Verb + arguments/modifiers Verb + list of arguments
Example saveForm.ShouldThrow(Of BusinessOperationException) Throws(Of BusinessOperationException)(saveForm)
Readability Close to standard English - very readable

Completely at odds with English grammar.  And when there are many arguments, which is the subject and which are modifiers?

 
 
Ultimately an extension method is syntactical sugar; then again, a little sugar can help you bake some wonderful recipes.  So let's check it out.
 
In the solution code above, the generic ShouldThrow(Of TException) method is as an extension method to the .NET Action delegate.  Let me pause for a moment to let that concept sink in; we are writing code that extends a built-in .NET type.  And that type isn't even a class; it's a delegate!  Clearly, extension methods open up a new realm of possibilities to us.  Decorating a standard shared method with the System.Runtime.CompilerServices.ExtensionAttribute turns it into a method that behaves as if it were a member of the type of its first argument.  Thus the first argument to the ShouldThrow method in Blackbaud.AppFx.UnitTesting/AssertExtensions.vb is of type Action delegate:
 
    <Extension()> _
    Public Function ShouldThrow(Of TException As Exception)(ByVal workToDo As Action) As TException
        Dim errorMessage As New StringBuilder("An exception of type ")
        errorMessage.Append(GetType(TException))
        errorMessage.Append(" should have been thrown, but wasn't.  Test method info follows...")
        errorMessage.AppendLine()
        errorMessage.AppendCallingMethodInfo()
        Return ShouldThrow(Of TException)(workToDo, errorMessage.ToString())
    End Function
 
    <Extension()> _
    Public Function ShouldThrow(Of TException As Exception)(ByVal workToDo As Action, ByVal errorMessage As String) As TException
        Try
            workToDo.Invoke()
            Throw New AssertFailedException(errorMessage)
        Catch ex As Exception
            If (GetType(TException).Equals(ex.GetType)) Then
                Return ex
            End If
            Throw
        End Try
    End Function
 
The .ShouldThrow() method encapsulates all of the try/catch logic that had obfuscated the earlier versions of our test code.  Just specify the type of the exception that should be thrown when you call it, and it will do one of three things:
  • return the thrown exception of the desired type,
  • rethrow a thrown exception of the wrong type, or
  • throw its own exception if an exception is not thrown.
In VB an extension method must belong to a module.  (C# uses a static class.)  After you compile the module containing the ShouldThrow() method, the .NET languages will allow you to use it as if it were actually a member of the Action delegate type.  An extension method does have a limitation that true members of a type do not have: it cannot access the non-public members of the type.  Alas, even magic has limits.
 
Another nice aspect of writing an extension method is that it will even show up in Intellisense!   Already the advantage of the the extension method is clear; if you were using an older-style shared method, Intellisense would not list it as a candidate for use with the Action delegate saveAction.  
 
While an overload of the ShouldThrow() method is available for those who wish to provide a custom error message, the default (parameterless) overload uses the AppendCallingMethodInfo() method to incorporate the name, source file, and line number of the calling test method into a useful error message.  I recommend using the default overload of ShouldThrow() for most situations, since it produces a very useful error message while keeping your code's syntax as simple as possible.  The alert reader will note that AppendCallingMethodInfo() is implemented as an extension method to the StringBuilder class; the curious may read through AssertExtensions.vb source file to see how the info is extracted from a stack trace.
 
The AssertExtensions module also contains the .Where() extension method overloads which, as we see from their first parameter, can be called on a generic exception.
 
 
    <Extension()> _
    Public Function Where(Of TException As Exception) _
    (ByVal ex As TException, ByVal predicate As Func(Of TException, Boolean)) _
    As Boolean
 
        Dim errorMessage As New StringBuilder("The " + GetType(TException).ToString() + " did not satisfy the Where predicate.  Test method info follows...")
        errorMessage.AppendLine()
        errorMessage.AppendCallingMethodInfo()
        Return Where(ex, predicate, errorMessage.ToString())
 
    End Function
 
    <Extension()> _
    Public Function Where(Of TException As Exception) _
    (ByVal ex As TException, ByVal predicate As Func(Of TException, Boolean), ByVal errorMessage As String) _
    As Boolean
 
        If Predicate(ex) Then
            Return True
        Else
            Throw New AssertFailedException(errorMessage)
        End If
 
    End Function
 
The Func keyword, new in VB 9, denotes a generic function delegate whose signature is determined by the types of the parameters.  The last parameter is the return type of the delegate, and the leading parameters before it (if any) specify the types of the input parameters. Applying this syntax to our code, we see that the parameter ByVal predicate As Func(Of TException, Boolean) denotes a delegate for a function that takes a generic TException argument and returns Boolean.   The Where() method invokes the delegate and takes care of throwing an AssertFailedException with an appropriate message if the delegate returns False.  You will recall that the sample code used a lambda expression to create the delegate invoked by Where().
 
Finally, the ability to write an extension method allows us to obey the Law of Demeter, even though we cannot modify the BBSolder-generated BusinessOperationException class.  In the unit test file, just define an extension method that traverses the properties:
 
Friend Module BusinessOperationExceptionExtensions
 
    <Extension()> _
    Friend Function DBExceptionCode(ByVal ex As BusinessOperationException) As String
        Return ex.DataFormErrorInfo.SQLErrorInfo.ErrorMessage
    End Function
 
End Module

Now for any BusinessOperationException ex, you can just call ex.DbExceptionCode instead of traversing the various properties.  Sweet!

Lambda Expressions

According to MSDN, a lambda expression is "a function without a name that calculates and returns a single value."  Think of a lambda expression as an executable block of code that is defined in-line, rather than in a function that must be defined somewhere else.  A lambda expression can also make use of variables that are in scope at the point where it is defined.  Because a lambda expression helps you keep the flow of logic in one place and avoid the ceremony of passing parameters, it can serve as a powerful tool for simplifying your code.
 
You will recall that in our sample code, the .Where extension method processes a lambda expression:
 
.Where(Function(e) e.DBExceptionCode.Contains("ERR_MEMBERSPECIFIEDMEMBERS_INVALID"))
 
The Function keyword indicates that the expression which follows is a lambda.  Behind the scenes, the VB compiler will do the following:
  • build an anonymous function that takes a parameter of type BusinessOperationException and returns Boolean; the lambda expression then becomes the body of the function. 
  • infer that the generic type TException declared in the .Where() method definition will be of type BusinessOperationException in this block of code. 
  • build a delegate to that anonymous function, and then pass that delegate as an argument to the .Where extension method, which has a ByVal predicate As Func(Of TException, Boolean) parameter. 
And you thought you worked hard!  Well, why not let the compiler do the hard work for you?  You could think of a lambda expression as a way to get the compiler to handle all the plumbing chores of function bodies and delegates, while leaving just the essence of the logic visible in your code. 

Conclusion

Extension methods and lambda expressions can help you write code that is more succinct, more readable, and more powerful.  According to Microsoft's Eric White, "Functional Programming (FP) has the potential to reduce program line count by 20% to 50%, reduce bugs and increase robustness, and move us in the direction of taking advantage of multiple core CPUs."  In this article we saw how extension methods and lambda expressions, which are important parts of the VB functional programming toolkit, made the sample test code more readable and succinct.  Of course these new techniques are not a golden hammer that can be used everywhere; nevertheless, be alert for opportunities to use them in the test code you write.

Further Reading

Source Listing for AssertExtensions.vb

Imports System.Diagnostics
Imports System.Runtime.CompilerServices
Imports System.Text
 
Public Module AssertExtensions
 
    ''' <summary>
    ''' Verifies that a specific method call throws an exception of a specific type
    ''' </summary>
    ''' <typeparam name="TException">The type of exception the method call is expected to throw</typeparam>
    ''' <param name="workToDo">delegate to method call which is expected to throw an exception</param>
    ''' <returns>
    ''' instance of exception thrown by the Action
    ''' </returns>
    ''' <remarks>
    ''' This method will re-throw any exceptions of a type other than that designated by the generic TException.
    ''' If the Action does not throw any exceptions, this method will throw an AssertFailedException.
    ''' </remarks>
    <Extension()> _
    Public Function ShouldThrow(Of TException As Exception)(ByVal workToDo As Action) As TException
        Dim errorMessage As New StringBuilder("An exception of type ")
        errorMessage.Append(GetType(TException))
        errorMessage.Append(" should have been thrown, but wasn't.  Test method info follows...")
        errorMessage.AppendLine()
        errorMessage.AppendCallingMethodInfo()
        Return ShouldThrow(Of TException)(workToDo, errorMessage.ToString())
    End Function
 
    ''' <summary>
    ''' Verifies that a specific method call throws an exception of a specific type
    ''' </summary>
    ''' <typeparam name="TException">The type of exception the method call is expected to throw</typeparam>
    ''' <param name="workToDo">delegate to method call which is expected to throw an exception</param>
    ''' <param name="errorMessage">user-defined error message if expected exception not thrown</param>
    ''' <returns>
    ''' instance of exception thrown by the Action
    ''' </returns>
    ''' <remarks>
    ''' This method will re-throw any exceptions of a type other than that designated by the generic TException.
    ''' If the Action does not throw any exceptions, this method will throw an AssertFailedException.
    ''' </remarks>
    <Extension()> _
    Public Function ShouldThrow(Of TException As Exception)(ByVal workToDo As Action, ByVal errorMessage As String) As TException
        Try
            workToDo.Invoke()
            Throw New AssertFailedException(errorMessage)
        Catch ex As Exception
            If (GetType(TException).Equals(ex.GetType)) Then
                Return ex
            End If
            Throw
        End Try
    End Function
 
    <Extension()> _
    Public Sub AppendCallingMethodInfo(ByVal sb As StringBuilder, Optional ByVal howManyFramesBack As Integer = 1)
        ' how far back is from the perspective of the caller of this method. +1 gets us back to the frame of 
        ' this method's caller, then we continue from there
        howManyFramesBack += 1
        Dim st As New StackTrace(fNeedFileInfo:=True)
        Dim sf = st.GetFrame(howManyFramesBack)
        sb.Append("Test method: ")
        sb.Append(sf.GetMethod())
        sb.AppendLine()
        sb.Append("FileName: ")
        sb.Append(sf.GetFileName())
        sb.AppendLine()
        sb.Append("Line Number: ")
        sb.Append(sf.GetFileLineNumber())
    End Sub
 
    ''' <summary>
    ''' Verifies that an exception satisfies a predicate
    ''' </summary>
    ''' <typeparam name="TException">The type of exception against which the predicate is applied</typeparam>
    ''' <param name="ex">The exception against which the predicate is applied (passed by compiler)</param>
    ''' <param name="predicate">The condition which the exception must satisfy</param>
    ''' <returns>True if exception satisfies predicate, otherwise an AssertFailedException is thrown</returns>
    ''' <remarks>
    ''' Unlike the IEnumerable(Of T).Where extension method, this method does not return a subset from a set.  
    ''' Instead it verifies that a particular instance of an exception satisfies a predicate.  This method is 
    ''' designed to be used in conjunction with ShouldThrow to provide a fluent expression for a test.
    ''' <example>
    ''' Dim myAction As Action = (Function() MyMethod(myData))
    ''' myAction.ShouldThrow(Of SpecializedException)().Where(Function(ex) ex.SpecialProperty = "Hello")
    ''' </example>
    ''' </remarks>
    <Extension()> _
    Public Function Where(Of TException As Exception) _
    (ByVal ex As TException, ByVal predicate As Func(Of TException, Boolean)) _
    As Boolean
 
        Dim errorMessage As New StringBuilder("The " + GetType(TException).ToString() + " did not satisfy the Where predicate.  Test method info follows...")
        errorMessage.AppendLine()
        errorMessage.AppendCallingMethodInfo()
        Return Where(ex, predicate, errorMessage.ToString())
 
    End Function
 
    ''' <summary>
    ''' Verifies that an exception satisfies a predicate, and throws exception with user-supplied message if not
    ''' </summary>
    ''' <typeparam name="TException">The type of exception against which the predicate is applied</typeparam>
    ''' <param name="ex">The exception against which the predicate is applied (passed by compiler)</param>
    ''' <param name="predicate">The condition which the exception must satisfy</param>
    ''' <param name="errorMessage">Message of AssertFailedException thrown if predicate fails</param>
    ''' <returns>True if exception satisfies predicate, otherwise an AssertFailedException is thrown</returns>
    ''' <remarks>
    ''' Unlike the IEnumerable(Of T).Where extension method, this method does not return a subset from a set.  
    ''' Instead it verifies that a particular instance of an exception satisfies a predicate.  This method is 
    ''' designed to be used in conjunction with ShouldThrow to provide a fluent expression for a test.
    ''' <example>
    ''' Dim myAction As Action = (Function() MyMethod(myData))
    ''' myAction.ShouldThrow(Of MyException)().Where(Function(ex) ex.MyProp = "Hello", "MyProp wasn't 'Hello')
    ''' </example>
    ''' </remarks>
    <Extension()> _
    Public Function Where(Of TException As Exception) _
    (ByVal ex As TException, ByVal predicate As Func(Of TException, Boolean), ByVal errorMessage As String) _
    As Boolean
 
        If Predicate(ex) Then
            Return True
        Else
            Throw New AssertFailedException(errorMessage)
        End If
 
    End Function
 
End Module

 

posted @ Wednesday, December 30, 2009 3:23 PM | Feedback (0) | Filed Under [ Coding Practices and Design Patterns Testing & Debugging ]

Sunday, March 29, 2009

How to Reuse Code Without Creating an Implicit API

I work for an ISV, and we have to be very cautious about the code that we declare with public scope.  If it's public, a customer can treat it as an API, which might not always be a good idea if the code is really just doing some internal task in a way that could easily change as we improve our software.  In other words, not all that is public should be an API.

So what do you do if you need to share code across assembly boundaries?  Recently I was writing some unit tests for a class, and the tests needed access to a class method in order to imitate (and test) its behavior.  Of course this method was private--of course!--so it appeared that I had to choose between 2 very unpalatable options:

  1. Give the method public scope, which would make it visible to all our customers' implementations.  Essentially, this option would make the method part of our API.  Not happening.
  2. Copy the 20 lines of code from the method, paste it into my new class, and supplicate the dark spirits of bad software design to supernaturally prevent the method from ever being modified.  But how could I perform the ritual without an eye of newt and toe of frog?*

But appearances can be deceiving.  It turns out that the InternalsVisibleToAttribute, introduced by Microsoft in .NET 2.0, solves the problem quite neatly.  Using this attribute, you can designate one or more friend assemblies that will be allowed access to classes, methods and properties in the target assembly that have friend scope, while all other code--including customer code--is denied access.

Using the attribute is a piece of cake:

  1. Mark any classes/methods/properties that need to be visible to other assemblies with "Friend" scope.
  2. Import the System.Runtime.CompilerServices namespace into your assembly's AssemblyInfo.vb file.

    Imports System.Runtime.InteropServices

  3. In AssemblyInfo.vb, mark the assembly with the InternalsVisibleToAttribute for each friend assembly that should have access to the friend scope.

    <Assembly: InternalsVisibleTo("MySystem.Assembly1")>
    <Assembly: InternalsVisibleTo("MySystem.Assembly2")>

If you are working in C#, you will make your changes in the AssemblyInfo.cs file for the assembly that contains the code to be re-used, and you will use "using" instead of "Imports."

So I could have my cake (re-use of existing code) and shield it from inappropriate eyes at the same time.  While this technique is obviously useful for writing tests, it can be applied anywhere that code needs to be shared between your system's assemblies without exposing it customers as part of an API. 

* Not to mention the wool of bat and tongue of dog.

 

posted @ Sunday, March 29, 2009 2:44 PM | Feedback (0) | Filed Under [ Coding Practices and Design Patterns Testing & Debugging ]

Thursday, February 05, 2009

Simplicity vs. Performance

A colleague and I had a recent discussion about two versions of a trigger.  One was simple, containing a single query.  The other performed 14% better, but was a lot more complex--double the lines of code, and a branch that had to be evaluated.  My colleague advocated for checking in the more complex version right from the start, while I advocated for starting with simplicity.  What do you think?  Post a comment!

Here's how our discussion went:

COLLEAGUE:

Is your first example more than 14% more maintainable than the second example?  If not (which is my opinion, since there's really only one line, the "select inserted.NAME +..." line, that would change over time; the rest is boilerplate code), I think I'd go with the second for the performance increase.

ME:

My experience has been that it's a lot more expensive for an organization to maintain complex code than it is to tweak performance.  Additional complexity tends to work like Adam Smith's invisible hand--except that this invisible hand works for evil, rather than for good.  It makes staff spend more time trying to figure out how to fix bugs and add new functionality than they would with simpler code.  It makes the software organization slower to respond to customer requests and changing market conditions. 

The problem is that this complexity tax that the organization has to keep paying cycle after cycle is truly invisible because it is hard to quantify (unlike a small bump in performance).  But ask any developer whether s/he has had to spend significant extra time in development or troubleshooting because the code base was complex (i.e., whether s/he has had to pay the code complexity tax), and invariably the answer is yes, a thousand times yes.

Granted, a single trigger that improves performance by 14% itself will not cause all that hardship.  It's the cumulative effect of millions of lines of code that are a little too complex that make the development process harder and more expensive than it has to be/should be.

I guess the other way of pitching this philosophy is that if you tend to choose slightly better performance over simpler code,  when it comes time to optimize the system you're asking the question: which of these millions of lines of code could be simplified at only a small performance cost?  And we don't have the tools to answer that question very well.  OTOH, if we tend to choose simpler code over slightly better performance, then optimization means we're asking: where are the bottlenecks where we can get a big performance boost at a small cost in code complexity?  And we have a variety of profiling tools that help us answer that question very well.

COLLEAGUE:

I think you're letting a general philosophy get in the way of optimizing the one example you've shown.  A 14% improvement in performance is more than enough reason to have branching code that's only slightly less maintainable than the non-branching code.  I just can't imagine a scenario where a maintenance programmer sits down to fix a bug in this trigger and is so completely confounded by its complexity that fixing it would bring down the entire organization as you have suggested.
ME:

Actually, the maintainability difference is quite dramatic.  You're buying the slightly better performance at a cost of doubling the lines of code and doubling the cyclomatic complexity. 

Granted, this loss of maintainability occurs over a tiny portion of the code base.  On the other hand, the performance gain, measured as a percentage of total transactions against the database, is essentially imperceptible as well.  I.e., it's not as if a 14% perf gain in one of the most lightly used triggers among tens of thousands will amount to more than a sneeze.

What I'm saying is that the best strategy to optimize performance is to start with a good, simple code base, find the biggest bottlenecks in a realistic load test, and optimize just those code sections.  Usually, the vast majority of perf gains emerge from a small number of trade-offs against simplicity.

The opposite approach (always choosing performance over simplicity during initial development) results in getting to market more slowly (which has a cost), and--through the accretion of thousands of small complexities--a gnarly code base that is expensive and time-consuming to maintain.  The cumulative effect of small decisions makes a big difference.

If you look at this scenario in the small--one decision about one trigger--I can understand why you might choose performance over simplicity.  Now multiply that bias for perf over simplicity across dozens of developers writing code every day, and you end up with an entire system that is twice as expensive and time-consuming to maintain.  And the system would have lower quality to boot, since when you multiply lines of code and cyclomatic complexity, quality is pretty much lost at an exponential rate.  Such a loss of quality is surely expensive to an organization over the long-term.

That's why I advocate choosing simplicity as a matter of initial development.  Then as a QA process, look for the performance optimizations that will yield the biggest gains, and implement just those.  The 80/20 rule will apply: you'll get 80% of the performance gains, while giving up only 20% of the simplicity.

The problem of trying to go in the reverse direction is that we don't have very good tools for identifying which code simplifications can be purchased at the lowest performance expense.  So you just give up, and end up bearing all the expenses of a code base which is twice as big and twice as complex.  And those expenses are far, far greater than the expense of system performance that is a few percent slower.

posted @ Thursday, February 05, 2009 5:33 AM | Feedback (1) |

Monday, October 06, 2008

How To Search Encrypted Text in SQL Server 2005/2008


Recently I was discussing SQL Server encryption with some friends who have been using it to encrypt short strings such as Social Security numbers at their shop.  I commented, "Just try searching those Social Security Numbers," they shared my lamentation, and we moved on to other subjects.  Later that evening, though, I thought there must be a way to search those wretched encrypted blocks--somehow--and worked out the solution you are about to read. 

The Problem

The difficulty lies in the fact that you cannot compare varbinaries in a predicate.  Combine this with the fact that encrypted text is stored as a varbinary, and the result is that you cannot search the varbinary encrypted text, even if you run the search term through the encryption algorithm before performing the comparison. 

To illustrate, suppose you have a Person table that stores a name and a Social Security Number (SSN) encrypted with a certificate, defined like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

It would be nice if you could just encrypt the search text and build a predicate that compares the term and the encrypted column, like this:

declare @encryptedSSN varbinary(128);
set @encryptedSSN = ENCRYPTBYCERT(CERT_ID('certKey'), @ssn);
select name, CONVERT(char(9),DECRYPTBYCERT(CERT_ID('certKey'), SSN) as SSN
from dbo.Person
where ssn = @encryptedSSN;

As I mentioned previously, though, this approach simply does not work. 

Of course, you could make the predicate succeed by setting the datatype of the encrypted text column to char(128), and then converting the encryption products (stored SSNs plus the search term) from varbinary to char(128) before using them.  However, indexing a column that large, and performing comparisons with it, would consume a lot of disk and processing resources.

The Solution

Instead of converting the large encrypted block into char(128) data, you can work with the hash of the SSN instead.  If you calculate the hash using the SHA1 algorithm, the hash will require only 20 bytes of storage, so the burden on disk and processing resources will drop significantly.  And since the SHA1 algorithm is considered to be very secure, you will not add any significant security risk to your solution.  From start to finish, here's how the approach will work:

1. Include a char(20) column in your table to hold the hash of the SSN.  The new table definition will look like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
  [Hash] [char](20) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

You should also index the Hash column to improve the performance of searches by SSN, of course. 

2. Whenever you insert a record, store the hash of the SSN along with the encrypted bytes.  Your insert stored procedure should use the built-in HASHBYTES function, like this:

CREATE PROCEDURE [dbo].[usp_Insert_Person]
  @ssn
char(9),
  @name varchar(50)
AS
BEGIN

  insert into dbo.Person(SSN, Name,Hash)
  values(ENCRYPTBYCERT(CERT_ID('certKey'), @ssn), @name, convert(char(20), HASHBYTES('SHA1', @ssn)))
END

3. When you search by SSN, compare the search term hash to the stored hash in your predicate:

CREATE PROCEDURE usp_Search_Person_BySsn
  @ssn char(9)
AS
BEGIN

  --SET NOCOUNT ON added to prevent extra result sets from
  --interfering with SELECT statements.

  SET NOCOUNT ON;

  declare @hash char(20);
  set @hash = convert(char(20), HASHBYTES('SHA1', @ssn));
  select convert(char(9), DECRYPTBYCERT(cert_id('certkey'), ssn)) as SSN, name
  from dbo.Person
  where Hash= @hash;
END

Conclusion

This approach has worked well for me in my experimentation, and I recommend it to you if you need to search on an encrypted field.  You should note, however, that the approach does have a limitation: you cannot search on an arbitrary substring of the encrypted text.  In other words, a "begins with" or "contains" search is not possible.  As long as you can live with this limitation, you should be able to employ this approach for your own encrypted data.

As always, I invite you, my reader, to leave a comment if you have found this useful or if you can think of an improvement.

posted @ Monday, October 06, 2008 12:14 AM | Feedback (5) |

Tuesday, August 05, 2008

Five Steps to Evaluate Technology Options

Recently a project leadership team I was assisting had the difficult task of selecting between implementation technologies for a distributed architecture.  We had brainstormed a list of candidates, and after some investigation were able to enumerate their strengths and weaknesses well enough.  But then the decision-making process started to come unhinged as we thrashed about, weighing the various options.  How do you narrow the field to a winner, anyway?

When I was able to frame the process in terms of the five steps you are about to read, though, the path to a confident decision became clear.  While I am not suggesting that this is the evaluation methodology to end all evaluation methodologies, it certainly helped us.

Step 1: Reduce the Candidate List via Head-to-Head Comparisons.  You will make your job easier if you can quickly knock out some of the contenders.  Among the options we were considering for synchronizing data were

  • using SQL Server replication, and
  • writing our own application to read database records marked with an update flag, update records in a partner node with the data, and then remove the flags.

Ultimately, we realized that the application we were thinking of writing would be remarkably similar in structure and purpose to SQL Server's merge replication.  Since we had no intention of re-inventing the wheel, we removed the idea of writing such an application from our candidate list.

Step 2: Build an Analysis Table.  Since the goal is to learn how the technology options will influence the development of your system, the next step is to build a table that will compare the options according to how well they implement potential system features.  Start by lining up the remaining technology options on the x-axis, and system features on the y-axis.  The level of difficulty in implementing a feature, using a technology option, will be the data at the intersection of an option and a feature.  Here is a sample of how an analysis grid might look for a distributed architecture:

Feature Option 1 Option 2 Option 3
Workflow: Overnight Batch Simple Simple Relatively Easy
Workflow: Straight-Through (Central Only) Difficult Very Difficult Moderately Difficult
Workflow: Straight-Through (All Nodes) Very Difficult Impossible Difficult
Data Replication: Near Real-time Very Difficult Moderately Difficult Moderately Difficult
Data Replication: Every Hour Moderately Difficult Impossible Relatively Easy
Post-failure Data Re-sync: overnight Simple Relatively Easy Relatively Easy
Post-failure Data Re-sync: immediate Moderately Difficult Relatively Easy Impossible

In this example we see that straight-through workflow processing at the central node will be difficult using option 1, very difficult using option 2, and moderately difficult using option 3.  The ease of implementation (for all the feature/option pairs) ranges from simple to impossible, with four intermediate values (relatively easy, moderately difficult, difficult, and very difficult).  Given the murkiness of IT crystal balls, there is probably no point in attempting a finer-grained analysis of the difficulty level.

Step 3. Assign Scores.  In order to compare the various combinations of features and options, you need to assign a score to every combination.  If you assign a score for a level of difficulty, you can quantify the comparison.  The Cohn scale, a pseudo-Fibonacci series in which each score is about 50% greater than its predecessor, is a good choice.  Many shops are already using the Cohn scale to estimate user stories in agile methodologies, so the familiarity can help.  If you convert a level of difficulty into a score on the Cohn scale, you are assuming that it is about 50% more difficult than the next lower level, as you can see below:

Description Points
Simple 1
Relatively Easy 2
Moderately Difficult 3
Difficult 5
Very Difficult 8
Impossible 10,000
Converting Descriptions Into Points (Cohn Scale)

"Impossible" is converted to an extremely high number in order to allow a numeric comparison. 

After assigning the scores, the example table looks like this:

Feature Option 1 Option 2 Option 3
Workflow: Overnight Batch 1 1 2
Workflow: Straight-Through (Central Only) 5 8 3
Workflow: Straight-Through (All Nodes) 8 10000 5
Data Replication: Near Real-time 8 3 3
Data Replication: Every Hour 3 10000 2
Post-failure Data Re-sync: overnight 1 2 2
Post-failure Data Re-sync: immediate 3 2 10000

Step 4. Discuss Features and Effort with Your Customer.  Of course your customer will want to understand the choices they have.  Using the analysis table from step 3, you could tell your customer that if they desperately desire straight-through workflow processing on all nodes in the distributed system, near real-time data replication, and immediate re-synchronization of data after a failure in the link between distributed nodes, the only technology available is option 1, which will "cost" 19 points.  You could then point out that if it is acceptable to wait until overnight to re-synchronize data after a communications link failure, you could implement option 3 at a cost of 10 points--about half the effort.

What if my customer wants an estimate in terms of actual resources (person-days)?  Your customer certainly has the right to get a ball-park estimate of the costs, of course.  Since it is impractical to work up an estimate for each combination of features and options, you could perform an estimate for the lowest scoring combination instead, which will then become the basis for a conversion factor between points and effort.  In the example, the lowest scoring combination is overnight batch workflow, hourly data replication, and overnight re-synchronization of nodes after a communication failure, using option 1 (5 points).  If you estimate this combination as requiring 10 person-weeks of effort, then the conversion factor is 2 person-weeks per point.  As a result, you can estimate that the desperately desired combination (costing 19 points) will require 38 person-weeks.  Obviously, your project plan should not rely on this ball-park estimate; it is only precise enough to help the project team make an informed technology choice.

Step 5. Choose the Lowest Scoring Option for the Desired Feature Set.  In most projects, the biggest cost factor (and biggest risk) is the use of human resources, so the option which requires the least effort should usually win.  However, if there is a near tie between first and second place, you might want to weigh other factors such as licensing costs and vendor support in order to make your choice.

What methodology have you used for choosing between technology options?  Have you used an analysis table like this?  Leave a comment!

posted @ Tuesday, August 05, 2008 12:33 PM | Feedback (0) |

Friday, June 06, 2008

Health Monitoring in ASP.NET

While I was busy customizing Microsoft's Exception Management Application Block to classify and log all exceptions thrown in our web and Windows apps, and writing instrumentation code that published timing events via System.Diagnostics.Trace, Microsoft was busy writing ASP.NET Health Monitoring.  Microsoft has more resources, so their product is a little more advanced and customizable.  Here's what it provides:

  1. An event model: There are event classes for web request failures, for authentication failures, for errors, for SQL errors, for heartbeats, and so forth.  You can subclass the WebBaseEvent in order to define your own custom events, as well.
  2. A provider model: Health monitoring can publish events with 4 built-in providers (email notification, SQL Server, WMI, and event log).  The WMI provider is especially interesting, but you will have to write your own application to manage and report WMI events, according to Microsoft.
  3. A customization model: You can edit web.config in order to map events to providers and to configure providers (for example, to specify a connection string for the SQL Server provider).

Since we have already customized so much infrastructure, especially by writing a bug portal that makes use of the customized Exception Management block, we will probably not be migrating to the ASP.NET Health Monitor any time soon.  However, for those of you who have been whistling in the dark with respect to the health of your web applications and services, get busy!  I highly recommend Scott Allen's post as a starting place for your efforts.

posted @ Friday, June 06, 2008 1:03 PM | Feedback (0) |

How To Create a Windows Form Singleton

The .NET Framework provides built-in capabilities for creating components and services as singletons. If you want to create a singleton process, though, you're on your own. But not really; if you read on, you will find out how to use a Mutex to do the job.

posted @ Friday, June 06, 2008 9:31 AM | Feedback (1) |

Tuesday, May 27, 2008

.NET Entity Framework: Is It Ready for Web Prime-Time?

The .NET Entity Framework is Microsoft's new tool for object/relational mapping (ORM). While MS does have big plans for it, it does have some important limitations....

posted @ Tuesday, May 27, 2008 9:50 AM | Feedback (4) |

Tuesday, May 13, 2008

BOOK REVIEW: Don't Make Me Think, 2d Ed. by Steve Krug

Steve Krug's "Common Sense Approach to Web Usability" provides surprising and sometimes counterintuitive principles that every good website must follow. Krug preaches the importance of removing clutter in order to make the purpose and functionality of a site (or page) clear--and happily, he practices what he preaches in this remarkably lucid book. Here are some of Krug's key insights...

posted @ Tuesday, May 13, 2008 3:31 PM | Feedback (0) |

Thursday, May 08, 2008

How To: Encrypt and Manage Documents with SQL Server 2005

This post describes when you should use SQL Server encryption, which type of encryption key to use, and how to create the infrastructure for managing documents whose size exceeds the maximum encryption block size of 8000 bytes. It also summarizes network and operations steps you need to take to make sure your entire infrastructure is reasonably secure.

posted @ Thursday, May 08, 2008 4:13 PM | Feedback (4) |

Tuesday, March 25, 2008

Want to Write Great Business Software? Create a Great Domain Model!

When you are coding in a hurry, it is very tempting to write business logic in the first place that comes to mind, such as a button click handler. However, for all but the simplest systems, such a practice leads very quickly to a chaotic system whose business logic is scattered like the ash from an erupting volcano. Create a great domain model, though, and you will be able to nimbly align your software with your emerging business needs.

posted @ Tuesday, March 25, 2008 9:01 AM | Feedback (6) |

Friday, March 07, 2008

Refactoring to Comply with the DRY principle (Don't Repeat Yourself)

Every time I see an example of copy-and-paste programming, I'm like a bull who just spotted a waving red flag. "Don't Repeat Yourself" (DRY) is a principle every programmer should live by, for Turing's sake! Read on for a discussion of why copy-and-paste is egregious, and how to refactor an expansive set of conditional logic branches into a concise and elegant piece of code.

posted @ Friday, March 07, 2008 7:45 AM | Feedback (0) |

Thursday, March 06, 2008

A Good Solution for "Magic String" Data

Dealing with vendor data (or your own) in the form of "codes" can pose significant challenges. You must ensure that your source code remains readable, that data are properly validated, and that data can be displayed as user-friendly descriptions. The built-in solutions (named constants and enums) help, but they have some significant shortcomings. If you derive a class of named constants from the MagicStringTranslator class, though, you can vanquish all 3 challenges in one fell swoop!

Egghead Cafe has kindly published the improved version of my article here.  Enjoy!

posted @ Thursday, March 06, 2008 10:08 AM | Feedback (0) |

Wednesday, February 27, 2008

Make Magic Strings Easy to Understand and Type Safe

As organizations pass data back and forth, they often use codes to represent the data.  For example, a marital status of divorced might be represented as "D", married as "M", and so forth.  You have to solve three problems when you are dealing with magic strings:

  1. When you write logic to handle the data, things can get chaotic on a hurry if you are not careful; the use of literal magic strings in your source code can make it incomprehensible. 
  2. You can get into trouble by passing a string that is not in the set of valid codes as a parameter to a method that is expecting one of the codes.  Since the parameter type is typically string, the compiler will not help you detect the error.
  3. You often have to translate codes into it description so a user of your system will know which data have been gathered.

Using named constants can help make your source code more readable, but you still have more work to do: you must write extra logic to check whether a string belongs to the set of valid codes, and to translate a code into a description that a user can understand.  If you declare your named constants in a class that inherits from the MagicStringTranslator class, though, you will get the data validation and the translation for free.  Check out the article I wrote for the details!

posted @ Wednesday, February 27, 2008 1:40 AM | Feedback (0) |

Friday, February 15, 2008

"New" Statement Considered Harmful

Just by calling a class' constructor, you are tying your code to some implementation details of the class. Since good software uses loose coupling, though, you should develop the habit of providing a static construction method and hiding a class constructor. There are exceptions that prove the rule, however, as noted both in this article and in the comments. Read on to learn how to develop the good habit, and when you should consider breaking it....

posted @ Friday, February 15, 2008 2:59 PM | Feedback (13) |

Powered by: