Chris Falter

.NET Design and Best Practices
posts - 42, comments - 82, trackbacks - 24

My Links

News

All source code published on this blog is placed in the public domain.

Archives

Post Categories

Image Galleries

About Me

Sunday, March 29, 2009

How to Reuse Code Without Creating an Implicit API

I work for an ISV, and we have to be very cautious about the code that we declare with public scope.  If it's public, a customer can treat it as an API, which might not always be a good idea if the code is really just doing some internal task in a way that could easily change as we improve our software.  In other words, not all that is public should be an API.

So what do you do if you need to share code across assembly boundaries?  Recently I was writing some unit tests for a class, and the tests needed access to a class method in order to imitate (and test) its behavior.  Of course this method was private--of course!--so it appeared that I had to choose between 2 very unpalatable options:

  1. Give the method public scope, which would make it visible to all our customers' implementations.  Essentially, this option would make the method part of our API.  Not happening.
  2. Copy the 20 lines of code from the method, paste it into my new class, and supplicate the dark spirits of bad software design to supernaturally prevent the method from ever being modified.  But how could I perform the ritual without an eye of newt and toe of frog?*

But appearances can be deceiving.  It turns out that the InternalsVisibleToAttribute, introduced by Microsoft in .NET 2.0, solves the problem quite neatly.  Using this attribute, you can designate one or more friend assemblies that will be allowed access to classes, methods and properties in the target assembly that have friend scope, while all other code--including customer code--is denied access.

Using the attribute is a piece of cake:

  1. Mark any classes/methods/properties that need to be visible to other assemblies with "Friend" scope.
  2. Import the System.Runtime.CompilerServices namespace into your assembly's AssemblyInfo.vb file.

    Imports System.Runtime.InteropServices

  3. In AssemblyInfo.vb, mark the assembly with the InternalsVisibleToAttribute for each friend assembly that should have access to the friend scope.

    <Assembly: InternalsVisibleTo("MySystem.Assembly1")>
    <Assembly: InternalsVisibleTo("MySystem.Assembly2")>

If you are working in C#, you will make your changes in the AssemblyInfo.cs file for the assembly that contains the code to be re-used, and you will use "using" instead of "Imports."

So I could have my cake (re-use of existing code) and shield it from inappropriate eyes at the same time.  While this technique is obviously useful for writing tests, it can be applied anywhere that code needs to be shared between your system's assemblies without exposing it customers as part of an API. 

* Not to mention the wool of bat and tongue of dog.

 

posted @ Sunday, March 29, 2009 2:44 PM | Feedback (0) |

Turbotax Customer Service to the Rescue

I've got to give the Turbotax customer service organization credit: they responded quickly to my inquiry, and figured out a way to solve the problem.  Yeah, the user interface design was stinky,  but plan B worked out, so my faith in Turbotax is restored.

Lesson: a good customer service organization is essential if you want to keep your customers in spite of the occasional and inevitable screw-up.

posted @ Sunday, March 29, 2009 2:21 PM | Feedback (3) |

Sunday, March 22, 2009

Bad Software Design: Turbotax Web Edition

After I completed my federal taxes, Turbotax insisted on checking for errors before allowing me to file for the refund Uncle Sam owes me.  No problem there, I want my software to catch any possible errors.  The supposed errors that Turbotax wanted me to correct, however, were so completely misguided that I have to nominate the 2009 Turbotax Web Edition for the Software Hall of Shame.

First, Turbotax insisted that I had to file a 1099-INT for my bank savings account because I had filed one last year.  And it would not allow me to state that I did not receive a form; I had to provide some dollars and cents before Turbotax would let me file my 1040.  Wait a second, though; my bank didn't send me a 1099-INT this year because my account earned so little interest!  (It's a pathetic little savings account, really.)  So Turbotax was refusing to let me file until I provided data that do not even exist.

Let's talk about software design for a minute.  How on God's green earth does Turbotax know that I must have overlooked a form, just because I filed it in a previous year?  Is it not possible that I might have even closed the pathetic little savings account before 2008 even began?  It just so happened that I could visit my bank's web site and sum up the few pennies in interest paid on the account in 2008, so I was able to feed Turbotax's demand for data and pass on to the next hurdle. Not that I was happy about what had happened.

The second design flaw was so bad that I may not even be able to file my taxes (with Turbotax, anyway).  Several years ago I worked briefly as a consultant, so I filed a Schedule C that included the analysis of vehicle mileage and expenses for business vs. personal use.  In the subsequent years, I have not worked as a consultant, and I have never again filed a Schedule C.  So guess what supposed error Turbotax insisted that I had to correct?  That's right, Turbotax is refusing to allow me to file my taxes until I file a Schedule C that includes business and personal use of my vehicle.

I had not claimed any income from consulting for 2008.

I no longer own the vehicle that I supposedly have to file the mileage reports for.

But Turbotax will not allow me to indicate that I do not need to file the form.  It's their way on Schedule C, or the highway.

So I'm about to contact Turbotax support.  If it's anywhere close to the quality of the actual software design, I will no doubt have to write off the hours I spent working with their web software, hit the highway, and find a better tax software provider. But maybe not; maybe their support is world-class and will save the day.  I'll report on the results as they become available.

posted @ Sunday, March 22, 2009 3:44 PM | Feedback (0) |

Thursday, February 05, 2009

Simplicity vs. Performance

A colleague and I had a recent discussion about two versions of a trigger.  One was simple, containing a single query.  The other performed 14% better, but was a lot more complex--double the lines of code, and a branch that had to be evaluated.  My colleague advocated for checking in the more complex version right from the start, while I advocated for starting with simplicity.  What do you think?  Post a comment!

Here's how our discussion went:

COLLEAGUE:

Is your first example more than 14% more maintainable than the second example?  If not (which is my opinion, since there's really only one line, the "select inserted.NAME +..." line, that would change over time; the rest is boilerplate code), I think I'd go with the second for the performance increase.

ME:

My experience has been that it's a lot more expensive for an organization to maintain complex code than it is to tweak performance.  Additional complexity tends to work like Adam Smith's invisible hand--except that this invisible hand works for evil, rather than for good.  It makes staff spend more time trying to figure out how to fix bugs and add new functionality than they would with simpler code.  It makes the software organization slower to respond to customer requests and changing market conditions. 

The problem is that this complexity tax that the organization has to keep paying cycle after cycle is truly invisible because it is hard to quantify (unlike a small bump in performance).  But ask any developer whether s/he has had to spend significant extra time in development or troubleshooting because the code base was complex (i.e., whether s/he has had to pay the code complexity tax), and invariably the answer is yes, a thousand times yes.

Granted, a single trigger that improves performance by 14% itself will not cause all that hardship.  It's the cumulative effect of millions of lines of code that are a little too complex that make the development process harder and more expensive than it has to be/should be.

I guess the other way of pitching this philosophy is that if you tend to choose slightly better performance over simpler code,  when it comes time to optimize the system you're asking the question: which of these millions of lines of code could be simplified at only a small performance cost?  And we don't have the tools to answer that question very well.  OTOH, if we tend to choose simpler code over slightly better performance, then optimization means we're asking: where are the bottlenecks where we can get a big performance boost at a small cost in code complexity?  And we have a variety of profiling tools that help us answer that question very well.

COLLEAGUE:

I think you're letting a general philosophy get in the way of optimizing the one example you've shown.  A 14% improvement in performance is more than enough reason to have branching code that's only slightly less maintainable than the non-branching code.  I just can't imagine a scenario where a maintenance programmer sits down to fix a bug in this trigger and is so completely confounded by its complexity that fixing it would bring down the entire organization as you have suggested.
ME:

Actually, the maintainability difference is quite dramatic.  You're buying the slightly better performance at a cost of doubling the lines of code and doubling the cyclomatic complexity. 

Granted, this loss of maintainability occurs over a tiny portion of the code base.  On the other hand, the performance gain, measured as a percentage of total transactions against the database, is essentially imperceptible as well.  I.e., it's not as if a 14% perf gain in one of the most lightly used triggers among tens of thousands will amount to more than a sneeze.

What I'm saying is that the best strategy to optimize performance is to start with a good, simple code base, find the biggest bottlenecks in a realistic load test, and optimize just those code sections.  Usually, the vast majority of perf gains emerge from a small number of trade-offs against simplicity.

The opposite approach (always choosing performance over simplicity during initial development) results in getting to market more slowly (which has a cost), and--through the accretion of thousands of small complexities--a gnarly code base that is expensive and time-consuming to maintain.  The cumulative effect of small decisions makes a big difference.

If you look at this scenario in the small--one decision about one trigger--I can understand why you might choose performance over simplicity.  Now multiply that bias for perf over simplicity across dozens of developers writing code every day, and you end up with an entire system that is twice as expensive and time-consuming to maintain.  And the system would have lower quality to boot, since when you multiply lines of code and cyclomatic complexity, quality is pretty much lost at an exponential rate.  Such a loss of quality is surely expensive to an organization over the long-term.

That's why I advocate choosing simplicity as a matter of initial development.  Then as a QA process, look for the performance optimizations that will yield the biggest gains, and implement just those.  The 80/20 rule will apply: you'll get 80% of the performance gains, while giving up only 20% of the simplicity.

The problem of trying to go in the reverse direction is that we don't have very good tools for identifying which code simplifications can be purchased at the lowest performance expense.  So you just give up, and end up bearing all the expenses of a code base which is twice as big and twice as complex.  And those expenses are far, far greater than the expense of system performance that is a few percent slower.

posted @ Thursday, February 05, 2009 5:33 AM | Feedback (1) |

Monday, October 06, 2008

How To Search Encrypted Text in SQL Server 2005/2008


Recently I was discussing SQL Server encryption with some friends who have been using it to encrypt short strings such as Social Security numbers at their shop.  I commented, "Just try searching those Social Security Numbers," they shared my lamentation, and we moved on to other subjects.  Later that evening, though, I thought there must be a way to search those wretched encrypted blocks--somehow--and worked out the solution you are about to read. 

The Problem

The difficulty lies in the fact that you cannot compare varbinaries in a predicate.  Combine this with the fact that encrypted text is stored as a varbinary, and the result is that you cannot search the varbinary encrypted text, even if you run the search term through the encryption algorithm before performing the comparison. 

To illustrate, suppose you have a Person table that stores a name and a Social Security Number (SSN) encrypted with a certificate, defined like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

It would be nice if you could just encrypt the search text and build a predicate that compares the term and the encrypted column, like this:

declare @encryptedSSN varbinary(128);
set @encryptedSSN = ENCRYPTBYCERT(CERT_ID('certKey'), @ssn);
select name, CONVERT(char(9),DECRYPTBYCERT(CERT_ID('certKey'), SSN) as SSN
from dbo.Person
where ssn = @encryptedSSN;

As I mentioned previously, though, this approach simply does not work. 

Of course, you could make the predicate succeed by setting the datatype of the encrypted text column to char(128), and then converting the encryption products (stored SSNs plus the search term) from varbinary to char(128) before using them.  However, indexing a column that large, and performing comparisons with it, would consume a lot of disk and processing resources.

The Solution

Instead of converting the large encrypted block into char(128) data, you can work with the hash of the SSN instead.  If you calculate the hash using the SHA1 algorithm, the hash will require only 20 bytes of storage, so the burden on disk and processing resources will drop significantly.  And since the SHA1 algorithm is considered to be very secure, you will not add any significant security risk to your solution.  From start to finish, here's how the approach will work:

1. Include a char(20) column in your table to hold the hash of the SSN.  The new table definition will look like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
  [Hash] [char](20) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

You should also index the Hash column to improve the performance of searches by SSN, of course. 

2. Whenever you insert a record, store the hash of the SSN along with the encrypted bytes.  Your insert stored procedure should use the built-in HASHBYTES function, like this:

CREATE PROCEDURE [dbo].[usp_Insert_Person]
  @ssn
char(9),
  @name varchar(50)
AS
BEGIN

  insert into dbo.Person(SSN, Name,Hash)
  values(ENCRYPTBYCERT(CERT_ID('certKey'), @ssn), @name, convert(char(20), HASHBYTES('SHA1', @ssn)))
END

3. When you search by SSN, compare the search term hash to the stored hash in your predicate:

CREATE PROCEDURE usp_Search_Person_BySsn
  @ssn char(9)
AS
BEGIN

  --SET NOCOUNT ON added to prevent extra result sets from
  --interfering with SELECT statements.

  SET NOCOUNT ON;

  declare @hash char(20);
  set @hash = convert(char(20), HASHBYTES('SHA1', @ssn));
  select convert(char(9), DECRYPTBYCERT(cert_id('certkey'), ssn)) as SSN, name
  from dbo.Person
  where Hash= @hash;
END

Conclusion

This approach has worked well for me in my experimentation, and I recommend it to you if you need to search on an encrypted field.  You should note, however, that the approach does have a limitation: you cannot search on an arbitrary substring of the encrypted text.  In other words, a "begins with" or "contains" search is not possible.  As long as you can live with this limitation, you should be able to employ this approach for your own encrypted data.

As always, I invite you, my reader, to leave a comment if you have found this useful or if you can think of an improvement.

posted @ Monday, October 06, 2008 12:14 AM | Feedback (5) |

Tuesday, August 05, 2008

Five Steps to Evaluate Technology Options

Recently a project leadership team I was assisting had the difficult task of selecting between implementation technologies for a distributed architecture.  We had brainstormed a list of candidates, and after some investigation were able to enumerate their strengths and weaknesses well enough.  But then the decision-making process started to come unhinged as we thrashed about, weighing the various options.  How do you narrow the field to a winner, anyway?

When I was able to frame the process in terms of the five steps you are about to read, though, the path to a confident decision became clear.  While I am not suggesting that this is the evaluation methodology to end all evaluation methodologies, it certainly helped us.

Step 1: Reduce the Candidate List via Head-to-Head Comparisons.  You will make your job easier if you can quickly knock out some of the contenders.  Among the options we were considering for synchronizing data were

  • using SQL Server replication, and
  • writing our own application to read database records marked with an update flag, update records in a partner node with the data, and then remove the flags.

Ultimately, we realized that the application we were thinking of writing would be remarkably similar in structure and purpose to SQL Server's merge replication.  Since we had no intention of re-inventing the wheel, we removed the idea of writing such an application from our candidate list.

Step 2: Build an Analysis Table.  Since the goal is to learn how the technology options will influence the development of your system, the next step is to build a table that will compare the options according to how well they implement potential system features.  Start by lining up the remaining technology options on the x-axis, and system features on the y-axis.  The level of difficulty in implementing a feature, using a technology option, will be the data at the intersection of an option and a feature.  Here is a sample of how an analysis grid might look for a distributed architecture:

Feature Option 1 Option 2 Option 3
Workflow: Overnight Batch Simple Simple Relatively Easy
Workflow: Straight-Through (Central Only) Difficult Very Difficult Moderately Difficult
Workflow: Straight-Through (All Nodes) Very Difficult Impossible Difficult
Data Replication: Near Real-time Very Difficult Moderately Difficult Moderately Difficult
Data Replication: Every Hour Moderately Difficult Impossible Relatively Easy
Post-failure Data Re-sync: overnight Simple Relatively Easy Relatively Easy
Post-failure Data Re-sync: immediate Moderately Difficult Relatively Easy Impossible

In this example we see that straight-through workflow processing at the central node will be difficult using option 1, very difficult using option 2, and moderately difficult using option 3.  The ease of implementation (for all the feature/option pairs) ranges from simple to impossible, with four intermediate values (relatively easy, moderately difficult, difficult, and very difficult).  Given the murkiness of IT crystal balls, there is probably no point in attempting a finer-grained analysis of the difficulty level.

Step 3. Assign Scores.  In order to compare the various combinations of features and options, you need to assign a score to every combination.  If you assign a score for a level of difficulty, you can quantify the comparison.  The Cohn scale, a pseudo-Fibonacci series in which each score is about 50% greater than its predecessor, is a good choice.  Many shops are already using the Cohn scale to estimate user stories in agile methodologies, so the familiarity can help.  If you convert a level of difficulty into a score on the Cohn scale, you are assuming that it is about 50% more difficult than the next lower level, as you can see below:

Description Points
Simple 1
Relatively Easy 2
Moderately Difficult 3
Difficult 5
Very Difficult 8
Impossible 10,000
Converting Descriptions Into Points (Cohn Scale)

"Impossible" is converted to an extremely high number in order to allow a numeric comparison. 

After assigning the scores, the example table looks like this:

Feature Option 1 Option 2 Option 3
Workflow: Overnight Batch 1 1 2
Workflow: Straight-Through (Central Only) 5 8 3
Workflow: Straight-Through (All Nodes) 8 10000 5
Data Replication: Near Real-time 8 3 3
Data Replication: Every Hour 3 10000 2
Post-failure Data Re-sync: overnight 1 2 2
Post-failure Data Re-sync: immediate 3 2 10000

Step 4. Discuss Features and Effort with Your Customer.  Of course your customer will want to understand the choices they have.  Using the analysis table from step 3, you could tell your customer that if they desperately desire straight-through workflow processing on all nodes in the distributed system, near real-time data replication, and immediate re-synchronization of data after a failure in the link between distributed nodes, the only technology available is option 1, which will "cost" 19 points.  You could then point out that if it is acceptable to wait until overnight to re-synchronize data after a communications link failure, you could implement option 3 at a cost of 10 points--about half the effort.

What if my customer wants an estimate in terms of actual resources (person-days)?  Your customer certainly has the right to get a ball-park estimate of the costs, of course.  Since it is impractical to work up an estimate for each combination of features and options, you could perform an estimate for the lowest scoring combination instead, which will then become the basis for a conversion factor between points and effort.  In the example, the lowest scoring combination is overnight batch workflow, hourly data replication, and overnight re-synchronization of nodes after a communication failure, using option 1 (5 points).  If you estimate this combination as requiring 10 person-weeks of effort, then the conversion factor is 2 person-weeks per point.  As a result, you can estimate that the desperately desired combination (costing 19 points) will require 38 person-weeks.  Obviously, your project plan should not rely on this ball-park estimate; it is only precise enough to help the project team make an informed technology choice.

Step 5. Choose the Lowest Scoring Option for the Desired Feature Set.  In most projects, the biggest cost factor (and biggest risk) is the use of human resources, so the option which requires the least effort should usually win.  However, if there is a near tie between first and second place, you might want to weigh other factors such as licensing costs and vendor support in order to make your choice.

What methodology have you used for choosing between technology options?  Have you used an analysis table like this?  Leave a comment!

posted @ Tuesday, August 05, 2008 12:33 PM | Feedback (0) |

Friday, June 06, 2008

Health Monitoring in ASP.NET

While I was busy customizing Microsoft's Exception Management Application Block to classify and log all exceptions thrown in our web and Windows apps, and writing instrumentation code that published timing events via System.Diagnostics.Trace, Microsoft was busy writing ASP.NET Health Monitoring.  Microsoft has more resources, so their product is a little more advanced and customizable.  Here's what it provides:

  1. An event model: There are event classes for web request failures, for authentication failures, for errors, for SQL errors, for heartbeats, and so forth.  You can subclass the WebBaseEvent in order to define your own custom events, as well.
  2. A provider model: Health monitoring can publish events with 4 built-in providers (email notification, SQL Server, WMI, and event log).  The WMI provider is especially interesting, but you will have to write your own application to manage and report WMI events, according to Microsoft.
  3. A customization model: You can edit web.config in order to map events to providers and to configure providers (for example, to specify a connection string for the SQL Server provider).

Since we have already customized so much infrastructure, especially by writing a bug portal that makes use of the customized Exception Management block, we will probably not be migrating to the ASP.NET Health Monitor any time soon.  However, for those of you who have been whistling in the dark with respect to the health of your web applications and services, get busy!  I highly recommend Scott Allen's post as a starting place for your efforts.

posted @ Friday, June 06, 2008 1:03 PM | Feedback (0) |

How To Create a Windows Form Singleton

The .NET Framework provides built-in capabilities for creating components and services as singletons. If you want to create a singleton process, though, you're on your own. But not really; if you read on, you will find out how to use a Mutex to do the job.

posted @ Friday, June 06, 2008 9:31 AM | Feedback (1) |

Tuesday, May 27, 2008

.NET Entity Framework: Is It Ready for Web Prime-Time?

The .NET Entity Framework is Microsoft's new tool for object/relational mapping (ORM). While MS does have big plans for it, it does have some important limitations....

posted @ Tuesday, May 27, 2008 9:50 AM | Feedback (4) |

Tuesday, May 13, 2008

BOOK REVIEW: Don't Make Me Think, 2d Ed. by Steve Krug

Steve Krug's "Common Sense Approach to Web Usability" provides surprising and sometimes counterintuitive principles that every good website must follow. Krug preaches the importance of removing clutter in order to make the purpose and functionality of a site (or page) clear--and happily, he practices what he preaches in this remarkably lucid book. Here are some of Krug's key insights...

posted @ Tuesday, May 13, 2008 3:31 PM | Feedback (0) |

Thursday, May 08, 2008

How To: Encrypt and Manage Documents with SQL Server 2005

This post describes when you should use SQL Server encryption, which type of encryption key to use, and how to create the infrastructure for managing documents whose size exceeds the maximum encryption block size of 8000 bytes. It also summarizes network and operations steps you need to take to make sure your entire infrastructure is reasonably secure.

posted @ Thursday, May 08, 2008 4:13 PM | Feedback (4) |

Tuesday, March 25, 2008

Want to Write Great Business Software? Create a Great Domain Model!

When you are coding in a hurry, it is very tempting to write business logic in the first place that comes to mind, such as a button click handler. However, for all but the simplest systems, such a practice leads very quickly to a chaotic system whose business logic is scattered like the ash from an erupting volcano. Create a great domain model, though, and you will be able to nimbly align your software with your emerging business needs.

posted @ Tuesday, March 25, 2008 9:01 AM | Feedback (6) |

Friday, March 07, 2008

Refactoring to Comply with the DRY principle (Don't Repeat Yourself)

Every time I see an example of copy-and-paste programming, I'm like a bull who just spotted a waving red flag. "Don't Repeat Yourself" (DRY) is a principle every programmer should live by, for Turing's sake! Read on for a discussion of why copy-and-paste is egregious, and how to refactor an expansive set of conditional logic branches into a concise and elegant piece of code.

posted @ Friday, March 07, 2008 7:45 AM | Feedback (0) |

Thursday, March 06, 2008

A Good Solution for "Magic String" Data

Dealing with vendor data (or your own) in the form of "codes" can pose significant challenges. You must ensure that your source code remains readable, that data are properly validated, and that data can be displayed as user-friendly descriptions. The built-in solutions (named constants and enums) help, but they have some significant shortcomings. If you derive a class of named constants from the MagicStringTranslator class, though, you can vanquish all 3 challenges in one fell swoop!

Egghead Cafe has kindly published the improved version of my article here.  Enjoy!

posted @ Thursday, March 06, 2008 10:08 AM | Feedback (0) |

Wednesday, February 27, 2008

Make Magic Strings Easy to Understand and Type Safe

As organizations pass data back and forth, they often use codes to represent the data.  For example, a marital status of divorced might be represented as "D", married as "M", and so forth.  You have to solve three problems when you are dealing with magic strings:

  1. When you write logic to handle the data, things can get chaotic on a hurry if you are not careful; the use of literal magic strings in your source code can make it incomprehensible. 
  2. You can get into trouble by passing a string that is not in the set of valid codes as a parameter to a method that is expecting one of the codes.  Since the parameter type is typically string, the compiler will not help you detect the error.
  3. You often have to translate codes into it description so a user of your system will know which data have been gathered.

Using named constants can help make your source code more readable, but you still have more work to do: you must write extra logic to check whether a string belongs to the set of valid codes, and to translate a code into a description that a user can understand.  If you declare your named constants in a class that inherits from the MagicStringTranslator class, though, you will get the data validation and the translation for free.  Check out the article I wrote for the details!

posted @ Wednesday, February 27, 2008 1:40 AM | Feedback (0) |

Powered by: