Chris Falter

.NET Design and Best Practices
posts - 42, comments - 84, trackbacks - 24

My Links

News

All source code published on this blog is placed in the public domain.

Archives

Post Categories

Image Galleries

About Me

Monday, October 06, 2008

How To Search Encrypted Text in SQL Server 2005/2008


Recently I was discussing SQL Server encryption with some friends who have been using it to encrypt short strings such as Social Security numbers at their shop.  I commented, "Just try searching those Social Security Numbers," they shared my lamentation, and we moved on to other subjects.  Later that evening, though, I thought there must be a way to search those wretched encrypted blocks--somehow--and worked out the solution you are about to read. 

The Problem

The difficulty lies in the fact that you cannot compare varbinaries in a predicate.  Combine this with the fact that encrypted text is stored as a varbinary, and the result is that you cannot search the varbinary encrypted text, even if you run the search term through the encryption algorithm before performing the comparison. 

To illustrate, suppose you have a Person table that stores a name and a Social Security Number (SSN) encrypted with a certificate, defined like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

It would be nice if you could just encrypt the search text and build a predicate that compares the term and the encrypted column, like this:

declare @encryptedSSN varbinary(128);
set @encryptedSSN = ENCRYPTBYCERT(CERT_ID('certKey'), @ssn);
select name, CONVERT(char(9),DECRYPTBYCERT(CERT_ID('certKey'), SSN) as SSN
from dbo.Person
where ssn = @encryptedSSN;

As I mentioned previously, though, this approach simply does not work. 

Of course, you could make the predicate succeed by setting the datatype of the encrypted text column to char(128), and then converting the encryption products (stored SSNs plus the search term) from varbinary to char(128) before using them.  However, indexing a column that large, and performing comparisons with it, would consume a lot of disk and processing resources.

The Solution

Instead of converting the large encrypted block into char(128) data, you can work with the hash of the SSN instead.  If you calculate the hash using the SHA1 algorithm, the hash will require only 20 bytes of storage, so the burden on disk and processing resources will drop significantly.  And since the SHA1 algorithm is considered to be very secure, you will not add any significant security risk to your solution.  From start to finish, here's how the approach will work:

1. Include a char(20) column in your table to hold the hash of the SSN.  The new table definition will look like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
  [Hash] [char](20) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

You should also index the Hash column to improve the performance of searches by SSN, of course. 

2. Whenever you insert a record, store the hash of the SSN along with the encrypted bytes.  Your insert stored procedure should use the built-in HASHBYTES function, like this:

CREATE PROCEDURE [dbo].[usp_Insert_Person]
  @ssn
char(9),
  @name varchar(50)
AS
BEGIN

  insert into dbo.Person(SSN, Name,Hash)
  values(ENCRYPTBYCERT(CERT_ID('certKey'), @ssn), @name, convert(char(20), HASHBYTES('SHA1', @ssn)))
END

3. When you search by SSN, compare the search term hash to the stored hash in your predicate:

CREATE PROCEDURE usp_Search_Person_BySsn
  @ssn char(9)
AS
BEGIN

  --SET NOCOUNT ON added to prevent extra result sets from
  --interfering with SELECT statements.

  SET NOCOUNT ON;

  declare @hash char(20);
  set @hash = convert(char(20), HASHBYTES('SHA1', @ssn));
  select convert(char(9), DECRYPTBYCERT(cert_id('certkey'), ssn)) as SSN, name
  from dbo.Person
  where Hash= @hash;
END

Conclusion

This approach has worked well for me in my experimentation, and I recommend it to you if you need to search on an encrypted field.  You should note, however, that the approach does have a limitation: you cannot search on an arbitrary substring of the encrypted text.  In other words, a "begins with" or "contains" search is not possible.  As long as you can live with this limitation, you should be able to employ this approach for your own encrypted data.

As always, I invite you, my reader, to leave a comment if you have found this useful or if you can think of an improvement.

posted @ Monday, October 06, 2008 12:14 AM | Feedback (5) | Filed Under [ Database Considerations ]

Powered by: