Geeks With Blogs
Chris Falter .NET Design and Best Practices

 

Recently I was discussing SQL Server encryption with some friends who have been using it to encrypt short strings such as Social Security numbers at their shop.  I commented, "Just try searching those Social Security Numbers," they shared my lamentation, and we moved on to other subjects.  Later that evening, though, I thought there must be a way to search those wretched encrypted blocks--somehow--and worked out the solution you are about to read. 

The Problem

The difficulty lies in the fact that you cannot compare varbinaries in a predicate.  Combine this with the fact that encrypted text is stored as a varbinary, and the result is that you cannot search the varbinary encrypted text, even if you run the search term through the encryption algorithm before performing the comparison. 

To illustrate, suppose you have a Person table that stores a name and a Social Security Number (SSN) encrypted with a certificate, defined like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

It would be nice if you could just encrypt the search text and build a predicate that compares the term and the encrypted column, like this:

declare @encryptedSSN varbinary(128);
set @encryptedSSN = ENCRYPTBYCERT(CERT_ID('certKey'), @ssn);
select name, CONVERT(char(9),DECRYPTBYCERT(CERT_ID('certKey'), SSN) as SSN
from dbo.Person
where ssn = @encryptedSSN;

As I mentioned previously, though, this approach simply does not work. 

Of course, you could make the predicate succeed by setting the datatype of the encrypted text column to char(128), and then converting the encryption products (stored SSNs plus the search term) from varbinary to char(128) before using them.  However, indexing a column that large, and performing comparisons with it, would consume a lot of disk and processing resources.

The Solution

Instead of converting the large encrypted block into char(128) data, you can work with the hash of the SSN instead.  If you calculate the hash using the SHA1 algorithm, the hash will require only 20 bytes of storage, so the burden on disk and processing resources will drop significantly.  And since the SHA1 algorithm is considered to be very secure, you will not add any significant security risk to your solution.  From start to finish, here's how the approach will work:

1. Include a char(20) column in your table to hold the hash of the SSN.  The new table definition will look like this:

CREATE TABLE [dbo].[Person](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [SSN] [varbinary](128) NULL,
  [Name] [varchar](50) NULL,
  [Hash] [char](20) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
[Id] ASC
)ON [PRIMARY]
)ON [PRIMARY]

You should also index the Hash column to improve the performance of searches by SSN, of course. 

2. Whenever you insert a record, store the hash of the SSN along with the encrypted bytes.  Your insert stored procedure should use the built-in HASHBYTES function, like this:

CREATE PROCEDURE [dbo].[usp_Insert_Person]
  @ssn
char(9),
  @name varchar(50)
AS
BEGIN

  insert into dbo.Person(SSN, Name,Hash)
  values(ENCRYPTBYCERT(CERT_ID('certKey'), @ssn), @name, convert(char(20), HASHBYTES('SHA1', @ssn)))
END

3. When you search by SSN, compare the search term hash to the stored hash in your predicate:

CREATE PROCEDURE usp_Search_Person_BySsn
  @ssn char(9)
AS
BEGIN

  --SET NOCOUNT ON added to prevent extra result sets from
  --interfering with SELECT statements.

  SET NOCOUNT ON;

  declare @hash char(20);
  set @hash = convert(char(20), HASHBYTES('SHA1', @ssn));
  select convert(char(9), DECRYPTBYCERT(cert_id('certkey'), ssn)) as SSN, name
  from dbo.Person
  where Hash= @hash;
END

Conclusion

This approach has worked well for me in my experimentation, and I recommend it to you if you need to search on an encrypted field.  You should note, however, that the approach does have a limitation: you cannot search on an arbitrary substring of the encrypted text.  In other words, a "begins with" or "contains" search is not possible.  As long as you can live with this limitation, you should be able to employ this approach for your own encrypted data.

As always, I invite you, my reader, to leave a comment if you have found this useful or if you can think of an improvement.

EDIT: Per the comment from Tom, please hash the encrypted SSN, rather than the SSN itself (my original idea) to obtain a searchable char field.  Hashing the SSN does not by itself provide enough security.

Posted on Monday, October 6, 2008 12:14 AM Database Considerations | Back to top


Comments on this post: How To Search Encrypted Text in SQL Server 2005/2008

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
Wow, I hate to actually say this, but do you understand how and what a SHA1 is? It's a hash, and has no inherit security in and of itself... Now let's use your example, you store the persons social security number in encrypted form in the database - great, that's what I would hope you do. You want to use a SHA1 to speed up your search - also great, it will do just that. Here's where the problem comes in - YOU'RE PERFOMING THE SHA1 ON THE PERSONS SSN!!!!! As I'm sure you're aware, SHA1's are considered one-way hashes, and this is generally true - and brute force reversing of it would be near to impossible in most cases. Most cases, the example you just gave is most certainly not. Let's see, an SSN is made of up 3 parts, 3 digits representing the issuing state, two representing the year cycle of the issuing state, and 4 digits that are effectively random, for a grand total of 9 digits (which obvsiouly corresponds to your @ssn variable). That means you've got roughtly 1 billion possible SSN's (the actual number is substancially less than that). Are you seeing the problem?

Got the database somehow and want the person's SSN? No problem, worst case just generate the SHA1 for all number between 000000000 and 999999999, then just compare them against the SHA1. Now how valuable is that public key encryption you're using? Worthless.

You are on the right track on speeding up things, but you failed to actually consider what to perform the SHA1 on. The correct answer is to perform the SHA1 on the encrypted buffer, and search against that (encrypting and SHA1'ing any SSN you're searching for). Then you're not exposing any additional data.
Left by Tim on Oct 13, 2008 12:18 AM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
Tim,

Thanks for stopping by and leaving a perspicacious comment. I agree that the hack is not so difficult for the determined attacker, and I like your idea of hashing the encrypted buffer instead in order to thwart it.

Comments like yours are why I love to interact with my readers.

- Chris
Left by Chris Falter on Oct 25, 2008 9:24 AM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
We need to do something similar to this but we need to search the data - like get the next 25 SSNs in order. Can you recommend an approach that does results in decrypting the entire table?
Left by Brian on Dec 03, 2008 6:42 AM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
This code was really helpful for me.
Left by chandrashekhar on Mar 23, 2009 7:33 PM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
i have 22 related tables , and have a master table also,
if i could create xml from all 22 tables and insert it into only in master table column.(ie avoiding all other 22 tables)..Whats ur opinion, and i wil b saving it as encrypted,frm this can i expect any performance increment?is it posible 2 giv any other relation to other tables?
Left by shiyaz on Sep 28, 2009 5:09 AM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
Good article Chris but I've a concern here. Hashing an encrypted field will give different results each time. This will happen because encryption will generate different cypher text for the same normal text. So how can we use this hash for search?
Left by Yugi on May 11, 2011 11:31 PM

# re: How To Search Encrypted Text in SQL Server 2005/2008
Requesting Gravatar...
Hi, im trying to follow Tom´s advice by hashing the encrypted text, however, every time i encrypt it, a different varbinary is returned, making a hashed search impossible :S any other way of doing this?

Thank you.
Left by juan on Jun 16, 2011 4:13 PM

Your comment:
 (will show your gravatar)


Copyright © Chris Falter | Powered by: GeeksWithBlogs.net