Robin Hames

Hints, tricks and tips relating to MS SQL Server and .NET
posts - 14 , comments - 43 , trackbacks - 0

SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.

SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.
 
A database I have been working on uses a checksum across address data to try to establish whether an entered address already exists in the database.
A problem occurred because two addresses had the same checksum value, but the actual address was subtlety different. The difference was that one had the street address as “St James’ Street”, whereas the other had “St James Street”.
The following test script shows that the single quote does not appear to alter the checksum calculation:
declare @t1 table (s varchar(50))
insert into @t1 values ('10 St James Street')
insert into @t1 values ('10 St James'' Street')
insert into @t1 values ('1''0'' S''t'' J''a''m''e''s'' S''t''r''e''e''t')
 
select checksum(s), * from @t1
 
These three rows all give the same checksum value (178448437).

Print | posted on Thursday, October 30, 2008 10:58 AM | Filed Under [ SQL Server ]

Feedback

Gravatar

# re: SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.

Interesting. The cause may have something to do with the way sql server processes strings. For example, on most servers, 'Yo' and 'yo' would return the same checksum. In general, it's best to use binary_checksum instead.

For example:

declare @t1 table (s varchar(50))
insert into @t1 values ('10 St James Street')
insert into @t1 values ('10 St James'' Street')
insert into @t1 values ('1''0'' S''t'' J''a''m''e''s'' S''t''r''e''e''t')

select checksum(s), binary_checksum(s), s from @t1

returns:
178448437 1093109684 10 St James Street
178448437 319752736 10 St James' Street
178448437 279324406 1'0' S't' J'a'm'e's' S't'r'e'e't



10/30/2008 2:49 PM | Randolpho
Gravatar

# re: SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.

I managed to get some interesting duplicates with binary_checksum() too. Such as:

declare @t1 table (s varchar(50))
insert into @t1 values ('10 James Street 10')
insert into @t1 values ('20 James Street 20')

select binary_checksum(s), * from @t1

2018066774 10 James Street 10
2018066774 20 James Street 20

Seems less likely to get duplicates though. I think using HASHBYTES with MD5 might be less likely to produced duplicates but still no guaranteed!

The big mistake my client had made was to assume that if CHECKSUM(x) = CHECKSUM(y) then x = y. I'm telling them that all they can assume is that if CHECKSUM(x) <> CHECKSUM(y) then x <> y.
10/30/2008 3:32 PM | Rhames
Gravatar

# re: SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.

Hi,

when I execute your script on our machine (SQL Server Ent.Ed. 64bit, 10.0.1600.22 ENGLISH, SQL_Latin1_General_CP1_CI_AS) then I receive the following distinct values:

-2137942929 10 St James Street
-113797783 10 St James' Street
-956354985 1'0' S't' J'a'm'e's' S't'r'e'e't

May be one of your SQL Server or database options is different from ours.

Greetings
Cordt
8/10/2011 3:42 PM | Cordt
Gravatar

# re: SQL Server CHECKSUM() with Single Quotes (‘) gives duplicate values.


SELECT CHECKSUM('KD')
, CHECKSUM('NT')

results both in 2321
9/9/2011 9:41 AM | Kees Frentz
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: