We had a process ot using BizTalk to take in a reference file, then send data in via a table_type parameter in a stored procedure. The stored procedure uses a MERGE statement to insert and update; the update part of which does a CHECKSUM first to determine whether an update is even necessary.
Then it happened, we had a reference row that should have been updated by the latest file, but it wasn’t done. We checked the incoming and outgoing BizTalk Message in and out of the pipeline; everything looked normal. Then we decided to take out the CHECKSUM part and the update succeeded.
In reading over the Microsoft documentation, I was absolutely shocked when I read this in the remarks section:
However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change
Then I checked the page for BINARY_CHECKSUM, a similar remark exists. http://msdn.microsoft.com/en-us/library/ms173784%28v=SQL.100%29.aspx
According to these 2 articles, we should be using HASHBYTES().
Personally, I was really shocked to find a function in any database that “most of the time” does its job. I really do not know any system that tolerates “occasional” mistakes, or maybe it’s that I haven’t worked in enough industries. I equate this to getting a notice on your bank statement saying “we might not have all your transaction listed, and we don’t know for sure whether we have them all”.
I love quantum physics, but not in my computer systems.
I updated the title of this post to give more clarity to what I was trying to say. You can read the comments below to follow the discussion. I’m really fortunate that so many smart people decided to participate in this discussion to make this post better.
Here is an article that articulates the original idea of this post: http://www.bidn.com/blogs/TomLannen/bidn-blog/2265/using-hashbytes-to-compare-columns