Geeks With Blogs
The Quandary Phase This code was generated by a tool.

Here’s a quick tip for string cleaning in SQL. Let’s say you have a string, perhaps one being imported from another data source, and you want to trim all the leading and/or trailing non-alphanumeric characters. For example, you may have the input string:

',,, foo bar $$$'

Which needs to be cleaned before inserting into the database. Perhaps you want to import it as this:

'foo bar $$$'

Having removed all the leading junk. Or perhaps you want to import it as this:

'foo bar'

Having removed the garbage from both sides. One way you could do this is using a UDF to loop through the characters in the input string, for example:

   1: create FUNCTION fn_ltrim_char
   2: (
   3:     @input nvarchar(255),
   4:     @charToTrim char
   5: )
   6: RETURNS nvarchar(4000)
   7: AS BEGIN
   8:  
   9:     DECLARE @counter int
  10:     SET @counter = 0
  11:  
  12:     DECLARE @output nvarchar(4000)
  13:     SET @output = ''
  14:  
  15:     WHILE @counter <= LEN(@input) BEGIN
  16:  
  17:         IF @output <> '' OR SUBSTRING(@input,@counter,1) <> @charToTrim
  18:             SET @output = @output + SUBSTRING(@input,@counter,1)
  19:  
  20:         SET @counter = @counter + 1
  21:  
  22:     END
  23:  
  24:     RETURN @output
  25: END

Whilst there’s nothing really wrong with this approach, it does feel a little clunky. There’s a lot of duplication of data during the copy and the whole idea of having to write custom a UDF to do this feels a bit overkill.

So, is it possible to do this without resorting to a custom function? Well, yes. This can be achieved in a single line of SQL using the handy PATINDEX built-in function. Here’s an example of using PATINDEX to trim the leading non-alphanumeric characters from a string:

   1: --trim leading non-alphanumeric chars
   2: DECLARE @input nvarchar(4000)
   3: SET @input = ',,, foo bar $$$'
   4:  
   5: SELECT SUBSTRING(@input, PATINDEX('%[a-zA-Z0-9]%', @input), LEN(@input))

In the above code snippet, PATINDEX is used to find the first occurrence of an alphabetic or numeric character in the string (i.e. the first character we want to keep). Once this is found, it’s a straightforward task to extract a substring starting at this character, and discard the rest. 

When it comes to trimming the trailing junk characters, however, things are not quite so simple. PATINDEX is used to search for the first given occurrence of a substring which matches the search pattern. Note: there is no out-of-the-box solution for finding the index of the last matching substring- not by using PATINDEX alone, at any rate. I did a (very) quick survey of other developers’ solutions to this problem, and came up with this one. However, this solution involves the use of another custom UDF, which is exactly what I wanted to avoid.

However, all was not lost just yet. Consider the following: we already have a solution that works for removing the leading non-alphanumeric characters, removing the trailing alphanumeric characters is basically the same problem in reverse, and T-SQL has a wonderful little built-in function which can REVERSE a string. So, actually, all we need to do to discover the last given occurrence of a matching substring in a string, is to:

  1. Reverse the string.
  2. Find the first occurrence of the search pattern in the string and store the index.
  3. Subtract that index from the whole string length to give you the same index relative to the end of the string.

Here’s an example of doing just that:

   1: DECLARE @input nvarchar(4000)
   2: SET @input = 'foo bar%&^$$$'
   3:  
   4: --get the index of the last alphanumeric character
   5: DECLARE @lastOccurrenceIndex int
   6: SELECT @lastOccurrenceIndex = LEN(@input) - PATINDEX('%[a-zA-Z0-9]%', REVERSE(@input))
   7:  
   8: --select everything up to & including the last alphanumeric character
   9: SELECT SUBSTRING(@input, 1, @lastOccurrenceIndex + 1)
So, now we need to put the above technique into action to solve the problem at hand. The next example shows how to trim trailing non-alphanumeric characters using PATINDEX and REVERSE:
   1: --trim trailing non-alphanumeric chars
   2: DECLARE @input nvarchar(4000)
   3: SET @input = ',,, foo bar $$$'
   4:  
   5: SELECT REVERSE(SUBSTRING(REVERSE(@input), PATINDEX('%[a-zA-Z0-9]%', REVERSE(@input)), LEN(@input)))

And, finally, we can now combine both techniques to trim both leading and trailing non-alphanumeric characters, like so:

   1: --trim both leading and trailing non-alphanumeric chars
   2: DECLARE @input nvarchar(4000)
   3: SET @input = ',,, foo bar $$$'
   4:  
   5: SELECT REVERSE(SUBSTRING(REVERSE(SUBSTRING(@input, PATINDEX('%[a-zA-Z0-9]%', @input), 
   6:     LEN(@input))), PATINDEX('%[a-zA-Z0-9]%', REVERSE(SUBSTRING(@input, 
   7:     PATINDEX('%[a-zA-Z0-9]%', @input), LEN(@input)))), LEN(@input)))

And there we have it- a neatly cleaned string, with no need for any additional UDFs.

Posted on Tuesday, August 11, 2009 7:56 AM | Back to top


Comments on this post: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters

# classic ugg boots
Requesting Gravatar...
tring, with no need for any additional UDFs.Ugg Bailey Fancy Boots
Classic Tall Ugg Boots
Left by ugg on Oct 31, 2010 3:29 PM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
Thanks for sharing this informative post. It will help us a great deal. Keep up the good work.
Left by ugg adirondack tall on Nov 17, 2010 6:21 PM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
Your blog article is very intersting and fantastic,at the same time the blog theme is unique and perfect,great job.To your success.One of the more impressive blogs Ive seen. Thanks so much for keeping the internet classy for a change.I ran across search engines and find this blog that fulfill my needs. There is a thing I do not agree but It doesn't matter since I think it does not hurt the whole content.
The added inshight increases my insights about:Indonesia Furniture | Indonesian Teak Furniture.
Left by John Agung Vanderbilt on Dec 07, 2010 5:02 AM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
These looks like a very hard to understand tutorial.. Anyway, I'm gonna figure it out soon. Thanks
Left by Free Online Dating Service on Feb 04, 2011 5:09 AM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
Your blog article is very intersting and fantastic,at the same time the blog theme is unique and perfect,great job.hermes birkin bags
Left by hegs plan on Feb 14, 2011 8:00 PM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
The chinese word for crisis is divided into two characters, one meaning danger and the other meaning opportunity ejoicing in hope, patient in tribulation.
Left by wow power leveling on Mar 09, 2011 3:37 PM

# re: SQL: Trimming Leading and Trailing Non-Alphanumeric Characters
Requesting Gravatar...
Very helpful. I already implement it for mySQL.
Left by rfcommunity on Mar 20, 2011 4:50 AM

Comments have been closed on this topic.
Copyright © Adam Pooler | Powered by: GeeksWithBlogs.net