Robin Hames

Hints, tricks and tips relating to MS SQL Server and .NET
posts - 14 , comments - 45 , trackbacks - 0

Tuesday, October 28, 2008

Creating SQL Wildcards with Recursion and a Common Table Expression in SQL Server 2005

Creating Test Wildcards with a Common Table Expression in SQL Server 2005
 
I needed to create a SQL Server table of test wildcard strings ranging from ‘AA%’ through to ‘ZZ%’. This seemed to be a prime candidate for using recursion with a Common Table Expression (CTE). After a little playing around I came up with the following:
with wildcards as
(select char(65) + char(65) + '%' as wildcard, 65 as num1, 65 as num2
union all
select char(
            case when num2 = 90 then num1 + 1
            else num1 end)
+ char(
            case when num2 = 90 then 65
            else num2 + 1 end)
      + '%',
      case when num2 = 90 then num1 + 1
            else num1 end,
      case when num2 = 90 then 65
            else num2 + 1 end
from wildcards where num1 < 91
and (num1 <> 90 or num2 <> 90)
)
 
Obviously, to actually put the results somewhere useful, I had an INSERT statement which selected from the Wildcards CTE:
insert into TestFilter (Filter)
select wildcard from wildcards option(MaxRecursion 676)
 
Selecting the num1 and num2 values from the Wildcards CTE shows you what is going on:
select * from wildcards option(MaxRecursion 676)
 

Posted On Tuesday, October 28, 2008 3:21 PM | Comments (1) | Filed Under [ SQL Server ]

Calculating Running Totals in SQL Server 2005, The optimal solution?

Using “Update to a local variable” to calculate running totals in SQL.
Recently I was looking at an existing view on a client's SQL server 2005 database. This view calculated the running total for a transaction amount from a table, but was performing very poorly.
I had always believed there were three different methods for calculating a running total using TSQL:
1.     Use a nested sub-query
2.     Use a self join
3.     Use Cursors
My own personal preference was to use the cursors option. If the cursor guidelines are followed, I've always found this to be the quickest, because the other two methods involve multiple scans of the table. The key for the cursor method is to ensure the data you are "cursoring" through is in the correct order, as the query optimzier does not understand cursors. This usually means cursoring through the data by clustered index, or copying the data into a temp table / table var first, in the relevant order.
A blog posted by Garth Wells back in 2001 gives these three techniques (http://www.sqlteam.com/article/calculating-running-totals)
I came across a fourth technique for the running total calculation, which is related to the cursor method. Like the cursor method, it involves a single scan of the source table, then inserting the calculated running total for each row into a temp table or table variable. However, instead of using a cursor, it makes use of the following UPDATE command syntax:
UPDATE table
SET variable = column = expression
The TSQL to calculate the running total is:
 
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
 
DECLARE @RunningTotal money
 
SET @RunningTotal = 0
 
INSERT INTO @SalesTbl 
SELECT DayCount, Sales, null
FROM Sales
ORDER BY DayCount
 
UPDATE @SalesTbl
SET @RunningTotal = RunningTotal = @RunningTotal + Sales
FROM @SalesTbl
 
SELECT * FROM @SalesTbl
 
I tested this query along with the other three methods on a simple set of test data (actually the same test data from Garth Wells’ blog mentioned above).
The results of my test runs are:

 

Method
Time Taken
Nested sub-query
9300 ms
Self join
6100 ms
Cursor
400 ms
Update to local variable
140 ms
I was surprised just how much faster using the “Update to a local variable” method was. I expected it to be similar to the cursor method, as both involve a single scan of the source table, and both calculate the running total once only for each row in the table. The Nested Sub-query and Self join methods are so much slower because they involve the repeated recalculation of all of the previous running totals.

Note: There is a pretty big assumption in using the “Update to local variable” method. This is that the Update statement will update the rows in the temp table in the correct order. There is no simple way to specify the order for an Update statement, so potentially this method could fail, although I have not seen this actually happen yet!

I think that if I use a table variable, then the update will probably be in the correct order, because there are no indexes for the query optimizer to use, and parallellism will not occur. However, I can't be sure about this!

The following script was used to create the test data:
CREATE TABLE Sales (DayCount smallint, Sales money)
CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)
go
INSERT Sales VALUES (1,120)
INSERT Sales VALUES (2,60)
INSERT Sales VALUES (3,125)
INSERT Sales VALUES (4,40)
 
DECLARE @DayCount smallint, @Sales money
SET @DayCount = 5
SET @Sales = 10
 
WHILE @DayCount < 5000
BEGIN
INSERT Sales VALUES (@DayCount,@Sales)
SET @DayCount = @DayCount + 1
SET @Sales = @Sales + 15
END
The queries used in my tests for the other three methods are posted below:
1.     Nested Sub-query
SELECT DayCount,
       Sales,
       Sales+COALESCE((SELECT SUM(Sales)
                      FROM Sales b
                      WHERE b.DayCount < a.DayCount),0)
                         AS RunningTotal
FROM Sales a
ORDER BY DayCount
2.     Self join
SELECT a.DayCount,
       a.Sales,
       SUM(b.Sales)
FROM Sales a
INNER JOIN Sales b
ON (b.DayCount <= a.DayCount)
GROUP BY a.DayCount,a.Sales
ORDER BY a.DayCount,a.Sales
3.     Cursor
 
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
 
DECLARE @DayCount smallint,
        @Sales money,
        @RunningTotal money
 
SET @RunningTotal = 0
 
DECLARE rt_cursor CURSOR
FOR
SELECT DayCount, Sales
FROM Sales
ORDER BY DayCount
 
OPEN rt_cursor
 
FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
 
WHILE @@FETCH_STATUS = 0
 BEGIN
 SET @RunningTotal = @RunningTotal + @Sales
 INSERT @SalesTbl VALUES (@DayCount,@Sales,@RunningTotal)
 FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
 END
 
CLOSE rt_cursor
DEALLOCATE rt_cursor
 
SELECT * FROM @SalesTbl
 

Posted On Tuesday, October 28, 2008 10:22 AM | Comments (18) | Filed Under [ SQL Server ]

Powered by: