Using “Update to a local variable” to calculate running totals in SQL.
Recently I was looking at an existing view on a client’s SQL server 2005 database. This view calculated the running total for a transaction amount from a table, but was performing very poorly.
I had always believed there were three different methods for calculating a running total using TSQL:
1. Use a nested sub-query
2. Use a self join
3. Use Cursors
My own personal preference was to use the cursors option. If the cursor guidelines are followed, I’ve always found this to be the quickest, because the other two methods involve multiple scans of the table. The key for the cursor method is to ensure the data you are “cursoring” through is in the correct order, as the query optimzier does not understand cursors. This usually means cursoring through the data by clustered index, or copying the data into a temp table / table var first, in the relevant order.
A blog posted by Garth Wells back in 2001 gives these three techniques (http://www.sqlteam.com/article/calculating-running-totals)
I came across a fourth technique for the running total calculation, which is related to the cursor method. Like the cursor method, it involves a single scan of the source table, then inserting the calculated running total for each row into a temp table or table variable. However, instead of using a cursor, it makes use of the following UPDATE command syntax:
UPDATE table
SET variable = column = expression
The TSQL to calculate the running total is:
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
DECLARE @RunningTotal money
SET @RunningTotal = 0
INSERT INTO @SalesTbl
SELECT DayCount, Sales, null
FROM Sales
ORDER BY DayCount
UPDATE @SalesTbl
SET @RunningTotal = RunningTotal = @RunningTotal + Sales
FROM @SalesTbl
SELECT * FROM @SalesTbl
I tested this query along with the other three methods on a simple set of test data (actually the same test data from Garth Wells’ blog mentioned above).
The results of my test runs are:
Method | Time Taken |
Nested sub-query | 9300 ms |
Self join | 6100 ms |
Cursor | 400 ms |
Update to local variable | 140 ms |
I was surprised just how much faster using the “Update to a local variable” method was. I expected it to be similar to the cursor method, as both involve a single scan of the source table, and both calculate the running total once only for each row in the table. The Nested Sub-query and Self join methods are so much slower because they involve the repeated recalculation of all of the previous running totals.
Note: There is a pretty big assumption in using the “Update to local variable” method. This is that the Update statement will update the rows in the temp table in the correct order. There is no simple way to specify the order for an Update statement, so potentially this method could fail, although I have not seen this actually happen yet!
I think that if I use a table variable, then the update will probably be in the correct order, because there are no indexes for the query optimizer to use, and parallellism will not occur. However, I can’t be sure about this!
The following script was used to create the test data:
CREATE TABLE Sales (DayCount smallint, Sales money)
CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)
go
INSERT Sales VALUES (1,120)
INSERT Sales VALUES (2,60)
INSERT Sales VALUES (3,125)
INSERT Sales VALUES (4,40)
DECLARE @DayCount smallint, @Sales money
SET @DayCount = 5
SET @Sales = 10
WHILE @DayCount < 5000
BEGIN
INSERT Sales VALUES (@DayCount,@Sales)
SET @DayCount = @DayCount + 1
SET @Sales = @Sales + 15
END
The queries used in my tests for the other three methods are posted below:
1. Nested Sub-query
SELECT DayCount,
Sales,
Sales+COALESCE((SELECT SUM(Sales)
FROM Sales b
WHERE b.DayCount < a.DayCount),0)
AS RunningTotal
FROM Sales a
ORDER BY DayCount
2. Self join
SELECT a.DayCount,
a.Sales,
SUM(b.Sales)
FROM Sales a
INNER JOIN Sales b
ON (b.DayCount <= a.DayCount)
GROUP BY a.DayCount,a.Sales
ORDER BY a.DayCount,a.Sales
3. Cursor
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
DECLARE @DayCount smallint,
@Sales money,
@RunningTotal money
SET @RunningTotal = 0
DECLARE rt_cursor CURSOR
FOR
SELECT DayCount, Sales
FROM Sales
ORDER BY DayCount
OPEN rt_cursor
FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @Sales
INSERT @SalesTbl VALUES (@DayCount,@Sales,@RunningTotal)
FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
END
CLOSE rt_cursor
DEALLOCATE rt_cursor
SELECT * FROM @SalesTbl