Robin Hames

Hints, tricks and tips relating to MS SQL Server and .NET
posts - 14 , comments - 43 , trackbacks - 0

Calculating Running Totals in SQL Server 2005, The optimal solution?

Using “Update to a local variable” to calculate running totals in SQL.
Recently I was looking at an existing view on a client's SQL server 2005 database. This view calculated the running total for a transaction amount from a table, but was performing very poorly.
I had always believed there were three different methods for calculating a running total using TSQL:
1.     Use a nested sub-query
2.     Use a self join
3.     Use Cursors
My own personal preference was to use the cursors option. If the cursor guidelines are followed, I've always found this to be the quickest, because the other two methods involve multiple scans of the table. The key for the cursor method is to ensure the data you are "cursoring" through is in the correct order, as the query optimzier does not understand cursors. This usually means cursoring through the data by clustered index, or copying the data into a temp table / table var first, in the relevant order.
A blog posted by Garth Wells back in 2001 gives these three techniques (http://www.sqlteam.com/article/calculating-running-totals)
I came across a fourth technique for the running total calculation, which is related to the cursor method. Like the cursor method, it involves a single scan of the source table, then inserting the calculated running total for each row into a temp table or table variable. However, instead of using a cursor, it makes use of the following UPDATE command syntax:
UPDATE table
SET variable = column = expression
The TSQL to calculate the running total is:
 
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
 
DECLARE @RunningTotal money
 
SET @RunningTotal = 0
 
INSERT INTO @SalesTbl 
SELECT DayCount, Sales, null
FROM Sales
ORDER BY DayCount
 
UPDATE @SalesTbl
SET @RunningTotal = RunningTotal = @RunningTotal + Sales
FROM @SalesTbl
 
SELECT * FROM @SalesTbl
 
I tested this query along with the other three methods on a simple set of test data (actually the same test data from Garth Wells’ blog mentioned above).
The results of my test runs are:

 

Method
Time Taken
Nested sub-query
9300 ms
Self join
6100 ms
Cursor
400 ms
Update to local variable
140 ms
I was surprised just how much faster using the “Update to a local variable” method was. I expected it to be similar to the cursor method, as both involve a single scan of the source table, and both calculate the running total once only for each row in the table. The Nested Sub-query and Self join methods are so much slower because they involve the repeated recalculation of all of the previous running totals.

Note: There is a pretty big assumption in using the “Update to local variable” method. This is that the Update statement will update the rows in the temp table in the correct order. There is no simple way to specify the order for an Update statement, so potentially this method could fail, although I have not seen this actually happen yet!

I think that if I use a table variable, then the update will probably be in the correct order, because there are no indexes for the query optimizer to use, and parallellism will not occur. However, I can't be sure about this!

The following script was used to create the test data:
CREATE TABLE Sales (DayCount smallint, Sales money)
CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)
go
INSERT Sales VALUES (1,120)
INSERT Sales VALUES (2,60)
INSERT Sales VALUES (3,125)
INSERT Sales VALUES (4,40)
 
DECLARE @DayCount smallint, @Sales money
SET @DayCount = 5
SET @Sales = 10
 
WHILE @DayCount < 5000
BEGIN
INSERT Sales VALUES (@DayCount,@Sales)
SET @DayCount = @DayCount + 1
SET @Sales = @Sales + 15
END
The queries used in my tests for the other three methods are posted below:
1.     Nested Sub-query
SELECT DayCount,
       Sales,
       Sales+COALESCE((SELECT SUM(Sales)
                      FROM Sales b
                      WHERE b.DayCount < a.DayCount),0)
                         AS RunningTotal
FROM Sales a
ORDER BY DayCount
2.     Self join
SELECT a.DayCount,
       a.Sales,
       SUM(b.Sales)
FROM Sales a
INNER JOIN Sales b
ON (b.DayCount <= a.DayCount)
GROUP BY a.DayCount,a.Sales
ORDER BY a.DayCount,a.Sales
3.     Cursor
 
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
 
DECLARE @DayCount smallint,
        @Sales money,
        @RunningTotal money
 
SET @RunningTotal = 0
 
DECLARE rt_cursor CURSOR
FOR
SELECT DayCount, Sales
FROM Sales
ORDER BY DayCount
 
OPEN rt_cursor
 
FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
 
WHILE @@FETCH_STATUS = 0
 BEGIN
 SET @RunningTotal = @RunningTotal + @Sales
 INSERT @SalesTbl VALUES (@DayCount,@Sales,@RunningTotal)
 FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
 END
 
CLOSE rt_cursor
DEALLOCATE rt_cursor
 
SELECT * FROM @SalesTbl
 

Print | posted on Tuesday, October 28, 2008 10:22 AM | Filed Under [ SQL Server ]

Feedback

Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Note that this method does not guarantee the order for the running total.
You will need (at least) a clustered index in same order as the running total business rules, and use a table lock for the table.

If those two conditions are met, you should be safe.
2/11/2009 10:04 PM | Peso
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Yeah. Thank you very much for this info. and I learn a lot from this.
Before, I use recursive cte for running count. I saw a big difference when I dealt with 1000+ rows compared in using this you so called “Update to a local variable”.
9/10/2009 8:59 AM | Nhell
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

this is very nice peice of code really like it but still i m not fully confident due to update query
so i modified it a little bit
pls let me know wht u think abt this query


create TABLE tb_test(Id numeric(18 , 0) identity (1, 1) , AutoId numeric(18 , 0), Total numeric( 25 , 7) , RunningTotal numeric( 25 , 7))

DECLARE @RunningTotal numeric( 25 , 7)

SET @RunningTotal = 0

INSERT INTO tb_test
SELECT Autoid, Total , null
FROM Tb_Accountledger where accountid = 4522
ORDER BY vdate , autoid

UPDATE tb_test
SET @RunningTotal = tb_test.RunningTotal = @RunningTotal + tb_test.total
FROM tb_test inner join tb_test a
ON tb_test.id = a.id

SELECT * FROM tb_test
drop table tb_test
9/25/2009 4:17 PM | khurram sultan
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Hi Robin,
Good article but its worth saying that there is no guarantee that this technique will work. It is not 100% reliable.

Read more: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/fa337bbf-1721-44e3-8b79-e02431154df1

Thank you for writing this blog post though because it prompted me to investigate and hence I wrote the forum thread linked to above.

Regards
-Jamie
9/29/2009 2:08 PM | Jamie Thomson
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Hi Robin,

Thanks alot for this solution. Nice and simple solution.
10/28/2009 9:47 AM | srini
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Does anyone know if such a method can work in ms access, and how?


Thanks
11/5/2009 1:04 PM | wytse
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

This methods some times is 100% reliable, for example if you have a recordset with smalldatetime / datetime col - then you can easily sotr before an update -

update #t
set .....
select * from #t
order by date

So for me this method is enough good
1/18/2010 7:56 PM | Jakub
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Hi guys,

I implemented a running total solution using multiple cursors a while ago (sounds bad i know).
We were using the first to get a company then update the companies running balance thus making sure that only the companies running total was updated and not the entire table full of other companies.
We had 24000+ rows and using the subquery method took 7.5 seconds, the cursor approach took < 1 second.
Was pleasantly surprised!

11/12/2010 3:03 PM | Ric
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Hi guys,

I think, the fastest possible way for calculating the running totals is to use CLR on SQL Server 2005+.

The CLR method is unbeatable comparing other methods.

I have a complete description on my blog here: http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/
1/11/2011 10:41 PM | Pavel Pawlowski
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

simple and noce
6/17/2011 10:23 AM | Paramesh Nomula
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Thank you so much I don't know how long ago you posted this but it's still a great help. Thank you.
8/7/2011 9:16 PM | A. Sorek
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

This is very nicely documented. Thank you for the excellent tip.
9/16/2011 8:33 PM | RobF
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Nice article, but can you tell the number of rows on which the results were executed, because sometimes, the number of rows effect the performance of the query as well....
9/26/2011 1:13 PM | Naveed Butt
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

Thank you! This is exactly what I was looking for, but I don't understand how it works.

I'm confused on this part:

UPDATE table
SET variable = column = expression

Is it a double assignment of the RunningTotal variable (reason for two equals signs)?

How does SQL know to take the record from the previous row to make it cumulative?
10/18/2011 6:47 PM | Mr T
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

excellent! Easy and so fast!
7/23/2012 2:44 AM | Chris
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

excellent! Easy and so fast and Thanks for Sharing this

Vikas Mahajan
10/9/2013 6:46 AM | Vikas Mahajan
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

This article is great, thanks - but what if I wanted to introduce another column in there like employee_name, where it keeps a running total for each employee (in other words the running total starts at 0 again when the employee_name changes)? Thanks in advance!
1/8/2014 10:23 PM | jbomb
Gravatar

# re: Calculating Running Totals in SQL Server 2005, The optimal solution?

The article is good but can be done using CTE.SQL Server 2012 has indroduced the Order by clause with sum and partition function.
2/3/2014 7:59 AM | Hrushi
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 
 

Powered by: