Geeks With Blogs
Jim Giercyk

SQL offers many different methods to produce the same results.  There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set.  Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient. 

For the queries below, I downloaded the test database from SQLSkills:  http://www.sqlskills.com/sql-server-resources/sql-server-demos/There isn’t a lot of data, but enough to prove my point: dbo.member has 10,000 records, and dbo.payment has 15,554.  Our result set contains 6,706 records.

The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.

 

/*************/
/* Sub Query  */
/*************/
SELECT  a.[Member Number] ,
        m.lastname ,
        m.firstname ,
        a.[Number Of Payments] ,
        a.[Average Payment] ,
        a.[Total Paid]
FROM    ( SELECT    member_no 'Member Number' ,
                    AVG(payment_amt) 'Average Payment' ,
                    SUM(payment_amt) 'Total Paid' ,
                    COUNT(Payment_No) 'Number Of Payments'
          FROM      dbo.payment
          GROUP BY  member_no
          HAVING    COUNT(Payment_No) > 1
        ) a
        JOIN dbo.member m ON a.[Member Number] = m.member_no
       

/***************/
/* Cross Apply  */
/***************/
SELECT  ca.[Member Number] ,
        m.lastname ,
        m.firstname ,
        ca.[Number Of Payments] ,
        ca.[Average Payment] ,
        ca.[Total Paid]
FROM    dbo.member m
        CROSS APPLY ( SELECT    member_no 'Member Number' ,
                                AVG(payment_amt) 'Average Payment' ,
                                SUM(payment_amt) 'Total Paid' ,
                                COUNT(Payment_No) 'Number Of Payments'
                      FROM      dbo.payment
                      WHERE     member_no = m.member_no
                      GROUP BY  member_no
                      HAVING    COUNT(Payment_No) > 1
                    ) ca


/********/                   
/* CTEs  */
/********/
;
WITH    Payments
          AS ( SELECT   member_no 'Member Number' ,
                        AVG(payment_amt) 'Average Payment' ,
                        SUM(payment_amt) 'Total Paid' ,
                        COUNT(Payment_No) 'Number Of Payments'
               FROM     dbo.payment
               GROUP BY member_no
               HAVING   COUNT(Payment_No) > 1
             ),
        MemberInfo
          AS ( SELECT   p.[Member Number] ,
                        m.lastname ,
                        m.firstname ,
                        p.[Number Of Payments] ,
                        p.[Average Payment] ,
                        p.[Total Paid]
               FROM     dbo.member m
                        JOIN Payments p ON m.member_no = p.[Member Number]
             )
    SELECT  *
    FROM    MemberInfo


/************************/
/* SELECT with Grouping   */
/************************/

SELECT  p.member_no 'Member Number' ,
        m.lastname ,
        m.firstname ,
        COUNT(Payment_No) 'Number Of Payments' ,
        AVG(payment_amt) 'Average Payment' ,
        SUM(payment_amt) 'Total Paid'
FROM    dbo.payment p
        JOIN dbo.member m ON m.member_no = p.member_no
GROUP BY p.member_no ,
        m.lastname ,
        m.firstname
HAVING  COUNT(Payment_No) > 1

 

We can see what is going on in SQL’s brain by looking at the execution plan.  The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes.  SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan.  We can settle this once and for all.  Here is what SQL did with these queries:

 

image

Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them.  Everybody is right…..I guess we can all finally go to lunch together!  But wait a second, I may not be a fighter, but I AM an instigator.  Smile   Let’s see how a table variable stacks up.  Here is the code I executed:

/********************/

/*  Table Variable  */
/********************/
DECLARE @AggregateTable TABLE
    (
      member_no INT ,
      AveragePayment MONEY ,
      TotalPaid MONEY ,
      NumberOfPayments MONEY
    )

INSERT  @AggregateTable
        SELECT  member_no 'Member Number' ,
                AVG(payment_amt) 'Average Payment' ,
                SUM(payment_amt) 'Total Paid' ,
                COUNT(Payment_No) 'Number Of Payments'
        FROM    dbo.payment
        GROUP BY member_no
        HAVING  COUNT(Payment_No) > 1
 

SELECT  at.member_no 'Member Number' ,
        m.lastname ,
        m.firstname ,
        at.NumberOfPayments 'Number Of Payments' ,
        at.AveragePayment 'Average Payment' ,
        at.TotalPaid 'Total Paid'
FROM    @AggregateTable at
        JOIN dbo.member m ON m.member_no = at.member_no

In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query.  Here’s what I got:

 

image

 

Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps.  BUT, the combination of the 2 steps is only 22% of the batch.  It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan.  The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan.  While this is true, the estimate does not come in to play until you read from the table variable.  In this case, the table variable had 6,706 rows, but it still outperformed the other queries.  People argue that table variables should only be used for hash or lookup tables.  The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries.

If anyone does volume testing on this theory, I would be interested in the results.  My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is.

Coding SQL is a matter of style.  If you’ve been around since they introduced DB2, you were probably taught a little differently than a recent computer science graduate.  If you have a company standard, I strongly recommend you follow it.  Winking smile  If you do not have a standard, generally speaking, there is no right or wrong answer when talking about the efficiency of these types of queries, and certainly no hard-and-fast rule.  Volume and infrastructure will dictate a lot when it comes to performance, so your results may vary in your environment.  Download the database and try it!

Posted on Wednesday, October 16, 2013 12:06 PM Table variable , efficiency , cross apply , cte , subquery | Back to top


Comments on this post: Performance considerations for common SQL queries

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Jim Giercyk | Powered by: GeeksWithBlogs.net