Why SQL TOP may slow down your query and how to fix it

ABSTRACT

This article shows that adding a TOP clause to the SQL SELECT query may greatly increase its execution time. It also shows very effective solution to fix this problem.

INTRODUCTION

Lately I was conducting performance tests for a new project. I built sample database, filled it with data and began testing. The database schema is not important here. All you should know is that I had about 3M of records in main table [Articles] and about 11M of records in tables which I joined. I used this naive query at first:

SELECT

[Articles]
.Id

    ,
    CountryCategories.Name

    ,
    CityCategories
        .Name

            FROM[Articles]

    INNER JOIN CategoryCountry2Articles

        ON[Articles]
        .Id = CategoryCountry2Articles
                  .IdArticle

                      INNER JOIN CountryCategories

                          ON CountryCategories.Id =
        CategoryCountry2Articles
            .IdCountry

                INNER JOIN CategoryCity2Articles

                    ON[Articles]
            .Id = CategoryCity2Articles
                      .IdArticle

                          INNER JOIN CityCategories

                              ON CityCategories.Id =
            CategoryCity2Articles
                .IdCity

                    WHERE CountryCategories.Name = 'country1'

                                                   AND CityCategories.Name =
                'city4'

This query took 00:04 (seconds) to execute. It returned far too much records, so I thought it would be a good idea to use TOP N clause to limit them. After all, limiting the number of records to be returned should decrease execution time. And certainly it should not increase it. Query optimizer however is odd enough to refute my assumptions. By adding a TOP clause to aforementioned query:

SELECT TOP(5)

[Articles]
    .Id,

    (…)

query execution time jumped to 01:14! How is that possible?

EXPLANATION

Execution plans for both queries may help to explain it. For base query:

http://imageshack.us/a/img195/5361/sqltop1.png

and for query with TOP clause:

http://imageshack.us/a/img521/9523/sqltop2.png

The main difference between those two execution plans is that in first plan optimizer uses Hash Matches, while in second it uses Nested Loops. Both symbols generally describe the same operation (join). However Hash Matches are efficient with large sets, while Nested Loops only with small sets. This is why the latter query performs so badly. It is the query optimizer that makes wrong decision.

But let’s assume that for some reason you do not want to abandon the idea of using TOP clause here. There are two ways to induce query optimizer to take right decision.

SOLUTIONS

First solution is to use explicit hash joins in your query:

SELECT TOP 5

[Articles]
    .Id

    ,
    CountryCategories.Name

    ,
    CityCategories
        .Name

            FROM[Articles]

    INNER HASH JOIN CategoryCountry2Articles

        ON[Articles]
        .Id = CategoryCountry2Articles
                  .IdArticle

                      INNER HASH JOIN CountryCategories

                          ON CountryCategories.Id =
        CategoryCountry2Articles
            .IdCountry

                INNER HASH JOIN CategoryCity2Articles

                    ON[Articles]
            .Id = CategoryCity2Articles
                      .IdArticle

                          INNER HASH JOIN CityCategories

                              ON CityCategories.Id =
            CategoryCity2Articles
                .IdCity

                    WHERE CountryCategories.Name = 'country1'

                                                   AND CityCategories.Name =
                'city4'

Execution time: 00:07. Execution plan:

http://imageshack.us/a/img812/2150/sqltop3.png

As you can see, this time Hash Matches are used.

Second and most effective solution is to use SQL variable:

DECLARE @topCount INT

    SET @topCount = 5

    SELECT TOP(@topCount)

        (...)

Execution time: 00:00.2 (0,2 sec).

Quick look at the execution plan:

http://imageshack.us/a/img802/6742/sqltop4.png

It shows that query optimizer did not only choose to use Hash Matches, but it also shows high degree of parallelism in the query. This is why it is so incredibly efficient.

I am not able to explain why using variable pushes query optimizer towards those decisions. But in case you are struggling with the same problem as I did, try this solution. I hope it helps.

Share This Post:
Short Url: http://wblo.gs/dXS

posted @ Wednesday, January 30, 2013 10:27 PM

This article is part of the GWB Archives. Original Author: Marcin Kasprzykowski`s blog

Replatforming Guide: Pros, Cons, and Impact

Deciding to replatform is no small feat; it’s like setting sails for new horizons with your digital presence. Weighing the

Cypress vs Selenium: Why Cypress is Better!

Navigating the competitive landscape of web testing tools, Cypress emerges as a noteworthy contender, outshining Selenium with its cutting-edge advantages.

Why SQL TOP may slow down your query and how to fix it

Related Posts

Replatforming Guide: Pros, Cons, and Impact

Cypress vs Selenium: Why Cypress is Better!