Why SQL TOP may slow down your query and how to fix it

ABSTRACT

This article shows that adding a TOP clause to the SQL SELECT query may greatly increase its execution time. It also shows very effective solution to fix this problem.


INTRODUCTION

Lately I was conducting performance tests for a new project. I built sample database, filled it with data and began testing. The database schema is not important here. All you should know is that I had about 3M of records in main table [Articles] and about 11M of records in tables which I joined. I used this naive query at first:

SELECT

[Articles].Id

,CountryCategories.Name

,CityCategories.Name

FROM [Articles]

INNER JOIN CategoryCountry2Articles

ON [Articles].Id = CategoryCountry2Articles.IdArticle

INNER JOIN  CountryCategories

ON CountryCategories.Id = CategoryCountry2Articles.IdCountry

INNER JOIN CategoryCity2Articles

ON [Articles].Id = CategoryCity2Articles.IdArticle

INNER JOIN  CityCategories

ON CityCategories.Id = CategoryCity2Articles.IdCity

WHERE CountryCategories.Name = 'country1'

AND CityCategories.Name = 'city4'

 

This query took 00:04 (seconds) to execute. It returned far too much records, so I thought it would be a good idea to use TOP N clause to limit them. After all, limiting the number of records to be returned should decrease execution time. And certainly it should not increase it. Query optimizer however is odd enough to refute my assumptions. By adding a TOP clause to aforementioned query:

SELECT TOP(5)

[Articles].Id,

(…)

query execution time jumped to 01:14!  How is that possible?


EXPLANATION

Execution plans for both queries may help to explain it. For base query:

http://imageshack.us/a/img195/5361/sqltop1.png

 and for query with TOP clause:

http://imageshack.us/a/img521/9523/sqltop2.png

The main difference between those two execution plans is that in first plan optimizer uses Hash Matches, while in second it uses Nested Loops. Both symbols generally describe the same operation (join). However Hash Matches are efficient with large sets, while Nested Loops only with small sets. This is why the latter query performs so badly. It is the query optimizer that makes wrong decision.

But let’s assume that for some reason you do not want to abandon the idea  of using TOP clause here. There are two ways to induce query optimizer to take right decision.


SOLUTIONS

First solution is to use explicit hash joins in your query:

SELECT TOP 5

[Articles].Id

,CountryCategories.Name

,CityCategories.Name

FROM [Articles]

 INNER HASH JOIN CategoryCountry2Articles

ON [Articles].Id = CategoryCountry2Articles.IdArticle

 INNER HASH JOIN  CountryCategories

ON CountryCategories.Id = CategoryCountry2Articles.IdCountry

 INNER HASH JOIN CategoryCity2Articles

ON [Articles].Id = CategoryCity2Articles.IdArticle

 INNER HASH JOIN  CityCategories

ON CityCategories.Id = CategoryCity2Articles.IdCity

WHERE CountryCategories.Name = 'country1'

AND CityCategories.Name = 'city4'

 

Execution time: 00:07. Execution plan:

 http://imageshack.us/a/img812/2150/sqltop3.png

 

As you can see, this time Hash Matches are used.

 

Second and most effective solution is to use SQL variable:

DECLARE @topCount INT

SET @topCount = 5

SELECT TOP (@topCount)

(...)

 

Execution time: 00:00.2 (0,2 sec).

Quick look at the execution plan:

http://imageshack.us/a/img802/6742/sqltop4.png

 

It shows that query optimizer did not only choose to use Hash Matches, but it also shows high degree of parallelism in the query. This is why it is so incredibly efficient.

 

I am not able to explain why using variable pushes query optimizer towards those decisions. But in case you are struggling with the same problem as I did, try this solution. I hope it helps.

 

 

F10 not working in WPF

At least seemingly.

Lately I stumbled across strange WPF behavior. I wanted to catch F10 keydown event. As it turned out it is not so obvious task.

Read more in article

How to add attributes to classes generated by LINQ2SQL or Entity Framework?

I have just published article about interesting way of decoratning generated classes with attributes. Here is a direct link: DIRECT LINK

Using custom font in WPF application

Recently I had some troubles with relatively simple scenario: applying custom font to TextBlock in my WPF application. As it turned out the case is not so trivial. I published an article that I hope will help you in such situations. Direct link is here: [DIRECT LINK]

GroupJoin method tutorial

I finally managed to publish a detailed tutorial about LINQ`s GroupJoin method which I promised long ago. Sorry for being so late   It is available here [DIRECT LINK]

LINQ: GroupBy and GroupJoin tutorials

It has been a while since I published my last article on my blog. Now I`m back hoping to deliver you interesting stuff about advanced methods in LINQ. First article describes the most complex overload of GroupBy method in little details and is available here [DIRECT LINK]. The second one - which will be published soon - shall describe GroupJoin method.

Expression trees manual

Expression trees was the last feature of C# 3.0 that I didn`t feel comfortable about. So I got to the bottom of them and prepared concise article for everyone interested in this topic. Comments are appreciated! Here`s direct link: DIRECT LINK

Another Microsoft exam passed!

I have just passed another Microsoft exam - this time it was related to the technology I truly love - WPF
It was 070-502 (Microsoft Certified Technology Specialist: .NET Framework 3.5 WPF Applications).
51 questions and 2,5h time for all of them. Quite demanding compared to other exams I have taken.
I was preparing for this exam mainly with the book WPF Unleashed by Adam Nathan and Daniel Lehenbauer (who was one of the key WPF developers). I encourage you to try it - astonishing book!

Nice trick for collections of anynomous objects

Lately during development I have encountered a problem while operating on collections using anonymous types, LINQ and lambda expressions. The solution turned out to be quite simple and short, yet very usefull. Short article presenting this problem as well as its solution is available in articles section. Here`s direct link:
DIRECT LINK

Covariance and contravariance demystified

New article regarding covariance, contravariance and invariance in C# language has just been posted in C# category [click here for direct link]. Comments are welcome!
«April»
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910