December 2010 Entries

Let's continue exploring some of the SQL Smells from Phil's list. He has been putting together.

Datatype mis-matches in predicates that rely on implicit conversion.(Plamen Ratchev)

This is a great example poking holes in the whole theory of "If it works it's not broken" Queries will this probably will generally work and give the correct response. In fact, without careful analysis, you probably may be completely oblivious that there is even a problem. This subtle little problem will needlessly complicate queries and slow them down regardless of the indexes applied.

Consider this example:

CREATE TABLE [dbo].[Page](

    [PageId] [int] IDENTITY(1,1) NOT NULL,

    [Title] [varchar](75) NOT NULL,

    [Sequence] [int] NOT NULL,

    [ThemeId] [int] NOT NULL,

    [CustomCss] [text] NOT NULL,

    [CustomScript] [text] NOT NULL,

    [PageGroupId] [int] NOT NULL;

 

CREATE PROCEDURE PageSelectBySequence

(

@sequenceMin smallint ,

@sequenceMax smallint

)

AS

BEGIN

SELECT [PageId] ,

[Title] ,

[Sequence] ,

[ThemeId] ,

[CustomCss] ,

[CustomScript] ,

[PageGroupId]

FROM [CMS].[dbo].[Page]

WHERE Sequence BETWEEN @sequenceMin AND @SequenceMax

END

 

Note that the Sequence column is defined as int while the sequence parameter is defined as a small int. The problem is that the database may have to do a lot of type conversions to evaluate the query. In some cases, this may even negate the indexes that you have in place.

Using Correlated subqueries instead of a join   (Dave_Levy/ Plamen Ratchev)

There are two main problems here. The first is a little subjective, since this is a non-standard way of expressing the query, it is harder to understand.

The other problem is much more objective and potentially problematic. You are taking much of the control away from the optimizer. Written properly, such a query may well out perform a corresponding query written with traditional joins. More likely than not, performance will degrade.

Whenever you assume that you know better than the optimizer, you will most likely be wrong. This is the fundmental problem with any hint.

Consider a query like this:

 

SELECT Page.Title ,

Page.Sequence ,

Page.ThemeId ,

Page.CustomCss ,

Page.CustomScript ,

PageEffectParams.Name ,

PageEffectParams.Value ,

( SELECT EffectName

FROM dbo.Effect

WHERE EffectId = dbo.PageEffects.EffectId

) AS EffectName

FROM Page

INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId

INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId

 

This can and should be written as:

 

SELECT Page.Title ,

Page.Sequence ,

Page.ThemeId ,

Page.CustomCss ,

Page.CustomScript ,

PageEffectParams.Name ,

PageEffectParams.Value ,

EffectName

FROM Page

INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId

INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId

INNER JOIN dbo.Effect ON dbo.Effects.EffectId = dbo.PageEffects.EffectId

 

The correlated query may just as easily show up in the where clause. It's not a good idea in the select clause or the where clause.

Few or No comments.

This one is a bit more complicated and controversial. All comments are not created equal. Some comments are helpful and need to be included. Other comments are not necessary and may indicate a problem.

I tend to follow the rule of thumb that comments that explain why are good. Comments that explain how are bad. Many people may be shocked to hear the idea of a bad comment, but hear me out.

If a comment is needed to explain what is going on or how it works, the logic is too complex and needs to be simplified.

Comments that explain why are good. Comments may explain why the sql is needed are good. Comments that explain where the sql is used are good. Comments that explain how tables are related should not be needed if the sql is well written. If they are needed, you need to consider reworking the sql or simplify your data model.

Use of functions in a WHERE clause. (Anil Das)

Calling a function in the where clause will often negate the indexing strategy. The function will be called for every record considered. This will often a force a full table scan on the tables affected.

Calling a function will not guarantee that there is a full table scan, but there is a good chance that it will. If you find that you often need to write queries using a particular function, you may need to add a column to the table that has the function already applied.

In honor of Phil Factor's recent post on SQL Smells, I thought I would post some blogs SQL smells and why some of them are bad as well as when they may be OK.

Let's start with the top of the list that Phil has compiled.

Use of deprecated syntax such as *= (Dave Howard)

There are two problems here. First this is not in compliance with the SQL 92 standard. We should strive our code to be standards compliant whenever there is a standard available. Now this sounds like a cop out along the lines of "Because I said so" but there is a better reason.

This join syntax was standardized as it was to reduce ambiguity. One source of ambiguity stems from not being able to express a full outer join with the old syntax. It is confusing to have the join conditions blended in with the filtering conditions in the where clause. The SQL 92 syntax provides strong separation.

Denormalisation that requires the shredding of the contents of columns. (Merrill Aldrich)

This description could apply to any of several potential problems. I am not sure which problem Merrill was specifically referring to or what examples he finds most egregious.

I will speak here about the perils of so called "smart columns" These are columns that need to be parsed to get their true meaning. There are many examples. I worked on a system once that was riddled with so called CSZ fields. These were fields that stored the City State and ZipCode in a single field. You had to parse these fields to get the details because they were not stored in separate fields. Clearly a very bad design.

I also feel the same way about storing XML data in a varchar(max) field. If you must store XML data in the database, use the XML data type. XML data type will provide support for verifying that the content is XML and in the format expected.

Contrived Interfaces

I am not sure exactly what whoever submitted this was getting at, but I am going to take this opportunity to talk about naming conventions. There are other smells that talk about prefixing, so I will talk about how you name the database objects apart from the skipping the prefixes.

Bad naming conventions lead to confusion, making the database more difficult to work with and harder to understand. There is no need to use contrived abbreviations.

FirstName is dramatically easier to read and follow then F_NME.

Especially in SQL server the maximum size for object names is larger than you will ever need and the database preserves case, there is no need sprinkle names with underscores to separate words or shorten the names. Separate words with case changes and spell the names out. It will cut down on confusion.

Use of deprecated datatypes such as TEXT/NTEXT (Dave Howard)

You probably cannot go through your database and change every data type that needs to be changed. When you get advanced warning that a data type is going away, stop using it.

As you can, change your data types. It is better to go through and update individual fields as you can than to have to scramble when you try to upgrade the database version. Upgrading the database version can be daunting enough without such extra complications.

 

This covers my thoughts on the first four smells in Phil's list. Future posts will flush out my thoughts on others.

 

I would love to hear your thoughts on these smells.