DevJef's Mumbo-Jumbo

«There's a bit of SQL in all of us»


News


Because of several reasons, I decided to move my blogs to another host. So, from now on my blogs can be found on my new blog: http://DevJef.wordpress.com/.


Hope to see you there, or you're always free to contact me via Twitter, mail or my blog!




If you work with SQL Server you will eventually encounter a JOIN statement. This JOIN statement can be stated as an ordinary JOIN, or as a sub select. In most cases I try to use JOINS (even though it’s just for readability), but in some cases it’s quicker to do it in a sub select (for example with production code, in which you don’t want to change the output but filter output).

 

Curious about the performance differences between the two, I tried to create a test. I created 2 tables with country information:

 

 

 

 

 

 

 

 

 

In this case I will JOIN both tables, based on DefaultAbbreviation. The queries I used look like this:

 

SELECT CA.UNAbbreviation
FROM TST_CountryAbbreviation CA
WHERE 1 = 1
AND CA.DefaultAbbreviation IN
    (SELECT DefaultAbbreviation
     FROM TST_CountryInformation)

 

SELECT CA.DefaultAbbreviation
FROM TST_CountryAbbreviation CA
INNER JOIN TST_CountryInformation CI
    ON CI.DefaultAbbreviation = CA.DefaultAbbreviation
WHERE 1 = 1


 

Both queries are executed, and the Execution Plans look like this:

 

Query 1:

  

And for query 2:

  

As you can see the Execution Plans are practically the same. But if you look at the Execution Plan if you execute both statements simultaneously:

 

 

 

 

 

 

So in this case it proves that a JOIN is slower than a sub select. But what if you want a specific value…? I used the same statements, except in both versions I ask for the specific “CountryProper” value “Netherlands”:

 

 

  

 

So the Execution Plans show that if you fetch a specific value, the costs of both queries are the same.

 

The conclusion is the same as always: check the Execution Plan if you’re not sure about the performance of your statements.

 

Sources:

CreateSources.sql

SelectList.sql

SelectSpecificValue.sql



In order to refresh your OLAP Cubes (SQL Server Analysis Services), you need to create XMLA files. These XMLA files can be used in a SQL Server Agent Job, so that the Cubes are refreshed and re-processed. To accomplish this, you need to follow the next steps:

 

1) Deploy your Cube to your SQL Server

2) Right click on the database, and choose “Script database as” –> “Alter To”

3) Add the following node to the top of the XML:

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">

4) Change the “Alter” node into the following node:

<Alter AllowCreate="true" ObjectExpansion="ExpandFull">

5) Add the following nodes to the bottom of the XML:

 <Parallel>
  <Process>
   <Type>ProcessFull</Type>
   <Object>
   <DatabaseID>SandBox</DatabaseID>
   </Object>
  </Process>
</Parallel>
</Batch>

6) Change the "DatabaseID" in the nodes of Step 5 to the Cube name you deployed

7) In your SQL Job, choose the "Type" of task  "SQL Server Integration Services Package"

 

So from now on, your SQL Job will re-create the SSAS Cube, and process it. But be aware, every time you change your Cube, change your XMLA file as well!!!



Every ones in a while you want to know the record count of all objects in your database. If you have 10 objects or less in your database, you can choose to type out the statements yourself, but once it’s more then that it gets quite annoying.

 

Because I needed this for a production issue (make sure that all objects contain records after import) I wrote a cursor to query all the database objects, and count the content of it:

  
USE Sandbox --====================================================================================================
DECLARE @TableName VARCHAR(50) = '',
        @SQLStatement VARCHAR(MAX) = ''
--====================================================================================================
DECLARE TableCursor CURSOR FOR
SELECT name
FROM sys.tables
WHERE name NOT IN ('sysdiagrams')
AND name NOT LIKE 'Temp%'
AND name NOT LIKE '%ReturnGrid%'
ORDER BY name ASC
--====================================================================================================
OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @TableName
--====================================================================================================
WHILE @@FETCH_STATUS = 0
BEGIN
       SET @SQLStatement = @SQLStatement + 'SELECT ''' + @TableName + ''' AS TableName, COUNT(*) AS RecordAmount FROM ' + @TableName
       SET @SQLStatement = @SQLStatement + ' UNION ALL '       
       /* Print Result */
       PRINT @TableName
       /* Select SQL Statement */
       --SELECT @TableName, @SQLStatement
       FETCH NEXT FROM TableCursor INTO @TableName
END
--====================================================================================================
CLOSE TableCursor
DEALLOCATE TableCursor
--====================================================================================================
SET @SQLStatement = LEFT(@SQLStatement, (LEN(@SQLStatement) - 10))
--====================================================================================================
/* Print  */
PRINT '--===================================================================================================='
PRINT 'Executing Statements' EXEC(@SQLStatement)
--====================================================================================================  

 

The script uses a cursor to query the object names from the sys.tables, and with that data it creates a dynamic SQL query. That query counts the content of all objects. The sysdiagrams (MS Shipped table), Temp and ReturnGrid tables are excluded from the cursor.

 

For the big databases I build in a PRINT statement, so you can keep track of the progress while running the query. If the dynamic SQL statement is executed, it will show the text “Executing Statements” via a PRINT statement in the bottom of the query. After that you will see the results.



For (almost) all developers, there’s no greater feeling, then knowing your source code is (relatively) safe. The same counts for T-SQL script. Especially if you know that you need the same code next year (argh). So first thing I did after the initial release of scripts, was looking for a way to check-in T-SQL from SQL Server Management Studio (SSMS).

 

So a colleague of mine, Dries (Blog | Twitter), dropped by with a solution: TFS MSSCCI Provider 2010 from the Visual Studio Gallery. It’s a free plugin from Microsoft. This will allow you to check-in your scripts from SSMS, without the hassle of manually checking in files via Visual Studio Team Explorer.

 

You can download the plugin and install it, even with SSMS running. All it takes is a restart of SSMS to pick up the plugin, en you’re ready to roll!



When you work with SQL Server, you regularly want to know when an object is last executed, last modified, etc.. Well, at least I do! In most cases you need to use the sys.objects to obtain this information, join it with other sys tables, search for your information, and so on. But there is an easier way.

 

The script below will get you the latest information of any object in your database.

 

USE SandBox


SELECT
    DB_NAME(Txt.dbid)                        AS DatabaseName,
    OBJECT_NAME(Txt.objectid, Txt.dbid)        AS ProcedureName,
    Objects.create_date                        AS CreationDate,
    Objects.modify_date                        AS ModifiedDate,
    MAX(Stats.last_execution_time)            AS Last_Execution,
    CASE Objects.type
        WHEN 'AF' THEN 'Aggregate function (CLR)'         WHEN 'C' THEN 'CHECK constraint'
        WHEN 'D' THEN 'Default or DEFAULT constraint'
        WHEN 'F' THEN 'FOREIGN KEY constraint'
        WHEN 'L' THEN 'Log'
        WHEN 'FN' THEN 'Scalar function'
        WHEN 'FS' THEN 'Assembly (CLR) scalar-function'
        WHEN 'FT' THEN 'Assembly (CLR) table-valued function'
        WHEN 'IF' THEN 'In-lined table-function'
        WHEN 'IT' THEN 'Internal table'
        WHEN 'P' THEN 'Stored procedure'         WHEN 'PC' THEN 'Assembly (CLR) stored-procedure'
        WHEN 'PK' THEN 'PRIMARY KEY constraint (type is K)'
        WHEN 'RF' THEN 'Replication filter stored procedure'
        WHEN 'S' THEN 'System table'         WHEN 'SN' THEN 'Synonym'
        WHEN 'SQ' THEN 'Service queue'
        WHEN 'TA' THEN 'Assembly (CLR) DML trigger'
        WHEN 'TF' THEN 'Table function'
        WHEN 'TR' THEN 'SQL DML Trigger'
        WHEN 'TT' THEN 'Table type'
        WHEN 'U' THEN 'User table'
        WHEN 'UQ' THEN 'UNIQUE constraint (type is K)'
        WHEN 'V' THEN 'View'
        WHEN 'X' THEN 'Extended stored procedure'
        ELSE '-'
    END AS ObjectType
FROM sys.dm_exec_query_stats Stats
CROSS APPLY sys.dm_exec_sql_text(Stats.sql_handle) Txt
LEFT JOIN sys.objects Objects
    ON Objects.object_id = Txt.objectid
WHERE 1 = 1
    AND Txt.dbid = db_id()               --Execute for selected DB
    AND Txt.dbid IS NOT NULL            --Exclude Ad-Hoc queries
    AND DB_NAME(Txt.dbid) IS NOT NULL    --Exclude Ad-Hoc queries
GROUP BY
    DB_NAME(Txt.dbid),
    OBJECT_NAME(Txt.objectid, Txt.dbid),
    Objects.create_date,     Objects.modify_date,
    Objects.type
ORDER BY DatabaseName, ProcedureName ASC

 

This script will show you the information you need to decide if you want to keep the object or, for example, want to delete it. With this script you can also check if Functions are still in use, or when a Trigger is last fired.

 

If you need more information about the object you can also use the query below. This will show you the time needed to execute the object, last execute time needed, etc:

 

SELECT *
FROM sys.dm_exec_query_stats Stats CROSS APPLY sys.dm_exec_sql_text(Stats.sql_handle) Txt

 

With these two queries you will be able to determine the status of the objects in your database.



If you write a SQL query, it’s important to know when certain pieces of you query are executed. For example, it’s possible that some statement in your query interferes with another part of your query, because of the execution time of a specific piece.

 

The order of execution is:

 

1. FROM Clause

2. JOIN / APPY / PIVOT / UNPIVOT Clause

3. WHERE Clause

4. GROUP BY Clause

5. CUBE / ROLLUP Clause

6. HAVING Clause

5. SELECT Clause

6. DISTINCT Clause

7. TOP Clause

8. ORDER BY Clause

 

So in many cases this won’t be a problem. But in some cases this can pose as a problem. In those cases it’s important to know the order of execution if you want to debug your query.

 

Sources: Logical Query Processing



In SQL Server we know 3 kinds of temporary tables: Local Temp Tables, Global Temp Tables and Table Variables. In this blog I will try to explain the differences, and tell you how to use it.

 

Local Temp Tables

The local temp table is the most commonly used temp table. This temp tables is created with the following statement:

 

CREATE TABLE #TempTable
      (ID INT IDENTITY(1,1) NOT NULL,
       Description VARCHAR(10) NULL)

 

The table that the script above created is a temporary table that is stored on disk. To be exact, it’s stored in the TempDB. The table can only be reached from within the same scope. It will be cleaned automatically, but it’s more wise to clean it up yourself by using a DROP TABLE statement.

 

A session specific identifier is added to the name, to make sure that temp tables from other sessions don’t use the same name. If you query the sys.tables, you will see a table name similar to this:

 

#TempTable___________________________________________________________________

_______________________________________000000000005

 

The benefits of using a temp table are (amongst others): reduction of required locks (because the user is the only one who has access), less logging necessary, easy debugging (in some cases), etc. One downside is that the objectname is limited to 116 characters.

 

Global Temp Tables

Global temporary tables work just like local temporary tables (stored in TempDB, less locking necessary). However, they are visible to all sessions, until the session that created the object goes out of scope and all referring sessions to the object go out of scope. Also, the name of the object will remain the same as you declared it. So if you query sys.tables in the database, you will notice that (instead of the script above) the name is still TempTable.

 

A global temp table can be created with the following statement:

 

CREATE TABLE ##TempTable
      (ID INT IDENTITY(1,1) NOT NULL,
       Description VARCHAR(10) NULL)

 

Most of the time you will not encounter these Global temp tables “in the wild”, because permanent tables are mostly preferred.

 

Table Variables

A temporary table can also be created in memory. This has several advantages: less locking then permanent objects, performs slightly better then temporary- or global tables, not as much logging necessary, etc. Table variables are cleared automatically when the procedure, function or query goes out of scope.

 

There are a few disadvantages and/or demands for using table variables:

  • They are allowed in SQL Server 2000 and above, and databases with Compatibility level 80 or higher
  • They cannot be used for “SELECT * INTO”-queries
  • They cannot be changed after declaration. They need to be recreated
  • They cannot be truncated
  • They cannot contain: User Defined Function (UDF), User Defined Type (UDT), Computed Column or Default Constraint
  • They cannot be dropped. You need to let them go out of scope
  • They cannot be created dynamically (“SELECT * INTO @TempTable”-statement), or used in a dynamic SQL statement
  • They are not included in statistics by SQL Sever, and you cannot create it manually
  • They don’t use parallelism when you use an “INSERT INTO”-statement
  • They will always have a cardinality of 1, because the table doesn’t exist at compile time. Cardinality refers to the uniqueness of a column. The lower the cardinality, the more duplicated items in a column
  • They must be referenced by an alias if you join the object in a query. If you don’t the compiler will tell you the object doesn’t exist

 

Conclusion

As always, there is no right or wrong answer. In most cases you will use the Table Variable or Local Temporary Table. The right answer in your case will be a judgment call between performance and usability. There are a few rules of thumb you can use to determine which type of object suites your needs:

  • If you have less then 100 rows, generally you want to use a Table Variable
  • If you need an index on your object, use a Temporary Table (Local or Global)

 



In order to test Performance in SQL Server, normally you would take a look at how may I/O or how many cycles a statement takes to complete. To make it less complex, you can take a look at how long a statements takes, just by looking at the execution-time. In most cases, network latency, SQL Server hardware and workstation performance are also included in this result (time to create result set, and time needed to send across the internet or internal network). And if a statement takes les then a second to execute, you need to perform some magic with dates, times, etc. It is what it sounds like: overkill. There is a simple way to test performance in SQL Server.

 

In order to understand the statement, first a small explanation of the keywords:

 

@@CPU_BUSY: Returns the ticks “spend” since the last execution of the statement

ISNULL: Replaces NULL with the specified replacement value

COALESCE: Returns the first non-NULL value from it’s replacement values

 

So basically ISNULL and COALESCE provide you with the same functionality( at least, the way I used it in the script below), except COALESCE can be used to prevent the inevitable bunch of nested ISNULL statements.

 

To test the statements, I used the following script:

 

  
DECLARE @Loops INT SET @Loops = 1
DECLARE @CPU INT SET @CPU = @@CPU_BUSY DECLARE @StartDate DATETIME SET @StartDate = GETDATE() WHILE @Loops <= 1000000
BEGIN
    IF COALESCE('123', '456') = '456'
            PRINT 1
    SET @Loops = @Loops + 1
END PRINT 'COALESCE, both non-NULL'
PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)
PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))
PRINT ''
GO --================================================== DECLARE @Loops INT SET @Loops = 1
DECLARE @CPU INT SET @CPU = @@CPU_BUSY
DECLARE @StartDate DATETIME SET @StartDate = GETDATE() WHILE @Loops <= 1000000 BEGIN
    IF ISNULL('123', '456') = '456'
            PRINT 1
    SET @Loops = @Loops + 1
END PRINT 'ISNULL, both non-NULL'
PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)
PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))
PRINT ''
GO

If you execute this statement, you will see the following results:

 

COALESCE, both non-NULL
Total CPU time: 16
Total milliseconds: 566
 
ISNULL, both non-NULL
Total CPU time: 19
Total milliseconds: 640

 

As you can see, the COALESCE takes less time to complete then the ISNULL function. The total CPU time needed is shorter, and looking at the used time, it saves you 74 milliseconds. In this case it isn’t that much, but with larger queries this can save you a lot of time.

 

The actual CPU time and milliseconds vary because of different hardware, caching, etc. But the outcome is the most important.

 

For those people who want to comment about COALESCE; I know! COALESCE is not exactly the same as ISNULL, but like I said, in this script it is!



How fun is it to use new software? As a developer you always want to use the latest software, and in many cases it’s backwards compatible. This (unfortunately) isn’t the case with Report Builder 3.0 and SQL Server 2005 Reporting Services. This is a “by design” issue, but it’s such an annoying “feature”. If you try to deploy a Report built in RB 3.0, you will get the error:

 

"The report definition is not valid. Details: The report definition has an invalid target namespace 'http://schemas.microsoft.com/sqlserver/reporting/2010/01/reportdefinition' which cannot be upgraded. (rsInvalidReportDefinition)"

 

In most cases you will copy-paste the tables or objects from RB 3.0 to the Business Intelligence Development Studio (BIDS), and save it again. But sometimes you just want to take the shortcut, and that is possible as well. As always, I have to mention that I don’t guarantee anything, but it worked for me several times.

 

The first step is to open the .RDL file with a text editor, and replace the following line of XML:

 

<Report xmlns:rd=”http://schemas.microsoft.com/SQLServer/reporting/reportdesigner” 
xmlns=”http://schemas.microsoft.com/sqlserver/reporting/2010/01/reportdefinition“>

 

with:

 

<Report xmlns:rd=”http://schemas.microsoft.com/SQLServer/reporting/reportdesigner”
xmlns=”http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition“>

 

Step two is removing a set of nodes in the RDL, that is not supported by versions older then Report Server 2008:

 

<ReportSection>     <ReportSections>

 

 

And it’s closing tags:

 

    </ReportSection>
</ReportSections>

 

Don’t remove the text inside these tags, only remove the tags themselves!!!

 

If you follow these steps, you will be able to deploy a “new” Report to an “old” Report Server without weird copy-paste actions.