Thursday, April 19, 2012 #

T-SQL usage: prefer Table Type over CTE (Common Table Expression)

I came to the CTE world late (the same time I learned MERGE), but I've been using it here and there.  I love Table Type; the introduction of that changed the way we write our .NET app and BizTalk app, and in some way level the playing field of all these technologies against SSIS.

But through my many recent round of building database, data warehouse, data marts, I find my self migrating away from using table type and views, partly because it just seemed like such a hassle to maintain a separate object for each table in my visual studio project.  Also, I was able to write a recursive CTE recently that otherwise would have taken me 20x the lines of code to do.  So I was fairly happy.

Fast forward to now, I got a big surprise to learn that in MS SQL server, the common table expression is not deterministic, which means the MS SQL server is free to re-evaluate the expression each time the CTE is referenced.  Not only does this mean that it's not true when I thought my CTE would improve performance by running the query once, it also means that my SQL may be working with inconsistent data in the same transaction.

So, back to my original way to writing and maintaining table types.  No more CTE unless I absolutely need the write recursive SQL.

Posted On Thursday, April 19, 2012 11:34 AM | Feedback (0)

Wednesday, April 04, 2012 #

Using Recursive SQL and XML trick to PIVOT(OK, concat) a "Document Folder Structure Relationship" table, works like MySQL GROUP_CONCAT

I'm in the process of building out a Data Warehouse and encountered this issue along the way.

In the environment, there is a table that stores all the folders with the individual level.  For example, if a document is created here:{App Path}\Level 1\Level 2\Level 3\{document}, then the DocumentFolder table would look like this:
ID
ID_Parent
FolderName
1
NULL
Level 1
2
1
Level 2
3
2
Level 3

To my understanding, the table was built so that:
  • Each proposal can have multiple documents stored at various locations
  • Different users working on the proposal will have different access level to the folder; if one user is assigned access to a folder level, she/he can see all the sub folders and their content.

Now we understand from an application point of view why this table was built this way.  But you can quickly see the pain this causes the report writer to show a document link on the report.  I wasn't surprised to find the report query had 5 self outer joins, which is at the mercy of nobody creating a document that is buried 6 levels deep, and not to mention the degradation in performance.

With the help of 2 posts (at the end of this post), I was able to come up with this solution:

  1. Use recursive SQL to build out the folder path
  2. Use SQL XML trick to concat the strings.

Code (a reminder, I built this code in a stored procedure.  If you copy the syntax into a simple query window and execute, you'll get an incorrect syntax error)

-- Get all folders and group them by the original DocumentFolderID in PTSDocument table
;
WITH DocFoldersByDocFolderID(PTSDocumentFolderID_Original, PTSDocumentFolderID_Parent, sDocumentFolder, nLevel)
AS
(-- first member
     
SELECT 'PTSDocumentFolderID_Original' = d1.PTSDocumentFolderID
           
, PTSDocumentFolderID_Parent
           
, 'sDocumentFolder' = sName
           
, 'nLevel' = CONVERT(INT, 1000000)
     
FROM (SELECT DISTINCT PTSDocumentFolderID
                 
FROM dbo.PTSDocument_DY WITH(READPAST)
           
) AS d1
           
INNER JOIN dbo.PTSDocumentFolder_DY AS df1 WITH(READPAST)
                 
ON d1.PTSDocumentFolderID = df1.PTSDocumentFolderID
     
UNION ALL
     
-- recursive
     
SELECT ddf1.PTSDocumentFolderID_Original
           
, df1.PTSDocumentFolderID_Parent
           
, 'sDocumentFolder' = df1.sName
           
, 'nLevel' = ddf1.nLevel - 1
     
FROM dbo.PTSDocumentFolder_DY AS df1 WITH(READPAST)
           
INNER JOIN DocFoldersByDocFolderID AS ddf1
                 
ON df1.PTSDocumentFolderID = ddf1.PTSDocumentFolderID_Parent

)

-- Flatten out folder path

,
DocFolderSingleByDocFolderID(PTSDocumentFolderID_Original, sDocumentFolder)
AS
(SELECT dfbdf.PTSDocumentFolderID_Original
           
, 'sDocumentFolder' = STUFF((SELECT '\' + sDocumentFolder
                                        
FROM DocFoldersByDocFolderID
                                        
WHERE (PTSDocumentFolderID_Original = dfbdf.PTSDocumentFolderID_Original)
                                        
ORDER BY PTSDocumentFolderID_Original, nLevel
                                        
FOR XML PATH ('')),1,1,'')
     
FROM DocFoldersByDocFolderID AS dfbdf
     
GROUP BY dfbdf.PTSDocumentFolderID_Original

)

And voila, I use the second CTE to join back to my original query (which is now a CTE for Source as we can now use MERGE to do INSERT and UPDATE at the same time).

Each part of this solution would not solve the problem by itself because:

  • If I don't use recursion, I cannot build out the path properly.  If I use the XML trick only, then I don't have the originating folder ID info that I need to link to the document.
  • If I don't use the XML trick, then I don't have one row per document to show in the report.
    I could conceivably do this in the report function, but I'd rather not deal with the beginning or ending backslash and how to attach the document name.
  • PIVOT doesn't do strings and UNPIVOT runs into the same problem as the above.
I'm excited that each version of SQL server provides us new tools to solve old problems and/or enables us to solve problems in a more elegant way

The 2 posts that helped me along:

Posted On Wednesday, April 04, 2012 5:29 PM | Feedback (0)

Wednesday, March 14, 2012 #

Date Dimension: use date in YYYYMMDD format if your granular level is a single date

It still surprises me how many people set up the DateDimID as Identity(1, 1).

If you use YYYYMMDD format, then on some reports, you can even display the date without navigating to the Date dimension to show label.  In the same deal, you can use YYYYMM for a monthly date dimension, and YYYY for a yearly date dimension.

Posted On Wednesday, March 14, 2012 11:39 AM | Feedback (0)

Data warehouse design principle, a very good post to always steer me back in the right direction

Whenever I start doubting my design of Data Mart or Cube, I always come back to this article and read it again.  It helps me keep my design simple.

http://dwbi1.wordpress.com/2010/06/17/one-or-two-dimensions/

Great post!

Related post:
http://dwbi1.wordpress.com/2011/02/26/denormalising-a-fact-table/

Posted On Wednesday, March 14, 2012 11:34 AM | Feedback (0)

Thursday, March 08, 2012 #

Database Hardware CPU Licensing vs Performance and Fast Track Data Warehouse Architecture

This is just one of those cool post which are not written as often as it should be.  Hats off to Glenn Berry for stating it so clearly for all BI / database developers.

http://sqlserverperformance.wordpress.com/2012/03/07/intel-xeon-e5-2600-series-processors-released/

I read some Microsoft paper on "Fast Track Data Warehouse Architecture", and did not appreciate the benefit at the time.  What a great way for developers to start with someone already figured out the licensing + cost + performance.  There are also many great tips on performance tuning in this article as well.

http://msdn.microsoft.com/en-us/library/dd459178.aspx

Posted On Thursday, March 08, 2012 4:05 PM | Feedback (0)

Copyright © Kevin Shyr

Design by Bartosz Brzezinski

Design by Phil Haack Based On A Design By Bartosz Brzezinski