Many SQL developers shy away from the CROSS APPLY, because it is not very well understood.  There is not a lot of documentation or examples available online; CROSS APPLY is the SQL Query’s Redheaded step-child, but also a well-kept secret. 

Essentially, all a CROSS APPLY does is connect 2 queries that cannot be resolved as stand-along queries.  Using the AdventureWorks2008R2 database, I have put together a simple example:

 

USE AdventureWorks2008R2
 
  SELECT    p.FirstName + ' ' + p.LastName 'Sales Person',
            CAST(sp.SalesQuota AS MONEY) 'Sales Quota' ,
            CAST(ca.TotalSales AS MONEY) 'Total Sales' ,
            sp.CommissionPct 'Commission Percentage',
            ca.TotalCommission 'Total Commission',
            ca.TotalCommission / 12 'Gross Monthy Income',
            sp.Bonus 'Yearly Bonus',
            t.NAME 'Territory'
  FROM      Person.Person p
            JOIN Sales.SalesPerson sp ON p.BusinessEntityID = sp.businessentityid
            JOIN sales.SalesTerritory t ON t.TerritoryID = sp.TerritoryId
            CROSS APPLY ( SELECT    SUM(TotalDue) 'TotalSales' ,
                                    SUM(TotalDue) * sp.CommissionPct 'TotalCommission'
                          FROM      Sales.SalesOrderHeader s
                          WHERE     s.SalespersonID = sp.BusinessEntityId
                        ) ca
  WHERE     p.PersonType = 'SP'
            AND sp.SalesQuota IS NOT NULL
            ORDER BY sp.TerritoryId
 

So, what’s happening here?  One of the best uses for a CROSS APPLY is Aggregate Functions.  The top-level query requires the TotalSales and TotalCommision for each sales person.  The CROSS APPLY supplies these values by using the SUM() aggregate function.  Since the CROSS APPLY query is resolved first, we can think of it as “returning” the aggregated values to the top-level query.  Why is this important?  I’m glad you asked.  In this case, there are 3 major advantages to a CROSS APPLY over a simple JOIN to the SalesOrderHistory table.

1. Aggregate functions are only executed once no matter how many calculations the totals are used in the top-level query (notice that the ca.TotalCommission field is used twice)

2.  There is no need to use a nasty GROUP BY clause containing all of the top-level fields that are not part of an aggregate

3.  We can use ANY fields in the ORDER BY clause without including them in the top-level query.

Here is what the query would look like using a JOIN instead of a CROSS APPLY:

USE AdventureWorks2008R2

  SELECT    p.FirstName + ' ' + p.LastName 'Sales Person' ,
            CAST(sp.SalesQuota AS MONEY) 'Sales Quota' ,
            CAST(SUM(s.TotalDue) AS MONEY) 'Total Sales' ,
            sp.CommissionPct 'Commission Percentage' ,
            SUM(s.TotalDue) * sp.CommissionPct 'TotalCommission' ,
            ( SUM(s.TotalDue) * sp.CommissionPct ) / 12 'Gross Monthy Income' ,
            sp.Bonus 'Yearly Bonus' ,
            t.NAME 'Territory'
  FROM      Person.Person p
            JOIN Sales.SalesPerson sp ON p.BusinessEntityID = sp.businessentityid
            JOIN sales.SalesTerritory t ON t.TerritoryID = sp.TerritoryId
            JOIN Sales.SalesOrderHeader s ON s.SalespersonID = sp.BusinessEntityId
  WHERE     p.PersonType = 'SP'
            AND sp.SalesQuota IS NOT NULL
  GROUP BY  p.FirstName ,
            p.LastName ,
            sp.SalesQuota ,
            sp.CommissionPct ,
            sp.Bonus ,
            t.NAME
            --ORDER BY sp.TerritoryId

Doing the aggregates once may not make a lot of difference in this simple query, but in an environment that does heavy calculations such as engineering applications, there are often times when an aggregate is a common MAX() or MIN(), and is required to calculate many output columns in the top-level query.  There are also occasions where you need multiple aggregated fields to do a calculation…..in that case, readability is also improved in the top-level query.

The GROUP BY clause only contains 6 columns in this case, but what if you needed to return 50 columns or more.  With the columns listed twice, the number of lines in the query almost doubles.  Also, from a maintenance/enhancement standpoint, you double the amount of touch points when you are adding or changing a column.

Finally, the ORDER BY clause.  In the second example above, it is impossible to order by TerritoryID because it is not returned in the SELECT statement.  You would need to do another step if you were required to produce your results ordered by TerritoryId.  In the first query, you can order by any field or fields since there are no aggregate functions in the top-level query. 

I hope this makes the CROSS APPLY a little more understandable.  You can certainly go your entire career without coding one, but there are times when it can save you a lot of work, and in a high-volume environment, it may even help the efficiency of your queries.


SQL offers many different methods to produce the same results.  There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set.  Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient. 

For the queries below, I downloaded the test database from SQLSkills:  http://www.sqlskills.com/sql-server-resources/sql-server-demos/There isn’t a lot of data, but enough to prove my point: dbo.member has 10,000 records, and dbo.payment has 15,554.  Our result set contains 6,706 records.

The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.

 

/*************/
/* Sub Query  */
/*************/
SELECT  a.[Member Number] ,
        m.lastname ,
        m.firstname ,
        a.[Number Of Payments] ,
        a.[Average Payment] ,
        a.[Total Paid]
FROM    ( SELECT    member_no 'Member Number' ,
                    AVG(payment_amt) 'Average Payment' ,
                    SUM(payment_amt) 'Total Paid' ,
                    COUNT(Payment_No) 'Number Of Payments'
          FROM      dbo.payment
          GROUP BY  member_no
          HAVING    COUNT(Payment_No) > 1
        ) a
        JOIN dbo.member m ON a.[Member Number] = m.member_no
       

/***************/
/* Cross Apply  */
/***************/
SELECT  ca.[Member Number] ,
        m.lastname ,
        m.firstname ,
        ca.[Number Of Payments] ,
        ca.[Average Payment] ,
        ca.[Total Paid]
FROM    dbo.member m
        CROSS APPLY ( SELECT    member_no 'Member Number' ,
                                AVG(payment_amt) 'Average Payment' ,
                                SUM(payment_amt) 'Total Paid' ,
                                COUNT(Payment_No) 'Number Of Payments'
                      FROM      dbo.payment
                      WHERE     member_no = m.member_no
                      GROUP BY  member_no
                      HAVING    COUNT(Payment_No) > 1
                    ) ca


/********/                   
/* CTEs  */
/********/
;
WITH    Payments
          AS ( SELECT   member_no 'Member Number' ,
                        AVG(payment_amt) 'Average Payment' ,
                        SUM(payment_amt) 'Total Paid' ,
                        COUNT(Payment_No) 'Number Of Payments'
               FROM     dbo.payment
               GROUP BY member_no
               HAVING   COUNT(Payment_No) > 1
             ),
        MemberInfo
          AS ( SELECT   p.[Member Number] ,
                        m.lastname ,
                        m.firstname ,
                        p.[Number Of Payments] ,
                        p.[Average Payment] ,
                        p.[Total Paid]
               FROM     dbo.member m
                        JOIN Payments p ON m.member_no = p.[Member Number]
             )
    SELECT  *
    FROM    MemberInfo


/************************/
/* SELECT with Grouping   */
/************************/

SELECT  p.member_no 'Member Number' ,
        m.lastname ,
        m.firstname ,
        COUNT(Payment_No) 'Number Of Payments' ,
        AVG(payment_amt) 'Average Payment' ,
        SUM(payment_amt) 'Total Paid'
FROM    dbo.payment p
        JOIN dbo.member m ON m.member_no = p.member_no
GROUP BY p.member_no ,
        m.lastname ,
        m.firstname
HAVING  COUNT(Payment_No) > 1

 

We can see what is going on in SQL’s brain by looking at the execution plan.  The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes.  SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan.  We can settle this once and for all.  Here is what SQL did with these queries:

 

image

Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them.  Everybody is right…..I guess we can all finally go to lunch together!  But wait a second, I may not be a fighter, but I AM an instigator.  Smile   Let’s see how a table variable stacks up.  Here is the code I executed:

/********************/

/*  Table Variable  */
/********************/
DECLARE @AggregateTable TABLE
    (
      member_no INT ,
      AveragePayment MONEY ,
      TotalPaid MONEY ,
      NumberOfPayments MONEY
    )

INSERT  @AggregateTable
        SELECT  member_no 'Member Number' ,
                AVG(payment_amt) 'Average Payment' ,
                SUM(payment_amt) 'Total Paid' ,
                COUNT(Payment_No) 'Number Of Payments'
        FROM    dbo.payment
        GROUP BY member_no
        HAVING  COUNT(Payment_No) > 1
 

SELECT  at.member_no 'Member Number' ,
        m.lastname ,
        m.firstname ,
        at.NumberOfPayments 'Number Of Payments' ,
        at.AveragePayment 'Average Payment' ,
        at.TotalPaid 'Total Paid'
FROM    @AggregateTable at
        JOIN dbo.member m ON m.member_no = at.member_no

In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query.  Here’s what I got:

 

image

 

Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps.  BUT, the combination of the 2 steps is only 22% of the batch.  It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan.  The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan.  While this is true, the estimate does not come in to play until you read from the table variable.  In this case, the table variable had 6,706 rows, but it still outperformed the other queries.  People argue that table variables should only be used for hash or lookup tables.  The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries.

If anyone does volume testing on this theory, I would be interested in the results.  My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is.

Coding SQL is a matter of style.  If you’ve been around since they introduced DB2, you were probably taught a little differently than a recent computer science graduate.  If you have a company standard, I strongly recommend you follow it.  Winking smile  If you do not have a standard, generally speaking, there is no right or wrong answer when talking about the efficiency of these types of queries, and certainly no hard-and-fast rule.  Volume and infrastructure will dictate a lot when it comes to performance, so your results may vary in your environment.  Download the database and try it!


Occasionally I am asked to create a report is SSRS, and there are a few common user requests that are not exactly intuitive to a developer.  Since I constantly find myself looking at other reports and “borrowing” code, this blog entry is mostly for me so that I can change my thieving ways, but I hope that it helps you as well.  

 

 

Alternate background colors on each line in a table

 

How many times have you heard “In Excel I can……”?  In my case, too many to count, but as far as styling a table to have alternating colors, our users have a point.  It is extremely easy for a user to choose a style for a table, click a button and voila!  Doesn’t that make you angry?  Well fear not, we have a way and it is rather simple, but rather cumbersome.  We must change the Fill color for each text box in the Tablix:

image

 

As you can see, we replace the Fill color with an expression that indicates the 2 colors to alternate:

image

I’ll  paste the formula here so that you can cut it and put it directly into your report.  Of course you can change the colors to anything your user wants.  In this case the first line of the tablix will be white, the second will be lightgray and they will alternate from there.  If you want to be really fancy you can make the colors variables and let the user style the table (just like Excel!!!). 

=iif(RowNumber(Nothing) Mod 2, "White", "LightGrey")

 

 

Changing Visibility based on the existence of a value in a dataset

 

This is a handy trick when you have a file containing, for example, Inserts, Updates and Deletes with the action type based on a value in your dataset.  Based on the value, you display the record in one of 3 tables, but what happens when you have no records of a given type??  How do we make the table headers and boarders disappear? 

To do this, we need to count the number of records with each change type (D,I,U) to decide whether to show the table or not.  If the record count is 0 for any type we will hide the tablix….makes sense, right?

Open the properties of each tablix and choose the Visibility tab:

image

Under show or hide based on expression , we do our magic:

image

In the case of this expression, we are counting the number of “I” records in the recordset, and when the count is <=0 we will hide the table. 

=sum(iif(Fields!ChangeType.Value="I",1,0)) <= 0

TIP: Note the expression indicates when the tablix will be HIDDEN, not when it will be SHOWN.

This is a simple example, but this method can be used to count any value in any field or part of a field in a dataset.


In a complex SSIS Control Flow, there is often a need to have a common last step, regardless of the number of Control Flow end points.  Consider the following example:

image

In this case, we are using the Completion constraint (blue line), and an Expression to determine which path to follow.  If there are records, we upload them, if not we end without doing anything.  In either case, we are writing a message to a log table, recording the event.  As simple as this example is, we have 2 end points.  Now, what if we wanted to insert a record in the log that records the time the package ended?  In this scenario, we would need to code 2 additional INSERTs to the event log, one for each end point.  I am a firm believe in the theory that “the less moving parts the better”.  Let’s create an INSERT statement that will be executed regardless of the end point in the Control Flow.  ENTER THE EVENT HANDLER.

Let’s add a OnPostExecute event handler to at the Package level.  This will be hit when the package finishes, and we can record the Package End event.

image

Sounds like a great idea, and it’s true that the event handler will trigger on the completion of the package.  The problem we face is that events “bubble up” the event tree:  http://msdn.microsoft.com/en-us/library/ms140223.aspx.  That simply means that the OnPostExecute event will trigger for EVERY task in the package.  This is very useful, but it does not help us in the case of recording a Package End event.  Here is what our log looks like if we use a OnPostExecute without any constraints or expressions:

image

This is not exactly what we were going for.  So, how do we make SSIS understand that we only want an event written to the log if it is the last event in the package.  We could set the DisableEventHandles property to TRUE on every task in the package, but odds are pretty good you will want to track other events besides a Package End event.  Here is a modest solution:

image

By adding a Dummy Task, we can use expressions and constraints to create a Control Flow within the Event Handler. The Dummy Task in this case is simply an Execute SQL Statement:

image

The Expression used to pick the last event is this:

image

The @SourceName is a system variable that contains the name of the task that triggered the Event Handler.  The @PackageName is also a system variable which contains the name of the package.  The package is a task itself….it is a container holding all other tasks, and will always be the last OnPostExecute event.  When we run the package now, the results look a little more like what we would expect.

image

This gives us the power to do ANY processing after ANY task in ANY Event Handler by adding a constraint and expression to the Dummy Task.  Event Handlers are extremely useful in creating log files, but they are also great for consolidating duplicate “at end” processing when there are multiple end points, and reducing duplicate processing within parallel data flows. Have fun with it, but beware of the Event Tree!! 


When you have an SSIS package error, it is often very helpful to see the data records that are causing the problem.  After all, if your input has 50,000 records and 1 of them has corrupt data, it can be a chore.  Your execution results will tell you which column contains the bad data, but not which record…..enter the Data Viewer.

In this scenario I have created a truncation error.  The input length of [lastname] is 50, but the output table has a length of 15.  When it runs, at least one of the records causes the package to fail.

 

image

 

Now what?  We can tell from our execution results that there is a problem with [lastname], but we have no idea WHICH record?

 

image

 

Let’s identify the row that is actually causing the problem.  First, we grab the oft’ forgotten Row Count shape from our toolbar and connect it to the error output from our input query.  Remember that in order to intercept errors with the error output, you must redirect them.

 

image

 

The Row Count shape requires 1 integer variable.  For our purposes, we will not reference the variable, but it is still required in order for the package to run.  Typically we would use the variable to hold the number of rows in the table and refer back to it later in our process.  We are simply using the Row Count as a “Dead End” for errors.  I called my variable RowCounter.  To create a variable, with no shapes selected, right-click on the background and choose Variable.

 

image

 

Once we have setup the Row Count shape, we can right-click on the red line (error output) from the query, and select Data Viewers.  In the popup, we click the add button and we will see this:

 

image

 

There are other fancier options we can play with, but for now we just want to view the output in a grid.  WE select Grid, then click OK on all of the popup windows to shut them down.  We should now see a grid with a pair of glasses on the error output line.

 

image

 

So, we are ready to catch the error output in a grid and see that is causing the problem!  This time when we run the package, it does not fail because we directed the error to the Row Count.  We also get a popup window showing the error record in a grid.  If there were multiple errors we would see them all.

 

image

 

Indeed, the [lastname] column is longer than 15 characters.  Notice the last column in the grid, [Error Code – Description].  We knew this was a truncation error before we added the grid, but if you have worked with SSIS for any length of time, you know that some errors are much more obscure.  The description column can be very useful under those circumstances!

Data viewers can be used any time we want to see the data that is actually in the pipeline;  they stop the package temporarily until we shut them.  Also remember that the Row Count shape can be used as a “Dead End”.  It is useful during development when we want to see the output from a dataflow, but don’t want to update a table or file with the data. 

Data viewers are an invaluable tool for both development and debugging.  Just remember to REMOVE THEM before putting your package into production  Smile


If you are like me, you dread the 3AM wake-up call.  I would say that the majority of the pages I get are false alarms.  The alerts that require action often require me to open an SSIS package, see where the trouble is and try to identify the offending data.  That can be very time-consuming and can take quite a chunk out of my beauty sleep.  For those reasons, I have developed a simple error handling scenario for SSIS which allows me to rest a little easier.  Let me first say, this is a high level discussion; getting into the nuts and bolts of creating each shape is outside the scope of this document, but if you have an average understanding of SSIS, you should have no problem following along.

image

In the Data Flow above you can see that there is a caution triangle.  For the purpose of this discussion I am creating a truncation error to demonstrate the process, so this is to be expected. 

The first thing we need to do is to redirect the error output.  Double-clicking on the Query shape presents us with the properties window for the input.  Simply set the columns that you want to redirect to Redirect Row in the dropdown box and hit Apply.

image

Without going into a dissertation on error handling, I will just note that you can decide which errors you want to redirect on Error and on Truncation.  Therefore, to override this process for a column or condition, simply do not redirect that column or condition.

The next thing we want to do is to add some information about the error; specifically, the name of the package which encountered the error and which step in the package wrote the record to the error table.  REMEMBER: If you redirect the error output, your package will not fail, so you will not know where the error record was created without some additional information. 

 

image

I added 3 columns to my error record; Severity, Package Name and Step Name.  Severity is just a free-form column that you can use to note whether an error is fatal, whether the package is part of a test job and should be ignored, etc.  Package Name and Step Name are system variables.

In my package I have created a truncation situation, where the firstname column is 50 characters in the input, but only 4 characters in the output.  Some records will pass without truncation, others will be sent to the error output.  However, the package will not fail.

image

We can see that of the 14 input rows, 8 were redirected to the error table.

image

This information can be used by another step or another scheduled process or triggered to determine whether an alert should be sent.  It can also be used as a historical record of the errors that are encountered over time.  There are other system variables that might make more sense in your infrastructure, so try different things.  Date and time seem like something you would want in your output for example. 

In summary, we have redirected the error output from an input, added derived columns with information about the errors, and inserted the information and the offending data into an error table.  The error table information can be used by another step or process to determine, based on the error information, what level alert must be sent.  This will eliminate false alarms, and give you a head start when a genuine error occurs.


I recently created a DLL and I wanted to reference it from a project I was developing in Visual Studio.  In previous versions of Windows, doing so was simply a matter of dropping the DLL file in the C:\Windows\assembly folder.  That would add the DLL to the Global Assembly Cache (GAC) and make it accessible in Visual Studio.  However, as is often the case, Window 7 is different.  Even if you have Administrator privileges on your machine, you still do not have permission to drop a file in the assembly folder. 

Undaunted, I thought about using the old DOS command line utility gacutil.exe.  Microsoft developed the tool as part of the .Net framework, and it is available in the Windows SDK Framework Tools.  If you have never used gacutil.exe before, you can find out everything you ever wanted to know but were afraid to ask here: http://msdn.microsoft.com/en-us/library/ex0ss12c(v=vs.80).aspx .  Unfortunately, if you do not have the Windows SDK loaded on your development machine, you will need to install it to use gacutil, but it is relatively quick and painless, and the framework tools are very useful.  Look here for your latest SDK: http://www.microsoft.com/download/en/search.aspx?q=Windows%20SDK .  

After installing the SDK, I tried installing my DLL to the GAC by running gacutil from a DOS command line:

image

That’s odd.  Microsoft is shipping a tool that cannot be executed even with Administrator rights?  Let me stop here and say that I am by no means a Windows security expert, so I actually did contact my system administrators, and they were not sure how to fix the problem….there must be a super administrator access level, but it isn’t available to your average developer in my company.  The solution outlined here is working within the boundaries of a normal windows Administrator.

So, now the hacker in me bubbles to the surface.  What if I were to create a simple BAT file containing the gacutil command?  It’s so crazy it just might work! 

image

image

Ugh!  I was starting to think this would never work, but then I realized that simply executing a batch program did not change my level of access.  Typically in Windows 7, you would select the “Run As Administrator” option to temporarily act as an administrator for the purpose of executing a process.  However, that option is not available for BAT files run from the command line.  SOLUTION: Create a desktop shortcut to execute the BAT file, which in turn will execute the line command…..are you still with me? 

I created a shortcut and pointed it to my batch file.  Theoretically, all I need to do now is right-click on the shortcut and select “Run As Administrator” and we’re good, right?  Well, kinda.  If you notice the syntax of my BAT file, the name of the DLL is passed in as a parameter.  Therefore, I either have to hard-code the file name in the BAT program (YUCK!!), or I can leave the parameter and drag the DLL file to the shortcut and drop it.  Sweet, drag-and-drop works for me…..but if I use the drag-and-drop method, there is no way for me to right-click and select “Run As Administrator”.  That is not a problem…..I simply have to adjust the properties of the shortcut I created and I am in business.  I Right-clicked on the shortcut and select “Properties”.  Under the “Shortcut” tab there is an “Advanced” button…..I clicked it.

image

All I needed to do was check the “Run As Administrator” box:

image

In summary, what I have done is create a BAT file to execute a command line utility, gacutil.exe.  Then, rather than executing the BAT file from the command line, I created a desktop shortcut to run it and set the shortcut properties to “Run As Administrator”.  This will effectively mean I am executing the command line utility with Administrator privileges.  Pretty sneaky.

Now, when I drag the DLL file  over to the shortcut, it starts the BAT file and adds the DLL to the assembly cache.  I created another BAT file to remove a DLL from the GAC in case the need should arise.  The code for that is:

image

Give it a try.  I can’t imagine why updating the GAC has been made into such a chore in Windows 7.  Hopefully there is a service pack in the works that will give developers the functionality they had in Windows XP, but in the meantime, this workaround is extremely useful.


One of the challenges recently thrown my way was to read an EBCDIC flat file, decode packed dates, and insert the dates into a SQL table.  For those unfamiliar with packed data, it is a way to store data at the nibble level (half a byte), and was often used by mainframe programmers to conserve storage space.  In the case of my input file, the dates were 2 bytes long and  represented the number of days that have past since 01/01/1950.  My first thought was, in the words of Scooby, Hmmmmph?  But, I love a good challenge, so I dove in.

Reading in the flat file was rather simple.  The only difference between reading an EBCDIC and an ASCII file is the Code Page option in the connection manager.  In my case, I needed to use Code Page 1140 for EBCDIC (I could have also used Code Page 37).    

cm

 

Once the code page is set correctly, SSIS can understand what it is reading and it will convert the output to the default code page, 1252.  However, packed data is either unreadable or produces non-alphabetic characters, as we can see in the preview window.

preview

 

Column 1 is actually the packed date, columns 0 and 2 are the values in the rest of the file.  We are only interested in Column 1, which is a 2 byte field representing a packed date.  We know that 2 bytes of packed data can be stored in 1 byte of character data, so we are working with 4 packed digits in 2 character bytes.  If you are confused, stay tuned….this will make sense in a minute.

OutputColumns

 

Right-click on your Flat File Source shape and select “Show Advanced Editor”. Here is where the magic begins. By changing the properties of the output columns, we can access the packed digits from each byte. By default, the Output Column data type is DT_STR. Since we want to look at the bytes individually and not the entire string, change the data type to DT_BYTES. Next, and most important, set UseBinaryFormat to TRUE. This will write the HEX VALUES of the output string instead of writing the character values.  Now we are getting somewhere!

Next, you will need to use a Data Conversion shape in your Data Flow to transform the 2 position byte stream to a 4 position Unicode string containing the packed data.  You need the string to be 4 bytes long because it will contain the 4 packed digits.  Here is what that should look like in the Data Conversion shape:

convert

Direct the output of your data flow to a test table or file to see the results.  In my case, I created a test table.  The results looked like this:

  queryResults

 

Hold on a second!  That doesn't look like a date at all.  No, of course not.  It is a hex number which represents the days which have passed between 01/01/1950 and the date.  We have to convert the Hex value to a decimal value, and use the DATEADD function to get a date value.  Luckily, I have created a function to convert Hex to Decimal:

 


-- =============================================
-- Author:        Jim Giercyk
-- Create date: March, 2012
-- Description:    Converts a Hex string to a decimal value
-- =============================================
CREATE FUNCTION [dbo].[ftn_HexToDec]
(
    @hexValue NVARCHAR(6)
)
RETURNS DECIMAL
AS
BEGIN
    -- Declare the return variable here
DECLARE @decValue DECIMAL
IF @hexValue LIKE '0x%' SET @hexValue = SUBSTRING(@hexValue,3,4)

DECLARE @decTab TABLE
(
decPos1 VARCHAR(2),
decPos2 VARCHAR(2),
decPos3 VARCHAR(2),
decPos4 VARCHAR(2)
)

DECLARE @pos1 VARCHAR(1) = SUBSTRING(@hexValue,1,1)
DECLARE @pos2 VARCHAR(1) = SUBSTRING(@hexValue,2,1)
DECLARE @pos3 VARCHAR(1) = SUBSTRING(@hexValue,3,1)
DECLARE @pos4 VARCHAR(1) = SUBSTRING(@hexValue,4,1)

INSERT @decTab
VALUES (CASE
              WHEN @pos1 = 'A' THEN '10'
                WHEN @pos1 = 'B' THEN '11'
              WHEN @pos1 = 'C' THEN '12'
              WHEN @pos1 = 'D' THEN '13'
              WHEN @pos1 = 'E' THEN '14'
              WHEN @pos1 = 'F' THEN '15'
              ELSE @pos1             
END,
CASE
              WHEN @pos2 = 'A' THEN '10'
                WHEN @pos2 = 'B' THEN '11'
              WHEN @pos2 = 'C' THEN '12'
              WHEN @pos2 = 'D' THEN '13'
              WHEN @pos2 = 'E' THEN '14'
              WHEN @pos2 = 'F' THEN '15'
              ELSE @pos2             
END,
CASE
              WHEN @pos3 = 'A' THEN '10'
                WHEN @pos3 = 'B' THEN '11'
              WHEN @pos3 = 'C' THEN '12'
              WHEN @pos3 = 'D' THEN '13'
              WHEN @pos3 = 'E' THEN '14'
              WHEN @pos3 = 'F' THEN '15'
              ELSE @pos3             
END,
CASE
              WHEN @pos4 = 'A' THEN '10'
                WHEN @pos4 = 'B' THEN '11'
              WHEN @pos4 = 'C' THEN '12'
              WHEN @pos4 = 'D' THEN '13'
              WHEN @pos4 = 'E' THEN '14'
              WHEN @pos4 = 'F' THEN '15'
              ELSE @pos4             
END)

SET @decValue = (CONVERT(INT,(SELECT decPos4 FROM @decTab)))         +
                (CONVERT(INT,(SELECT decPos3 FROM @decTab))*16)      +
                (CONVERT(INT,(SELECT decPos2 FROM @decTab))*(16*16)) +
                (CONVERT(INT,(SELECT decPos1 FROM @decTab))*(16*16*16))

    RETURN @decValue

END


GO

 

 

Making use of the function, I found the decimal conversion, added that number of days to 01/01/1950 and FINALLY arrived at my “unpacked relative date”.  Here is the query I used to retrieve the formatted date, and the result set which was returned:

SELECT [packedDate] AS 'Hex Value',
       dbo.ftn_HexToDec([packedDate]) AS 'Decimal Value',
       CONVERT(DATE,DATEADD(day,dbo.ftn_HexToDec([packedDate]),'01/01/1950'),101) AS 'Relative String Date'
  FROM [dbo].[Output Table]

 

 

  queryResults  

This technique can be used any time you need to retrieve the hex value of a character string in SSIS.  The date example may be a bit difficult to understand at first, but with SSIS becoming the preferred tool for enterprise level integration for many companies, there is no doubt that developers will encounter these types of requirements with regularity in the future.

Please feel free to contact me if you have any questions. 


SSRS is a powerful tool, but there is very little available to measure it’s performance or view the SSRS execution log or catalog in detail.  Here are a few simple queries that will give you insight to the system that you never had before.

 

ACTIVE REPORTS:  Have you ever seen your SQL Server performance take a nose dive due to a long-running report?  If the SPID is executing under a generic Report ID, or it is a scheduled job, you may have no way to tell which report is killing your server.  Running this query will show you which reports are executing at a given time, and WHO is executing them.

 

USE ReportServerNative

SELECT runningjobs.computername,
            runningjobs.requestname, 
            runningjobs.startdate,
            users.username, 
           Datediff(s,runningjobs.startdate, Getdate()) / 60 AS    'Active Minutes'
FROM runningjobs
INNER JOIN users
ON runningjobs.userid = users.userid
ORDER BY runningjobs.startdate

 
     
     
SSRS CATALOG:  We have all asked “What was the last thing that changed”, or better yet, “Who in the world did that!”.  Here is a query that will show all of the reports in your SSRS catalog, when they were created and changed, and by who.
 
     
 

USE ReportServerNative

SELECT DISTINCT catalog.PATH,
                           catalog.name,
                           users.username AS [Created By], 
                           catalog.creationdate,
                           users_1.username AS [Modified By],
                           catalog.modifieddate
FROM catalog
        INNER JOIN users ON catalog.createdbyid = users.userid 
INNER JOIN users AS users_1 ON catalog.modifiedbyid = users_1.userid
INNER JOIN executionlogstorage
ON catalog.itemid = executionlogstorage.reportid
WHERE ( catalog.name <> '' )

 
     
     
SSRS EXECUTION LOG:  Sometimes we need to know what was happening on the SSRS report server at a given time in the past.  This query will help you do just that.  You will need to set the timestart and timeend in the WHERE clause to suit your needs.
     
 

USE ReportServerNative

SELECT catalog.name AS report,
       executionlogstorage.username AS [User],
       executionlogstorage.timestart,
       executionlogstorage.timeend, 
       Datediff(mi,e.timestart,e.timeend) AS ‘Time In Minutes',
       catalog.modifieddate AS [Report Last Modified],
       users.username
FROM   catalog  (nolock)
       INNER JOIN executionlogstorage e (nolock)
         ON catalog.itemid = executionlogstorage.reportid
       INNER JOIN users (nolock)
         ON catalog.modifiedbyid = users.userid
WHERE  executionlogstorage.timestart >= Dateadd(s, -1, '03/31/2012')
       AND executionlogstorage.timeend <= Dateadd(DAY, 1, '04/02/2012')
 

 

 

LONG RUNNING REPORTS:  This query will show the longest running reports over a given time period.  Note that the “>5” in the WHERE clause sets the report threshold at 5 minutes, so anything that ran less than 5 minutes will not appear in the result set.  Adjust the threshold and start/end times to your liking.  With this information in hand, you can better optimize your system by tweaking the longest running reports first.

     
 

USE ReportServerNative

SELECT executionlogstorage.instancename,
       catalog.PATH,
       catalog.name,
       executionlogstorage.username,
       executionlogstorage.timestart,
       executionlogstorage.timeend,
       Datediff(mi, e.timestart, e.timeend) AS 'Minutes',
       executionlogstorage.timedataretrieval,
       executionlogstorage.timeprocessing,
       executionlogstorage.timerendering,
       executionlogstorage.[RowCount],
       users_1.username
       AS createdby,
       CONVERT(VARCHAR(10), catalog.creationdate, 101)
       AS 'Creation Date',
       users.username
       AS modifiedby,
       CONVERT(VARCHAR(10), catalog.modifieddate, 101)
       AS 'Modified Date'
FROM   executionlogstorage e 
       INNER JOIN catalog
         ON executionlogstorage.reportid = catalog.itemid
       INNER JOIN users
         ON catalog.modifiedbyid = users.userid
       INNER JOIN users AS users_1
         ON catalog.createdbyid = users_1.userid
WHERE  ( e.timestart > '03/31/2012' )
       AND ( e.timestart <= '04/02/2012' )
       AND  Datediff(mi, e.timestart, e.timeend) > 5
       AND catalog.name <> ''
ORDER  BY 'Minutes' DESC 

 

 

 

I have used these queries to build SSRS reports that I can refer to quickly, and export to Excel if I need to report or quantify my findings.  I encourage you to look at the data in the ReportServerNative database on your report server to understand the queries and create some of your own.  For instance, you may want a query to determine which reports are using which shared data sources. 

Work smarter, not harder!


 

While working on a mainframe integration project, it occurred to me that some basic computer concepts are slipping into obscurity. For example, just about anyone can tell you that a 64-bit processor is faster than a 32-bit processer. A grade school child could tell you that a computer “speaks” in ‘1’s and ‘0’s. Some people can even tell you that there are 8 bits in a byte. However, I have found that even the most seasoned developers often can’t explain the theory behind those statements.

That is not a knock on programmers; in the age of IntelliSense, what reason do we have to work with data at the bit level? Many computer theory classes treat bit-level programming as a thing of the past, no longer necessary now that storage space is plentiful. The trouble with that mindset is that the world is full of legacy systems that run programs written in the 1970’s.  Today our jobs require us to extract data from those systems, regardless of the format, and that often involves low-level programming. Because it seems knowledge of the low-level concepts is waning in recent times, I thought a review would be in order.

 

 

 

CHARACTER:

See Spot Run

HEX:

53 65 65 20 53 70 6F 74 20 52 75 6E

DECIMAL:

83 101 101 32 83 112 111 116 32 82 117 110

BINARY:

01010011 01100101 01100101 00100000

01010011 01110000 01101111 01110100

00100000 01010010 01110101 01101110

In this example, I have broken down the words “See Spot Run” to a level computers can understand – machine language.

 

 

CHARACTER:  The character level is what is rendered by the computer.  A “Character Set” or “Code Page” contains 256 characters, both printable and unprintable.  Each character represents 1 BYTE of data.  For example, the character string “See Spot Run” is 12 Bytes long, exclusive of the quotation marks.  Remember, a SPACE is an unprintable character, but it still requires a byte.  In the example I have used the default Windows character set, ASCII, which you can see here:  http://www.asciitable.com/

HEX:  Hex is short for hexadecimal, or Base 16.  Humans are comfortable thinking in base ten, perhaps because they have 10 fingers and 10 toes; fingers and toes are called digits, so it’s not much of a stretch.  Computers think in Base 16, with numeric values ranging from zero to fifteen, or 0 – F.  Each decimal place has a possible 16 values as opposed to a possible 10 values in base 10.  Therefore, the number 10 in Hex is equal to the number 16 in Decimal. 

DECIMAL:  The Decimal conversion is strictly for us humans to use for calculations and conversions.  It is much easier for us humans to calculate that [30 – 10 = 20] in decimal than it is for us to calculate [1E – A = 14] in Hex.  In the old days, an error in a program could be found by determining the displacement from the entry point of a module.  Since those values were dumped from the computers head, they were in hex. A programmer needed to convert them to decimal, do the equation and convert back to hex.  This gets into relative and absolute addressing, a topic for another day. 

BINARY:  Binary, or machine code, is where any value can be expressed in 1s and 0s.  It is really Base 2, because each decimal place can have a possibility of only 2 characters, a 1 or a 0.  In Binary, the number 10 is equal to the number 2 in decimal.

Why only 1s and 0s?  Very simply, computers are made up of lots and lots of transistors which at any given moment can be ON ( 1 ) or OFF ( 0 ).  Each transistor is a bit, and the order that the transistors fire (or not fire) is what distinguishes one value from  another in the computers head (or CPU).  Consider 32 bit vs 64 bit processing…..a 64 bit processor has the capability to read 64 transistors at a time.  A 32 bit processor can only read half as many at a time, so in theory the 64 bit processor should be much faster.  There are many more factors involved in CPU performance, but that is the fundamental difference.

  

DECIMAL

HEX

BINARY

0 0 0000
1 1 0001
2 2 0010
3 3 0011
4 4 0100
5 5 0101
6 6 0110
7 7 0111
8 8 1000
9 9 1001
10 A 1010
11 B 1011
12 C 1100
13 D 1101
14 E 1110
15 F 1111

 

Remember that each character is a BYTE, there are 2 HEX characters in a byte (called nibbles) and 8 BITS in a byte.  I hope you enjoyed reading about the theory of data processing.  This is just a high-level explanation, and there is much more to be learned.  It is safe to say that, no matter how advanced our programming languages and visual studios become, they are nothing more than a way to interpret bits and bytes.  There is nothing like the joy of hex to get the mind racing.