Posts
47
Comments
136
Trackbacks
0
June 2012 Entries
Null Values And The T-SQL IN Operator

I came across some unexpected behavior while troubleshooting a failing test the other day that took me long enough to figure out that I thought it was worth sharing here. I finally traced the failing test back to a SELECT statement in a stored procedure that was using the IN t-sql operator to exclude a certain set of values. Here’s a very simple example table to illustrate the issue:

Customers

CustomerId INT, NOT NULL, Primary Key
CustomerName nvarchar(100) NOT NULL
SalesRegionId INT NULL

 

The ‘SalesRegionId’ column contains a number representing the sales region that the customer belongs to. This column is nullable because new customers get created all the time but assigning them to sales regions is a process that is handled by a regional manager on a periodic basis. For the purposes of this example, the Customers table currently has the following rows:

CustomerId CustomerName SalesRegionId
1 Customer A 1
2 Customer B NULL
3 Customer C 4
4 Customer D 2
5 Customer E 3

 

How could we write a query against this table for all customers that are NOT in sales regions 2 or 4? You might try something like this:

   1: SELECT
   2:   CustomerId,
   3:   CustomerName,
   4:   SalesRegionId
   5: FROM Customers
   6: WHERE SalesRegionId NOT IN (2,4)

 

Will this work? In short, no; at least not in the way that you might expect. Here’s what this query will return given the example data we’re working with:

CustomerId CustomerName SalesRegionId
1 Customer A 1
5 Customer E 5

 

I was expecting that this query would also return ‘Customer B’, since that customer has a NULL SalesRegionId. In my mind, having a customer with no sales region should be included in a set of customers that are not in sales regions 2 or 4.When I first started troubleshooting my issue I made note of the fact that this query should probably be re-written without the NOT IN clause, but I didn’t suspect that the NOT IN clause was actually the source of the issue. This particular query was only one minor piece in a much larger process that was being exercised via an automated integration test and I simply made a poor assumption that the NOT IN would work the way that I thought it should.

So why doesn’t this work the way that I thought it should? From the MSDN documentation on the t-sql IN operator:

If the value of test_expression is equal to any value returned by subquery or is equal to any expression from the comma-separated list, the result value is TRUE; otherwise, the result value is FALSE.

Using NOT IN negates the subquery value or expression.

The key phrase out of that quote is, “… is equal to any expression from the comma-separated list…”. The NULL SalesRegionId isn’t included in the NOT IN because of how NULL values are handled in equality comparisons. From the MSDN documentation on ANSI_NULLS:

The SQL-92 standard requires that an equals (=) or not equal to (<>) comparison against a null value evaluates to FALSE. When SET ANSI_NULLS is ON, a SELECT statement using WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement using WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.

In fact, the MSDN documentation on the IN operator includes the following blurb about using NULL values in IN sub-queries or expressions that are used with the IN operator:

Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.

If I were to include a ‘SET ANSI_NULLS OFF’ command right above my SELECT statement I would get ‘Customer B’ returned in the results, but that’s definitely not the right way to deal with this. We could re-write the query to explicitly include the NULL value in the WHERE clause:

   1: SELECT
   2:   CustomerId,
   3:   CustomerName,
   4:   SalesRegionId
   5: FROM Customers
   6: WHERE (SalesRegionId NOT IN (2,4) OR SalesRegionId IS NULL)

 

This query works and properly includes ‘Customer B’ in the results, but I ultimately opted to re-write the query using a LEFT OUTER JOIN against a table variable containing all of the values that I wanted to exclude because, in my case, there could potentially be several hundred values to be excluded. If we were to apply the same refactoring to our simple sales region example we’d end up with:

   1: DECLARE @regionsToIgnore TABLE (IgnoredRegionId INT)
   2: INSERT @regionsToIgnore values (2),(4)
   3:  
   4: SELECT
   5:   c.CustomerId,
   6:   c.CustomerName,
   7:   c.SalesRegionId
   8: FROM Customers c
   9: LEFT OUTER JOIN @regionsToIgnore r ON r.IgnoredRegionId = c.SalesRegionId
  10: WHERE r.IgnoredRegionId IS NULL

By performing a LEFT OUTER JOIN from Customers to the @regionsToIgnore table variable we can simply exclude any rows where the IgnoredRegionId is null, as those represent customers that DO NOT appear in the ignored regions list. This approach will likely perform better if the number of sales regions to ignore gets very large and it also will correctly include any customers that do not yet have a sales region.

Posted On Sunday, June 24, 2012 4:36 PM | Comments (0)
Using Table-Valued Parameters With SQL Server Reporting Services

In my last post I talked about using table-valued parameters to pass a list of integer values to a stored procedure without resorting to using comma-delimited strings and parsing out each value into a TABLE variable. In this post I’ll extend the “Customer Transaction Summary” report example to see how we might leverage this same stored procedure from within an SQL Server Reporting Services (SSRS) report. I’ve worked with SSRS off and on for the past several years and have generally found it to be a very useful tool for building nice-looking reports for end users quickly and easily. That said, I’ve been frustrated by SSRS from time to time when seemingly simple things are difficult to accomplish or simply not supported at all. I thought that using table-valued parameters from within a SSRS report would be simple, but unfortunately I was wrong.

Customer Transaction Summary Example

Let’s take the “Customer Transaction Summary” report example from the last post and try to plug that same stored procedure into an SSRS report. Our report will have three parameters:

  1. Start Date – beginning of the date range for which the report will summarize customer transactions
  2. End Date – end of the date range for which the report will summarize customer transactions
  3. Customer Ids – One or more customer Ids representing the customers that will be included in the report

The simplest way to get started with this report will be to create a new dataset and point it at our Customer Transaction Summary report stored procedure (note that I’m using SSRS 2012 in the screenshots below, but there should be little to no difference with SSRS 2008):

image

When you initially create this dataset the SSRS designer will try to invoke the stored procedure to determine what the parameters and output fields are for you automatically. As part of this process the following dialog pops-up:

image

Obviously I can’t use this dialog to specify a value for the ‘@customerIds’ parameter since it is of the IntegerListTableType user-defined type that we created in the last post. Unfortunately this really throws the SSRS designer for a loop, and regardless of what combination of Data Type, Pass Null Value, or Parameter Value I used here, I kept getting this error dialog with the message, "Operand type clash: nvarchar is incompatible with IntegerListTableType".

image

This error message makes some sense considering that the nvarchar type is indeed incompatible with the IntegerListTableType, but there’s little clue given as to how to remedy the situation. I don’t know for sure, but I think that behind-the-scenes the SSRS designer is trying to give the @customerIds parameter an nvarchar-typed SqlParameter which is causing the issue. When I first saw this error I figured that this might just be a limitation of the dataset designer and that I’d be able to work around the issue by manually defining the parameters. I know that there are some special steps that need to be taken when invoking a stored procedure with a table-valued parameter from ADO .NET, so I figured that I might be able to use some custom code embedded in the report  to create a SqlParameter instance with the needed properties and value to make this work, but the “Operand type clash" error message persisted.

The Text Query Approach

Just because we’re using a stored procedure to create the dataset for this report doesn’t mean that we can’t use the ‘Text’ Query Type option and construct an EXEC statement that will invoke the stored procedure. In order for this to work properly the EXEC statement will also need to declare and populate an IntegerListTableType variable to pass into the stored procedure. Before I go any further I want to make one point clear: this is a really ugly hack and it makes me cringe to do it. Simply put, I strongly feel that it should not be this difficult to use a table-valued parameter with SSRS. With that said, let’s take a look at what we’ll have to do to make this work.

Manually Define Parameters

First, we’ll need to manually define the parameters for report by right-clicking on the ‘Parameters’ folder in the ‘Report Data’ window. We’ll need to define the ‘@startDate’ and ‘@endDate’ as simple date parameters. We’ll also create a parameter called ‘@customerIds’ that will be a mutli-valued Integer parameter:

image

In the ‘Available Values’ tab we’ll point this parameter at a simple dataset that just returns the CustomerId and CustomerName of each row in the Customers table of the database or manually define a handful of Customer Id values to make available when the report runs.

Once we have these parameters properly defined we can take another crack at creating the dataset that will invoke the ‘rpt_CustomerTransactionSummary’ stored procedure. This time we’ll choose the ‘Text’ query type option and put the following into the ‘Query’ text area:

   1: exec('declare @customerIdList IntegerListTableType ' + @customerIdInserts + 
   2: ' EXEC rpt_CustomerTransactionSummary 
   3: @startDate=''' + @startDate + ''', 
   4: @endDate='''+ @endDate + ''', 
   5: @customerIds=@customerIdList')

 

By using the ‘Text’ query type we can enter any arbitrary SQL that we we want to and then use parameters and string concatenation to inject pieces of that query at run time. It can be a bit tricky to parse this out at first glance, but from the SSRS designer’s point of view this query defines three parameters:

  1. @customerIdInserts – This will be a Text parameter that we use to define INSERT statements that will populate the @customerIdList variable that is being declared in the SQL. This parameter won’t actually ever get passed into the stored procedure. I’ll go into how this will work in a bit.
  2. @startDate – This is a simple date parameter that will get passed through directly into the @startDate parameter of the stored procedure on line 3.
  3. @endDate – This is another simple data parameter that will get passed through into the @endDate parameter of the stored procedure on line 4.

At this point the dataset designer will be able to correctly parse the query and should even be able to detect the fields that the stored procedure will return without needing to specify any values for query when prompted to. Once the dataset has been correctly defined we’ll have a @customerIdInserts parameter listed in the ‘Parameters’ tab of the dataset designer. We need to define an expression for this parameter that will take the values selected by the user for the ‘@customerIds’ parameter that we defined earlier and convert them into INSERT statements that will populate the @customerIdList variable that we defined in our Text query. In order to do this we’ll need to add some custom code to our report using the ‘Report Properties’ dialog:

image

Any custom code defined in the Report Properties dialog gets embedded into the .rdl of the report itself and (unfortunately) must be written in VB .NET. Note that you can also add references to custom .NET assemblies (which could be written in any language), but that’s outside the scope of this post so we’ll stick with the “quick and dirty” VB .NET approach for now. Here’s the VB .NET code (note that any embedded code that you add here must be defined in a static/shared function, though you can define as many functions as you want):

   1: Public Shared Function BuildIntegerListInserts(ByVal variableName As String, ByVal paramValues As Object()) As String
   2:     Dim insertStatements As New System.Text.StringBuilder()
   3:     For Each paramValue As Object In paramValues
   4:         insertStatements.AppendLine(String.Format("INSERT {0} VALUES ({1})", variableName, paramValue))
   5:     Next
   6:     Return insertStatements.ToString()
   7: End Function

 

This method takes a variable name and an array of objects. We use an array of objects here because that is how SSRS will pass us the values that were selected by the user at run-time. The method uses a StringBuilder to construct INSERT statements that will insert each value from the object array into the provided variable name.

Once this method has been defined in the custom code for the report we can go back into the dataset designer’s Parameters tab and update the expression for the ‘@customerIdInserts’ parameter by clicking on the button with the “function” symbol that appears to the right of the parameter value. We’ll set the expression to:

   1: =Code.BuildIntegerListInserts("@customerIdList ", Parameters!customerIds.Value)

 

In order to invoke our custom code method we simply need to invoke “Code.<method name>” and pass in any needed parameters. The first parameter needs to match the name of the IntegerListTableType variable that we used in the EXEC statement of our query. The second parameter will come from the Value property of the ‘@customerIds’ parameter (this evaluates to an object array at run time).

Finally, we’ll need to edit the properties of the ‘@customerIdInserts’ parameter on the report to mark it as a nullable internal parameter so that users aren’t prompted to provide a value for it when running the report.

image

Limitations And Final Thoughts

When I first started looking into the text query approach described above I wondered if there might be an upper limit to the size of the string that can be used to run a report. Obviously, the size of the actual query could increase pretty dramatically if you have a parameter that has a lot of potential values or you need to support several different table-valued parameters in the same query. I tested the example Customer Transaction Summary report with 1000 selected customers without any issue, but your mileage may vary depending on how much data you might need to pass into your query.

If you think that the text query hack is a lot of work just to use a table-valued parameter, I agree! I think that it should be a lot easier than this to use a table-valued parameter from within SSRS, but so far I haven’t found a better way. It might be possible to create some custom .NET code that could build the EXEC statement for a given set of parameters automatically, but exploring that will have to wait for another post. For now, unless there’s a really compelling reason or requirement to use table-valued parameters from SSRS reports I would probably stick with the tried and true “join-multi-valued-parameter-to-CSV-and-split-in-the-query” approach for using mutli-valued parameters in a stored procedure.

Posted On Thursday, June 21, 2012 10:08 PM | Comments (10)
Using Table-Valued Parameters in SQL Server

I work with stored procedures in SQL Server pretty frequently and have often found myself with a need to pass in a list of values at run-time. Quite often this list contains a set of ids on which the stored procedure needs to operate the size and contents of which are not known at design time. In the past I’ve taken the collection of ids (which are usually integers), converted them to a string representation where each value is separated by a comma and passed that string into a VARCHAR parameter of a stored procedure. The body of the stored procedure would then need to parse that string into a table variable which could be easily consumed with set-based logic within the rest of the stored procedure. This approach works pretty well but the VARCHAR variable has always felt like an un-wanted “middle man” in this scenario. Of course, I could use a BULK INSERT operation to load the list of ids into a temporary table that the stored procedure could use, but that approach seems heavy-handed in situations where the list of values is usually going to contain only a few dozen values. Fortunately SQL Server 2008 introduced the concept of table-valued parameters which effectively eliminates the need for the clumsy middle man VARCHAR parameter.

Example: Customer Transaction Summary Report

Let’s say we have a report that can summarize the the transactions that we’ve conducted with customers over a period of time. The report returns a pretty simple dataset containing one row per customer with some key metrics about how much business that customer has conducted over the date range for which the report is being run. Sometimes the report is run for a single customer, sometimes it’s run for all customers, and sometimes it’s run for a handful of customers (i.e. a salesman runs it for the customers that fall into his sales territory).

This report can be invoked from a website on-demand, or it can be scheduled for periodic delivery to certain users via SQL Server Reporting Services. Because the report can be created from different places and the query to generate the report is complex it’s been packed into a stored procedure that accepts three parameters:

  • @startDate – The beginning of the date range for which the report should be run.
  • @endDate – The end of the date range for which the report should be run.
  • @customerIds – The customer Ids for which the report should be run.

Obviously, the @startDate and @endDate parameters are DATETIME variables. The @customerIds parameter, however, needs to contain a list of the identity values (primary key) from the Customers table representing the customers that were selected for this particular run of the report. In prior versions of SQL Server we might have made this parameter a VARCHAR variable, but with SQL Server 2008 we can make it into a table-valued parameter.

Defining And Using The Table Type

In order to use a table-valued parameter, we first need to tell SQL Server about what the table will look like. We do this by creating a user defined type. For the purposes of this stored procedure we need a very simple type to model a table variable with a single integer column. We can create a generic type called ‘IntegerListTableType’ like this:

CREATE TYPE IntegerListTableType AS TABLE (Value INT NOT NULL)

Once defined, we can use this new type to define the @customerIds parameter in the signature of our stored procedure. The parameter list for the stored procedure definition might look like:

   1: CREATE PROCEDURE dbo.rpt_CustomerTransactionSummary
   2:   @starDate datetime,
   3:   @endDate datetime,
   4:   @customerIds IntegerListTableTableType READONLY

 

Note the ‘READONLY’ statement following the declaration of the @customerIds parameter. SQL Server requires any table-valued parameter be marked as ‘READONLY’ and no DML (INSERT/UPDATE/DELETE) statements can be performed on a table-valued parameter within the routine in which it’s used. Aside from the DML restriction, however, you can do pretty much anything with a table-valued parameter as you could with a normal TABLE variable.

With the user defined type and stored procedure defined as above, we could invoke like this:

   1: DECLARE @cusomterIdList IntegerListTableType
   2: INSERT @customerIdList VALUES (1)
   3: INSERT @customerIdList VALUES (2)
   4: INSERT @customerIdList VALUES (3)
   5:  
   6: EXEC dbo.rpt_CustomerTransationSummary 
   7:   @startDate = '2012-05-01',
   8:   @endDate = '2012-06-01'
   9:   @customerIds = @customerIdList

 

Note that we can simply declare a variable of type ‘IntegerListTableType’ just like any other normal variable and insert values into it just like a TABLE variable. We could also populate the variable with a SELECT … INTO or INSERT … SELECT statement if desired.

Using The Table-Valued Parameter With ADO .NET

Invoking a stored procedure with a table-valued parameter from ADO .NET is as simple as building a DataTable and passing it in as the Value of a SqlParameter. Here’s some example code for how we would construct the SqlParameter for the @customerIds parameter in our stored procedure:

   1: var customerIdsParameter = new SqlParameter();
   2: customerIdParameter.Direction = ParameterDirection.Input;
   3: customerIdParameter.TypeName = "IntegerListTableType";
   4: customerIdParameter.Value = selectedCustomerIds.ToIntegerListDataTable("Value");

 

All we’re doing here is new’ing up an instance of SqlParameter, setting the pamameters direction, specifying the name of the User Defined Type that this parameter uses, and setting its value. We’re assuming here that we have an IEnumerable<int> variable called ‘selectedCustomerIds’ containing all of the customer Ids for which the report should be run. The ‘ToIntegerListDataTable’ method is an extension method of the IEnumerable<int> type that looks like this:

   1: public static DataTable ToIntegerListDataTable(this IEnumerable<int> intValues, string columnName)
   2: {
   3:     var intergerListDataTable = new DataTable();
   4:     intergerListDataTable.Columns.Add(columnName);
   5:     foreach(var intValue in intValues)
   6:     {
   7:         var nextRow = intergerListDataTable.NewRow();
   8:         nextRow[columnName] = intValue;
   9:         intergerListDataTable.Rows.Add(nextRow);
  10:     }
  11:  
  12:     return intergerListDataTable;
  13: }

 

Since the ‘IntegerListTableType’ has a single int column called ‘Value’, we pass that in for the ‘columnName’ parameter to the extension method. The method creates a new single-columned DataTable using the provided column name then iterates over the items in the IEnumerable<int> instance adding one row for each value. We can then use this SqlParameter instance when invoking the stored procedure just like we would use any other parameter.

Advanced Functionality

Using passing a list of integers into a stored procedure is a very simple usage scenario for the table-valued parameters feature, but I’ve found that it covers the majority of situations where I’ve needed to pass a collection of data for use in a query at run-time. I should note that BULK INSERT feature still makes sense for passing large amounts of data to SQL Server for processing. MSDN seems to suggest that 1000 rows of data is the tipping point where the overhead of a BULK INSERT operation can pay dividends.

I should also note here that table-valued parameters can be used to deal with more complex data structures than single-columned tables of integers. A User Defined Type that backs a table-valued parameter can use things like identities and computed columns. That said, using some of these more advanced features might require the use the SqlDataRecord and SqlMetaData classes instead of a simple DataTable. Erland Sommarskog has a great article on his website that describes when and how to use these classes for table-valued parameters.

What About Reporting Services?

Earlier in the post I referenced the fact that our example stored procedure would be called from both a web application and a SQL Server Reporting Services report. Unfortunately, using table-valued parameters from SSRS reports can be a bit tricky and warrants its own blog post which I’ll be putting together and posting sometime in the near future.

Posted On Wednesday, June 13, 2012 9:44 PM | Comments (1)
Meta
Tag Cloud