TSQL ENHANCEMENTS

PIVOT and UNPIVOT

PIVOT and UNPIVOT are new relational operators that you specify in the FROM clause of a query. They perform some manipulation on an input table-valued expression and produce an output table as a result. The PIVOT operator rotates rows into columns, possibly performing aggregations along the way. It widens the input table expression based on a given pivot column, generating an output table with a column for each unique value in the pivot column. The UNPIVOT operator performs the opposite operation of that performed by the PIVOT operator; it rotates columns into rows. It narrows the input table expression based on a pivot column.

PIVOT

The PIVOT operator is useful for handling open-schema scenarios and for generating crosstab reports.

In an open-schema scenario, you maintain entities with sets of attributes that are either not known ahead or different for each entity type. The users of your application define the attributes dynamically. Instead of predefining many columns and storing many null values in your tables, you split the attributes into different rows and store only the relevant attributes for each entity instance.

PIVOT allows you to generate crosstab reports for open-schema and other scenarios in which you rotate rows into columns, possibly calculating aggregations along the way and presenting the data in a useful form.

An example of an open-schema scenario is a database that keeps track of items put up for auction. Some attributes are relevant for all auction items, such as the item type, when it was made, and its initial price. Only the attributes that are relevant for all items are stored in the AuctionItems table:

CREATE TABLE AuctionItems
(
  itemid       INT          NOT NULL PRIMARY KEY NONCLUSTERED,
  itemtype     NVARCHAR(30) NOT NULL,
  whenmade     INT          NOT NULL,
  initialprice MONEY        NOT NULL,
  /* other columns */
)
CREATE UNIQUE CLUSTERED INDEX idx_uc_itemtype_itemid
  ON AuctionItems(itemtype, itemid)
INSERT INTO AuctionItems VALUES(1, N'Wine',     1822,      3000)
INSERT INTO AuctionItems VALUES(2, N'Wine',     1807,       500)
INSERT INTO AuctionItems VALUES(3, N'Chair',    1753,    800000)
INSERT INTO AuctionItems VALUES(4, N'Ring',     -501,   1000000)
INSERT INTO AuctionItems VALUES(5, N'Painting', 1873,   8000000)
INSERT INTO AuctionItems VALUES(6, N'Painting', 1889,   8000000)

Other attributes are specific to the item type, and new items of different types are continuously being added. Such attributes can be stored in a different ItemAttributes table in which each item attribute is stored in a different row. Each row contains the item ID, attribute name, and attribute value:

CREATE TABLE ItemAttributes
(
  itemid    INT          NOT NULL REFERENCES AuctionItems,

  attribute NVARCHAR(30) NOT NULL,
  value     SQL_VARIANT  NOT NULL, 
  PRIMARY KEY (itemid, attribute)
)
INSERT INTO ItemAttributes
  VALUES(1, N'manufacturer', CAST(N'ABC'              AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(1, N'type',         CAST(N'Pinot Noir'       AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(1, N'color',        CAST(N'Red'              AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(2, N'manufacturer', CAST(N'XYZ'              AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(2, N'type',         CAST(N'Porto'            AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(2, N'color',        CAST(N'Red'              AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(3, N'material',     CAST(N'Wood'             AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(3, N'padding',      CAST(N'Silk'             AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(4, N'material',     CAST(N'Gold'             AS NVARCHAR(15)))
INSERT INTO ItemAttributes
  VALUES(4, N'inscription',  CAST(N'One ring ...'     AS NVARCHAR(50)))
INSERT INTO ItemAttributes
  VALUES(4, N'size',         CAST(10                  AS INT))
INSERT INTO ItemAttributes
  VALUES(5, N'artist',       CAST(N'Claude Monet'     AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(5, N'name',         CAST(N'Field of Poppies' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(5, N'type',         CAST(N'Oil'              AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(5, N'height',       CAST(19.625              AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
  VALUES(5, N'width',        CAST(25.625              AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
  VALUES(6, N'artist',       CAST(N'Vincent Van Gogh' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(6, N'name',         CAST(N'The Starry Night' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(6, N'type',         CAST(N'Oil'              AS NVARCHAR(30)))
INSERT INTO ItemAttributes
  VALUES(6, N'height',       CAST(28.75               AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
  VALUES(6, N'width',        CAST(36.25               AS NUMERIC(9,3)))

Note that the sql_variant data type is used for the value column because different attribute values may be of different data types. For example, the size attribute stores an integer attribute value, and a name attribute stores a character string attribute value.

Suppose you want to present the data from the ItemAttributes table with a row for each item that is a painting (items 5,6) and a column for each attribute. Without the PIVOT operator, you must write a query such as:

SELECT
  itemid,
  MAX(CASE WHEN attribute = 'artist'  THEN value END) AS [artist],
  MAX(CASE WHEN attribute = 'name'    THEN value END) AS [name],
  MAX(CASE WHEN attribute = 'type'    THEN value END) AS [type],
  MAX(CASE WHEN attribute = 'height'  THEN value END) AS [height],
  MAX(CASE WHEN attribute = 'width'   THEN value END) AS [width]
FROM ItemAttributes AS ATR
WHERE itemid IN(5,6)
GROUP BY itemid

Here is the result set:

itemid artist           name             type       height width
------ ---------------- ---------------- ---------- ------ ------
5      Claude Monet     Field of Poppies Oil        19.625 25.625
6      Vincent Van Gogh The Starry Night Oil        28.750 36.250

The PIVOT operator allows you to maintain shorter and more readable code to achieve the same results:

SELECT *
FROM ItemAttributes AS ATR
  PIVOT

  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT
WHERE itemid IN(5,6)

As with most new features, understanding the PIVOT operator comes with experimentation and use. Some of the elements in the PIVOT syntax are apparent and only require that you figure out the relationship of these elements to the query that does not use the new operator. Others are hidden.

You may find the following terms helpful in understanding the semantics of the PIVOT operator:

table_expression

The virtual table on which the PIVOT operator works (the part in the query between the FROM clause and the PIVOT operator): ItemAttributes AS ATR in this case.

pivot_column

The column from table_expression whose values you want to rotate into result columns: attribute in this case.

column_list

The list of values from pivot_column that you want to present as result columns (in the parentheses followed by the IN clause). These must be expressed as legal identifiers: [artist], [name], [type], [height], [width] in this case.

aggregate_function

The aggregate function that you use to generate the data or column values in the result: MAX() in this case.

value_column

The column from table_expression that you use as the argument for aggregate_function: value in this case.

group_by_list

The hidden part—ALL columns from table_expression excluding pivot_column and value_column which are used to group the result: itemid in this case.

select_list

The list of columns following the SELECT clause that might include any column(s) from group_by_list and column_list. Aliases can be used to change the name of the result columns: * in this case returns all columns from group_by_list and column_list.

The PIVOT operator returns one row for each unique value in group_by_list as if you had a query with a GROUP BY clause and specified those columns. Notice that group_by_list is implied; it is not specified explicitly anywhere in the query. It contains all columns from table_expression excluding pivot_column and value_column. Understanding this is probably the key to understanding why queries you write with the PIVOT operator work as they do, and why you might get errors in some cases.

Possible result columns include values from group_by_list and . If you specify an asterisk (*), the query returns both lists. The data part of the result columns or the result column values are calculated by aggregate_function with value_column as the argument.

The following color-highlighted code illustrates the different elements in the query using the PIVOT operator:

SELECT * -- itemid, [artist], [name], [type], [height], [width]
FROM ItemAttributes AS ATR
  PIVOT
  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT
WHERE itemid IN(5,6)

And the following relates the different elements to the query that does not use the PIVOT operator:

SELECT
  itemid,
  MAX(CASE WHEN attribute = 'artist'  THEN value END) AS [artist],
  MAX(CASE WHEN attribute = 'name'    THEN value END) AS [name],
  MAX(CASE WHEN attribute = 'type'    THEN value END) AS [type],
  MAX(CASE WHEN attribute = 'height'  THEN value END) AS [height],
  MAX(CASE WHEN attribute = 'width'   THEN value END) AS [width]
FROM ItemAttributes AS ATR
WHERE itemid IN(5,6)
GROUP BY itemid

Note that you must explicitly specify the values in . The PIVOT operator does not provide an option to derive those dynamically from pivot_column in a static query. You can use dynamic SQL to construct the query string yourself to achieve this.

Taking the previous PIVOT query a step further, suppose you want to return, for each auction item, all attributes relevant to paintings. You want to include those attributes that appear in AuctionItems and those that appear in ItemAttributes. You might try the following query, which returns an error:

SELECT *
FROM AuctionItems AS ITM
  JOIN ItemAttributes AS ATR
    ON ITM.itemid = ATR.itemid
  PIVOT

  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT
WHERE itemtype = 'Painting'

Here is the error message:

.Net SqlClient Data Provider: Msg 8156, Level 16, State 1, Line 1

The column 'itemid' was specified multiple times for 'PVT'.

Remember that PIVOT works on table_expression, which is the virtual table returned by the section in the query between the FROM clause and the PIVOT clause. In this query, the virtual table contains two instances of the itemid column—one originating from AuctionItems and the other from ItemAttributes. You might be tempted to revise the query as follows, but you will also get an error:

SELECT ITM.itemid, itemtype, whenmade, initialprice, 
  [artist], [name], [type], [height], [width]
FROM AuctionItems AS ITM
  JOIN ItemAttributes AS ATR
    ON ITM.itemid = ATR.itemid
  PIVOT
  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT
WHERE itemtype = 'Painting'

Here is the error message:

.Net SqlClient Data Provider: Msg 8156, Level 16, State 1, Line 1
The column 'itemid' was specified multiple times for 'PVT'.
.Net SqlClient Data Provider: Msg 107, Level 15, State 1, Line 1
The column prefix 'ITM' does not match with a table name or alias name used in the query.

As mentioned earlier, the PIVOT operator works on the virtual table returned by table_expression and not on the columns in the select_list. The select_list is evaluated after the PIVOT operator performs its manipulations and can refer only to group_by_list and column_list. That is why the ITM alias is no longer recognized in the select_list. If you understand this, you realize that you should provide PIVOT with a table_expression that contains only the columns you want to work on. This includes the grouping columns (only one occurrence of itemid plus itemtype, whenmade and initialprice), the pivot column (attribute), and the value column (value). You can achieve this by using CTEs or derived tables. Here is an example using a CTE:

WITH PNT
AS
(
  SELECT ITM.*, ATR.attribute, ATR.value
  FROM AuctionItems AS ITM
    JOIN ItemAttributes AS ATR
      ON ITM.itemid = ATR.itemid
  WHERE ITM.itemtype = 'Painting'
)
SELECT * FROM PNT
  PIVOT
  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT

Here is the result set:

itemid itemtype whenmade initialprice artist           name             type height width
------ -------- -------- ------------ ---------------- ---------------- ---- ------ -----
5      Painting 1873     8000000.0000 Claude Monet     Field of Poppies Oil  19.62  25.62
6      Painting 1889     8000000.0000 Vincent Van Gogh The Starry Night Oil  28.75  36.25

Here is an example using a derived table:

SELECT *
FROM (SELECT ITM.*, ATR.attribute, ATR.value
      FROM AuctionItems AS ITM
         JOIN ItemAttributes AS ATR
         ON ITM.itemid = ATR.itemid
      WHERE ITM.itemtype = 'Painting') AS PNT
  PIVOT

  (
    MAX(value)
    FOR attribute IN([artist], [name], [type], [height], [width])
  ) AS PVT

You can also use PIVOT when you want to generate a crosstab report to summarize data. For example, using the Purchasing.PurchaseOrderHeader table in the AdventureWorks database, suppose you want to return the number of orders each employee made using each purchasing method, pivoting the purchase method IDs to columns. Keeping in mind that you should provide the PIVOT operator with only the relevant data, you use a derived table and write the following query:

SELECT EmployeeID, [1] AS SM1, [2] AS SM2,
  [3] AS SM3, [4] AS SM4, [5] AS SM5
FROM (SELECT PurchaseOrderID, EmployeeID, ShipMethodID
      FROM Purchasing.PurchaseOrderHeader) ORD
  PIVOT
  (
   COUNT(PurchaseOrderID)
   FOR ShipMethodID IN([1], [2], [3], [4], [5])
  ) AS PVT

Here is the result set:

EmployeeID  SM1         SM2         SM3         SM4         SM5
----------- ----------- ----------- ----------- ----------- -----------
164         56          62          12          89          141
198         24          27          6           45          58
223         56          67          17          98          162
231         50          67          12          81          150
233         55          62          12          106         125
238         53          58          13          102         134
241         50          59          13          108         130
244         55          47          17          93          148
261         58          54          11          120         117
264         50          58          15          86          151
266         58          68          14          116         144
274         24          26          6           41          63

The COUNT(PurchaseOrderID) function counts the number of rows for each ship method in the list. Note that PIVOT disallows the use of COUNT(*). Column aliases are used to provide more descriptive names to the result columns. Using PIVOT to show an order count for each ship method in a different column is reasonable when you have a small number of ship methods whose IDs are known ahead of time.

You can also pivot values that are derived from expressions. For example, suppose you want to return the total freight value for each employee in each order year, pivoting the years to columns. The order year is derived from the OrderDate column:

SELECT EmployeeID, [2001] AS Y2001, [2002] AS Y2002,
  [2003] AS Y2003, [2004] AS Y2004
FROM (SELECT EmployeeID, YEAR(OrderDate) AS OrderYear, Freight
      FROM Purchasing.PurchaseOrderHeader) AS ORD
  PIVOT
  (
   SUM(Freight)
   FOR OrderYear IN([2001], [2002], [2003], [2004])
  ) AS PVT

Here is the result set:

EmployeeID  Y2001       Y2002       Y2003       Y2004
----------- ----------- ----------- ----------- ------------
164         509.9325    14032.0215  34605.3459  105087.7428
198         NULL        5344.4771   14963.0595  45020.9178
223         365.7019    12496.0776  37489.2896  117599.4156
231         6.8025      9603.0502   37604.3258  75435.8619
233         1467.1388   9590.7355   32988.0643  98603.745
238         17.3345     9745.1001   37836.583   100106.3678
241         221.1825    6865.7299   35559.3883  114430.983
244         5.026       5689.4571   35449.316   74690.3755
261         NULL        10483.27    32854.9343  73992.8431
264         NULL        10337.3207  37170.1957  82406.4474
266         4.2769      9588.8228   38533.9582  115291.2472
274         NULL        1877.2665   13708.9336  41011.3821

Crosstab reports are common in data warehouse scenarios. Consider the following OrdersFact table, which you populate with sales orders and sales order details data from AdventureWorks:

CREATE TABLE OrdersFact
(
  OrderID    INT      NOT NULL,
  ProductID  INT      NOT NULL,
  CustomerID NCHAR(5) NOT NULL,
  OrderYear  INT      NOT NULL,
  OrderMonth INT      NOT NULL,

  OrderDay   INT      NOT NULL,
  Quantity   INT      NOT NULL,
  PRIMARY KEY(OrderID, ProductID)
)
INSERT INTO OrdersFact
  SELECT O.SalesOrderID, OD.ProductID, O.CustomerID,
    YEAR(O.OrderDate) AS OrderYear, MONTH(O.OrderDate) AS OrderMonth,
    DAY(O.OrderDate) AS OrderDay, OD.OrderQty
  FROM Sales.SalesOrderHeader AS O
    JOIN Sales.SalesOrderDetail AS OD
      ON O.SalesOrderID = OD.SalesOrderID

To get the total quantities for each year and month, returning years in rows and months in columns, you use the following query:

SELECT *
FROM (SELECT OrderYear, OrderMonth, Quantity
      FROM OrdersFact) AS ORD
 PIVOT
 (
  SUM(Quantity)
  FOR OrderMonth IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
 ) AS PVT

Here is the result set:

OrderYear  1     2     3     4     5     6     7     8     9     10    11    12
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2001       NULL  NULL  NULL  NULL  NULL  NULL  966   2209  1658  1403  3132  2480
2002       1040  2303  1841  1467  3179  2418  7755  11325 9066  5584  8268  6672
2003       3532  5431  4132  5694  8278  6444  11288 18986 18681 11607 14771 15855
2004       9227  10999 11314 12239 15656 15805 2209  NULL  NULL  NULL  NULL  NULL

PIVOT returns null values for nonexistent intersections between year and month. A year appears in the result if it appears in the input table expression (the derived table ORD), regardless of whether or not it has an intersection with any of the specified months. This means that you might get a row with NULL in all columns if you do not specify all existing months. However, null values in the result do not necessarily represent nonexistent intersections. They might result from base null values in the quantity column, unless the column disallows null values. If you want to override NULL and see another value instead, for example 0, you can do so by using the ISNULL() function in the select list:

SELECT OrderYear,
  ISNULL([1],  0) AS M01,
  ISNULL([2],  0) AS M02,
  ISNULL([3],  0) AS M03,
  ISNULL([4],  0) AS M04,
  ISNULL([5],  0) AS M05,
  ISNULL([6],  0) AS M06,
  ISNULL([7],  0) AS M07,
  ISNULL([8],  0) AS M08,
  ISNULL([9],  0) AS M09,
  ISNULL([10], 0) AS M10,
  ISNULL([11], 0) AS M11,
  ISNULL([12], 0) AS M12
FROM (SELECT OrderYear, OrderMonth, Quantity
      FROM OrdersFact) AS ORD
 PIVOT
 (
  SUM(Quantity)
  FOR OrderMonth IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
 ) AS PVT

Here is the result set:

OrderYear  1     2     3     4     5     6     7     8     9     10    11    12
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2001       0     0     0     0     0     0     966   2209  1658  1403  3132  2480
2002       1040  2303  1841  1467  3179  2418  7755  11325 9066  5584  8268  6672
2003       3532  5431  4132  5694  8278  6444  11288 18986 18681 11607 14771 15855
2004       9227  10999 11314 12239 15656 15805 2209  0     0     0     0     0

Using ISNULL(Quantity, 0) within the derived table would only take care of base null values in the Quantity column (if such existed) not of null values that PIVOT generated for nonexistent intersections.

Suppose you want to return the total quantities for each customer ID in the range 1–9 against a combination of year and month values in the first quarter of each of the years 2003 and 2004. To get the year and month values in rows and customer IDs in columns, use the following query:

SELECT *
FROM (SELECT CustomerID, OrderYear, OrderMonth, Quantity
      FROM OrdersFact
      WHERE CustomerID BETWEEN 1 AND 9
        AND OrderYear IN(2003, 2004)

        AND OrderMonth IN(1, 2, 3)) AS ORD
  PIVOT
  (
    SUM(Quantity)
    FOR CustomerID IN([1],[2],[3],[4],[5],[6],[7],[8],[9])
  ) AS PVT

Here is the result set:

OrderYear   OrderMonth  1    2    3    4    5    6    7    8    9
----------- ----------- ---- ---- ---- ---- ---- ---- ---- ---- ----
2003        1           NULL NULL NULL 105  NULL NULL 8    NULL NULL
2004        1           NULL NULL NULL 80   NULL NULL NULL NULL NULL
2003        2           NULL 5    NULL NULL NULL NULL NULL NULL 15
2004        2           NULL 10   NULL NULL NULL NULL NULL 6    3
2003        3           NULL NULL 105  NULL 15   NULL NULL NULL NULL
2004        3           NULL NULL 103  NULL 25   4    NULL NULL NULL 

The implied group-by list in this case is OrderYear and OrderMonth, because CustomerID and Quantity are used as the pivot and value columns, respectively.

However, if you want the combination of year and month values to appear as columns, you must concatenate them yourself before passing them to the PIVOT operator, because there can be only one pivot column:

SELECT *
FROM (SELECT CustomerID, OrderYear*100+OrderMonth AS YM, Quantity
      FROM OrdersFact
      WHERE CustomerID BETWEEN 1 AND 9
        AND OrderYear IN(2003, 2004)
        AND OrderMonth IN(1, 2, 3)) AS ORD
  PIVOT
  (
    SUM(Quantity)
    FOR YM IN([200301],[200302],[200303],[200401],[200402],[200403])
  ) AS PVT

Here is the result set:

CustomerID 200301 200302 200303 200401 200402 200403
---------- ------ ------ ------ ------ ------ ------
2          NULL   5      NULL   NULL   10     NULL
3          NULL   NULL   105    NULL   NULL   103
6          NULL   NULL   NULL   NULL   NULL   4
4          105    NULL   NULL   80     NULL   NULL
8          NULL   NULL   NULL   NULL   6      NULL
5          NULL   NULL   15     NULL   NULL   25

7          8      NULL   NULL   NULL   NULL   NULL
9          NULL   15     NULL   NULL   3      NULL

UNPIVOT

The UNPIVOT operator allows you to normalize prepivoted data. The syntax and elements of the UNPIVOT operator are similar to those of the PIVOT operator.

For example, consider the AuctionItems table from the previous section:

itemid      itemtype                 whenmade    initialprice
----------- ------------------------ ----------- --------------
1           Wine                      1822          3000.0000
2           Wine                      1807           500.0000
3           Chair                    1753         800000.0000
4           Ring                      -501       1000000.0000
5           Painting                  1873       8000000.0000
6           Painting                  1889       8000000.0000

Suppose you want each attribute to appear in a different row similar to the way attributes are kept in the ItemAttributes table:

itemid      attribute       value
----------- --------------- -------
1           itemtype        Wine
1           whenmade        1822
1           initialprice    3000.00
2           itemtype        Wine
2           whenmade        1807
2           initialprice    500.00
3           itemtype        Chair
3           whenmade        1753
3           initialprice    800000.00
4           itemtype        Ring
4           whenmade        -501
4           initialprice    1000000.00
5           itemtype        Painting
5           whenmade        1873
5           initialprice    8000000.00
6           itemtype        Painting
6           whenmade        1889
6           initialprice    8000000.00

In the UNPIVOT query, you want to rotate the columns itemtype, whenmade, and initialprice to rows. Each row should have the item ID, attribute, and value. The new column names that you must provide are attribute and value. They correspond to the pivot_column and value_column in the PIVOT operator. The attribute column should get the actual column names that you want to rotate (itemtype, whenmade, and initialprice) as values. The value column should get the values from the three different source columns into one destination column. To help clarify, first a version of an UNPIVOT query that is not valid is presented, followed by a valid one in which you apply some restrictions:

SELECT itemid, attribute, value
FROM AuctionItems
  UNPIVOT
  (
    value FOR attribute IN([itemtype], [whenmade], [initialprice])
  ) AS UPV

As arguments to the PIVOT operator, you provide a name for value_column (value in this case) followed by the FOR clause. Following the FOR clause, provide a name for pivot_column (attribute in this case) and then an IN clause with the list of source column names you want to get as values in pivot_column. This list of columns is referred to as in the PIVOT operator. This query generates the following error:

.Net SqlClient Data Provider: Msg 8167, Level 16, State 1, Line 1

Type of column 'whenmade' conflicts with the type of other columns specified in the UNPIVOT list.

The destination value column contains values originating from several different source columns (those that appear in ). Because the target of all column values is a single column, UNPIVOT requires that all columns in have the same data type, length, and precision. To meet this restriction, you can provide the UNPIVOT operator with a table expression that converts the three columns to the same data type. The sql_variant data type is a good candidate because you can convert the different source columns to the same data type and still retain their original data types. Applying this restriction, you revise the previous query as follows and get the result you want:

SELECT itemid, attribute, value
FROM (SELECT itemid,
        CAST(itemtype     AS SQL_VARIANT) AS itemtype,
        CAST(whenmade     AS SQL_VARIANT) AS whenmade,
        CAST(initialprice AS SQL_VARIANT) AS initialprice
      FROM AuctionItems) AS ITM
  UNPIVOT
  (
    value FOR attribute IN([itemtype], [whenmade], [initialprice])
  ) AS UPV

The data type of the result attribute column is sysname. This is the data type SQL Server uses for storing object names.

Note that the UNPIVOT operator eliminates null values in the value column from the result; therefore, it cannot be considered to be the exact reverse operation as the PIVOT operator.

Having rotated the columns in AuctionItems into rows, you can now union the result of the UNPIVOT operation with the rows from ItemAttributes to provide a unified result:

SELECT itemid, attribute, value
FROM (SELECT itemid,
        CAST(itemtype     AS SQL_VARIANT) AS itemtype,
        CAST(whenmade     AS SQL_VARIANT) AS whenmade,
        CAST(initialprice AS SQL_VARIANT) AS initialprice
      FROM AuctionItems) AS ITM
  UNPIVOT
  (
    value FOR attribute IN([itemtype], [whenmade], [initialprice])
  ) AS UPV
UNION ALL
SELECT *
FROM ItemAttributes
ORDER BY itemid, attribute

Here is the result set:

itemid      attribute       value
----------- --------------- -------------
1           color           Red
1           initialprice    3000.00
1           itemtype        Wine
1           manufacturer    ABC
1           type            Pinot Noir
1           whenmade        1822
2           color           Red
2           initialprice    500.00
2           itemtype        Wine
2           manufacturer    XYZ
2           type            Porto
2           whenmade        1807
3           initialprice    800000.00
3           itemtype        Chair

3           material        Wood
3           padding         Silk
3           whenmade        1753
4           initialprice    1000000.00
4           inscription     One ring
4           itemtype        Ring
4           material        Gold
4           size            10
4           whenmade        -501
5           height          19.625
5           initialprice    8000000.00
5           itemtype        Painting
5           name            Field of Poppies
5           artist          Claude Monet
5           type            Oil
5           whenmade        1873
5           width           25.625
6           height          28.750
6           initialprice    8000000.00
6           itemtype        Painting
6           name            The Starry Night
6           artist          Vincent Van Gogh
6           type            Oil
6           whenmade        1889
6           width           36.250

APPLY

The APPLY relational operator allows you to invoke a specified table-valued function once per each row of an outer table expression. You specify APPLY in the FROM clause of a query, similar to the way you use the JOIN relational operator. APPLY comes in two forms: CROSS APPLY and OUTER APPLY. With the APPLY operator, SQL Server 2005 Beta 2 allows you to refer to a table-valued function in a correlated subquery.

CROSS APPLY

CROSS APPLY invokes a table-valued function for each row in an outer table expression. You can refer to columns in the outer table as arguments to the table-valued function. CROSS APPLY returns a unified result set out of all of the results returned by the individual invocations of the table-valued function. If the table-valued function returns an empty set for a given outer row, that outer row is not returned in the result. For example, the following table-valued function accepts two integers as arguments and returns a table with one row, with the minimum and maximum values as columns:

CREATE FUNCTION dbo.fn_scalar_min_max(@p1 AS INT, @p2 AS INT) RETURNS TABLE
AS
RETURN
  SELECT
    CASE
      WHEN @p1 < @p2 THEN @p1
      WHEN @p2 < @p1 THEN @p2
      ELSE COALESCE(@p1, @p2)
    END AS mn,
    CASE
      WHEN @p1 > @p2 THEN @p1
      WHEN @p2 > @p1 THEN @p2
      ELSE COALESCE(@p1, @p2)
    END AS mx
GO
SELECT * FROM fn_scalar_min_max(10, 20)

Here is the result set:

mn          mx         
----------- -----------
10          20         

Given the following T1 table:

CREATE TABLE T1
(
  col1 INT NULL,
  col2 INT NULL
)

INSERT INTO T1 VALUES(10, 20)
INSERT INTO T1 VALUES(20, 10)
INSERT INTO T1 VALUES(NULL, 30)
INSERT INTO T1 VALUES(40, NULL)
INSERT INTO T1 VALUES(50, 50)

You want to invoke fn_scalar_min_max for each row in T1. You write a CROSS APPLY query as follows:

SELECT *
FROM T1 CROSS APPLY fn_scalar_min_max(col1, col2) AS M

Here is the result set:

col1        col2        mn          mx         
----------- ----------- ----------- -----------
10          20          10          20         
20          10          10          20         
NULL        30          30          30         
40          NULL        40          40         
50          50          50          50         

If the table-valued function returns multiple rows for a certain outer row, the outer row is returned multiple times. Consider the Employees table used earlier in this paper in the Recursive Queries and Common Table Expressions section (Employees Organizational Chart scenario). In the same database, you also create the following Departments table:

CREATE TABLE Departments
(
  deptid    INT NOT NULL PRIMARY KEY,
  deptname  VARCHAR(25) NOT NULL,
  deptmgrid INT NULL REFERENCES Employees
)
SET NOCOUNT ON
INSERT INTO Departments VALUES(1, 'HR',           2)
INSERT INTO Departments VALUES(2, 'Marketing',    7)
INSERT INTO Departments VALUES(3, 'Finance',      8)
INSERT INTO Departments VALUES(4, 'R&D',          9)
INSERT INTO Departments VALUES(5, 'Training',     4)
INSERT INTO Departments VALUES(6, 'Gardening', NULL)

Most departments have a manager ID that corresponds to an employee in the Employees table, but as in the case of the Gardening department, it is possible for a department to have no manager. Note that a manager in the Employees table does necessarily also manage a department. The following table-valued function accepts an employee ID as an argument and returns that employee and all of his or her subordinates in all levels:

CREATE FUNCTION dbo.fn_getsubtree(@empid AS INT) RETURNS @TREE TABLE
(
  empid   INT NOT NULL,
  empname VARCHAR(25) NOT NULL,
  mgrid   INT NULL,
  lvl     INT NOT NULL
)
AS
BEGIN
  WITH Employees_Subtree(empid, empname, mgrid, lvl)
  AS
  ( 
    -- Anchor Member (AM)
    SELECT empid, empname, mgrid, 0
    FROM employees
    WHERE empid = @empid
    UNION all
    -- Recursive Member (RM)
    SELECT e.empid, e.empname, e.mgrid, es.lvl+1
    FROM employees AS e
      JOIN employees_subtree AS es
        ON e.mgrid = es.empid
  )
  INSERT INTO @TREE
    SELECT * FROM Employees_Subtree
  RETURN
END
GO

To return all of the subordinates in all levels for the manager of each department, use the following query:

SELECT *
FROM Departments AS D
  CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST

Here is the result set:

deptid      deptname   deptmgrid   empid       empname    mgrid       lvl
----------- ---------- ----------- ----------- ---------- ----------- ---
1           HR         2           2           Andrew     1           0
1           HR         2           5           Steven     2           1
1           HR         2           6           Michael    2           1
2           Marketing  7           7           Robert     3           0
2           Marketing  7           11          David      7           1
2           Marketing  7           12          Ron        7           1
2           Marketing  7           13          Dan        7           1
2           Marketing  7           14          James      11          2
3           Finance    8           8           Laura      3           0
4           R&D        9           9           Ann        3           0
5           Training   4           4           Margaret   1           0
5           Training   4           10          Ina        4           1

There are two things to notice here. First, each row from Departments is duplicated as many times as there are rows returned from fn_getsubtree for the department's manager. Second, the Gardening department does not appear in the result because fn_getsubtree returned an empty set for it.

Another practical use of the CROSS APPLY operator answers a common request: returning n rows for each group. For example, the following function returns the requested number of most recent orders for a given customer:

USE AdventureWorks
GO
CREATE FUNCTION fn_getnorders(@custid AS INT, @n AS INT)
  RETURNS TABLE
AS
RETURN
  SELECT TOP(@n) *
  FROM Sales.SalesOrderHeader
  WHERE CustomerID = @custid
  ORDER BY OrderDate DESC
GO

Using the CROSS APPLY operator, you can get the two most recent orders for each customer by using the following simple query:

SELECT O.*
FROM Sales.Customer AS C
  CROSS APPLY fn_getnorders(C.CustomerID, 2) AS O

For more information about the enhancement to TOP, see "TOP Enhancements" later in this paper.

OUTER APPLY

OUTER APPLY is very similar to CROSS APPLY, but it also returns rows from the outer table for which the table-valued function returned an empty set. Null values are returned as the column values that correspond to the columns of the table-valued function. For example, revise the query against the Departments table from the previous section to use OUTER APPLY instead of CROSS APPLY and notice the last row in the output:

SELECT *
FROM Departments AS D
  OUTER APPLY fn_getsubtree(D.deptmgrid) AS ST

Here is the result set:

deptid      deptname   deptmgrid   empid       empname    mgrid       lvl
----------- ---------- ----------- ----------- ---------- ----------- ---
1           HR         2           2           Andrew     1           0
1           HR         2           5           Steven     2           1
1           HR         2           6           Michael    2           1
2           Marketing  7           7           Robert     3           0
2           Marketing  7           11          David      7           1
2           Marketing  7           12          Ron        7           1
2           Marketing  7           13          Dan        7           1
2           Marketing  7           14          James      11          2
3           Finance    8           8           Laura      3           0
4           R&D        9           9           Ann        3           0
5           Training   4           4           Margaret   1           0
5           Training   4           10          Ina        4           1
6           Gardening  NULL        NULL        NULL       NULL        NULL

Table-Valued Functions in Correlated Subqueries

In SQL Server 2000, you cannot refer to table-valued functions within a correlated subquery. In conjunction with providing the APPLY relational operator, this restriction is removed in SQL Server 2005 Beta 2. Now, within a subquery, you can provide a table-valued function with columns from the outer query as arguments. For example, if you want to return only those departments whose manager has at least three employees, you can write the following query:

SELECT *
FROM Departments AS D
WHERE (SELECT COUNT(*)
       FROM fn_getsubtree(D.deptmgrid)) >= 3
deptid      deptname                  deptmgrid  
----------- ------------------------- -----------
1           HR                        2          
2           Marketing                 7          


 

Print | posted @ Tuesday, July 05, 2005 10:12 AM

Comments on this entry:

Gravatar # re: TSQL ENHANCEMENTS
by Kay at 8/18/2005 9:58 PM

Excellent Paper. Thoroughly enjoyed reading it.
Gravatar # re: TSQL ENHANCEMENTS
by Pete at 9/1/2009 11:37 AM

Very, very good article. Searching for thorough explanation of more-than-basic pivot operations, common errors etc and found this. Excellent.
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification: