PIVOT and UNPIVOT
PIVOT and UNPIVOT are new relational operators that you specify in the FROM clause of a query. They perform some manipulation on an input table-valued expression and produce an output table as a result. The PIVOT operator rotates rows into columns, possibly performing aggregations along the way. It widens the input table expression based on a given pivot column, generating an output table with a column for each unique value in the pivot column. The UNPIVOT operator performs the opposite operation of that performed by the PIVOT operator; it rotates columns into rows. It narrows the input table expression based on a pivot column.
PIVOT
The PIVOT operator is useful for handling open-schema scenarios and for generating crosstab reports.
In an open-schema scenario, you maintain entities with sets of attributes that are either not known ahead or different for each entity type. The users of your application define the attributes dynamically. Instead of predefining many columns and storing many null values in your tables, you split the attributes into different rows and store only the relevant attributes for each entity instance.
PIVOT allows you to generate crosstab reports for open-schema and other scenarios in which you rotate rows into columns, possibly calculating aggregations along the way and presenting the data in a useful form.
An example of an open-schema scenario is a database that keeps track of items put up for auction. Some attributes are relevant for all auction items, such as the item type, when it was made, and its initial price. Only the attributes that are relevant for all items are stored in the AuctionItems table:
CREATE TABLE AuctionItems
(
itemid INT NOT NULL PRIMARY KEY NONCLUSTERED,
itemtype NVARCHAR(30) NOT NULL,
whenmade INT NOT NULL,
initialprice MONEY NOT NULL,
/* other columns */
)
CREATE UNIQUE CLUSTERED INDEX idx_uc_itemtype_itemid
ON AuctionItems(itemtype, itemid)
INSERT INTO AuctionItems VALUES(1, N'Wine', 1822, 3000)
INSERT INTO AuctionItems VALUES(2, N'Wine', 1807, 500)
INSERT INTO AuctionItems VALUES(3, N'Chair', 1753, 800000)
INSERT INTO AuctionItems VALUES(4, N'Ring', -501, 1000000)
INSERT INTO AuctionItems VALUES(5, N'Painting', 1873, 8000000)
INSERT INTO AuctionItems VALUES(6, N'Painting', 1889, 8000000)
Other attributes are specific to the item type, and new items of different types are continuously being added. Such attributes can be stored in a different ItemAttributes table in which each item attribute is stored in a different row. Each row contains the item ID, attribute name, and attribute value:
CREATE TABLE ItemAttributes
(
itemid INT NOT NULL REFERENCES AuctionItems,
attribute NVARCHAR(30) NOT NULL,
value SQL_VARIANT NOT NULL,
PRIMARY KEY (itemid, attribute)
)
INSERT INTO ItemAttributes
VALUES(1, N'manufacturer', CAST(N'ABC' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(1, N'type', CAST(N'Pinot Noir' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(1, N'color', CAST(N'Red' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(2, N'manufacturer', CAST(N'XYZ' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(2, N'type', CAST(N'Porto' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(2, N'color', CAST(N'Red' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(3, N'material', CAST(N'Wood' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(3, N'padding', CAST(N'Silk' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(4, N'material', CAST(N'Gold' AS NVARCHAR(15)))
INSERT INTO ItemAttributes
VALUES(4, N'inscription', CAST(N'One ring ...' AS NVARCHAR(50)))
INSERT INTO ItemAttributes
VALUES(4, N'size', CAST(10 AS INT))
INSERT INTO ItemAttributes
VALUES(5, N'artist', CAST(N'Claude Monet' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(5, N'name', CAST(N'Field of Poppies' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(5, N'type', CAST(N'Oil' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(5, N'height', CAST(19.625 AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
VALUES(5, N'width', CAST(25.625 AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
VALUES(6, N'artist', CAST(N'Vincent Van Gogh' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(6, N'name', CAST(N'The Starry Night' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(6, N'type', CAST(N'Oil' AS NVARCHAR(30)))
INSERT INTO ItemAttributes
VALUES(6, N'height', CAST(28.75 AS NUMERIC(9,3)))
INSERT INTO ItemAttributes
VALUES(6, N'width', CAST(36.25 AS NUMERIC(9,3)))
Note that the sql_variant data type is used for the value column because different attribute values may be of different data types. For example, the size attribute stores an integer attribute value, and a name attribute stores a character string attribute value.
Suppose you want to present the data from the ItemAttributes table with a row for each item that is a painting (items 5,6) and a column for each attribute. Without the PIVOT operator, you must write a query such as:
SELECT
itemid,
MAX(CASE WHEN attribute = 'artist' THEN value END) AS [artist],
MAX(CASE WHEN attribute = 'name' THEN value END) AS [name],
MAX(CASE WHEN attribute = 'type' THEN value END) AS [type],
MAX(CASE WHEN attribute = 'height' THEN value END) AS [height],
MAX(CASE WHEN attribute = 'width' THEN value END) AS [width]
FROM ItemAttributes AS ATR
WHERE itemid IN(5,6)
GROUP BY itemid
Here is the result set:
itemid artist name type height width
------ ---------------- ---------------- ---------- ------ ------
5 Claude Monet Field of Poppies Oil 19.625 25.625
6 Vincent Van Gogh The Starry Night Oil 28.750 36.250
The PIVOT operator allows you to maintain shorter and more readable code to achieve the same results:
SELECT *
FROM ItemAttributes AS ATR
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
WHERE itemid IN(5,6)
As with most new features, understanding the PIVOT operator comes with experimentation and use. Some of the elements in the PIVOT syntax are apparent and only require that you figure out the relationship of these elements to the query that does not use the new operator. Others are hidden.
You may find the following terms helpful in understanding the semantics of the PIVOT operator:
table_expression
The virtual table on which the PIVOT operator works (the part in the query between the FROM clause and the PIVOT operator): ItemAttributes AS ATR in this case.
pivot_column
The column from table_expression whose values you want to rotate into result columns: attribute in this case.
column_list
The list of values from pivot_column that you want to present as result columns (in the parentheses followed by the IN clause). These must be expressed as legal identifiers: [artist], [name], [type], [height], [width] in this case.
aggregate_function
The aggregate function that you use to generate the data or column values in the result: MAX() in this case.
value_column
The column from table_expression that you use as the argument for aggregate_function: value in this case.
group_by_list
The hidden part—ALL columns from table_expression excluding pivot_column and value_column which are used to group the result: itemid in this case.
select_list
The list of columns following the SELECT clause that might include any column(s) from group_by_list and column_list. Aliases can be used to change the name of the result columns: * in this case returns all columns from group_by_list and column_list.
The PIVOT operator returns one row for each unique value in group_by_list as if you had a query with a GROUP BY clause and specified those columns. Notice that group_by_list is implied; it is not specified explicitly anywhere in the query. It contains all columns from table_expression excluding pivot_column and value_column. Understanding this is probably the key to understanding why queries you write with the PIVOT operator work as they do, and why you might get errors in some cases.
Possible result columns include values from group_by_list and . If you specify an asterisk (*), the query returns both lists. The data part of the result columns or the result column values are calculated by aggregate_function with value_column as the argument.
The following color-highlighted code illustrates the different elements in the query using the PIVOT operator:
SELECT * -- itemid, [artist], [name], [type], [height], [width]
FROM ItemAttributes AS ATR
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
WHERE itemid IN(5,6)
And the following relates the different elements to the query that does not use the PIVOT operator:
SELECT
itemid,
MAX(CASE WHEN attribute = 'artist' THEN value END) AS [artist],
MAX(CASE WHEN attribute = 'name' THEN value END) AS [name],
MAX(CASE WHEN attribute = 'type' THEN value END) AS [type],
MAX(CASE WHEN attribute = 'height' THEN value END) AS [height],
MAX(CASE WHEN attribute = 'width' THEN value END) AS [width]
FROM ItemAttributes AS ATR
WHERE itemid IN(5,6)
GROUP BY itemid
Note that you must explicitly specify the values in . The PIVOT operator does not provide an option to derive those dynamically from pivot_column in a static query. You can use dynamic SQL to construct the query string yourself to achieve this.
Taking the previous PIVOT query a step further, suppose you want to return, for each auction item, all attributes relevant to paintings. You want to include those attributes that appear in AuctionItems and those that appear in ItemAttributes. You might try the following query, which returns an error:
SELECT *
FROM AuctionItems AS ITM
JOIN ItemAttributes AS ATR
ON ITM.itemid = ATR.itemid
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
WHERE itemtype = 'Painting'
Here is the error message:
.Net SqlClient Data Provider: Msg 8156, Level 16, State 1, Line 1
The column 'itemid' was specified multiple times for 'PVT'.
Remember that PIVOT works on table_expression, which is the virtual table returned by the section in the query between the FROM clause and the PIVOT clause. In this query, the virtual table contains two instances of the itemid column—one originating from AuctionItems and the other from ItemAttributes. You might be tempted to revise the query as follows, but you will also get an error:
SELECT ITM.itemid, itemtype, whenmade, initialprice,
[artist], [name], [type], [height], [width]
FROM AuctionItems AS ITM
JOIN ItemAttributes AS ATR
ON ITM.itemid = ATR.itemid
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
WHERE itemtype = 'Painting'
Here is the error message:
.Net SqlClient Data Provider: Msg 8156, Level 16, State 1, Line 1
The column 'itemid' was specified multiple times for 'PVT'.
.Net SqlClient Data Provider: Msg 107, Level 15, State 1, Line 1
The column prefix 'ITM' does not match with a table name or alias name used in the query.
As mentioned earlier, the PIVOT operator works on the virtual table returned by table_expression and not on the columns in the select_list. The select_list is evaluated after the PIVOT operator performs its manipulations and can refer only to group_by_list and column_list. That is why the ITM alias is no longer recognized in the select_list. If you understand this, you realize that you should provide PIVOT with a table_expression that contains only the columns you want to work on. This includes the grouping columns (only one occurrence of itemid plus itemtype, whenmade and initialprice), the pivot column (attribute), and the value column (value). You can achieve this by using CTEs or derived tables. Here is an example using a CTE:
WITH PNT
AS
(
SELECT ITM.*, ATR.attribute, ATR.value
FROM AuctionItems AS ITM
JOIN ItemAttributes AS ATR
ON ITM.itemid = ATR.itemid
WHERE ITM.itemtype = 'Painting'
)
SELECT * FROM PNT
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
Here is the result set:
itemid itemtype whenmade initialprice artist name type height width
------ -------- -------- ------------ ---------------- ---------------- ---- ------ -----
5 Painting 1873 8000000.0000 Claude Monet Field of Poppies Oil 19.62 25.62
6 Painting 1889 8000000.0000 Vincent Van Gogh The Starry Night Oil 28.75 36.25
Here is an example using a derived table:
SELECT *
FROM (SELECT ITM.*, ATR.attribute, ATR.value
FROM AuctionItems AS ITM
JOIN ItemAttributes AS ATR
ON ITM.itemid = ATR.itemid
WHERE ITM.itemtype = 'Painting') AS PNT
PIVOT
(
MAX(value)
FOR attribute IN([artist], [name], [type], [height], [width])
) AS PVT
You can also use PIVOT when you want to generate a crosstab report to summarize data. For example, using the Purchasing.PurchaseOrderHeader table in the AdventureWorks database, suppose you want to return the number of orders each employee made using each purchasing method, pivoting the purchase method IDs to columns. Keeping in mind that you should provide the PIVOT operator with only the relevant data, you use a derived table and write the following query:
SELECT EmployeeID, [1] AS SM1, [2] AS SM2,
[3] AS SM3, [4] AS SM4, [5] AS SM5
FROM (SELECT PurchaseOrderID, EmployeeID, ShipMethodID
FROM Purchasing.PurchaseOrderHeader) ORD
PIVOT
(
COUNT(PurchaseOrderID)
FOR ShipMethodID IN([1], [2], [3], [4], [5])
) AS PVT
Here is the result set:
EmployeeID SM1 SM2 SM3 SM4 SM5
----------- ----------- ----------- ----------- ----------- -----------
164 56 62 12 89 141
198 24 27 6 45 58
223 56 67 17 98 162
231 50 67 12 81 150
233 55 62 12 106 125
238 53 58 13 102 134
241 50 59 13 108 130
244 55 47 17 93 148
261 58 54 11 120 117
264 50 58 15 86 151
266 58 68 14 116 144
274 24 26 6 41 63
The COUNT(PurchaseOrderID) function counts the number of rows for each ship method in the list. Note that PIVOT disallows the use of COUNT(*). Column aliases are used to provide more descriptive names to the result columns. Using PIVOT to show an order count for each ship method in a different column is reasonable when you have a small number of ship methods whose IDs are known ahead of time.
You can also pivot values that are derived from expressions. For example, suppose you want to return the total freight value for each employee in each order year, pivoting the years to columns. The order year is derived from the OrderDate column:
SELECT EmployeeID, [2001] AS Y2001, [2002] AS Y2002,
[2003] AS Y2003, [2004] AS Y2004
FROM (SELECT EmployeeID, YEAR(OrderDate) AS OrderYear, Freight
FROM Purchasing.PurchaseOrderHeader) AS ORD
PIVOT
(
SUM(Freight)
FOR OrderYear IN([2001], [2002], [2003], [2004])
) AS PVT
Here is the result set:
EmployeeID Y2001 Y2002 Y2003 Y2004
----------- ----------- ----------- ----------- ------------
164 509.9325 14032.0215 34605.3459 105087.7428
198 NULL 5344.4771 14963.0595 45020.9178
223 365.7019 12496.0776 37489.2896 117599.4156
231 6.8025 9603.0502 37604.3258 75435.8619
233 1467.1388 9590.7355 32988.0643 98603.745
238 17.3345 9745.1001 37836.583 100106.3678
241 221.1825 6865.7299 35559.3883 114430.983
244 5.026 5689.4571 35449.316 74690.3755
261 NULL 10483.27 32854.9343 73992.8431
264 NULL 10337.3207 37170.1957 82406.4474
266 4.2769 9588.8228 38533.9582 115291.2472
274 NULL 1877.2665 13708.9336 41011.3821
Crosstab reports are common in data warehouse scenarios. Consider the following OrdersFact table, which you populate with sales orders and sales order details data from AdventureWorks:
CREATE TABLE OrdersFact
(
OrderID INT NOT NULL,
ProductID INT NOT NULL,
CustomerID NCHAR(5) NOT NULL,
OrderYear INT NOT NULL,
OrderMonth INT NOT NULL,
OrderDay INT NOT NULL,
Quantity INT NOT NULL,
PRIMARY KEY(OrderID, ProductID)
)
INSERT INTO OrdersFact
SELECT O.SalesOrderID, OD.ProductID, O.CustomerID,
YEAR(O.OrderDate) AS OrderYear, MONTH(O.OrderDate) AS OrderMonth,
DAY(O.OrderDate) AS OrderDay, OD.OrderQty
FROM Sales.SalesOrderHeader AS O
JOIN Sales.SalesOrderDetail AS OD
ON O.SalesOrderID = OD.SalesOrderID
To get the total quantities for each year and month, returning years in rows and months in columns, you use the following query:
SELECT *
FROM (SELECT OrderYear, OrderMonth, Quantity
FROM OrdersFact) AS ORD
PIVOT
(
SUM(Quantity)
FOR OrderMonth IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) AS PVT
Here is the result set:
OrderYear 1 2 3 4 5 6 7 8 9 10 11 12
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2001 NULL NULL NULL NULL NULL NULL 966 2209 1658 1403 3132 2480
2002 1040 2303 1841 1467 3179 2418 7755 11325 9066 5584 8268 6672
2003 3532 5431 4132 5694 8278 6444 11288 18986 18681 11607 14771 15855
2004 9227 10999 11314 12239 15656 15805 2209 NULL NULL NULL NULL NULL
PIVOT returns null values for nonexistent intersections between year and month. A year appears in the result if it appears in the input table expression (the derived table ORD), regardless of whether or not it has an intersection with any of the specified months. This means that you might get a row with NULL in all columns if you do not specify all existing months. However, null values in the result do not necessarily represent nonexistent intersections. They might result from base null values in the quantity column, unless the column disallows null values. If you want to override NULL and see another value instead, for example 0, you can do so by using the ISNULL() function in the select list:
SELECT OrderYear,
ISNULL([1], 0) AS M01,
ISNULL([2], 0) AS M02,
ISNULL([3], 0) AS M03,
ISNULL([4], 0) AS M04,
ISNULL([5], 0) AS M05,
ISNULL([6], 0) AS M06,
ISNULL([7], 0) AS M07,
ISNULL([8], 0) AS M08,
ISNULL([9], 0) AS M09,
ISNULL([10], 0) AS M10,
ISNULL([11], 0) AS M11,
ISNULL([12], 0) AS M12
FROM (SELECT OrderYear, OrderMonth, Quantity
FROM OrdersFact) AS ORD
PIVOT
(
SUM(Quantity)
FOR OrderMonth IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) AS PVT
Here is the result set:
OrderYear 1 2 3 4 5 6 7 8 9 10 11 12
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2001 0 0 0 0 0 0 966 2209 1658 1403 3132 2480
2002 1040 2303 1841 1467 3179 2418 7755 11325 9066 5584 8268 6672
2003 3532 5431 4132 5694 8278 6444 11288 18986 18681 11607 14771 15855
2004 9227 10999 11314 12239 15656 15805 2209 0 0 0 0 0
Using ISNULL(Quantity, 0) within the derived table would only take care of base null values in the Quantity column (if such existed) not of null values that PIVOT generated for nonexistent intersections.
Suppose you want to return the total quantities for each customer ID in the range 1–9 against a combination of year and month values in the first quarter of each of the years 2003 and 2004. To get the year and month values in rows and customer IDs in columns, use the following query:
SELECT *
FROM (SELECT CustomerID, OrderYear, OrderMonth, Quantity
FROM OrdersFact
WHERE CustomerID BETWEEN 1 AND 9
AND OrderYear IN(2003, 2004)
AND OrderMonth IN(1, 2, 3)) AS ORD
PIVOT
(
SUM(Quantity)
FOR CustomerID IN([1],[2],[3],[4],[5],[6],[7],[8],[9])
) AS PVT
Here is the result set:
OrderYear OrderMonth 1 2 3 4 5 6 7 8 9
----------- ----------- ---- ---- ---- ---- ---- ---- ---- ---- ----
2003 1 NULL NULL NULL 105 NULL NULL 8 NULL NULL
2004 1 NULL NULL NULL 80 NULL NULL NULL NULL NULL
2003 2 NULL 5 NULL NULL NULL NULL NULL NULL 15
2004 2 NULL 10 NULL NULL NULL NULL NULL 6 3
2003 3 NULL NULL 105 NULL 15 NULL NULL NULL NULL
2004 3 NULL NULL 103 NULL 25 4 NULL NULL NULL
The implied group-by list in this case is OrderYear and OrderMonth, because CustomerID and Quantity are used as the pivot and value columns, respectively.
However, if you want the combination of year and month values to appear as columns, you must concatenate them yourself before passing them to the PIVOT operator, because there can be only one pivot column:
SELECT *
FROM (SELECT CustomerID, OrderYear*100+OrderMonth AS YM, Quantity
FROM OrdersFact
WHERE CustomerID BETWEEN 1 AND 9
AND OrderYear IN(2003, 2004)
AND OrderMonth IN(1, 2, 3)) AS ORD
PIVOT
(
SUM(Quantity)
FOR YM IN([200301],[200302],[200303],[200401],[200402],[200403])
) AS PVT
Here is the result set:
CustomerID 200301 200302 200303 200401 200402 200403
---------- ------ ------ ------ ------ ------ ------
2 NULL 5 NULL NULL 10 NULL
3 NULL NULL 105 NULL NULL 103
6 NULL NULL NULL NULL NULL 4
4 105 NULL NULL 80 NULL NULL
8 NULL NULL NULL NULL 6 NULL
5 NULL NULL 15 NULL NULL 25
7 8 NULL NULL NULL NULL NULL
9 NULL 15 NULL NULL 3 NULL
UNPIVOT
The UNPIVOT operator allows you to normalize prepivoted data. The syntax and elements of the UNPIVOT operator are similar to those of the PIVOT operator.
For example, consider the AuctionItems table from the previous section:
itemid itemtype whenmade initialprice
----------- ------------------------ ----------- --------------
1 Wine 1822 3000.0000
2 Wine 1807 500.0000
3 Chair 1753 800000.0000
4 Ring -501 1000000.0000
5 Painting 1873 8000000.0000
6 Painting 1889 8000000.0000
Suppose you want each attribute to appear in a different row similar to the way attributes are kept in the ItemAttributes table:
itemid attribute value
----------- --------------- -------
1 itemtype Wine
1 whenmade 1822
1 initialprice 3000.00
2 itemtype Wine
2 whenmade 1807
2 initialprice 500.00
3 itemtype Chair
3 whenmade 1753
3 initialprice 800000.00
4 itemtype Ring
4 whenmade -501
4 initialprice 1000000.00
5 itemtype Painting
5 whenmade 1873
5 initialprice 8000000.00
6 itemtype Painting
6 whenmade 1889
6 initialprice 8000000.00
In the UNPIVOT query, you want to rotate the columns itemtype, whenmade, and initialprice to rows. Each row should have the item ID, attribute, and value. The new column names that you must provide are attribute and value. They correspond to the pivot_column and value_column in the PIVOT operator. The attribute column should get the actual column names that you want to rotate (itemtype, whenmade, and initialprice) as values. The value column should get the values from the three different source columns into one destination column. To help clarify, first a version of an UNPIVOT query that is not valid is presented, followed by a valid one in which you apply some restrictions:
SELECT itemid, attribute, value
FROM AuctionItems
UNPIVOT
(
value FOR attribute IN([itemtype], [whenmade], [initialprice])
) AS UPV
As arguments to the PIVOT operator, you provide a name for value_column (value in this case) followed by the FOR clause. Following the FOR clause, provide a name for pivot_column (attribute in this case) and then an IN clause with the list of source column names you want to get as values in pivot_column. This list of columns is referred to as in the PIVOT operator. This query generates the following error:
.Net SqlClient Data Provider: Msg 8167, Level 16, State 1, Line 1
Type of column 'whenmade' conflicts with the type of other columns specified in the UNPIVOT list.
The destination value column contains values originating from several different source columns (those that appear in ). Because the target of all column values is a single column, UNPIVOT requires that all columns in have the same data type, length, and precision. To meet this restriction, you can provide the UNPIVOT operator with a table expression that converts the three columns to the same data type. The sql_variant data type is a good candidate because you can convert the different source columns to the same data type and still retain their original data types. Applying this restriction, you revise the previous query as follows and get the result you want:
SELECT itemid, attribute, value
FROM (SELECT itemid,
CAST(itemtype AS SQL_VARIANT) AS itemtype,
CAST(whenmade AS SQL_VARIANT) AS whenmade,
CAST(initialprice AS SQL_VARIANT) AS initialprice
FROM AuctionItems) AS ITM
UNPIVOT
(
value FOR attribute IN([itemtype], [whenmade], [initialprice])
) AS UPV
The data type of the result attribute column is sysname. This is the data type SQL Server uses for storing object names.
Note that the UNPIVOT operator eliminates null values in the value column from the result; therefore, it cannot be considered to be the exact reverse operation as the PIVOT operator.
Having rotated the columns in AuctionItems into rows, you can now union the result of the UNPIVOT operation with the rows from ItemAttributes to provide a unified result:
SELECT itemid, attribute, value
FROM (SELECT itemid,
CAST(itemtype AS SQL_VARIANT) AS itemtype,
CAST(whenmade AS SQL_VARIANT) AS whenmade,
CAST(initialprice AS SQL_VARIANT) AS initialprice
FROM AuctionItems) AS ITM
UNPIVOT
(
value FOR attribute IN([itemtype], [whenmade], [initialprice])
) AS UPV
UNION ALL
SELECT *
FROM ItemAttributes
ORDER BY itemid, attribute
Here is the result set:
itemid attribute value
----------- --------------- -------------
1 color Red
1 initialprice 3000.00
1 itemtype Wine
1 manufacturer ABC
1 type Pinot Noir
1 whenmade 1822
2 color Red
2 initialprice 500.00
2 itemtype Wine
2 manufacturer XYZ
2 type Porto
2 whenmade 1807
3 initialprice 800000.00
3 itemtype Chair
3 material Wood
3 padding Silk
3 whenmade 1753
4 initialprice 1000000.00
4 inscription One ring
4 itemtype Ring
4 material Gold
4 size 10
4 whenmade -501
5 height 19.625
5 initialprice 8000000.00
5 itemtype Painting
5 name Field of Poppies
5 artist Claude Monet
5 type Oil
5 whenmade 1873
5 width 25.625
6 height 28.750
6 initialprice 8000000.00
6 itemtype Painting
6 name The Starry Night
6 artist Vincent Van Gogh
6 type Oil
6 whenmade 1889
6 width 36.250
APPLY
The APPLY relational operator allows you to invoke a specified table-valued function once per each row of an outer table expression. You specify APPLY in the FROM clause of a query, similar to the way you use the JOIN relational operator. APPLY comes in two forms: CROSS APPLY and OUTER APPLY. With the APPLY operator, SQL Server 2005 Beta 2 allows you to refer to a table-valued function in a correlated subquery.
CROSS APPLY
CROSS APPLY invokes a table-valued function for each row in an outer table expression. You can refer to columns in the outer table as arguments to the table-valued function. CROSS APPLY returns a unified result set out of all of the results returned by the individual invocations of the table-valued function. If the table-valued function returns an empty set for a given outer row, that outer row is not returned in the result. For example, the following table-valued function accepts two integers as arguments and returns a table with one row, with the minimum and maximum values as columns:
CREATE FUNCTION dbo.fn_scalar_min_max(@p1 AS INT, @p2 AS INT) RETURNS TABLE
AS
RETURN
SELECT
CASE
WHEN @p1 < @p2 THEN @p1
WHEN @p2 < @p1 THEN @p2
ELSE COALESCE(@p1, @p2)
END AS mn,
CASE
WHEN @p1 > @p2 THEN @p1
WHEN @p2 > @p1 THEN @p2
ELSE COALESCE(@p1, @p2)
END AS mx
GO
SELECT * FROM fn_scalar_min_max(10, 20)
Here is the result set:
mn mx
----------- -----------
10 20
Given the following T1 table:
CREATE TABLE T1
(
col1 INT NULL,
col2 INT NULL
)
INSERT INTO T1 VALUES(10, 20)
INSERT INTO T1 VALUES(20, 10)
INSERT INTO T1 VALUES(NULL, 30)
INSERT INTO T1 VALUES(40, NULL)
INSERT INTO T1 VALUES(50, 50)
You want to invoke fn_scalar_min_max for each row in T1. You write a CROSS APPLY query as follows:
SELECT *
FROM T1 CROSS APPLY fn_scalar_min_max(col1, col2) AS M
Here is the result set:
col1 col2 mn mx
----------- ----------- ----------- -----------
10 20 10 20
20 10 10 20
NULL 30 30 30
40 NULL 40 40
50 50 50 50
If the table-valued function returns multiple rows for a certain outer row, the outer row is returned multiple times. Consider the Employees table used earlier in this paper in the Recursive Queries and Common Table Expressions section (Employees Organizational Chart scenario). In the same database, you also create the following Departments table:
CREATE TABLE Departments
(
deptid INT NOT NULL PRIMARY KEY,
deptname VARCHAR(25) NOT NULL,
deptmgrid INT NULL REFERENCES Employees
)
SET NOCOUNT ON
INSERT INTO Departments VALUES(1, 'HR', 2)
INSERT INTO Departments VALUES(2, 'Marketing', 7)
INSERT INTO Departments VALUES(3, 'Finance', 8)
INSERT INTO Departments VALUES(4, 'R&D', 9)
INSERT INTO Departments VALUES(5, 'Training', 4)
INSERT INTO Departments VALUES(6, 'Gardening', NULL)
Most departments have a manager ID that corresponds to an employee in the Employees table, but as in the case of the Gardening department, it is possible for a department to have no manager. Note that a manager in the Employees table does necessarily also manage a department. The following table-valued function accepts an employee ID as an argument and returns that employee and all of his or her subordinates in all levels:
CREATE FUNCTION dbo.fn_getsubtree(@empid AS INT) RETURNS @TREE TABLE
(
empid INT NOT NULL,
empname VARCHAR(25) NOT NULL,
mgrid INT NULL,
lvl INT NOT NULL
)
AS
BEGIN
WITH Employees_Subtree(empid, empname, mgrid, lvl)
AS
(
-- Anchor Member (AM)
SELECT empid, empname, mgrid, 0
FROM employees
WHERE empid = @empid
UNION all
-- Recursive Member (RM)
SELECT e.empid, e.empname, e.mgrid, es.lvl+1
FROM employees AS e
JOIN employees_subtree AS es
ON e.mgrid = es.empid
)
INSERT INTO @TREE
SELECT * FROM Employees_Subtree
RETURN
END
GO
To return all of the subordinates in all levels for the manager of each department, use the following query:
SELECT *
FROM Departments AS D
CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST
Here is the result set:
deptid deptname deptmgrid empid empname mgrid lvl
----------- ---------- ----------- ----------- ---------- ----------- ---
1 HR 2 2 Andrew 1 0
1 HR 2 5 Steven 2 1
1 HR 2 6 Michael 2 1
2 Marketing 7 7 Robert 3 0
2 Marketing 7 11 David 7 1
2 Marketing 7 12 Ron 7 1
2 Marketing 7 13 Dan 7 1
2 Marketing 7 14 James 11 2
3 Finance 8 8 Laura 3 0
4 R&D 9 9 Ann 3 0
5 Training 4 4 Margaret 1 0
5 Training 4 10 Ina 4 1
There are two things to notice here. First, each row from Departments is duplicated as many times as there are rows returned from fn_getsubtree for the department's manager. Second, the Gardening department does not appear in the result because fn_getsubtree returned an empty set for it.
Another practical use of the CROSS APPLY operator answers a common request: returning n rows for each group. For example, the following function returns the requested number of most recent orders for a given customer:
USE AdventureWorks
GO
CREATE FUNCTION fn_getnorders(@custid AS INT, @n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(@n) *
FROM Sales.SalesOrderHeader
WHERE CustomerID = @custid
ORDER BY OrderDate DESC
GO
Using the CROSS APPLY operator, you can get the two most recent orders for each customer by using the following simple query:
SELECT O.*
FROM Sales.Customer AS C
CROSS APPLY fn_getnorders(C.CustomerID, 2) AS O
For more information about the enhancement to TOP, see "TOP Enhancements" later in this paper.
OUTER APPLY
OUTER APPLY is very similar to CROSS APPLY, but it also returns rows from the outer table for which the table-valued function returned an empty set. Null values are returned as the column values that correspond to the columns of the table-valued function. For example, revise the query against the Departments table from the previous section to use OUTER APPLY instead of CROSS APPLY and notice the last row in the output:
SELECT *
FROM Departments AS D
OUTER APPLY fn_getsubtree(D.deptmgrid) AS ST
Here is the result set:
deptid deptname deptmgrid empid empname mgrid lvl
----------- ---------- ----------- ----------- ---------- ----------- ---
1 HR 2 2 Andrew 1 0
1 HR 2 5 Steven 2 1
1 HR 2 6 Michael 2 1
2 Marketing 7 7 Robert 3 0
2 Marketing 7 11 David 7 1
2 Marketing 7 12 Ron 7 1
2 Marketing 7 13 Dan 7 1
2 Marketing 7 14 James 11 2
3 Finance 8 8 Laura 3 0
4 R&D 9 9 Ann 3 0
5 Training 4 4 Margaret 1 0
5 Training 4 10 Ina 4 1
6 Gardening NULL NULL NULL NULL NULL
Table-Valued Functions in Correlated Subqueries
In SQL Server 2000, you cannot refer to table-valued functions within a correlated subquery. In conjunction with providing the APPLY relational operator, this restriction is removed in SQL Server 2005 Beta 2. Now, within a subquery, you can provide a table-valued function with columns from the outer query as arguments. For example, if you want to return only those departments whose manager has at least three employees, you can write the following query:
SELECT *
FROM Departments AS D
WHERE (SELECT COUNT(*)
FROM fn_getsubtree(D.deptmgrid)) >= 3
deptid deptname deptmgrid
----------- ------------------------- -----------
1 HR 2
2 Marketing 7