SQL Server 2005 has lots of new tricks up its sleeve. During our user group meeting this past Wednesday night there was a question about how to use one such enhancement: the PIVOT statement. You can think of this like a two dimensional GROUP BY. It’s the same sort of idea as Excel’s pivot tables, or the Matrix control from Reporting Services. In this post we’ll walk through the process of converting a simple query with JOIN and GROUP BY to use PIVOT.
The sample code acts on the Customers, Orders, and Employees tables in the Northwind database. If you don’t have a copy of Northwind on your instance of SQL 2005, just run the script found in this .ZIP file.
So to start with, here’s a quick query that uses a JOIN to find which salespeople sold product to which customer. One row for each sale will be shown:
SELECT e.firstName,c.CompanyName
FROM Employees e
JOIN Orders o ON e.employeeID=o.employeeID
JOIN Customers c ON c.customerID=o.customerID
ORDER BY e.firstName,c.CompanyName
Compelling, but we see that there's many repeated entries in the list. We'd like to count up how many sales are at the intersection of each distinct customer and salesperson. To do this we'll add in a COUNT function and GROUP BY clause.
SELECT e.firstName,c.CompanyName, COUNT(o.orderID)
FROM Employees e
JOIN Orders o ON e.employeeID=o.employeeID
JOIN Customers c ON c.customerID=o.customerID
GROUP BY e.firstName,c.CompanyName
ORDER BY e.firstName,c.CompanyName
Now the real fun begins as we transition this into a query that uses PIVOT. Because we’re grouping by not one but two columns, this query is a good candidate for PIVOT. We’ll start simple with a query that doesn’t JOIN anything, but just references the Orders table. We’ll choose to have Employee data put into columns since there are only 9 employees. Much more manageable. In this case the numbers 1 through 9 refer to the ID of employees:
SELECT * FROM Orders
PIVOT (COUNT(employeeID) FOR employeeID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) p
In the result set notice that the specific columns returned come from the result of the PIVOT clause and not directly from the Orders table! And that PIVOT clause brings up every column from the Orders table except for employeeID, which gets weeded out since it is used in the aggregate function COUNT. If it helps to visualize what’s going on here, it’s as if we had said “GROUP BY *” if such a thing existed. Another interesting thing about PIVOT is that there can be only one aggregate function and column referenced. In this case it’s COUNT(employeeID).
To limit the columns returned to just what we need to show and aggregate, we’ll have to use a subquery. A subquery like this is necessary in almost every PIVOT query you would ever want to run. In this case we want to see a count of orders based on customerID and employeeID, so the subquery will include those columns.
SELECT * FROM (SELECT customerID,employeeID FROM Orders) o
PIVOT (COUNT(employeeID) FOR employeeID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) p
The aliases “o” and “p” are essential!
Although we don’t use it anywhere else in the query, the alias “o” on the subquery (or really any alias at all) is required. The reason is that an unnamed table-valued parameter cannot be used as part of the PIVOT clause. So “o” gets referenced internally by PIVOT, and is never exposed anywhere else. If we want to specify a column in any way in the rest of the query such as in the column list or when JOINing to another table, we would instead use what PIVOT spits out, which is table-valued and given the alias “p”. Even if we don’t need to reference a column emitted from PIVOT (as seen in the above query), some kind of alias is still required.
So now let’s update the column list to show friendlier names for the employeeID columns. Unfortunately each column you want to show this way needs to be hard-coded:
SELECT
customerID,
[1] as [Nancy],
[2] as [Andrew],
[3] as [Janet],
[4] as [Margaret],
[5] as [Steven],
[6] as [Michael],
[7] as [Robert],
[8] as [Laura],
[9] as [Anne]
FROM (SELECT customerID,employeeID FROM Orders) o
PIVOT (COUNT(employeeID) FOR employeeID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) p
Our final trick will be to show friendly names for the customers by JOINing in that table, and referencing the companyName column. We’ll also do an ORDER BY with this column:
SELECT c.companyName,
[1] as [Nancy],
[2] as [Andrew],
[3] as [Janet],
[4] as [Margaret],
[5] as [Steven],
[6] as [Michael],
[7] as [Robert],
[8] as [Laura],
[9] as [Anne]
FROM (SELECT customerID,employeeID FROM Orders) o
PIVOT (COUNT(employeeID) FOR employeeID IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) p
JOIN Customers c ON p.customerID=c.customerID
ORDER BY c.CompanyName
Again note that in the JOIN we have to use “p.cusomterID” rathar than “o.customerID” since the alias “o” is only made available internally in PIVOT.
Hopefully you enjoyed this tour through the new PIVOT statement in SQL 2005. But if not, hey, at least you got this info for free!