There are a couple of ways to Pivot your data in SQL Server. The first is by using
PIVOT, and the other is using what is known as a cross-tab. Personally I prefer the latter as it’s more flexible and feel it’s easier to understand. I’ll explain in a later article why a cross-tab is more flexible, so I’ll be covering both methods in this article.
In a addition to pivoting, you also have unpivoting. Again, you have two options here, using
UNPIVOT or using a
VALUES constructor. Again, I prefer the latter for the same reasons but I’ll be covering both.
Although not always ideal, there are times when you want to pivot (or in Excel terms “Transpose”) your data, so that the rows become columns. Pivoting is a form of aggregation; countless times I have seen people ask how they pivot without aggregating. Regardless of what people think (in that you are “flipping” the data), what doesn’t change is that in SQL Server you need to use some kind of aggregate function in order to pivot your data. For the examples you’ll see below I’ll be using
COUNT, however, if you’re pivoting strings the function you’ll need will likely be
MAX (which is still an aggregate function).
Take some data that looks like this:
CustomerName |DateActive -------------|---------- Steve |2019-12-01 Jane |2019-12-01 Trevor |2019-12-02 Jane |2019-12-03 Jeff |2019-12-03 Jane |2019-12-04 Trevor |2019-12-04 Steve |2019-12-05
You want to Pivot the data, so that the date is a row, and each person has a column, with a
1 for days they were active, and
0 if they were inactive. So it ends up looking something like this:
Date |Steve |Jane |Trevor |Jeff -----------|------|-----|-------|---- 2019-12-01 |1 |1 |0 |0 2019-12-02 |0 |0 |1 |0 2019-12-03 |0 |1 |0 |1 2019-12-04 |0 |0 |1 |0 2019-12-05 |1 |0 |0 |0
For those wanting to replicate the scenario, here’s some DDL and DML for you:
CREATE TABLE dbo.ToPivot (CustomerName varchar(6) NOT NULL, DateActive date NOT NULL) INSERT INTO dbo.ToPivot (CustomerName,DateActive) VALUES('Steve','2019-12-01'), ('Jane','2019-12-01'), ('Trevor','2019-12-02'), ('Jane','2019-12-03'), ('Jeff','2019-12-03'), ('Jane','2019-12-04'), ('Trevor','2019-12-04'), ('Steve','2019-12-05');
Like I said at the start, I prefer a cross-tab over the code
PIVOT operator, however, you can pivot the above data with some T-SQL that looks like this:
SELECT DateActive AS [Date], Steve, Jane, Trevor, Jeff FROM (SELECT CustomerName, DateActive FROM dbo.ToPivot) TP PIVOT(COUNT(CustomerName) FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;
This works, but the syntax does feel a little odd and clunky to me, therefore I understand why many (including myself) find it difficult to read. There are really 3 parts to the query. The first is in the middle, the subquery, which is the query against your table. In this case that’s the query below:
SELECT CustomerName, DateActive FROM dbo.ToPivot
This simply gets the data from the table in its raw format, and that is the data we’re going to be pivoting. The second “part” is the
PIVOT, which is at the bottom of the query. This starts with the column you want to aggregate, which will be the values in the new columns, and the values you’re going to pivot, which will become your new columns. So for the above we have this:
PIVOT(COUNT(CustomerName) FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;
This means we want to aggregate
CustomerName, by counting the times it appears, and pivot those into the columns
Jeff. The exclusion of
DateActive means that this is effectively the column you are grouping by. If you’re editting upper
SELECT, you’ll notice that
DateActive can still be referenced, however,
Using a cross-tab
The idea of a cross-tab is to use conditional aggregation with a
CASE expression. This means that you only include the rows you’re pivoting in the conditional aggregate. For what we have here, then a cross-tab query would look like this:
SELECT DateActive AS [Date], COUNT(CASE CustomerName WHEN 'Steve' THEN 1 END) AS Steve, COUNT(CASE CustomerName WHEN 'Jane' THEN 1 END) AS Jane, COUNT(CASE CustomerName WHEN 'Trevor' THEN 1 END) AS Trevor, COUNT(CASE CustomerName WHEN 'Jeff' THEN 1 END) AS Jeff FROM dbo.ToPivot GROUP BY DateActive;
Personally, like I said, I find this a lot easier to read. There’s no subquery, and the pivoting is done within the
SELECT at the start, not at the bottom of the query (as is the case with
CASE expression only returns a value when the value of
CustomerName is correct, which here is the same name as the alias we give the column. As
NULL values are eliminated when using aggregate functions, only rows where the Customer’s name is the one we want will be counted.
Unpivoting is, unsurprisingly, more of less opposite of what we have before (there can be slight differences). Instead of having many rows, you have many columns that you want to convert to rows. I often find that unpivoting is really useful when you have denormalised data and need to fix it. For example you have some data like the below:
ID |Name |Email |Email2 |Email3 ---|------|----------------------|-------------------------|-------------------------- 1 |Steve |firstname.lastname@example.org |email@example.com |firstname.lastname@example.org 2 |Jane |email@example.com |firstname.lastname@example.org |NULL 3 |Chris |email@example.com |NULL |NULL
Really, a customer should have one row per email address, not multiple columns. This is actually a type of design many of us see often, and the question I often raise is “What happens when a customer has 4 Email addresses?” Of course, the answer is almost always “That’ll never happen”, and unsurprisingly people are very often proved wrong shortly afterwards. What the data should look like is something like this:
EmailID |UserID |EmailAddress --------|-------|--------------------------- 1 |1 |firstname.lastname@example.org 2 |1 |email@example.com 3 |1 |firstname.lastname@example.org 4 |2 |email@example.com 5 |2 |firstname.lastname@example.org 6 |3 |Chris email@example.com
I’ve intentionally omitted
Name, as they should be in a different table, and the 2 tables are related by the value of
UserID, therefore we have a many to one relationship. Including
Name in the above dataset would be a different type of denormalised data, so why fix one problem while creating another?
Again, here is some DDL and DML for the data for others:
CREATE TABLE dbo.ToUnpivot (ID int NOT NULL, [Name] nvarchar(20) NOT NULL, Email nvarchar(128) NULL, Email2 nvarchar(128) NULL, Email3 nvarchar(128) NULL); INSERT INTO dbo.ToUnpivot (ID, [Name], Email, Email2, Email3) VALUES(1,'Steve','firstname.lastname@example.org','email@example.com','firstname.lastname@example.org'), (2,'Jane','email@example.com','firstname.lastname@example.org',NULL), (3,'Chris','email@example.com',NULL,NULL);
The syntax for
UNPIVOT is very similar to
PIVOT, as be be seen with the solution below:
SELECT ROW_NUMBER() OVER (ORDER BY U.ID,U.EmailNo) AS EmailID, U.ID AS UserID, U.EmailAddress FROM (SELECT ID, [Name], [Email], [Email2], [Email3] FROM dbo.ToUnpivot) TU UNPIVOT(EmailAddress FOR EmailNo IN(Email,Email2,Email3)) U;
Again this starts with with the sub query in the middle (which I won’t explain). Then we have the
UNPIVOT on the bottom. Here
EmailAddress is the name of the column we want to have the values in, and
EmailNo is where the name of column will be returned, which is an
nvarchar(128) (not a
sysname interestingly). As I want a new ID per Email Address, I use
ROW_NUMBER to number each one, ordered by the value of the User’s ID and
EmailNo, and return the value of the the User ID and Email Address is separate column.
Again, I don’t find this particularly intuitive.
EmailAddress FOR EmailNo just doesn’t read well for me either, which (I feel) makes the code difficult to understand unless you’re familiar with the syntax.
The syntax for using
VALUES to unpivot your data isn’t really that similar to the cross-tab, however, I do feel it has a similar idea “under the hood”. Instead you use
VALUES in the
FROM to create new rows per column:
SELECT ROW_NUMBER() OVER (ORDER BY UP.ID, V.EmailNo) AS EmailID, UP.ID AS UserID, V.EmailAddress FROM dbo.ToUnpivot UP CROSS APPLY (VALUES(1,UP.Email), (2,UP.Email2), (3,UP.Email3))V(EmailNo,EmailAddress) WHERE V.EmailAddress IS NOT NULL;
Here the dataset constructed using
VALUES has 2 columns. The first is a effective the Email Address Number, with the first column getting the number 1, the 2nd 2, and the 3rd 3. This could easily be something else, such as the column’s name (like it was when using
UNPIVOT), however I felt it important to use an
int here. With
UNPIVOT the data type returned for
EmailNo was an
nvarchar, meaning that the ordering isn’t the same as a numerical data type. Although not relevant to the sample data we had here, if you have a column
Email10, then ordering the data by
EmailNo would place
'2' has a greater value than
'10' when working with string data types. This is just another reason why using the correct data type is so important. The second column declared in the the
VALUES construct is the value of the relevant columns.
Notice that here I include a
WHERE to eliminate any
NULL values. When using
UNIVOT it will automatically eliminate these, however,
VALUES will not, so you need to do so using the
WHERE. In the
SELECT you can also see that I have used the same technique with
ROW_NUMBER as I did before, to give each Email Address a unique ID.
The above is just an introduction to two different ways the (un)pivot your data using T-SQL. Although I prefer the cross-tab/
VALUES methods, that doesn’t mean you should use them, however, in the next article I’ll be performing more complex transformations, which are a lot harder to achieve when using the
Part of the Pivoting and Unpivoting Post Series.