There are a couple of ways to Pivot your data in SQL Server. The first is by using PIVOT
, and the other is using what is known as a cross-tab. Personally I prefer the latter as it's more flexible and feel it's easier to understand. I'll explain in a later article why a cross-tab is more flexible, so I'll be covering both methods in this article.
In a addition to pivoting, you also have unpivoting. Again, you have two options here, using UNPIVOT
or using a VALUES
constructor. Again, I prefer the latter for the same reasons but I'll be covering both.
Pivoting
Although not always ideal, there are times when you want to pivot (or in Excel terms "Transpose") your data, so that the rows become columns. Pivoting is a form of aggregation; countless times I have seen people ask how they pivot without aggregating. Regardless of what people think (in that you are "flipping" the data), what doesn't change is that in SQL Server you need to use some kind of aggregate function in order to pivot your data. For the examples you'll see below I'll be using COUNT
, however, if you're pivoting strings the function you'll need will likely be MAX
(which is still an aggregate function).
Take some data that looks like this:
CustomerName |DateActive
-------------|----------
Steve |2019-12-01
Jane |2019-12-01
Trevor |2019-12-02
Jane |2019-12-03
Jeff |2019-12-03
Jane |2019-12-04
Trevor |2019-12-04
Steve |2019-12-05
You want to Pivot the data, so that the date is a row, and each person has a column, with a 1
for days they were active, and 0
if they were inactive. So it ends up looking something like this:
Date |Steve |Jane |Trevor |Jeff
-----------|------|-----|-------|----
2019-12-01 |1 |1 |0 |0
2019-12-02 |0 |0 |1 |0
2019-12-03 |0 |1 |0 |1
2019-12-04 |0 |0 |1 |0
2019-12-05 |1 |0 |0 |0
Sample data
For those wanting to replicate the scenario, here's some DDL and DML for you:
CREATE TABLE dbo.ToPivot (CustomerName varchar(6) NOT NULL,
DateActive date NOT NULL)
INSERT INTO dbo.ToPivot (CustomerName,DateActive)
VALUES('Steve','2019-12-01'),
('Jane','2019-12-01'),
('Trevor','2019-12-02'),
('Jane','2019-12-03'),
('Jeff','2019-12-03'),
('Jane','2019-12-04'),
('Trevor','2019-12-04'),
('Steve','2019-12-05');
Using PIVOT
Like I said at the start, I prefer a cross-tab over the code PIVOT
operator, however, you can pivot the above data with some T-SQL that looks like this:
SELECT DateActive AS [Date],
Steve,
Jane,
Trevor,
Jeff
FROM (SELECT CustomerName,
DateActive
FROM dbo.ToPivot) TP
PIVOT(COUNT(CustomerName)
FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;
This works, but the syntax does feel a little odd and clunky to me, therefore I understand why many (including myself) find it difficult to read. There are really 3 parts to the query. The first is in the middle, the subquery, which is the query against your table. In this case that's the query below:
SELECT CustomerName,
DateActive
FROM dbo.ToPivot
This simply gets the data from the table in its raw format, and that is the data we're going to be pivoting. The second "part" is the PIVOT
, which is at the bottom of the query. This starts with the column you want to aggregate, which will be the values in the new columns, and the values you're going to pivot, which will become your new columns. So for the above we have this:
PIVOT(COUNT(CustomerName)
FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;
This means we want to aggregate CustomerName
, by counting the times it appears, and pivot those into the columns Steve
, Jane
, Trevor
, and Jeff
. The exclusion of DateActive
means that this is effectively the column you are grouping by. If you're editting upper SELECT
, you'll notice that DateActive
can still be referenced, however, PIVOT
.
Using a cross-tab
The idea of a cross-tab is to use conditional aggregation with a CASE
expression. This means that you only include the rows you're pivoting in the conditional aggregate. For what we have here, then a cross-tab query would look like this:
SELECT DateActive AS [Date],
COUNT(CASE CustomerName WHEN 'Steve' THEN 1 END) AS Steve,
COUNT(CASE CustomerName WHEN 'Jane' THEN 1 END) AS Jane,
COUNT(CASE CustomerName WHEN 'Trevor' THEN 1 END) AS Trevor,
COUNT(CASE CustomerName WHEN 'Jeff' THEN 1 END) AS Jeff
FROM dbo.ToPivot
GROUP BY DateActive;
Personally, like I said, I find this a lot easier to read. There's no subquery, and the pivoting is done within the SELECT
at the start, not at the bottom of the query (as is the case with PIVOT
). Each CASE
expression only returns a value when the value of CustomerName
is correct, which here is the same name as the alias we give the column. As NULL
values are eliminated when using aggregate functions, only rows where the Customer's name is the one we want will be counted.
Unpivoting
Unpivoting is, unsurprisingly, more of less opposite of what we have before (there can be slight differences). Instead of having many rows, you have many columns that you want to convert to rows. I often find that unpivoting is really useful when you have denormalised data and need to fix it. For example you have some data like the below:
ID |Name |Email |Email2 |Email3
---|------|----------------------|-------------------------|--------------------------
1 |Steve |steve@ymail.com |stevesmith@work.org |stevetheboywonder@live.net
2 |Jane |jane.b@gmail.com |jane.bloggs@theoffice.uk |NULL
3 |Chris |the_big_boss@live.com |NULL |NULL
Really, a customer should have one row per email address, not multiple columns. This is actually a type of design many of us see often, and the question I often raise is "What happens when a customer has 4 Email addresses?" Of course, the answer is almost always "That'll never happen", and unsurprisingly people are very often proved wrong shortly afterwards. What the data should look like is something like this:
EmailID |UserID |EmailAddress
--------|-------|---------------------------
1 |1 |steve@ymail.com
2 |1 |stevesmith@work.org
3 |1 |stevetheboywonder@live.net
4 |2 |jane.b@gmail.com
5 |2 |jane.bloggs@theoffice.uk
6 |3 |Chris the_big_boss@live.com
I've intentionally omitted Name
, as they should be in a different table, and the 2 tables are related by the value of UserID
, therefore we have a many to one relationship. Including Name
in the above dataset would be a different type of denormalised data, so why fix one problem while creating another?
Sample Data
Again, here is some DDL and DML for the data for others:
CREATE TABLE dbo.ToUnpivot (ID int NOT NULL,
[Name] nvarchar(20) NOT NULL,
Email nvarchar(128) NULL,
Email2 nvarchar(128) NULL,
Email3 nvarchar(128) NULL);
INSERT INTO dbo.ToUnpivot (ID,
[Name],
Email,
Email2,
Email3)
VALUES(1,'Steve','steve@ymail.com','stevesmith@work.org','stevetheboywonder@live.net'),
(2,'Jane','jane.b@gmail.com','jane.bloggs@theoffice.uk',NULL),
(3,'Chris','the_big_boss@live.com',NULL,NULL);
Using UNPIVOT
The syntax for UNPIVOT
is very similar to PIVOT
, as be be seen with the solution below:
SELECT ROW_NUMBER() OVER (ORDER BY U.ID,U.EmailNo) AS EmailID,
U.ID AS UserID,
U.EmailAddress
FROM (SELECT ID,
[Name],
[Email],
[Email2],
[Email3]
FROM dbo.ToUnpivot) TU
UNPIVOT(EmailAddress FOR EmailNo
IN(Email,Email2,Email3)) U;
Again this starts with with the sub query in the middle (which I won't explain). Then we have the UNPIVOT
on the bottom. Here EmailAddress
is the name of the column we want to have the values in, and EmailNo
is where the name of column will be returned, which is an nvarchar(128)
(not a sysname
interestingly). As I want a new ID per Email Address, I use ROW_NUMBER
to number each one, ordered by the value of the User's ID and EmailNo
, and return the value of the the User ID and Email Address is separate column.
Again, I don't find this particularly intuitive. EmailAddress FOR EmailNo
just doesn't read well for me either, which (I feel) makes the code difficult to understand unless you're familiar with the syntax.
Using VALUES
The syntax for using VALUES
to unpivot your data isn't really that similar to the cross-tab, however, I do feel it has a similar idea "under the hood". Instead you use VALUES
in the FROM
to create new rows per column:
SELECT ROW_NUMBER() OVER (ORDER BY UP.ID, V.EmailNo) AS EmailID,
UP.ID AS UserID,
V.EmailAddress
FROM dbo.ToUnpivot UP
CROSS APPLY (VALUES(1,UP.Email),
(2,UP.Email2),
(3,UP.Email3))V(EmailNo,EmailAddress)
WHERE V.EmailAddress IS NOT NULL;
Here the dataset constructed using VALUES
has 2 columns. The first is a effective the Email Address Number, with the first column getting the number 1, the 2nd 2, and the 3rd 3. This could easily be something else, such as the column's name (like it was when using UNPIVOT
), however I felt it important to use an int
here. With UNPIVOT
the data type returned for EmailNo
was an nvarchar
, meaning that the ordering isn't the same as a numerical data type. Although not relevant to the sample data we had here, if you have a columnEmail10
, then ordering the data by EmailNo
would place Email10
before Email2
; as '2'
has a greater value than '10'
when working with string data types. This is just another reason why using the correct data type is so important. The second column declared in the the VALUES
construct is the value of the relevant columns.
Notice that here I include a WHERE
to eliminate any NULL
values. When using UNIVOT
it will automatically eliminate these, however, VALUES
will not, so you need to do so using the WHERE
. In the SELECT
you can also see that I have used the same technique with ROW_NUMBER
as I did before, to give each Email Address a unique ID.
Next Steps
The above is just an introduction to two different ways the (un)pivot your data using T-SQL. Although I prefer the cross-tab/VALUES
methods, that doesn't mean you should use them, however, in the next article I'll be performing more complex transformations, which are a lot harder to achieve when using the PIVOT
/UNPIVOT
operators.
Part of the Pivoting and Unpivoting Post Series.