Pivoting and Unpivoting: Part 1 – An introduction

There are a couple of ways to Pivot your data in SQL Server. The first is by using PIVOT, and the other is using what is known as a cross-tab. Personally I prefer the latter as it's more flexible and feel it's easier to understand. I'll explain in a later article why a cross-tab is more flexible, so I'll be covering both methods in this article.

In a addition to pivoting, you also have unpivoting. Again, you have two options here, using UNPIVOT or using a VALUES constructor. Again, I prefer the latter for the same reasons but I'll be covering both.

Pivoting

Although not always ideal, there are times when you want to pivot (or in Excel terms "Transpose") your data, so that the rows become columns. Pivoting is a form of aggregation; countless times I have seen people ask how they pivot without aggregating. Regardless of what people think (in that you are "flipping" the data), what doesn't change is that in SQL Server you need to use some kind of aggregate function in order to pivot your data. For the examples you'll see below I'll be using COUNT, however, if you're pivoting strings the function you'll need will likely be MAX (which is still an aggregate function).

Take some data that looks like this:

CustomerName |DateActive
-------------|----------
Steve        |2019-12-01
Jane         |2019-12-01
Trevor       |2019-12-02
Jane         |2019-12-03
Jeff         |2019-12-03
Jane         |2019-12-04
Trevor       |2019-12-04
Steve        |2019-12-05

You want to Pivot the data, so that the date is a row, and each person has a column, with a 1 for days they were active, and 0 if they were inactive. So it ends up looking something like this:

Date       |Steve |Jane |Trevor |Jeff
-----------|------|-----|-------|----
2019-12-01 |1     |1    |0      |0
2019-12-02 |0     |0    |1      |0
2019-12-03 |0     |1    |0      |1
2019-12-04 |0     |0    |1      |0
2019-12-05 |1     |0    |0      |0

Sample data

For those wanting to replicate the scenario, here's some DDL and DML for you:

CREATE TABLE dbo.ToPivot (CustomerName varchar(6) NOT NULL,
                          DateActive date NOT NULL)

INSERT INTO dbo.ToPivot (CustomerName,DateActive)
VALUES('Steve','2019-12-01'),
      ('Jane','2019-12-01'),
      ('Trevor','2019-12-02'),
      ('Jane','2019-12-03'),
      ('Jeff','2019-12-03'),
      ('Jane','2019-12-04'),
      ('Trevor','2019-12-04'),
      ('Steve','2019-12-05');

Using PIVOT

Like I said at the start, I prefer a cross-tab over the code PIVOT operator, however, you can pivot the above data with some T-SQL that looks like this:

SELECT DateActive AS [Date],
       Steve,
       Jane,
       Trevor,
       Jeff
FROM (SELECT CustomerName,
             DateActive
      FROM dbo.ToPivot) TP
PIVOT(COUNT(CustomerName)
      FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;

This works, but the syntax does feel a little odd and clunky to me, therefore I understand why many (including myself) find it difficult to read. There are really 3 parts to the query. The first is in the middle, the subquery, which is the query against your table. In this case that's the query below:

SELECT CustomerName,
       DateActive
FROM dbo.ToPivot

This simply gets the data from the table in its raw format, and that is the data we're going to be pivoting. The second "part" is the PIVOT, which is at the bottom of the query. This starts with the column you want to aggregate, which will be the values in the new columns, and the values you're going to pivot, which will become your new columns. So for the above we have this:

PIVOT(COUNT(CustomerName)
      FOR CustomerName IN (Steve,Jane,Trevor,Jeff)) P;

This means we want to aggregate CustomerName, by counting the times it appears, and pivot those into the columns Steve, Jane, Trevor, and Jeff. The exclusion of DateActive means that this is effectively the column you are grouping by. If you're editting upper SELECT, you'll notice that DateActive can still be referenced, however, CustomerName cannot, as it doesn't exist in the scope of after the PIVOT.

Using a cross-tab

The idea of a cross-tab is to use conditional aggregation with a CASE expression. This means that you only include the rows you're pivoting in the conditional aggregate. For what we have here, then a cross-tab query would look like this:

SELECT DateActive AS [Date],
       COUNT(CASE CustomerName WHEN 'Steve' THEN 1 END) AS Steve,
       COUNT(CASE CustomerName WHEN 'Jane' THEN 1 END) AS Jane,
       COUNT(CASE CustomerName WHEN 'Trevor' THEN 1 END) AS Trevor,
       COUNT(CASE CustomerName WHEN 'Jeff' THEN 1 END) AS Jeff
FROM dbo.ToPivot
GROUP BY DateActive;

Personally, like I said, I find this a lot easier to read. There's no subquery, and the pivoting is done within the SELECT at the start, not at the bottom of the query (as is the case with PIVOT). Each CASE expression only returns a value when the value of CustomerName is correct, which here is the same name as the alias we give the column. As NULL values are eliminated when using aggregate functions, only rows where the Customer's name is the one we want will be counted.

Unpivoting

Unpivoting is, unsurprisingly, more of less opposite of what we have before (there can be slight differences). Instead of having many rows, you have many columns that you want to convert to rows. I often find that unpivoting is really useful when you have denormalised data and need to fix it. For example you have some data like the below:

ID |Name  |Email                 |Email2                   |Email3
---|------|----------------------|-------------------------|--------------------------
1  |Steve |steve@ymail.com       |stevesmith@work.org      |stevetheboywonder@live.net
2  |Jane  |jane.b@gmail.com      |jane.bloggs@theoffice.uk |NULL
3  |Chris |the_big_boss@live.com |NULL                     |NULL

Really, a customer should have one row per email address, not multiple columns. This is actually a type of design many of us see often, and the question I often raise is "What happens when a customer has 4 Email addresses?" Of course, the answer is almost always "That'll never happen", and unsurprisingly people are very often proved wrong shortly afterwards. What the data should look like is something like this:

EmailID |UserID |EmailAddress
--------|-------|---------------------------
1       |1      |steve@ymail.com
2       |1      |stevesmith@work.org
3       |1      |stevetheboywonder@live.net
4       |2      |jane.b@gmail.com
5       |2      |jane.bloggs@theoffice.uk
6       |3      |Chris the_big_boss@live.com

I've intentionally omitted Name, as they should be in a different table, and the 2 tables are related by the value of UserID, therefore we have a many to one relationship. Including Name in the above dataset would be a different type of denormalised data, so why fix one problem while creating another?

Sample Data

Again, here is some DDL and DML for the data for others:

CREATE TABLE dbo.ToUnpivot (ID int NOT NULL,
                            [Name] nvarchar(20) NOT NULL,
                            Email nvarchar(128) NULL,
                            Email2 nvarchar(128) NULL,
                            Email3 nvarchar(128) NULL);


INSERT INTO dbo.ToUnpivot (ID,
                          [Name],
                          Email,
                          Email2,
                          Email3)
VALUES(1,'Steve','steve@ymail.com','stevesmith@work.org','stevetheboywonder@live.net'),
      (2,'Jane','jane.b@gmail.com','jane.bloggs@theoffice.uk',NULL),
      (3,'Chris','the_big_boss@live.com',NULL,NULL);

Using UNPIVOT

The syntax for UNPIVOT is very similar to PIVOT, as be be seen with the solution below:

SELECT ROW_NUMBER() OVER (ORDER BY U.ID,U.EmailNo) AS EmailID,
       U.ID AS UserID,
       U.EmailAddress
FROM (SELECT ID,
             [Name],
             [Email],
             [Email2],
             [Email3]
       FROM dbo.ToUnpivot) TU
UNPIVOT(EmailAddress FOR EmailNo
         IN(Email,Email2,Email3)) U;

Again this starts with with the sub query in the middle (which I won't explain). Then we have the UNPIVOT on the bottom. Here EmailAddress is the name of the column we want to have the values in, and EmailNo is where the name of column will be returned, which is an nvarchar(128) (not a sysname interestingly). As I want a new ID per Email Address, I use ROW_NUMBER to number each one, ordered by the value of the User's ID and EmailNo, and return the value of the the User ID and Email Address is separate column.

Again, I don't find this particularly intuitive. EmailAddress FOR EmailNo just doesn't read well for me either, which (I feel) makes the code difficult to understand unless you're familiar with the syntax.

Using VALUES

The syntax for using VALUES to unpivot your data isn't really that similar to the cross-tab, however, I do feel it has a similar idea "under the hood". Instead you use VALUES in the FROM to create new rows per column:

SELECT ROW_NUMBER() OVER (ORDER BY UP.ID, V.EmailNo) AS EmailID,
       UP.ID AS UserID,
       V.EmailAddress
FROM dbo.ToUnpivot UP
     CROSS APPLY (VALUES(1,UP.Email),
                        (2,UP.Email2),
                        (3,UP.Email3))V(EmailNo,EmailAddress)
WHERE V.EmailAddress IS NOT NULL;

Here the dataset constructed using VALUES has 2 columns. The first is a effective the Email Address Number, with the first column getting the number 1, the 2nd 2, and the 3rd 3. This could easily be something else, such as the column's name (like it was when using UNPIVOT), however I felt it important to use an int here. With UNPIVOT the data type returned for EmailNo was an nvarchar, meaning that the ordering isn't the same as a numerical data type. Although not relevant to the sample data we had here, if you have a columnEmail10, then ordering the data by EmailNo would place Email10 before Email2; as '2' has a greater value than '10' when working with string data types. This is just another reason why using the correct data type is so important. The second column declared in the the VALUES construct is the value of the relevant columns.

Notice that here I include a WHERE to eliminate any NULL values. When using UNIVOT it will automatically eliminate these, however, VALUES will not, so you need to do so using the WHERE. In the SELECT you can also see that I have used the same technique with ROW_NUMBER as I did before, to give each Email Address a unique ID.

Next Steps

The above is just an introduction to two different ways the (un)pivot your data using T-SQL. Although I prefer the cross-tab/VALUES methods, that doesn't mean you should use them, however, in the next article I'll be performing more complex transformations, which are a lot harder to achieve when using the PIVOT/UNPIVOT operators.

Part of the Pivoting and Unpivoting Post Series.

Leave a Reply

Your email address will not be published. Required fields are marked *