Querying SQL directly to pull out data can be incredibly powerful, particularly when dealing with aggregated data. If the data has more than one dimension, the standard with SQL is to stack the dimensions into rows, which may be handy from a way to handle raw data but becomes harder when we want to display the data in a more intuitive tabular manner. Enter the pivot!
Round 1 - Static Pivots
Let’s say that we have some gaming stats that are stored in our database like so:
gameId | Date | Player | Game | WinFlag |
---|---|---|---|---|
1 | 2020-01-20 | Jack | Carcassonne | Y |
2 | 2020-01-20 | Kate | Carcassonne | N |
3 | 2020-01-21 | Alan | Citadels | Y |
If we return by day how many games each player won then the query would look something like this with the results being returned in the following format:
SELECT Player, Date, COUNT(gameId) AS WinCount FROM GameStats WHERE WinFlag = 'Y' GROUP BY Player, Date
Player | Date | WinCount |
---|---|---|
Jack | 2020-01-20 | 1 |
Kate | 2020-01-20 | 2 |
Kate | 2020-01-21 | 3 |
Alan | 2020-01-21 | 2 |
This shows each combination of our dimensions (Player and then Date) in rows by default. That’s fine, but not really the most intuitive grid of data to display. We’d be after something more like below where each row represents a date, each player has their own column and how many game’s they’ve won per day is the column value:
Date | Jack | Kate | Alan |
---|---|---|---|
2020-01-20 | 1 | 2 | 0 |
2020-01-21 | 0 | 3 | 2 |
We can modify our query as follows:
SELECT * FROM ( SELECT Player, Date, COUNT(gameId) AS WinCount FROM GameStats WHERE WinFlag = 'Y' GROUP BY Player, Date ) AS p PIVOT ( SUM(WinCount) FOR Player IN ([Jack], [Alan], [Kate]) ) AS pvt
So what have we done? Well firstly we’ve taken our previous query and wrapped it into an alias (p here) before passing it into our pivot operation which takes in 2 parameters. The first parameter is how you want to aggregate the data - as we’ve already got our data counted up, summing it here will produce the same result. The second is defining what our columns need to be - the individual names of our players.
So we’ve managed to leverage the power of the database to pivot our multi-dimensional date into a more user-friendly table. However there’s a slight hitch, we’ve had to hard-code the column values (the player names) what if we want new players to appear when we start recording their data? It’s not very practical to be constantly updating this query with new data. For that we’re going to have to get a bit more dynamic…
Round 2 - Dynamic Pivots
Building on the query above, we’ve now going to generate our column values from the data itself before passing it into the pivot operation.
DECLARE @ColumnName AS NVARCHAR(MAX) DECLARE @query AS NVARCHAR(MAX) --We'll need this later SELECT @ColumnName = ISNULL(@ColumnName + ',', '') +QUOTENAME(Player) FROM (SELECT DISTINCT Player FROM GameStats ) AS Player => [Alan],[Jack],[Kate]
I mean… that’s a lot to unpack there. We’re able to stuff a single variable
(@ColumnName
) with an array of strings by effectively looping through our distinct
player list and concatenating the results together. Not the most intuitive
syntax but let’s forgive SQL here and move on to our main query. Here we wrapped
our pivot query from above in one giant string so we can concatenate it with our
column results stored in @ColumnName
SET @query = 'SELECT * FROM ( SELECT Player, Date, COUNT(gameId) AS WinCount FROM GameStats WHERE WinFlag = ''Y'' GROUP BY Player, Date ) AS p PIVOT ( SUM(WinCount) FOR Player IN (' + @ColumnName + ') ) AS pvt'
Now let’s execute that query:
EXEC sp_executesql @query
You should have the same results but your query is far more adaptive to new data! Full query to get you up and running below:
DECLARE @ColumnName AS NVARCHAR(MAX) DECLARE @query AS NVARCHAR(MAX) DROP TABLE IF EXISTS ##GameStats CREATE TABLE ##GameStats( gameId INT, Date SMALLDATETIME, Player VARCHAR(200), Game VARCHAR(200), WinFlag CHAR(1) ) INSERT INTO ##GameStats VALUES (1, '20200120', 'Jack', 'Carcassonne', 'Y'), (2, '20200120', 'Kate', 'Carcassonne', 'N'), (2, '20200120', 'Kate', 'Citadels', 'Y'), (2, '20200120', 'Jack', 'Citadels', 'N'), (2, '20200120', 'Alan', 'Tsuro', 'N'), (2, '20200120', 'Kate', 'Tsuro', 'Y'), (3, '20200121', 'Alan', 'Citadels', 'Y'), (3, '20200121', 'Jack', 'Citadels', 'N'), (3, '20200121', 'Alan', 'Love Letter', 'Y'), (3, '20200121', 'Kate', 'Love Letter', 'N'), (3, '20200121', 'Kate', 'Las Vegas', 'Y'), (3, '20200121', 'Alan', 'Las Vegas', 'N'), (3, '20200121', 'Kate', 'Zombie Dice', 'Y'), (3, '20200121', 'Jack', 'Zombie Dice', 'N'), (3, '20200121', 'Kate', 'Splendor', 'Y'), (3, '20200121', 'Jack', 'Splendor', 'N'); SELECT @ColumnName = ISNULL(@ColumnName + ',', '') +QUOTENAME(Player) FROM (SELECT DISTINCT Player FROM ##GameStats ) AS Player SET @query = 'SELECT * FROM ( SELECT Player, Date, COUNT(gameId) AS WinCount FROM ##GameStats WHERE WinFlag = ''Y'' GROUP BY Player, Date ) AS p PIVOT ( SUM(WinCount) FOR Player IN (' + @ColumnName + ') ) AS pvt' EXEC sp_executesql @query