Pivot query results in PostgreSQL using crosstab

Kiptoo Korir
Dev Genius
Published in
5 min readNov 5, 2021

--

A laptop with a code editor open
Photo by Caspar Camille Rubin on Unsplash

If you have used Microsoft Excel or SQL Server you might have come across the pivot function, which allows you to easily transform your data such that you can conduct comparisons and see patterns. PostgreSQL possesses the same functionality in the crosstab function.

The two functions essentially change the orientation of data that would normally be listed vertically to appear horizontally. This tutorial will look into how to make use of this utility.

Table of contents

  1. Prerequisites
  2. Test scenario
  3. Setup the database
  4. Understanding the crosstab function
  5. Applying the crosstab function
  6. Conclusion

Prerequisites

To follow through the article, you will need a basic understanding of SQL and more so PostgreSQL.

Test scenario

The database will focus on three entities: stores, products and the sales made on the products. There are three stores and three products, with each store able to sell any of the products. Even though the sales table does not contain the created_at attribute, the assumption is that the records we have detail the sales that occurred over a single week.

The aim of the tutorial is to be able to turn this output:

Into the following output:

Setup the database

To get started, open up your PostgreSQL through whichever preferred client and create a new database.

CREATE DATABASE crosstab_tutorial;

With the database created, setup your tables. As aforementioned, there are three relations: a stores table, products table and a sales table.

With the tables created, run the query to insert the dummy data.

Understanding the crosstab function

To be able to utilize the crosstab function, the tablefunc module needs to be enabled once per database, which can be done using the command

CREATE EXTENSION IF NOT EXISTS tablefunc;

The crosstab function exists in two variations, in a simple form that takes in one input parameter and the second more expressive one takes in two input parameters.

In the simple form, crosstab is limited to exactly three columns in its output. As such it treats all values in a group alike, inserting each value into the first available column. However, if you have some null entries in your values and you want the value columns to correspond to specific categories of data, it will not work as desired.

crosstab(text sql)

This tutorial will focus on the more expressive form which takes in two input parameters

crosstab(text source_sql, text category_sql)

The two parameter variation offers a solution to the null entries issue suffered by the single parameter variation. It handles the issue by taking in an explicit list of the categories corresponding to the output columns. This explicit list is the second parameter.

The source_sql produces the source set of data. This data must include a column that will function as the row_name . This column must be first. The data must have two additional columns: the category column and the value column. The category column and the value column are always the last two.

In our test case, the row name column will be the store names, the category column will be the product names and the value column will be the sum of the units sold.

The source_sql may have extra columns, but the above three basic columns are required and in the order shown above. Extra columns will come in between the row_name column and the category column, like so:

The category_sql produces a list of our categories in the same order applied to the source_sql. Therefore if the category_sql is ordered in a specific way, the same order condition should be applied to the category column in the source_sql.

The category_sql query must produce only one column and at least one row or else an error will be generated. The result must include only distinct values, duplicate values will result in an error.

The crosstab function returns result of the type setof as such so the actual names and types of the output columns must be defined explicitly in the FROM clause and have to correspond to the types of the columns in the source_sql. The FROM clause must define the proper number of output columns and their proper data types.

If the source_sql query's result has n columns, the first n-2 of them must match up with the first n-2 columns of the output. The remaining output columns must have the type of the last column of the source_sql query's result, and there must be exactly as many of them as there are rows in the category_sql query's result.

Therefore, the output of the crosstab function should have the row name columns and any extra columns. On top of these, there should be a column for each distinct category produced by the category_sql.

Each group of rows in the source_sql query result with the same row_name value will be condensed into one row in the output result.

It is therefore important that the source_sql always include order by 1 clause so that same row_names are grouped together.

Applying the crosstab function

First we will need to enable the tablefunc module as described in the section above, using the command:

CREATE EXTENSION IF NOT EXISTS tablefunc;

Let’s look at the source_sql which will provide the result set that we need to transform. The statement that we will use will be:

When the query is run, it will produce:

As seen in the table above, the store_name column is our row_name, product_name corresponds to the category column and total_units corresponds to the value column.

As for the category_sql we will use the product names and we will obtain them using the following query:

Which will produce the categories as shown below:

With both our source sql and category sql, we can now apply the crosstab function to our source result, through the following query

This will transform our data into the format shown below:

The output value columns are filled with the value fields from rows having matching category values in the source data set. For instance, in the source dataset, Beatty Group sold 470 units of wine as per row 4 in the table, this will appear under the Beatty Group row and under the wine column.

Output columns whose matching category is not present in any input row of the group are filled with nulls.

As mentioned above, the source_sql should always specify ORDER BY 1 to ensure that values with the same row_name are grouped together. However, ordering of the categories within a group is not important.

As shown in the crosstab query, the order specified in the FROM clause should match the order produced by the category_sql.

Conclusion

Crosstab serves an important function in day to day operations and once understood, it is quite easy to use.

Find out more about crosstab function from PostgresSQl docs https://www.postgresql.org/docs/12/tablefunc.html

PS: The dummy data in this tutorial was generated using https://www.mockaroo.com/

--

--

Web Developer | Information Geek | Occasional Poet - I talk about software development and life in general.