Pandas Pivot

In this lesson, we are going over 6 different examples of how you can utilize pivot within python pandas.

Pivot allows you to reshape a dataframe and grab aggregate values quite fast in just one line of code. We will go through some east examples and then add on complexity as the lesson progresses.

If instead of an article you want to watch a YouTube video, we have one linked below on our channel that is based on the article.

Tutorial Prep

Before we jump into pivot, let’s set up everything we need for this tutorial.

Begin by brining in Pandas and Numpy. We will use Pandas for the dataframes. Numpy will be used for null values which we will explore much later in the lesson.Â

  import pandas as pd import numpy as np

To start we’re going to create a simple dataframe in python:

We will be looking at running data for famous races. The dataframe will have Year, Country, City and Runners.Â

  data = { 'Year': [2022, 2022, 2022, 2023, 2023, 2023, 2024, 2024, 2024, 2022, 2023, 2024, 2022, 2023, 2024, 2022, 2023, 2024, 2022, 2023, 2024], 'Country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'Germany', 'Germany', 'Germany', 'UK', 'UK', 'UK', 'Germany', 'Germany', 'Germany'], 'State': ['MA', 'NY', 'IL', 'MA', 'NY', 'IL', 'MA', 'NY', 'IL', 'England', 'England', 'England', 'Berlin', 'Berlin', 'Berlin', 'Scotland', 'Scotland', 'Scotland', 'Hamburg', 'Hamburg', 'Hamburg'], 'City': ['Boston', 'New York', 'Chicago', 'Boston', 'New York', 'Chicago', 'Boston', 'New York', 'Chicago', 'London', 'London', 'London', 'Berlin', 'Berlin', 'Berlin', 'Edinburgh', 'Edinburgh', 'Edinburgh', 'Hamburg', 'Hamburg', 'Hamburg'], 'Runners': [30000, 50000, 45000, 31000, 51000, 46000, 32000, 52000, 47000, 42000, 43000, 44000, 39000, 40000, 41000, 15000, 16000, 17000, 18000, 19000, 20000] }
  df = pd.DataFrame(data) df.head(20)

Example 1 - Beginner Pivot

Our first example is quite basic. We move the year to the index (was a column before). Then move the unique city values into columns. The numbers populated in the pivot table are the runners in each race.

  pivot_df = df.pivot(index='Year', columns='City', values='Runners') pivot_df.head(10)

Example 2 - Pivot with Magin Values

While the first pivot table was helpful, we want to now add in the totals for each row and column. This is quite easy. Change pivot to pivot_table and add in the parameters: margins and margins_name.

  pivot_with_totals = df.pivot_table(index='Year', columns='City', values='Runners', aggfunc='sum', margins=True, margins_name='Total') pivot_with_totals.head(10)

Example 3 - Pivot with Agg Functions

We can also use a pivot with aggregate functions. In this example we use aggfunc and pass in a list of what we want to see with the pivot. In this example we use sum, mean, max and min.

  multi_agg_pivot = df.pivot_table(index='Year', columns='Country', values='Runners', aggfunc=['sum', 'mean', 'max', 'min']) multi_agg_pivot.head(10)

Example 4 -

Next, lets look at how we can use a pivot with a multi index (hierarchical index). For this we are going to use a new dataframe based on age groups and different race types.

  data2 = { 'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024] * 4, 'Race Category': ['5K', '10K', 'Half Marathon', 'Marathon'] * 24, 'Age Group': ['18-29', '18-29', '18-29', '18-29', '30-39', '30-39', '30-39', '30-39', '40-49', '40-49', '40-49', '40-49', '50+', '50+', '50+', '50+'] * 6, 'Gender': ['Male'] * 48 + ['Female'] * 48, 'Participants': [1200, 1300, 1100, 900, 1400, 1500, 1300, 1100, 1000, 1100, 950, 800, 800, 900, 850, 700, 1300, 1400, 1200, 1000, 1500, 1600, 1400, 1200, 1100, 1200, 1050, 900, 900, 1000, 950, 800, 1400, 1500, 1300, 1100, 1600, 1700, 1500, 1300, 1200, 1300, 1150, 1000, 1000, 1100, 1050, 900] * 2 }
  df_race = pd.DataFrame(data2) df_race.head(10)

This time we use a list to pass in the index values. We want to use Year and Gender.

  multi_pivot_race = df_race.pivot_table(index=['Year', 'Gender'], columns=['Race Category', 'Age Group'], values='Participants', aggfunc='sum')

While this worked, the race distances are out of order. A 10k is 6.2 miles whereas a 5k is 3.1 We need to rearrange these.Â

  desired_order = ['5K', '10K', 'Half Marathon', 'Marathon'] multi_pivot_race = multi_pivot_race.copy() multi_pivot_race = multi_pivot_race.sort_index( axis=1, level=0, key=lambda x: [desired_order.index(i) if i in desired_order else len(desired_order) for i in x.get_level_values(0)] ) multi_pivot_race.head()

Example 5 -

In this example, we are going to take a look at a use case where we can fill in null values while doing a pivot.

While it is quite easy to fill in null values before or after the pivot, you can save yourself a line of code by using the optional parameter fill_value.

  data3 = { 'Runner': ['Ryan', 'Bob', 'Ryan', 'Bob', 'Charlie', 'Charlie', 'Ryan'], 'Race': ['Boston', 'Boston', 'NYC', 'NYC', 'Chicago', 'Boston', 'Chicago'], 'Time': [3.5, np.nan, 3.8, 4.2, 4.0, np.nan, 3.7] } df3 = pd.DataFrame(data3) df3.head()
  pivot_table_null_values = df3.pivot_table(index='Runner', columns='Race', values='Time', aggfunc='mean', fill_value="Did Not Run") pivot_table_null_values.head()

Example 6 - Multiple Values

If you have multiple values in a pivot table as the index, you cannot perform a basic pivot table. Instead you need to use some sort of aggregate.

  data_error = { 'Runner': ['Ryan', 'Ryan', 'Bob', 'Bob'], 'Race': ['Boston', 'Boston', 'NYC', 'NYC'], 'Time': [3.5, 3.6, 4.2, 4.3] # Multiple times for same race } df_error = pd.DataFrame(data_error) df_error.head()
  pivot_df_multi = df_error.pivot(index='Runner', columns='Race', values='Time')
  pivot_table_df = df_error.pivot_table(index='Runner', columns='Race', values='Time', aggfunc='mean') pivot_table_df.head(10)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *