Python Pandas

Pandas diff

By utilizing diff in Python Pandas we can find the difference between different rows and columns. In this article we will go over 9 different examples of utilizing it in different capacities.

If you would like to watch a YouTube video based around the written tutorial, it is embedded below. We also have other Pandas videos on the channel!

Before we jump into the tutorial, make sure to import in both Pandas and NumPy.

				
					import pandas as pd
import numpy as np

For the first half of the tutorial, we will be utilizing this dataframe based around YouTube views on different videos.

				
					
df_YouTube = pd.DataFrame({
    'YouTube Views': [125, 800, 335, 400, 1500]
})

Example 1 - Default periods=1

Let’s look at our first example. In this one we will be using .diff() to find the difference between consecutive rows. We can pass in the optional parameter of periods=1, but by default it looks at the subtraction of the two consecutive rows.

The first row (index 0) will always have a NaN (null value). This is due to there being no value ahead of it. Later in the tutorial we will go over a few different approaches on how to fill in that value.

To find the Views change for the other rows, we take the current row and subtract the prior index. For example in the index of 1: 800 – 125 = 675. In the index of 2: 335 – 800 = -465. Sometimes we only want the positive difference, this will be shown in a later example.

				
					df_YouTube['Views_Change'] = df_YouTube['YouTube Views'].diff()

Example 2 - Periods 2

Now it’s time to pass in the periods parameter. In this example we will use 2 which now looks at a 2 row difference. Due to this change, the first two rows (index 0 and 1) have null values.

For the index of 2: 335 – 125 = 210

For the index of 3: 400 – 800 = -400

				
					df_YouTube['Views_Change_2_Periods'] = df_YouTube['YouTube Views'].diff(periods=2)

Example 3 - Abs difference

As mentioned in example 1, sometimes we only want to see the positive difference. This is accomplished by using .abs() which gives us the absolute value. The absolute value will always be positive.

				
					df_YouTube['Views_Absolute_Difference'] = df_YouTube['YouTube Views'].diff().abs()

Example 4 - Diff multiple

In this example we are going to look at using diff on multiple columns to find the difference in rows. We will create a new basic dataframe based around sports card sales of Donald Bradman and Nolan Ryan.

				
					monthly_card_sales = pd.DataFrame({
    'Donald Bradman': [28, 46, 33],
    'Nolan Ryan': [511, 702, 611]
})

Since both columns are integers, we can simply use .diff().

				
					monthly_card_sales.diff()

Personally, I prefer to add new columns when finding the diff() so let’s do that.

Unlike the earlier examples when we created one column, with two we need double brackets.

				
					monthly_card_sales[['Don_Bradman_diff', 'Nolan_Ryan_diff']] = monthly_card_sales.diff()

Example 5 - Difference across columns

It’s time to switch to finding the difference now between columns. We will be creating a new dataframe that looks at the number of merchants a company signs on between Q1 and Q3 of the years 2023-2025.

				
					df_merchants = pd.DataFrame({
    'Q1': [182, 270, 330],
    'Q2': [211, 220, 380],
    'Q3': [250, 230, 390]
}, index=[2023, 2024, 2025])
df_merchants.index.name = 'Year'

To change us finding the difference between rows to columns, we have to switch the axis. By default the axis is 0 which means we are finding the row difference.

Since we are going to find the column difference, pass in axis=1. The results are posted down below.

				
					df_merchants.diff(axis=1)

Example 6 -TIme series analysis with date index

One of the most popular usecases for .diff() is when working with time series data.

In fact, I believe in all the time series videos and articles posted, we have used diff in every single one, so it’s an important skill to pickup early on.

In this example we will look at finding the difference in temperatures daily in Zermatt.

				
					dates = pd.date_range(start='2025-04-19', periods=7, freq='D')
temps = [30, 32, 31, 35, 36, 34, 33]
df_temps = pd.DataFrame({
    'Date': dates,
    'Temperature': temps
})
df_temps.set_index('Date', inplace=True)
df_temps['Temp_Change'] = df_temps['Temperature'].diff()

Example 7 - Dealing with null values prior to diff

While I personally prefer filling in null values before we use .diff(), we technically do not have to.

We will create a new dataframe with two null values for this next example. It details trains and their capacity from Paris to Geneva.

				
					data = {
    'Time': ['08:00', '08:15', '08:30', '08:45', '09:00', '09:15'],
    'Passengers': [120, 125, np.nan, 130, 128, np.nan]
}
nan_df = pd.DataFrame(data)

For this example, let’s fill in every missing value with 100 before we use .diff(). You’ll see that diff acts as if there is a 100 value for the nulls in Passengers, but it does not impact the passengers column.

				
					nan_df['diff'] = nan_df['Passengers'].fillna(100).diff()

Null values though will not stop a diff() calculation. The issue though is one null value in a column used for diff will lead to 1-2 null values in the diff() calculation depending on it’s location. It’ll impact the current row and the row below it.

				
					nan_df['diff_not_filled_na'] = nan_df['Passengers'].diff()

Example 8 - Different ways to fill in the first value

Thoughout the tutorial, we have brought up the first value being null many times. Let’s now look at 3 common approaches to fill it.

The first approach is bfill(). This takes the value from the row below and populates it in the row above.

				
					df_temps['Filled_bfill'] = df_temps['Temp_Change'].bfill()

We can use fillna() and pass in a float or integer value. In this example I use 0.

				
					df_temps['Filled_zero'] = df_temps['Temp_Change'].fillna(0)

We can also make fillna() dynamic. Let’s first find the mean for the temp_change column.

After finding the mean, we pass that into fillna().

				
					mean_change = df_temps['Temp_Change'].mean()
df_temps['Filled_mean'] = df_temps['Temp_Change'].fillna(mean_change)

All 3 approaches used are in the resulting dataframe down below.

Example 9 - Groupby

Another common way of using .diff() is through a groupby. A perfect example of this is comparing running times. You wouldn’t want to use a 5k time to compare to a marathon, the time difference will be a few hours! Let’s group by the event and find the difference.

				
					data = {
    'Date': pd.date_range(start='2025-01-01', periods=12, freq='ME'),
    'Event': ['5K', '10K', 'Half', 'Marathon'] * 3,
    'Time': [25, 55, 110, 240, 24, 54, 108, 238, 23, 52, 107, 237]
}
df_running = pd.DataFrame(data)
df_running.sort_values(by=['Event', 'Date'], inplace=True)
df_running.reset_index(drop=True, inplace=True)

The code below groups by the event and finds the difference in time.

				
					df_running['Time_Change'] = df_running.groupby('Event')['Time'].diff()

Final Thoughts

Thank you for checking out this video on Pandas Diff. If you want to learn more amount Pandas, check out our other articles on the website.

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Pandas diff

Table of Contents

Example 1 - Default periods=1

Example 2 - Periods 2

Example 3 - Abs difference

Example 4 - Diff multiple

Example 5 - Difference across columns

Example 6 -TIme series analysis with date index

Example 7 - Dealing with null values prior to diff

Example 8 - Different ways to fill in the first value

Example 9 - Groupby

Final Thoughts

Join 1,000+ AI Automation Builders

Ryan Nolan

Important Links

LinkedIn

Social Media

Keep Learning

pandas create dataframe

Python Pandas Data Cleaning

Pandas Columns

Pandas Resample

Python Pandas JSON