Pandas Rolling

April 7, 2025 Ryan Nolan No comments yet

In Python Pandas a rolling function allows you to find statistics over a period. This is done through setting a window size. Let’s look at 12 different examples.

If you want to watch our YouTube video based around the tutorial, it is embedded down below.

Tutorial Prep

Before we jump into this tutorial, let’s import in a few libraries and create a dataframe.

Start by importing in Pandas and NumPy

  import pandas as pd import numpy as np

Next we are going to build a Pandas dataframe. The data we are looking at is different concerts by Billy Strings.

  data = [ ["2025-02-21", "State Farm Arena", "Atlanta", "GA", 21000], ["2025-02-22", "State Farm Arena", "Atlanta", "GA", 21000], ["2025-02-28", "Bridgestone Arena", "Nashville", "TN", 19891], ["2025-03-01", "Bridgestone Arena", "Nashville", "TN", 19891], ["2025-04-03", "St. Augustine Amphitheatre", "St. Augustine", "FL", 4700], ["2025-04-04", "St. Augustine Amphitheatre", "St. Augustine", "FL", 4700], ["2025-04-05", "St. Augustine Amphitheatre", "St. Augustine", "FL", 4700], ["2025-04-09", "Yuengling Center", "Tampa", "FL", 10411], ["2025-04-11", "Enmarket Arena", "Savannah", "GA", 9500], ["2025-04-12", "Enmarket Arena", "Savannah", "GA", 9500], ["2025-04-15", "John Paul Jones Arena", "Charlottesville", "VA", 14593], ["2025-04-17", "Koka Booth Amphitheatre", "Cary", "NC", 7000], ["2025-04-18", "Koka Booth Amphitheatre", "Cary", "NC", 7000], ["2025-04-19", "Koka Booth Amphitheatre", "Cary", "NC", 7000], ["2025-05-13", "Talking Stick Resort Amphitheatre", "Phoenix", "AZ", 20000], ["2025-05-15", "North Island Credit Union Amphitheatre", "Chula Vista", "CA", 20500], ["2025-05-16", "Hollywood Bowl", "Los Angeles", "CA", 17500], ["2025-05-18", "Toyota Amphitheatre", "Wheatland", "CA", 18500], ["2025-05-20", "Ford Idaho Center Amphitheater", "Nampa", "ID", 10500], ["2025-05-22", "ONE Spokane Stadium", "Spokane", "WA", 12000], ["2025-05-24", "RV Inn Style Resorts Amphitheater", "Ridgefield", "WA", 18000], ["2025-05-25", "Gorge Amphitheatre", "George", "WA", 27500], ["2025-05-30", "Van Andel Arena", "Grand Rapids", "MI", 10834], ["2025-05-31", "Van Andel Arena", "Grand Rapids", "MI", 10834], ["2025-06-06", "Allstate Arena", "Rosemont", "IL", 18500], ["2025-06-07", "Allstate Arena", "Rosemont", "IL", 18500], ["2025-06-11", "T-Mobile Center", "Kansas City", "MO", 18972], ["2025-06-13", "Chaifetz Arena", "St. Louis", "MO", 10600], ["2025-06-14", "Chaifetz Arena", "St. Louis", "MO", 10600], ["2025-06-20", "Rupp Arena", "Lexington", "KY", 23500], ["2025-06-21", "Rupp Arena", "Lexington", "KY", 23500], ["2025-07-04", "Germania Insurance Amphitheater", "Austin", "TX", 14000], ["2025-07-06", "Dos Equis Pavilion", "Dallas", "TX", 20000], ["2025-07-09", "Walmart AMP", "Rogers", "AR", 9500], ["2025-07-11", "Ascend Amphitheater", "Nashville", "TN", 6800], ["2025-07-12", "Ascend Amphitheater", "Nashville", "TN", 6800], ["2025-07-14", "PNC Music Pavilion", "Charlotte", "NC", 19500], ["2025-07-16", "Coastal Credit Union Music Park", "Raleigh", "NC", 20000], ["2025-07-18", "Veterans United Home Loans Amphitheater", "Virginia Beach", "VA", 20000], ["2025-07-20", "Merriweather Post Pavilion", "Columbia", "MD", 19319], ["2025-07-23", "Saratoga Performing Arts Center", "Saratoga Springs", "NY", 25100], ["2025-07-25", "Bethel Woods Center for the Arts", "Bethel", "NY", 15000], ["2025-07-27", "Xfinity Theatre", "Hartford", "CT", 30000], ["2025-07-30", "Bank of New Hampshire Pavilion", "Gilford", "NH", 9000], ["2025-08-01", "Maine Savings Amphitheater", "Bangor", "ME", 15000], ["2025-08-03", "Leader Bank Pavilion", "Boston", "MA", 5000], ["2025-08-05", "Northwell Health at Jones Beach Theater", "Wantagh", "NY", 15000], ["2025-08-07", "PNC Bank Arts Center", "Holmdel", "NJ", 17500] ]

  df = pd.DataFrame(data, columns=["Date", "Venue", "City", "State", "Capacity"])

  df.head(10)

Example 1 - Rolling Average

To take a look at the average over the last 3, we set window = 3.

  df['rolling_mean_3'] = df['Capacity'].rolling(window=3).mean()

  df.head(10)

Example 2 - Rolling Sum

Another common metric to look at is the rolling sum

  df['rolling_sum'] = df['Capacity'].rolling(window=3).sum()

  df.head()

Example 3 - Rolling Min/Max

These will be the last statistics we look at, but you can also do min/max. There are lots of other to take a look at such a STD but we will not be covering them in this tutorial.

  df['rolling_min'] = df['Capacity'].rolling(window=3).sum() df['rolling_max'] = df['Capacity'].rolling(window=3).sum()

  df.head()

Example 4 - Rolling Window Different Columns

Let’s look at using a rolling window function with multiple columns at the same time.

Before we do that, let’s generate another column. This time it’s for tickets sold for each of the venues.

  np.random.seed(42)

  df['tickets_sold'] = (df['Capacity'] * np.random.uniform(0.9, 1.0, len(df))).astype(int)

  df.head()

  df[['Rolling_mean_Capacity_5', 'Rolling_mean_Tickets_5']] = df[['Capacity', 'tickets_sold']].rolling(window=5).mean()

  df[['Rolling_mean_Capacity_5', 'Rolling_mean_Tickets_5']].head()

Example 5 - Rolling Agg

We can also perform multiple calculations on the same column at one time. Let’s set a window to 5 and use mean, sum, and std.

  rolling_stats = df['Capacity'].rolling(window=5).agg(['mean', 'sum', 'std'])

  rolling_stats.head(10)

Example 6 - Apply

Apply is quite common to use within Pandas. We can also use it with rolling. Let’s look at using a lambda function inside and take the max – min.

  df['rolling_range'] = df['tickets_sold'].rolling(window=3).apply(lambda x: x.max() - x.min())

  df[['tickets_sold', 'rolling_range']].head(10)

Example 7

Utilizing a Groupby is another common usecase when working with rolling. Let’s groupby the venue and find the mean of tickets sold.

  df['Rolling_mean_venue'] = df.groupby('Venue')['tickets_sold'].rolling(window=2).mean().reset_index(level=0, drop=True)

  df[['Venue', 'Rolling_mean_venue']]

Example 8 - Min Periods

One issue with rolling and setting the window size is null values. Min Periods allows you to fill in some of the null values by starting the calculation early instead of waiting for the full window.

  df['rolling_avg_min_periods'] = df['Capacity'].rolling(window=5, min_periods=3).mean()

  df['rolling_avg_min_periods']

Example 9

You can also start building out more advanced rolling capabilities when utilizing shift. Shift allows you to look at data from rows above or below.

  df['rolling_shift'] = df['Capacity'].rolling(window=5).mean().shift(1) - df['Capacity']

  df['rolling_shift'].head(10)

Example 10

Besides min periods we can utilize a forward fill to help populate the missing null values.

  df['rolling_filled_values'] = df['Capacity'].ffill().rolling(window=5).mean()

  df['rolling_filled_values'].head()

Example 11

Another parameter that is used within rolling is centering our data.

  df['centered_rolling_avg'] = df['Capacity'].rolling(window=5, center=True).mean()

  df['centered_rolling_avg'].head(10)

Example 12

Our last example is based around using days as our measure in rolling. The earlier examples looked at row numbers/index.

Start off by converting the date into the index of the dataframe.

  df['Date'] = pd.to_datetime(df['Date'])

  df.set_index('Date', inplace=True)

Once the date index is set, pass in a day amount. For example 7D for one week.

  df['rolling_avg_7_days'] = df['Capacity'].rolling('7D').mean()

  df['rolling_avg_7_days']

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.