Python Pandas

Pandas MultiIndex

Working with structured data in Python often calls for more than just a flat table. When your dataset has multiple levels of information—like years and quarters, countries and cities, or products and categories—Pandas MultiIndex can be a powerful tool. It allows you to represent hierarchical relationships within your data, enabling advanced analysis, cleaner code, and more flexible reshaping.

But MultiIndex can be intimidating at first. That’s why this article walks you through 24 hands-on examples, starting from beginner-friendly basics like setting and accessing a MultiIndex, all the way to more complex topics like reshaping, slicing, merging, and visualizing multi-level data structures.

If you want to watch a video on our YouTube channel based around the tutorial it is linked below.

To start the multiindex lesson, we need to import pandas as pd

				
					import pandas as pd

Example 1

In this example, we create a Python dictionary “data_races, that stores race information, including race names, the year they occured, and a difficulty rating for each event

				
					data_races = {
    'Race': ['Badwater', 'Barkley Marathons', 'Vero Beach Ultra', 'Forgotten Florida', 'Badwater'],
    'Year': [2020, 2021, 2020, 2021, 2025],
    'Difficulty': [9.7, 9.8, 8.1, 6.1, 9.9]
}

Next, we convert the data_races dictionary into a pandas DataFrame named “df”.

				
					df = pd.DataFrame(data_races)

Then we set a multi-level index on the DataFrame using both “Race” and “Year”. This allows for more advanced selection and organization based on these two columns

				
					df = df.set_index(['Race', 'Year'])

df

Here, we create a seperate copy of the df DataFrame and store it in the df2, allowing changes to be made to df2 without affecting the original df.

				
					df2 = df.copy()

Here we perform similar operation but we store the df DataFrame in “df3”

				
					df3 = df.copy()

Example 2

This returns the index of the df DataFrame, which is now a multiindex consisting of Race and Year pairs for each row

				
					df.index

Example 3 Index Levels values

Next we retrieve all values from the first level of the multiindex. i.e the Race names in the df DataFrame

				
					df.index.get_level_values(0)

This retrieves all the values from the second level of the Multiindex i.e the Year values in the df DataFrame

				
					df.index.get_level_values(1)

Example 4 loc

In this example , we use the “loc” to select all rows in the df DataFrame where the Race level of the index is “Badwater”, returning entries for all corresponding years.

				
					df.loc['Badwater']         # returns all years for Badwater

Here we access the specific row in df where the Race is “Badwater” and the Year is 2020, returning the corresponding data for that entry

				
					df.loc[('Badwater', 2020)] # returns a specific entry

Example 5 iloc

Note: .iloc[] doesn’t care about the index labels—just the row numbers.

Here we return the first row of the DataFrame based on it’s position, which is “Badwater” in the year 2020

				
					df.iloc[0]  # First row (Badwater, 2020)

Example 6 rename index version 1

				
					df.index.names = ['Event', 'YearHeld']

This renames the MultiIndex levels in the DataFrame to 'Event' and 'YearHeld', making the index labels more descriptive.

Example 7 rename index version 2

df.rename_axis(index={‘Event’: ‘Race’, ‘YearHeld’: ‘Year’}, inplace=True)

				
					df.rename_axis(index={'Event': 'Race', 'YearHeld': 'Year'}, inplace=True)

Example 8 Swap Index Levels

Here, we swap the order of the MultiIndex levels in df, making Year the first level and Eventthe second, and stores the result in df_swapped

				
					df_swapped = df.swaplevel()

Example 9 Aggregation with Multilevel index

Here we group the DataFrame by the Race level of the Multiindex and calculate the mean of the numeric columns i.e Difficulty for each race

				
					df.groupby(level='Race').mean()

This groups the DataFrame by the Year level of the Multiindex and computes the average of numeric columns like “Difficulty” for each year.

				
					df.groupby(level='Year').mean()

Example 10 sort by multilevel index - asc

Here we sort the DataFrame based on the Race level of the Mu;ltiindex in alphabetical order, helping to organize the data by event name

				
					df.sort_index(level='Race')

Example 11 sort by multilevel index - desc

Here, we sort the DataFrame by the Year in descending order, showing the most recent years first

				
					df.sort_index(level='Year', ascending = False)

Example 12 sort by column

Here we sort the DataFrame by the Difficulty column in ascending order, from the easiest to the most difficult race.

				
					df.sort_values(by='Difficulty')

Example 13 Slicing Multiindex

This assigns a special object IndexSlice from pandas to the variable idx.

It’s used for slicing multi-level indexes (also known as a MultiIndex) in a clean and readable way

selects all values in the first level of the MultiIndex, and only 2020 in the second level.

				
					idx = pd.IndexSlice

				
					df.loc[idx[:, 2020], :]  # All races from 2020

Example 14 Filtering

Same as normal

Here we filter the DataFrame to show only the rows where the Difficulty rating is greater than 8

				
					df[df['Difficulty'] > 8]

Example 15 Update Values at

Here we update the Difficulty value to 10.0 for the entry where the Race is “Badwater” and the year is “2020”

				
					df.at[('Badwater', 2020), 'Difficulty'] = 10.0

Example 16 Add New Row

This adds a new row to the DataFrame for the race “Everglades Ultra” in the year
2020 with a Difficulty rating of 7.1

				
					df.loc[('Everglades Ultra', 2020), :] = [7.1]

Example 17 Cross-section: Query along a specific level

Here we retrieve all rows from the DataFrame where the Year level of the Multiindex is “2020”, filtering races held in that year

				
					# Get all races in 2020
df.xs(2020, level='Year')

This retrieves all rows where the Race is “Badwater” , returning data for all years that race was held

				
					#Get all years for "Badwater"
df.xs('Badwater', level='Race')

Example 18 reorder index levels

This reorders the MultiIndex so that Year comes first and Race second, then sorts the DataFrame by this new index order.

				
					df = df.reorder_levels(['Year', 'Race']).sort_index()

df

Example 19 - unstack- Moves index level(s) to columns

That moves the inner index level (YearHeld) into columns:

				
					df2.unstack(level='Race')

This changes the layout of the DataFrame by turning the Year from part of the index into column headers, showing the Difficulty for each race across different years.To start we’re going to create a simple dataframe in python:

				
					#Remember, starts at 0!
df2.unstack(level=1)

Example 20 stack

Here we create a dictionary called “data_runners” that stores runners names along with their finish times for two race types

				
					data_runners = {
    'Runner': ['Alice', 'Bob', 'Clara', 'Dave', 'Eva'],
    '50 miler': [7.5, 10.2, 8.9, 15.8, 6.3],
    '100 miler': [22.3, 35.6, 27.8, 39.1, 18.5]
}

Next we create a DataFrame from “data_runners” and set the “Runner” column as the index

				
					df4 = pd.DataFrame(data_runners).set_index('Runner')

Here we reshape the DataFrame by stacking the columns into a single column, creating a multiindex with Runner and race type

				
					df_stacked = df4.stack()

This gives a Series with a MultiIndex:

Example 21 - Reset Index / convert MultiIndex into columns

This basically unstacks the df4 DataFrame and also resets the index

				
					df_stacked_reset = df4.stack().reset_index()

This resets the index of df3, turning the index levels back into regular columns, and creates a new default integer index.

				
					df_reset_dataframe = df3.reset_index()

Example 22 MultiIndex columns

This creates a list called data_music containing two nested lists:
The first list holds album sales and their corresponding years.

The second list contains number of album released for the same periods.

				
					data_music = [
    [200000000, 1960, 120000000, 1962, 250000000, 1965],
    [8, 10, 17, 7, 26, 12]
]

This creates a MultiIndex for columns using all combinations of band names and their attributes, like “Album Sales” and “Starting Year”, to organize data in a more structured format.

				
					# create multi-level column index
bands = ['The Beatles', 'The Rolling Stones', 'Pink Floyd']
attributes = ['Album Sales', 'Starting Year']
multi_columns = pd.MultiIndex.from_product([bands, attributes], names=['Band', 'Attribute'])

This creates a DataFrame (df5) using the data_music values, sets custom row labels for the index, and applies the multi_columns MultiIndex to organize the columns by band and attribute.

				
					df5 = pd.DataFrame(data_music, index=['Worldwide Sales (est.)', 'Studio Albums'], columns=multi_columns)

Example 23 build multiindex from array

Here we create three arrays.

country: This holds country names

city: This lists corresponding cities

population: contains the population for each city

				
					# create arrays
country = ['Netherlands', 'Belgium', 'Switzerland', 'Switzerland', 'France', 'United Kingdom']
city = ['Amsterdam', 'Brussels', 'Geneva', 'Zermatt', 'Paris', 'London']
population = [821752, 185103, 203856, 5771, 2148271, 8982000]  # example populations

This combines the country and city lists into a single list of lists called index_array, which can be used to create a Multiindex for rows.

				
					# create array of arrays
index_array = [country, city]

This creates a Multiindex for rows using the country and city arrays, assigning “Country” and “City” as the index level names.

				
					# create multiindex from array
multi_index = pd.MultiIndex.from_arrays(index_array, names=['Country', 'City'])

Next we create a data frame “df6 using multi_index

				
					# create dataframe using multiindex
df6 = pd.DataFrame({'Population': population}, index=multi_index)

Example 24 build 3 level multi index

This defines lists representing baseball data

				
					# create data
years = [2022, 2022, 2022, 2023, 2023, 2023]
leagues = ['AL', 'AL', 'NL', 'AL', 'AL', 'NL']
teams = ['Yankees', 'Astros', 'Dodgers', 'Yankees', 'Astros', 'Dodgers']
wins = [99, 106, 111, 82, 90, 100]
losses = [63, 56, 51, 80, 72, 62]

This comines the years, leagues and teams lists into one list of lists called index_array, which will be used to build a multiindex for the DataFrame rows

				
					# create array of arrays
index_array = [years, leagues, teams]

This creates a Multiindex for rows using the yeahrs, leagues, and teams arrays, and labels the index levels as “Year”, “League”, and “Team”.

				
					# create multiindex
multi_index = pd.MultiIndex.from_arrays(index_array, names=['Year', 'League', 'Team'])

This creates a DataFrame “df7” with the wins and losses data , using the previously created multi_index as the row index to organize the data by year, league, and team

				
					# create dataframe
df7 = pd.DataFrame({'Wins': wins, 'Losses': losses}, index=multi_index)

Final Thoughts

That’s it for the Multiindex tutorial. Thanks for checking out the article! Make sure to read our other Pandas guides.

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.