Pandas MultiIndex
We are going to be lloking at Pandas Multiindex
to start with, we would import pandas as pd
import pandas as pd
Example 1
In this example, we create a Python dictionary “data_races, that stores race information, including race names, the year they occured, and a difficulty rating for each event
data_races = { 'Race': ['Badwater', 'Barkley Marathons', 'Vero Beach Ultra', 'Forgotten Florida', 'Badwater'], 'Year': [2020, 2021, 2020, 2021, 2025], 'Difficulty': [9.7, 9.8, 8.1, 6.1, 9.9] }
Next, we convert the data_races dictionary into a pandas DataFrame named “df”.
df = pd.DataFrame(data_races)
Then we set a multi-level index on the DataFrame using both “Race” and “Year”. This allows for more advanced selection and organization based on these two columns
df = df.set_index(['Race', 'Year'])
df

Here, we create a seperate copy of the df DataFrame and store it in the df2, allowing changes to be made to df2 without affecting the original df.
df2 = df.copy()
Here we perform similar operation but we store the df DataFrame in “df3”
df3 = df.copy()
Example 2
This returns the index of the df DataFrame, which is now a multiindex consisting of Race and Year pairs for each row
df.index

Example 3 Index Levels values
Next we retrieve all values from the first level of the multiindex. i.e the Race names in the df DataFrame
df.index.get_level_values(0)

This retrieves all the values from the second level of the Multiindex i.e the Year values in the df DataFrame
df.index.get_level_values(1)

Example 4 loc
In this example , we use the “loc” to select all rows in the df DataFrame where the Race level of the index is “Badwater”, returning entries for all corresponding years.
df.loc['Badwater'] # returns all years for Badwater

Here we access the specific row in df where the Race is “Badwater” and the Year is 2020, returning the corresponding data for that entry
df.loc[('Badwater', 2020)] # returns a specific entry

Example 5 iloc
Here we return the first row of the DataFrame based on it’s position, which is “Badwater” in the year 2020
df.iloc[0] # First row (Badwater, 2020)

Example 6 rename index version 1
df.index.names = ['Event', 'YearHeld']
This renames the MultiIndex levels in the DataFrame to 'Event'
and 'YearHeld'
, making the index labels more descriptive.

Example 7 rename index version 2
df.rename_axis(index={‘Event’: ‘Race’, ‘YearHeld’: ‘Year’}, inplace=True)
df.rename_axis(index={'Event': 'Race', 'YearHeld': 'Year'}, inplace=True)

Example 8 Swap Index Levels
Here, we swap the order of the MultiIndex levels in df, making Year the first level and Eventthe second, and stores the result in df_swapped
df_swapped = df.swaplevel()

Example 9 Aggregation with Multilevel index
Here we group the DataFrame by the Race level of the Multiindex and calculate the mean of the numeric columns i.e Difficulty for each race
df.groupby(level='Race').mean()

This groups the DataFrame by the Year level of the Multiindex and computes the average of numeric columns like “Difficulty” for each year.
df.groupby(level='Year').mean()

Example 10 sort by multilevel index - asc
Here we sort the DataFrame based on the Race level of the Mu;ltiindex in alphabetical order, helping to organize the data by event name
df.sort_index(level='Race')

Example 11 sort by multilevel index - desc
Here, we sort the DataFrame by the Year in descending order, showing the most recent years first
df.sort_index(level='Year', ascending = False)

Example 12 sort by column
Here we sort the DataFrame by the Difficulty column in ascending order, from the easiest to the most difficult race.
df.sort_values(by='Difficulty')

Example 13 Slicing Multiindex
Â
idx = pd.IndexSlice
df.loc[idx[:, 2020], :] # All races from 2020

Example 14 Filtering
Here we filter the DataFrame to show only the rows where the Difficulty rating is greater than 8
df[df['Difficulty'] > 8]

Example 15 Update Values at
Here we update the Difficulty value to 10.0 for the entry where the Race is “Badwater” and the year is “2020”
df.at[('Badwater', 2020), 'Difficulty'] = 10.0

Example 16 Add New Row
This adds a new row to the DataFrame for the race “Everglades Ultra” in the yearÂ
2020 with a Difficulty rating of 7.1
df.loc[('Everglades Ultra', 2020), :] = [7.1]

Example 17 Cross-section: Query along a specific level
Here we retrieve all rows from the DataFrame where the Year level of the Multiindex is “2020”, filtering races held in that year
# Get all races in 2020 df.xs(2020, level='Year')

This retrieves all rows where the Race is “Badwater” , returning data for all years that race was held
#Get all years for "Badwater" df.xs('Badwater', level='Race')

Example 18 reorder index levels
This reorders the MultiIndex so that Year comes first and Race second, then sorts the DataFrame by this new index order.
df = df.reorder_levels(['Year', 'Race']).sort_index()
df

Example 19 - unstack- Moves index level(s) to columns
df2.unstack(level='Race')

This changes the layout of the DataFrame by turning the Year from part of the index into column headers, showing the Difficulty for each race across different years.To start we’re going to create a simple dataframe in python:
#Remember, starts at 0! df2.unstack(level=1)

Example 20 stack
Here we create a dictionary called “data_runners” that stores runners names along with their finish times for two race types
data_runners = { 'Runner': ['Alice', 'Bob', 'Clara', 'Dave', 'Eva'], '50 miler': [7.5, 10.2, 8.9, 15.8, 6.3], '100 miler': [22.3, 35.6, 27.8, 39.1, 18.5] }
Next we create a DataFrame from “data_runners” and set the “Runner” column as the indexÂ
df4 = pd.DataFrame(data_runners).set_index('Runner')

Here we reshape the DataFrame by stacking the columns into a single column, creating a multiindex with Runner and race type
df_stacked = df4.stack()

Example 21 - Reset Index / convert MultiIndex into columns
This basically unstacks the df4 DataFrame and also resets the index
df_stacked_reset = df4.stack().reset_index()

This resets the index of df3, turning the index levels  back into regular columns, and creates a new default integer index.
df_reset_dataframe = df3.reset_index()

Example 22 MultiIndex columns
This creates a list called data_music containing two nested lists:
The first list holds album sales and their corresponding years.
The second list contains number of album released for the same periods.
data_music = [ [200000000, 1960, 120000000, 1962, 250000000, 1965], [8, 10, 17, 7, 26, 12] ]
This creates a MultiIndex for columns using all combinations of band names and their attributes, like “Album Sales” and “Starting Year”, to organize data in a more structured format.
# create multi-level column index bands = ['The Beatles', 'The Rolling Stones', 'Pink Floyd'] attributes = ['Album Sales', 'Starting Year'] multi_columns = pd.MultiIndex.from_product([bands, attributes], names=['Band', 'Attribute'])
This creates a DataFrame (df5
) using the data_music
values, sets custom row labels for the index, and applies the multi_columns
MultiIndex to organize the columns by band and attribute.
df5 = pd.DataFrame(data_music, index=['Worldwide Sales (est.)', 'Studio Albums'], columns=multi_columns)

Example 23 build multiindex from array
Here we create three arrays.
country: This holds country names
city: This lists corresponding cities
population: contains the population for each city
# create arrays country = ['Netherlands', 'Belgium', 'Switzerland', 'Switzerland', 'France', 'United Kingdom'] city = ['Amsterdam', 'Brussels', 'Geneva', 'Zermatt', 'Paris', 'London'] population = [821752, 185103, 203856, 5771, 2148271, 8982000] # example populations
This combines the country
and city lists into a single list of lists called index_array, which can be used to create a Multiindex for rows.
# create array of arrays index_array = [country, city]
This creates a Multiindex for rows using the country and city arrays, assigning “Country” and “City” as the index level names.
# create multiindex from array multi_index = pd.MultiIndex.from_arrays(index_array, names=['Country', 'City'])
Next we create a data frame “df6 using multi_index
# create dataframe using multiindex df6 = pd.DataFrame({'Population': population}, index=multi_index)

Example 24 build 3 level multi index
This defines lists representing baseball data
# create data years = [2022, 2022, 2022, 2023, 2023, 2023] leagues = ['AL', 'AL', 'NL', 'AL', 'AL', 'NL'] teams = ['Yankees', 'Astros', 'Dodgers', 'Yankees', 'Astros', 'Dodgers'] wins = [99, 106, 111, 82, 90, 100] losses = [63, 56, 51, 80, 72, 62]
This comines the years, leagues and teams lists into one list of lists called index_array, which will be used to build a multiindex for the DataFrame rows
# create array of arrays index_array = [years, leagues, teams]
This creates a Multiindex for rows using the yeahrs, leagues, and teams arrays, and labels the index levels as “Year”, “League”, and “Team”.Â
# create multiindex multi_index = pd.MultiIndex.from_arrays(index_array, names=['Year', 'League', 'Team'])
This creates a DataFrame “df7” with the wins and losses data , using the previously created multi_index as the row index to organize the data by year, league, and team
# create dataframe df7 = pd.DataFrame({'Wins': wins, 'Losses': losses}, index=multi_index)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.