Pandas Index

An index within Python Pandas is a way to identify a specific row within a dataframe. In this lesson we will be going over string and integer indexes as well as multindexes.

If you want to watch a video based on the tutorial, it is linked down below.

Indexes vs Indices

Often you’ll hear both of these terms thrown around. They both deal with the plural form of index and are acceptable to use.Â

Setting up the Pandas DataFrame Index Tutorial

To start we’re going to create a simple dataframe in python:

The dataframe we are utilizing is based around runners, their age, and then the city and miles of races.

  import pandas as pd
  data = { 'Name': ['Ryan', 'Kilian', 'Steven', 'Jim'], 'Age': [25, 30, 22, 40], 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'], 'Miles': [100, 200, 26, 300] }

By default, if we don’t specify an index, they are integers. So let’s take a look at a few usecases.

  df = pd.DataFrame(data) df.head(10)

Example 1 - iloc

By using iloc, we can use integers to grab specific rows and columns.

In Python, indexes start at the value of 0 and go through n – 1 where n is the number of rows.

  df.iloc[1]

After the comma we can specify a certain column that we want to grab.

  df.iloc[1, 0]

If we want to grab multiple rows use another set of brackets.

  df.iloc[[1, 3]]

Another important concept when it comes to iloc is slicing. We can grab a bunch of rows at once. As a heads up through slicing works differently with iloc and loc.Â

In iloc we do not include the last part of the slice. loc we do include that. In this example, start:stop (not included in iloc)

  df.iloc[0:2]

If we want to grab a column and all its values, we need to use a colon for the first value of iloc. A colon represents all values since we don’t specify a start and stop.

  df.iloc[:,2]

Example 2 - Set Index

When you set an index, it replaces the integer values. In this example let’s look at setting the index Name.

  df.set_index('Name', inplace=True) df.head(10)

Example 3 - Change Index

We can also rename an axis. Let’s change name into runner. We pass in a dictionary as a parameter.

  df = df.rename_axis(index={'Name': 'Runner'}) df.head(10)

Example 4 - Index Values

Let’s take a look at an example where we can grab every index value. This gives us a list in which we can search values inside.

  index_values = df.index.values
  print(index_values)

Let’s see if we wanted to look if David is in the index. This will output the value of False.

  print('David' in index_values)

Now if we use the value Jim, we should get the value of true.

  print('Jim' in index_values)

Example 5 - loc

We went over iloc earlier, we used integers. Well, loc is the opposite in which it uses strings.

In our first example let’s look at grabbing rows that have index of Kilian.

  df.loc['Kilian']

When you use a comma, you can specify a row and a column.Â

  df.loc['Kilian', 'City']

If we use two brackets we can specify multiple rows to grab.

  df.loc[['Kilian', 'Ryan']]

Let’s look at using slicing again. This time the pattern looks at this start:stop (stop is included in the results)

  df.loc['Charlie', 'Age':'City']

We can again grab a full column. Pass in a colon, comma, and then the column name.

  df.loc[:,'City']

Example 6 - Sort Index

We can also sort the the index.Â

  df.sort_index(inplace=True) df.head(10)

We can also sort in the other direction. If we set ascending=False we sort in descending order.

  df.sort_index(ascending=False, inplace=True) df.head(10)

Example 7 - Reset Index

If we reset the dataframe, we move runner into a column and go back to integers as the index.

  df.reset_index(inplace=True) df.head(10)

Example 8 - multi level index

While every example in this article was based around a single level index, let’s look at expanding this out to two.Â

This is a topic that will also have an individual article. So if you want to take a look at multi level indexes in python pandas click here.

  data_races = { 'Race': ['Badwater', 'Barkley Marathons', 'Vero Beach Ultra', 'Forgotten Florida'], 'Year': [2020, 2021, 2020, 2021], 'Difficulty': [9.7, 9.8, 8.1, 6.1] } df2 = pd.DataFrame(data_races)
  df2 = df2.set_index(['Race', 'Year']) df2.head(10)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *