Filter Pandas Dataframe with multiple conditions

Let’s take a look at a few different ways to filter and select rows in a pandas dataframe based on multiple conditions.

If you want to watch a YouTube video based on this tutorial, it is embedded below.

To start we’re going to create a simple dataframe in python:

  import pandas as pd df = pd.DataFrame({'User_id': ['234', '738', '632', '777', '834', '123'], 'Age': [11, 45, 56, 32, 26, 31], 'Hourly': [100, 93, 78, 120, 64, 115], 'FT_Team': ['Steelers', 'Seahawks', 'Falcons', 'Falcons', 'Patriots', 'Steelers']}) df

pd.dataframe created

6 functions to filter dataframe rows on multiple conditions

Let’s look at six different ways to filter rows in a dataframe based on multiple conditions:

What conditions do we want to filter on?

Get all rows having hourly wage greater than or equal to 100 and age < 60 and favorite football team name starts with ‘S’.

Using Loc to Filter With Multiple Conditions

The loc function in pandas can be used to access groups of rows or columns by label.

Add each condition you want to be included in the filtered result and concatenate them with the & operator. You’ll see our code sample will return a pd.dataframe of our filtered rows. Don’t forget to include “import pandas as pd” at the top!

  df.loc[(df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S')), ['User_id', 'FT_Team']]

.loc on dataframe

Using The Numpy Function np.where To Filter With Multiple Conditions

The Numpy python library interacts great with dataframes, especially when dealing with indexing.

Let’s use the numpy.where function with a few conditions to produce the same result we wanted above.

  import numpy as np idx = np.where((df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S'))) idx

applying np.where to our df

Let’s take the returned row index list that matches the combined conditions into loc

  df.loc[idx]

output from .loc

Using The Pandas Function Query With Multiple Conditions

We can use the pandas dataframe function query() and boolean expressions to get our filtered rows back. The main benefit of the query function is it uses numexpr which improves efficiency, especially in larger dataframes.

  df.query('Hourly >=100 & Age < 60 & FT_Team.str.startswith("S").values')

print query output

Filtering On Multiple Conditions Using Pandas Boolean Indexing

This is a good method to go with if you want to remove columns as well, as you can exclude any dataframe columns you don’t want in the last statement. Boolean indexing is also very efficient as it does not make a copy of the data.

  df[(df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S'))][['User_id', 'Age', 'Hourly']]

Output has all three columns

Pandas Eval Function With Multiple Conditions

We can use the pandas dataframe function eval inside a df[] tag to filter on these conditions. Note the .values() at the end.

  df[df.eval('Hourly >=100 & Age < 60 & FT_Team.str.startswith("S").values')]

Looking To Filter Where Only 1 Condition Has To Be True?

This numpy array and functools implementation allows you to use disjunction to filter your dataframe. Here’s an example:

  import numpy as np import functools def disjunction(*conditions): return functools.reduce(np.logical_or, conditions) c_1 = df.Hourly >= 100 c_2 = df.Age < 60 c_3 = df.FT_Team.str.startswith('s') data_filtered = df[disjunction(c_1, c_2, c_3)]

each column is compared to the conditions Note: you still need “import pandas as pd”

Dataframe Comparison Tools For Multiple Condition Filtering

Post pandas .22 update, there’s multiple functions you can use as well to compare column values to conditions. Make sure your dtype is the same as what you want to compare to.

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *