Filter Pandas Dataframe with multiple conditions
Let’s take a look at a few different ways to filter and select rows in a pandas dataframe based on multiple conditions.
If you want to watch a YouTube video based on this tutorial, it is embedded below.
To start we’re going to create a simple dataframe in python:
import pandas as pd df = pd.DataFrame({'User_id': ['234', '738', '632', '777', '834', '123'], 'Age': [11, 45, 56, 32, 26, 31], 'Hourly': [100, 93, 78, 120, 64, 115], 'FT_Team': ['Steelers', 'Seahawks', 'Falcons', 'Falcons', 'Patriots', 'Steelers']}) df

pd.dataframe created
6 functions to filter dataframe rows on multiple conditions
Let’s look at six different ways to filter rows in a dataframe based on multiple conditions:
What conditions do we want to filter on?
Get all rows having hourly wage greater than or equal to 100 and age < 60 and favorite football team name starts with ‘S’.
Using Loc to Filter With Multiple Conditions
The loc function in pandas can be used to access groups of rows or columns by label.
Add each condition you want to be included in the filtered result and concatenate them with the & operator. You’ll see our code sample will return a pd.dataframe of our filtered rows. Don’t forget to include “import pandas as pd” at the top!
df.loc[(df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S')), ['User_id', 'FT_Team']]

.loc on dataframe
Using The Numpy Function np.where To Filter With Multiple Conditions
The Numpy python library interacts great with dataframes, especially when dealing with indexing.
Let’s use the numpy.where function with a few conditions to produce the same result we wanted above.
import numpy as np idx = np.where((df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S'))) idx

applying np.where to our df
Let’s take the returned row index list that matches the combined conditions into loc
df.loc[idx]

output from .loc
Using The Pandas Function Query With Multiple Conditions
We can use the pandas dataframe function query() and boolean expressions to get our filtered rows back. The main benefit of the query function is it uses numexpr which improves efficiency, especially in larger dataframes.
df.query('Hourly >=100 & Age < 60 & FT_Team.str.startswith("S").values')

print query output
Filtering On Multiple Conditions Using Pandas Boolean Indexing
This is a good method to go with if you want to remove columns as well, as you can exclude any dataframe columns you don’t want in the last statement. Boolean indexing is also very efficient as it does not make a copy of the data.
df[(df['Hourly']>=100) & (df['Age'] < 60) & (df['FT_Team'].str.startswith('S'))][['User_id', 'Age', 'Hourly']]


Output has all three columns
Pandas Eval Function With Multiple Conditions
We can use the pandas dataframe function eval inside a df[] tag to filter on these conditions. Note the .values() at the end.
df[df.eval('Hourly >=100 & Age < 60 & FT_Team.str.startswith("S").values')]
Looking To Filter Where Only 1 Condition Has To Be True?
This numpy array and functools implementation allows you to use disjunction to filter your dataframe. Here’s an example:
import numpy as np import functools def disjunction(*conditions): return functools.reduce(np.logical_or, conditions) c_1 = df.Hourly >= 100 c_2 = df.Age < 60 c_3 = df.FT_Team.str.startswith('s') data_filtered = df[disjunction(c_1, c_2, c_3)]
each column is compared to the conditions Note: you still need “import pandas as pd”
Dataframe Comparison Tools For Multiple Condition Filtering
Post pandas .22 update, there’s multiple functions you can use as well to compare column values to conditions. Make sure your dtype is the same as what you want to compare to.
Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.