Pandas Query
By using query, you can simply filter down a dataframe in a more readable format.
In this lesson we will go over how to use this with numbers, strings, variables, and more. If you want to watch a YouTube video, the one this lesson is based on is down below.
Let’s start by importing in three libraries which we will be using in this video: Pandas, Numpy and Seaborn.
import pandas as pd
import numpy as np
import seaborn as sns
To start we’re going to create a dataframe from the Taxis seaborn dataset.
df = sns.load_dataset('taxis')
df.head()

The taxis dataset includes columns for: pickup, dropoff, passengers, distance, fare, tip, tolls, total, color, payment, pickup_zone, dropoff_zone, pickup_burough, and dropoff_burough.
Pandas Query Number Filters
Let’s first dive into a few examples of how we can filter based off of numbers. To do this, we will look at the passengers column and use a few different operations.
Equal
df.query('passengers == 3')

Greater than
df.query('passengers > 3')

Less than
df.query('passengers < 3')

Greater than or equal to
df.query('passengers >= 3')

Less than or equal to
df.query('passengers <= 3')

Not equal to
df.query('passengers != 3')

Pandas Query Strings
Since query already uses single quotes, strings are needed to be placed in double quotes
df.query('pickup_borough == "Manhattan"')

Pandas Query And
To filter based on two or more conditions that must be met, use and. We can simply place and between the two conditions. In this example it looks at the pickup_burough needing to be Manhattan and passengers less than or equal to 3.
df.query('pickup_borough == "Manhattan" and passengers <= 3')

Pandas Query Or
To filter based on two or more conditions that one must be met, use or. We can simply place or between the two conditions. In this example it looks at the pickup_burough needing to be Manhattan or passengers less than or equal to 3.
df.query('pickup_borough == "Manhattan" or passengers <= 3')

Pandas Query In
Instead of using or with multiple buroughs, we can use in. The code below looks at a pickup_burough in Manhattan or Brooklyn
df.query('pickup_borough in ("Manhattan", "Brooklyn")')

Pandas Query Not In
By using not in, we can do the opposite. This code looks at all boroughs not in Manhattan or Brooklyn
df.query('pickup_borough not in ("Manhattan", "Brooklyn")')

Pandas Query Variables
Let’s continue the Not in example from above, but this time by using a list. Create the list variable boroughs as shown below.
boroughs = ['Manhattan', 'Brooklyn']

To filter based on a variable, you will have to use @ and then the variable name.
df.query('pickup_borough not in @boroughs')

Let’s take a look at one more example, this time a simpler datatype of an integer.
passengers = 5
The same @ applies whether it’s a list or an integer.
df.query('passengers == @passengers')

Math Operations
We can multiply, add, subtract, or divide a column by a number and compare to another column or number.
df.query('distance * 3 < 10')

We can also multiply two different columns. Lets look at distance and passengers.
df.query('distance * passengers < 10')

String Functions
Within query you can also utilize string function. In this example we look at the color column and filter for anything that contains yel
df.query('color.str.contains("yel")')

Dates
We can also use date filtering within the query
df.query('dropoff.dt.month ==2')

Null Values
As well as looking at null values
df.query('dropoff_zone.isna() == 1')

Grab a Series or Column after query
And you can also specify a specific column or series after the query.
df.query('dropoff.dt.month ==2')['payment']

df.query('dropoff.dt.month ==2')[['total','payment']]
Inplace

Replaces current dataframe with new one filtered
df2 = sns.load_dataset('taxis')
df2.query('color == "yellow"', inplace=True)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.