Pandas Where
The where() function in Pandas is used to replace values in a DataFrame or Series where a condition is not met.Â
It is used to check a data frame for one or more conditions and return the result.
To start with we import pandas and numpy
To start we’re going to import two essential Python libraries.
Pandas
numpy
import pandas as pd import numpy as np
Next we create a panadas DataFrame named df with a single column, “Hourly_SALARY”, containing salary values as strings and one missing value as nan
df = pd.DataFrame({ 'Hourly_Salary': ['500.00', '10000.00', '200.00', '20.00', np.nan] })
Then we convert the “Hourly_Salary” column from strings to numeric values, allowing for mathematical operations and handling of missing data(NaN)
df['Hourly_Salary'] = pd.to_numeric(df['Hourly_Salary'])
df

Example 1
In this example we filter the DataFrame by keeping only the values less than 1000; values not meeting the condition are replaced with NaN
filtered_df = df.where(df < 1000)
filtered_df

Example 2
This example filters the DataFrame by replacing values greater than or equal to 1000 witjh 999, while keeping values less than 1000 unchanged
filtered_df_2 = df.where(df < 1000, other=999)
filtered_df_2

Example 3
In this example we are replacing all NaN(missing) values in the DataFrame with 0, leaving all non-null values unchanged
df_where_3 = df.where(df.notnull(), 0)
df_where_3

Here, we are creating another DataFrame with columns “Running Back”, “Career Rusing Yards” and “Touchdowns”.
df2 = pd.DataFrame({ 'Running Back': ['Barry Sanders', 'Walter Payton', 'Emmitt Smith', 'Jim Brown'], 'Career Rushing Yards': [152690, 16726, 18355, 12312], 'Touchdowns': [99, 110, 164, 106] })
df2

Example 4
Next, we filter the DataFrame (df2) to keep only the rows where the Career Rusing Yards are less than 100,000, rows that dont meet this condition are filled with NaN
df2.where(df2["Career Rushing Yards"] < 100000)

Example 5
In this example, we keep rows where Touchdowns are at least 99 and Career Rushing Yards are not equal to 18,355. All other rows are replaced with NaN
df2.where((df2["Touchdowns"] >= 99) & (df2["Career Rushing Yards"] != 18355))

Example 6
Here we create a boolean filter that marks True for players with fewer than 100 touchdowns and False otherwise
filter1 = df2["Touchdowns"] < 100
Here we create a boolean filter that marks True for players with Touchdowns greater than 150 and False otherwise.
filter2 = df2["Touchdowns"] > 150
Then we filter then DataFrame(df) to keep rows where either filter1 or filter2 is True. Rows not matching these conditions are replaced with NaN
df2.where(filter1 | filter2)

Example 7
Here we create a new column “touchdown_totals” in df2 that shows the original touchdown count if it’s 100 or more, otherwise it labels the values as “Less Than 100”
df2["touchdown_totals"] = df2["Touchdowns"].where(df2["Touchdowns"] >= 100, other="Less Than 100")
df2

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.