Pandas Where

The where() function in Pandas is used to replace values in a DataFrame or Series where a condition is not met.Â

It is used to check a data frame for one or more conditions and return the result.

To start with we import pandas and numpy

To start we’re going to import two essential Python libraries.

Pandas

numpy

  import pandas as pd import numpy as np

Next we create a panadas DataFrame named df with a single column, “Hourly_SALARY”, containing salary values as strings and one missing value as nan

  df = pd.DataFrame({ 'Hourly_Salary': ['500.00', '10000.00', '200.00', '20.00', np.nan] })

Then we convert the “Hourly_Salary” column from strings to numeric values, allowing for mathematical operations and handling of missing data(NaN)

  df['Hourly_Salary'] = pd.to_numeric(df['Hourly_Salary'])
  df

Example 1

In this example we filter the DataFrame by keeping only the values less than 1000; values not meeting the condition are replaced with NaN

  filtered_df = df.where(df < 1000)
  filtered_df

Example 2

This example filters the DataFrame by replacing values greater than or equal to 1000 witjh 999, while keeping values less than 1000 unchanged

  filtered_df_2 = df.where(df < 1000, other=999)
  filtered_df_2

Example 3

In this example we are replacing all NaN(missing) values in the DataFrame with 0, leaving all non-null values unchanged

  df_where_3 = df.where(df.notnull(), 0)
  df_where_3

Here, we are creating another DataFrame with columns “Running Back”, “Career Rusing Yards” and “Touchdowns”.

  df2 = pd.DataFrame({ 'Running Back': ['Barry Sanders', 'Walter Payton', 'Emmitt Smith', 'Jim Brown'], 'Career Rushing Yards': [152690, 16726, 18355, 12312], 'Touchdowns': [99, 110, 164, 106] })
  df2

Example 4

Next, we filter the DataFrame (df2) to keep only the rows where the Career Rusing Yards are less than 100,000, rows that dont meet this condition are filled with NaN

  df2.where(df2["Career Rushing Yards"] < 100000)

Example 5

In this example, we keep rows where Touchdowns are at least 99 and Career Rushing Yards are not equal to 18,355. All other rows are replaced with NaN

  df2.where((df2["Touchdowns"] >= 99) & (df2["Career Rushing Yards"] != 18355))

Example 6

Here we create a boolean filter that marks True for players with fewer than 100 touchdowns and False otherwise

  filter1 = df2["Touchdowns"] < 100

Here we create a boolean filter that marks True for players with Touchdowns greater than 150 and False otherwise.

  filter2 = df2["Touchdowns"] > 150

Then we filter then DataFrame(df) to keep rows where either filter1 or filter2 is True. Rows not matching these conditions are replaced with NaN

  df2.where(filter1 | filter2)

Example 7

Here we create a new column “touchdown_totals” in df2 that shows the original touchdown count if it’s 100 or more, otherwise it labels the values as “Less Than 100”

  df2["touchdown_totals"] = df2["Touchdowns"].where(df2["Touchdowns"] >= 100, other="Less Than 100")
  df2

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *