Pandas Apply

Pandas Apply allows you to apply a function to rows or columns within a dataframe.

In this lesson we will be taking a look at 8 different examples of using Apply, so that you can understand the different approaches of using it.

If you would rather follow along to a video, we have one on our YouTube channel linked down below, otherwise all the code will be in this article.

Start by importing in both Pandas and Numpy.

  import pandas as pd import numpy as np

We are going to create a dataframe based around temperatures of different states. Each state has an average, summer, and winter temperature. Also to note, this is all artifical data.Â

  data = { "State": ["California", "Texas", "New York", "Florida", "Illinois", "Georgia", "Washington"], "AvgTempF": [59, 65, 52, 70, 51, 67, 50], "SummerTemp":[79, 85, 62, 90, 60, 77, 60], "WinterTemp":[49, 45, 42, 50, 41, 57, 40] }
  df = pd.DataFrame(data) df.head(10)

Example 1 - if else state location

One of the most common use cases for apply is writing an if/else function and creating new a column based around the logic. In this example we want to classify the state location.

  def get_region(state): south = ["Texas", "Florida", "Georgia"] west = ["California", "Washington"] midwest = ["Illinois"] northeast = ["New York"] if state in south: return "South" elif state in west: return "West" elif state in midwest: return "Midwest" elif state in northeast: return "Northeast" else: return "Unknown"

To apply the function, we first have to choose the column we want to use. In this case state. Then we use .apply and pass in the get_region function.

  df["Region"] = df["State"].apply(get_region) df.head(10)

Example 2 Convert Fahrenheit to Celsius Using a Lambda Function

Probably the 2nd most popular way to use Apply is to utilize a Lambda Function. Instead of writing an external function like our first example, we can write the lambda directly inside the apply function.Â

In this case we want to convert the average temp to Celcius. Like above we want to first grab the column we want to run the function on. Inside we write a simple lambda that uses the temperature convesrion calculation from F to C.Â

  df["AvgTempC"] = df["AvgTempF"].apply(lambda x: (x - 32) * 5/9) df.head(10)

Example 3 Multiple Columns

This example is a small addition to the one above. We can apply this lambda (or a function like the first example) to multiple columns. This time we convert summer and winter temps.

  df[["SummerTempCelc", "WinterTempCelc"]] = df[["SummerTemp", "WinterTemp"]].apply(lambda x: (x - 32) * 5/9) df.head(10)

Example 4 Built in functions

Another quite common use case is to utilize a built in function. Taking a log transformation is quite common. All we have to do is pass np.log.

  df["AvglogTemp"] = df["AvgTempF"].apply(np.log) df.head(10)

Example 5 of applying a function row-wise to create a custom description

Say you want to grab multiple pieces of information from a row. This time, we have to pass in a row to our function and then specify columns within it.

  def create_description(row): return f"{row['State']} has an average temperature of {row['AvgTempF']}°F ({row['AvgTempC']:.1f}°C)."

We also see a new parameter when using apply. If we want to apply across rows, use axis=1. By default axis=0 which is columns.

  df["Description"] = df.apply(create_description, axis=1) df["Description"].head(10)

Example 6/7 with args and kwargs

Often functions will require multiple parameters. What happens if we want to reuse our function and then utilize different params each time?

This is where args and kwargs can come in handy. Args we don’t explicity define whereas Kwargs are.

  def temp_level(temp, high_temp, low_temp): if temp > high_temp: return "hot" elif temp < low_temp: return "cold" else: return "medium"

This is the args example. We pass in 80 as the high temp and 50 as the low temp for summer.

  df['temp_rating_summer'] = df['SummerTemp'].apply(temp_level, args = (80, 60))

This is the kwargs example. We pass in 50 as the high temp and 30 as the low temp for summer.

  df['temp_rating_winter'] = df['WinterTemp'].apply(temp_level, high_temp = 50, low_temp = 30)

We were able to reuse the same function for multiple use cases.

  df[['temp_rating_summer', 'temp_rating_winter']]

Example 8 apply function to each column axis = 0, can find a summary stat fast

The last use case looks at finding summary statistics. Additionally showcasing that we do not have to always create a new column with apply (although that is the most common use case).

  def median(x): return x.median()
  df[['AvgTempF', 'AvgTempC', 'SummerTemp', 'WinterTemp']].apply(median, axis = 0) df.head(10)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *