Pandas Merge

Merges in Python Pandas are like joins in SQL. In this lesson we are going to go through 7 different examples of using Merge. It will cover frequently used merges like left and inner while still going over infrequently used ones like full outer and cross.

This tutorial is based on a YouTube video we uploaded to the channel. You can watch it below.

  import pandas as pd

Example 1 - Basic Merge Same Columns

To start we’re going to create two simple dataframes in python. This tutorial will build on these dataframes in each example as we add complexity.

DF1 has the player id and name of cricketers while DF2 has the player id and salary. We want to have one dataframe that includes all of this information.

  df1 = pd.DataFrame({'Player_ID': [1, 2], 'Name': ['Shane Warne', 'Victor Trumper']}) df2 = pd.DataFrame({'Player_ID': [1, 2], 'Salary': [50000, 100000]})
  df1.head()
  df2.head()

To merge these dataframes, pass in each one, and then define the column that we want to merge on. In this case both dataframes have a Player_ID.

  merged_df = pd.merge(df1, df2, on='Player_ID') merged_df.head()

Example 2 - Merge with Different Column Names

We are going to do the same merge as above, but this time utilize different column names for each dataframe. We know that the Player_ID and Cricketer_id can be joined as it is the same information.

Often you’ll find in the real world that column names aren’t always going to be 1:1, so it’s important to learn how to join them.

  df3 = pd.DataFrame({'Player_ID': [1, 2], 'Name': ['Shane Warne', 'Victor Trumper']}) df4 = pd.DataFrame({'Cricketer_id': [1, 2], 'Salary': [50000, 100000]})

The only difference from the first example is we define the exact columns we want to join on. left_on for the first dataframe and right_on for the second dataframe.

  merged_df_2 = pd.merge(df3, df4, left_on='Player_ID', right_on='Cricketer_id')

Since we join on different column names, the final dataframe will have both. Personally, I do not want this so I typically drop one of the two columns. In this case, let’s drop Cricketer_id.

  merged_df_2 = merged_df_2.drop('Cricketer_id', axis=1) merged_df_2.head(10)

Examples 3 through 5 Different Types of Merges

The next few examples are going to use df5 and df6. Each will showcase a different type of pandas merge: Left, Inner, and Outer. Cross merge will be a bit later in this lesson.

These dataframes each have 1 new cricketer which we define as 3 and 4. The twist is that 3 is only present in DF5 and 4 is only present in DF6. Let’s look at how these merges impact the data.

  df5 = pd.DataFrame({'Player_ID': [1, 2, 3], 'Name': ['Shane Warne', 'Victor Trumper', 'Don Bradman']}) df6 = pd.DataFrame({'Cricketer_id': [1, 2, 4], 'Salary': [50000, 100000, 200000]})

Example 3 Left Merge

A Left Merge allows you to keep the entire left dataframe (the first one we enter into merge()) and add in additional information/columns from the second dataframe.

Since we have crickets 1-3 in the left dataframe, they will all show up in the final dataframe. The one nuance though is that cricket 3 – Donald Bradman – is not in the right dataframe. Due to that, he has null information (NaN).

  merged_df_left = pd.merge(df5, df6, left_on='Player_ID', right_on='Cricketer_id', how='left') merged_df_left.head(5)

Example 4 Inner Merge

The inner merge will only showcase results that are present in both dataframes.

In this example, we see that only cricketer 1 and 2 are present in both dataframes. Due to this, they are the only ones in the final one.

  merged_df_inner = pd.merge(df5, df6, left_on='Player_ID', right_on='Cricketer_id', how='inner') merged_df_inner.head(10)

Example 5 Full Outer Merge

The full outer merge will showcase everything from both dataframes with no filter.

As we see down below every cricketer (one through four) is shown in the final dataframe.Â

  merged_df_outer = pd.merge(df5, df6, left_on='Player_ID', right_on='Cricketer_id', how='outer') merged_df_outer.head(10)

Example 6 Suffixes

What happens if dataframes share common names?Â

This example takes a look at suffixes which allow you to rename the columns to make it clear where each column in the final dataframe came from.

In the two dataframes below, we are going to share the Player_ID, Name, and  Team columns. While we will join on the Player_ID the team will be different for each one. The first dataframe uses a national team whereas the second uses a professional team.

  df_national = pd.DataFrame({ 'Player_ID': [1, 2, 3], 'Name': ['Shane Warne', 'Victor Trumper', 'Don Bradman'], 'Team': ['Australia', 'Australia', 'Australia'] })
  df_professional = pd.DataFrame({ 'Player_ID': [1, 2, 4], 'Name': ['Shane Warne', 'Victor Trumper', 'Virat Kohli'], 'Team': ['Rajasthan Royals', 'Sydney CC', 'RCB'] # Professional teams })

To use suffixes, we will utilize a tuple. What we recommendis add an underscore before the suffix so that the final dataframe is formatted.

  cricket_suffixes_df = pd.merge(df_national, df_professional, on='Player_ID', suffixes=('_National', '_Professional'), how='left') cricket_suffixes_df.head(10)

Example 7 Cross Merge

The last merge we will take a look at is the Cross merge. This is a special type of merge as you technically don’t merge on a specific column.

Instead, we create every possible outcome. Probably the easiest way to explain this is to look at match outcomes. Every game must have a winner and loser. So let’s build out a dataframe with this in mind.

  df_teams = pd.DataFrame({ 'Team_ID': [1, 2, 3], 'Team_Name': ['India', 'Australia', 'Pakistan'] })
  # Outcomes DataFrame df_outcomes = pd.DataFrame({ 'Outcome': ['Win', 'Loss'] })

As you see in the code below, we don’t need a column to merge on. In the final dataframe we now have each outcome as a separate line.Â

  df_match_outcomes = pd.merge(df_teams, df_outcomes, how='cross') df_match_outcomes.head(10)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *