Pandas Melt

In this lesson, we are going to take a look at the Pandas Melt function. This is a way to transform a dataframe to convert columns to rows.

Later in this lesson, we will take a look at some of the benefits of using melt with a groupby and plotting.

This lesson is based on a YouTube video which is linked down below. (Video Coming Soon)

Import Pandas, Seaborn, and Matplotlib.

				
					import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt
				
			

Pandas Melt Example 1

To start we’re going to create a simple dataframe in python. We will create a dictionary and then pass it into pd.DataFrame. 

				
					data = {
    "City": ["New York", "Chicago", "Los Angeles"],
    "Phish": [15, 2, 4],
    "Billy Strings": [6, 3, 2],
    "Tool": [1, 1, 1]
}

df = pd.DataFrame(data)
df.head()
				
			

The parameters used: 

  1. df – our dataframe from above
  2. id_vars – the column(s) that you want to save in the melted dataframe
  3. var_name – the new column name with the columns headers from the other original dataframe
  4. value_name – the new column name with the values from the other original dataframe
				
					melted_df = pd.melt(df,
                    id_vars = ['City'],
                    var_name = 'Band',
                    value_name='Unique Concerts'                
)
melted_df.head(10)
				
			

The new Pandas Melt DataFrame

Pandas Melt Example 2

For our second example, we are going to look at using a bit more of a complex dataset from Seaborn. Additionally, this will allow us to look at using another parameter which was left blank in example 1. 

				
					penguins = sns.load_dataset('penguins')
penguins.head()
				
			

Seaborn Penguin DataFrame

				
					melted_penguins = pd.melt(
    penguins,
    id_vars=['species', 'island'], 
    value_vars = ['bill_length_mm',	'bill_depth_mm',	'flipper_length_mm',	'body_mass_g'],
    var_name = "measurment_type",
    value_name = 'measurment_value'
)
melted_penguins
				
			

This time we want to keep two columns so species and island are in a list equal to id_vars.

value_vars is used when we don’t want to utilize all the columns in the original dataframe. In this example we aren’t using sex. Instead we have 2 columns in id_vars and 4 columns within value_vars.

Group by

With the new dataframe, using a groupby has become simpler. Check out the code below compared to the original dataframe groupby.

				
					melted_penguins.groupby(['species', 'measurment_type'])['measurment_value'].mean().reset_index()
				
			

The original dataframe groupby makes us put in each measurement column, where as above we only had to use measurement_values.

				
					penguins.groupby('species')[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']].mean().reset_index()
				
			

Plots

Now let’s look at a few different plots with the melted dataframe.

Box Plot With Hue

				
					plt.figure(figsize=(10,6))
sns.boxplot(data=melted_penguins, x="measurment_type", y='measurment_value', hue='species')
plt.yscale("log")
plt.xticks(rotation=45)
plt.title("Penguin Measurments")
plt.show()
				
			

Facet Grid with Box Plots

				
					g = sns.FacetGrid(melted_penguins, col='measurment_type', col_wrap=2, height=4, sharey=False)
g.map_dataframe(sns.boxplot, x='species', y='measurment_value')
plt.show()
				
			

Another Box Plot (Original Dataframe)

				
					plt.figure(figsize=(10,6))
penguins.boxplot(column=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'], by='species')
plt.yscale("log")
plt.xticks(rotation=45)
plt.title("Penguin Measurments")
plt.show()
				
			

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *