Pandas Rank
Learning how to use Pandas Rank is important when it comes to real world data and Interview Questions. In this lesson we will go over 10+ different examples of utilizing it. We will cover the different methods, percentage, null values, and groupby.Â
If you want to watch a YouTube video based around Pandas Rank, it is embedded below.
Tutorial Prep
Before we start this lesson on Pandas Rank, let’s import Pandas and NumPyÂ
import pandas as pd import numpy as np
For this tutorial, we will use baseball card data. Player is the player on the card, card is the brand/set of the card, and price is the value of the card. We will create a dictionary and then pass it into a dataframe.
data = { 'Player': [ 'Honus Wagner', 'Mickey Mantle', 'Babe Ruth', 'Mike Trout', 'Derek Jeter', 'Ty Cobb', 'Nolan Ryan', 'Shohei Ohtani' ], 'Card': [ 'T206', '1952 Topps', '1916 Sporting News', '2009 Bowman Chrome', '1993 SP Foil', 'Ty Cobb', 'Venezuela', '2018 Topps Chrome' ], 'Price': [ 1000000, 50000, 350000, 2500, 300, 1000000, 10000, 2500 ] }
The dataframe name will be df.
df = pd.DataFrame(data)
We are going to create two copies of the original dataframe to use a bit later in the tutorial.
df2 = df.copy() df3 = df.copy()
Example 1 - Rank all columns in a dataframe
To start off the lesson, let’s look at what happens if we use .rank() on the dataframe. This will rank all the columns in the dataframe. Typically this isn’t the best approach to use and we will explore better practices later on.
df.rank()

Example 2 - Rank Only Numeric Columns
Typically with rank, we will only want to rank numeric columns. We again can use .rank() but this time pass in numeric_only=True. One thing to note is rank works in ascending order by default. So with the prices we will see the lowest price have the lowest rank.
df.rank(numeric_only=True)

Example 3 - Create a new column with rank - desc - avg method
Typically the best approach to use with rank is to create a new column. We can also change ascending to descending by utilizing ascending=False.
By default the method that is used with rank is average. If two values are the same, the rank is both combined divided by the number of entries. In the image below you’ll see Honus Wagner and Ty Cobb both have a rank value of 1.5. They have ranks of 1 and 2.Â
So we take (1 + 2) / 2 = 1.5. Additionally since these are the two highest values, we skip 2 now and jump to 3 which is the Babe Ruth Card.
df['card_value_rank_average'] = df['Price'].rank(ascending=False)

Example 4 - min: Ties get the minimum rank in the group.
The next method to look at is min which gives both values the min ranking. In this case both Wagner and Cobb get a value of 1. 2 is skipped and we jump to 3.
df['card_value_rank_min'] = df['Price'].rank(ascending=False, method='min')

Example 5 - Dense
Dense is nearly identical to min. The difference is instead of moving to 3 for the next value, we move to 2. As you can see with the two previous methods (Average and Min) Ruth had a value of 3. This time it is 2.
df['card_value_rank_dense'] = df['Price'].rank(ascending=False, method='dense')

Example 6 -max: Ties get the maximum rank
The Opposite of Min is Max when using rank. Since Cobb and Wagner are technically the first two ranks in the dataframe, they get assigned the max value which is 2. The next value would then be 3. This isn’t used as often as Avg, Min, or Max, but it is available if needed.
df['card_value_rank_max'] = df['Price'].rank(ascending=False, method='max')

Example 7 - First
For some special usecases, you may want to use first. Essentially what first does is rank the ties based around their index. So when Cobb and Wagner tie, whoever appears first in the dataframe gets the first rank with the next getting the second rank.Â
df['card_value_rank_first'] = df['Price'].rank(ascending=False, method='first')

Example 8 - Pct
Moving away from rank methods, let’s explore pct. Percentage (pct) will give your rankings a value from 0 to 1 normalizing it.
df2['price_percentage'] = df2['Price'].rank(ascending=False, pct=True)

Example 9 - Null - Keep
There are 3 different approaches that we can take when it deals with null values. Before we jump into them, we need to add in a null value to our dataframe. While not a baseball card, let’s add in a card of Thomas Edison into our dataframe with no $ amount on the value. This is represented by NaN in our dataframe.
To create this new row, we use .loc[] and pass in the index we want to assign it to. In this case it’ll be index 8 row 9.
df3.loc[8] = ['Thomas Edison', '1888 Lone Jack', np.nan]
Let’s look at the default way of how nulls are ranked, which is having the rank as NaN. We do not need to pass in a parameter to achieve this result. The next 2 ways of dealing with null values will give it a rank.
df3['card_value_null_keep'] = df3['Price'].rank(ascending=False)

Example 10 - Null - top
The first of two ways to assign a rank to Null values is top. This will give the null values the first rank value. In this case with the Thomas Edison, it gets a rank of 1.
df3['card_value_null_top'] = df3['Price'].rank(ascending=False, na_option='top')

Example 11 - Null - bottom
Bottom will give the opposite of the results achieved with top. We will have the last rank value given to the null value in the dataframe.
df3['card_value_null_bottom'] = df3['Price'].rank(ascending=False, na_option='bottom')

Example 12 - Groupby
Groupby is helpful when looking at different groups and wanting to assign ranks within groups. Everything that we covered above is applicable to groupby. To utilize this with our current dataframe, we need to assign a card_type to each card in the dataframe. This will be quite simplistic in this context, we will use Vintage and modern.Â
df3['card_type'] = ['Vintage', 'Vintage', 'Vintage', 'Modern', 'Modern', 'Vintage', 'Vintage', 'Modern', 'Vintage' ]
Now with the card_type in the dataframe, let’s groupby the card_type and assign the rank to price. We will use the default null ranking (NaN), default ranking method, and ascending = False so that the highest prices are ranked first. You’ll see we achieve four 1.5 values in the final results.
df3['card_type_groupby'] = df3.groupby('card_type')['Price'].rank(ascending=False)

Example 13 - nlargest
While this isn’t exact a rank example, one of the ways data analysts utilize rank is by grabbing the top X rank. Nlargest allows us to skip the ranking in general and grab the top results.
df3['Price'].nlargest(3)

Example 14 - nsmallest
Instead of the largest values, we can use nsmallest to grab the lowest values in the dataframe.
df3['Price'].nsmallest(3)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.