Spearman Rank Correlation

Table of Contents

Spearman Rank Correlation [Simply explained]

				
					import numpy as np
from scipy import stats
import pandas as pd
				
			
				
					Example 1 Manual
				
			
				
					# Step 1: Grab the data
hits = np.array([150, 180, 120, 210, 160])
rbis = np.array([75, 90, 50, 110, 85])
				
			
				
					# Step 2: Rank the data
hits_rank = stats.rankdata(hits)
				
			
				
					rbis_rank = stats.rankdata(rbis)
				
			
				
					# Step 3: Calculate the rank differences and their squares
d = hits_rank - rbis_rank
				
			
				
					d_squared = d ** 2
				
			
				
					# Step 4: Apply the Spearman formula
n = len(hits)
				
			
				
					spearman_manual = 1 - (6 * np.sum(d_squared)) / (n * (n**2 - 1))
				
			
				
					# Print the manually calculated Spearman correlation
print(f"Manual Spearman Correlation: {spearman_manual}")
				
			

Example 2 Fast wth scipy

				
					# Step 1: Calculate Spearman's rank correlation using scipy
spearman_corr, p_value = stats.spearmanr(hits, rbis)
				
			
				
					# Print the Spearman correlation coefficient and p-value
print(f"Spearman Correlation: {spearman_corr}")
				
			
				
					Example 3 Ordinal Data with pandas
				
			
				
					data = {
    'Hours_Studied': [10, 15, 5, 18],
    'Grades': ['C', 'A', 'D', 'B']
}
				
			
				
					# Create DataFrame
df = pd.DataFrame(data)
				
			
				
					# Convert Grades to numerical ordinal data
grade_mapping = {'A': 4, 'B': 3, 'C': 2, 'D': 1}
df['Grades_Ordinal'] = df['Grades'].map(grade_mapping)
				
			
				
					# Rank the data using pandas
df['Hours_Rank'] = df['Hours_Studied'].rank()

				
			
				
					df['Grades_Rank'] = df['Grades_Ordinal'].rank()
				
			
				
					df.head(5)
				
			
				
					spearman_corr = df[['Hours_Rank', 'Grades_Rank']].corr(method='spearman').iloc[0, 1]
				
			
				
					print(f"Spearman correlation: {spearman_corr:.3f}")
				
			

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

Keep Learning

python quantiles statistics

In Python, a quantile is a statistical term used to describe a point or value below which a certain proportion of the...

python variance and standard deviation

https://youtu.be/p4H2b2x_nWc#population and sample variance/std deviationVariance measures how far each data point in the set is from the mean andthus from every other...

Python Z-Score

We are going to be looking at Python Z-score. Z-score tells us how far a data poin is from the mean. https://youtu.be/QjG1ljFNF9U...

python covariance matrix

https://youtu.be/xNIQsXNZ4hg Example 1 Manual #The positive value of 3.6 indicates that the prices of Stock A and Stock B tend to move...