Python Probability Point Function

				
					import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
				
			
				
					np.random.seed(17)
				
			
				
					mean = 0
std_dev = 1
size = 1000
				
			
				
					data = np.random.normal(loc=mean, scale=std_dev, size=size)
				
			

Example 1

lowest 20%
				
					ppf_20 = norm.ppf(0.2, loc=data.mean(), scale=data.std())
				
			
				
					print(ppf_20)
				
			
lowest 70%
				
					ppf_70 = norm.ppf(0.7, loc=data.mean(), scale=data.std())
				
			
				
					print(ppf_70)
				
			

Example 2 Find the data points that are between 25 and 50%

				
					ppf_25 = norm.ppf(0.25, loc=data.mean(), scale=data.std())
ppf_50 = norm.ppf(0.50, loc=data.mean(), scale=data.std())
				
			
				
					print(ppf_25)
				
			
				
					print(ppf_50)
				
			

Example 3 Find the top 20% data points

				
					ppf_top_20 = norm.ppf((1-0.2), loc=data.mean(), scale=data.std())
				
			
				
					ppf_top_20
				
			

Example 4 Numpy Percentiles

Cover Quartile, Decile, Percentile in the next video

				
					ppf_75 = np.percentile(data, 75)
				
			

Example 5 Graph Matplotlib

seaborn doesnt have a PPF Option
				
					sorted_data = np.sort(data)
				
			
				
					cumulative_probabilities = np.linspace(0, 1, len(sorted_data), endpoint=False)
				
			
				
					plt.plot(cumulative_probabilities, sorted_data, marker='o', linestyle='none', label='PPF')
plt.title('PPF of Normally Distributed Data')
plt.xlabel('Cumulative Probability')
plt.ylabel('Data Values')
plt.legend()
plt.grid(True)
plt.show()
				
			

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *