Python Cumulative distribution function

				
					import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
				
			

Example 1 Manual Calculation

				
					data = [2, 3, 3, 5, 7]
				
			
				
					sorted_data = np.sort(data)
				
			
				
					data_len = len(sorted_data)
				
			
				
					cdf_values = []
				
			
				
					for i in range(data_len):
        # Calculate CDF as the proportion of data points less than or equal to sorted_data[i]
        cdf_value = np.sum(sorted_data <= sorted_data[i]) / data_len
        #each element is True if the corresponding element in sorted_data is less than or equal to sorted_data[i], and False otherwise
        cdf_values.append(cdf_value)
				
			
				
					print(cdf_values)
				
			

Much Easier Examples

A Cumulative Distribution Function (CDF) can be used with either a value from the distribution or a Z-score, depending on the context:
				
					np.random.seed(12)
				
			
				
					mean = 0
std_dev = 1
size = 1000
				
			
				
					data = np.random.normal(loc=mean, scale=std_dev, size=size)
				
			

Example 2 CDF at a single point

				
					cdf_neg_one = norm.cdf(-1, loc=data.mean(), scale=data.std())
				
			
				
					print(cdf_neg_one)
				
			
				
					cdf_one = norm.cdf(1, loc=data.mean(), scale=data.std())
				
			
				
					print(cdf_one)
				
			

Example 3 CDF Range

				
					Upper_CDF = norm.cdf(2, loc=data.mean(), scale=data.std())
Lower_CDF = norm.cdf(-2, loc=data.mean(), scale=data.std())
				
			
				
					cdf_range = Upper_CDF - Lower_CDF
				
			
				
					print(cdf_range)
				
			

Example 4 CDF Right Side, Example Value greater than 2

				
					value_greater_2 = 1 - norm.cdf(2, loc=data.mean(), scale=data.std())
				
			
				
					print(value_greater_2)
				
			

Example 5 Graph Seaborn

				
					sns.ecdfplot(data, label='CDF')
plt.title('CDF of Normally Distributed Data')
plt.xlabel('Data Values')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.grid(True)
plt.show()
				
			

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *