Statistics

Python Cumulative distribution function

June 12, 2025 Ryan Nolan No comments yet

				
					import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns

Example 1 Manual Calculation

				
					data = [2, 3, 3, 5, 7]

				
					sorted_data = np.sort(data)

				
					data_len = len(sorted_data)

				
					cdf_values = []

				
					for i in range(data_len):
        # Calculate CDF as the proportion of data points less than or equal to sorted_data[i]
        cdf_value = np.sum(sorted_data <= sorted_data[i]) / data_len
        #each element is True if the corresponding element in sorted_data is less than or equal to sorted_data[i], and False otherwise
        cdf_values.append(cdf_value)

				
					print(cdf_values)

Much Easier Examples

A Cumulative Distribution Function (CDF) can be used with either a value from the distribution or a Z-score, depending on the context:

				
					np.random.seed(12)

				
					mean = 0
std_dev = 1
size = 1000

				
					data = np.random.normal(loc=mean, scale=std_dev, size=size)

Example 2 CDF at a single point

				
					cdf_neg_one = norm.cdf(-1, loc=data.mean(), scale=data.std())

				
					print(cdf_neg_one)

				
					cdf_one = norm.cdf(1, loc=data.mean(), scale=data.std())

				
					print(cdf_one)

Example 3 CDF Range

				
					Upper_CDF = norm.cdf(2, loc=data.mean(), scale=data.std())
Lower_CDF = norm.cdf(-2, loc=data.mean(), scale=data.std())

				
					cdf_range = Upper_CDF - Lower_CDF

				
					print(cdf_range)

Example 4 CDF Right Side, Example Value greater than 2

				
					value_greater_2 = 1 - norm.cdf(2, loc=data.mean(), scale=data.std())

				
					print(value_greater_2)

Example 5 Graph Seaborn

				
					sns.ecdfplot(data, label='CDF')
plt.title('CDF of Normally Distributed Data')
plt.xlabel('Data Values')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.grid(True)
plt.show()

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply Cancel reply