Python Cumulative distribution function
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
Example 1 Manual Calculation
data = [2, 3, 3, 5, 7]
sorted_data = np.sort(data)
data_len = len(sorted_data)
cdf_values = []
for i in range(data_len):
# Calculate CDF as the proportion of data points less than or equal to sorted_data[i]
cdf_value = np.sum(sorted_data <= sorted_data[i]) / data_len
#each element is True if the corresponding element in sorted_data is less than or equal to sorted_data[i], and False otherwise
cdf_values.append(cdf_value)
print(cdf_values)

Much Easier Examples
A Cumulative Distribution Function (CDF) can be used with either a value from the distribution or a Z-score, depending on the context:
np.random.seed(12)
mean = 0
std_dev = 1
size = 1000
data = np.random.normal(loc=mean, scale=std_dev, size=size)
Example 2 CDF at a single point
cdf_neg_one = norm.cdf(-1, loc=data.mean(), scale=data.std())
print(cdf_neg_one)

cdf_one = norm.cdf(1, loc=data.mean(), scale=data.std())
print(cdf_one)

Example 3 CDF Range
Upper_CDF = norm.cdf(2, loc=data.mean(), scale=data.std())
Lower_CDF = norm.cdf(-2, loc=data.mean(), scale=data.std())
cdf_range = Upper_CDF - Lower_CDF
print(cdf_range)

Example 4 CDF Right Side, Example Value greater than 2
value_greater_2 = 1 - norm.cdf(2, loc=data.mean(), scale=data.std())
print(value_greater_2)

Example 5 Graph Seaborn
sns.ecdfplot(data, label='CDF')
plt.title('CDF of Normally Distributed Data')
plt.xlabel('Data Values')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.grid(True)
plt.show()

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.