Python Cumulative distribution function

  import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm import matplotlib.pyplot as plt import seaborn as sns

Example 1 Manual Calculation

  data = [2, 3, 3, 5, 7]
  sorted_data = np.sort(data)
  data_len = len(sorted_data)
  cdf_values = []
  for i in range(data_len): # Calculate CDF as the proportion of data points less than or equal to sorted_data[i] cdf_value = np.sum(sorted_data <= sorted_data[i]) / data_len #each element is True if the corresponding element in sorted_data is less than or equal to sorted_data[i], and False otherwise cdf_values.append(cdf_value)
  print(cdf_values)

Much Easier Examples

A Cumulative Distribution Function (CDF) can be used with either a value from the distribution or a Z-score, depending on the context:
  np.random.seed(12)
  mean = 0 std_dev = 1 size = 1000
  data = np.random.normal(loc=mean, scale=std_dev, size=size)

Example 2 CDF at a single point

  cdf_neg_one = norm.cdf(-1, loc=data.mean(), scale=data.std())
  print(cdf_neg_one)
  cdf_one = norm.cdf(1, loc=data.mean(), scale=data.std())
  print(cdf_one)

Example 3 CDF Range

  Upper_CDF = norm.cdf(2, loc=data.mean(), scale=data.std()) Lower_CDF = norm.cdf(-2, loc=data.mean(), scale=data.std())
  cdf_range = Upper_CDF - Lower_CDF
  print(cdf_range)

Example 4 CDF Right Side, Example Value greater than 2

  value_greater_2 = 1 - norm.cdf(2, loc=data.mean(), scale=data.std())
  print(value_greater_2)

Example 5 Graph Seaborn

  sns.ecdfplot(data, label='CDF') plt.title('CDF of Normally Distributed Data') plt.xlabel('Data Values') plt.ylabel('Cumulative Probability') plt.legend() plt.grid(True) plt.show()

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *