scipy chi square goodness of fit

#The Chi-Squared test determines whether there’s a significant association between categorical variables.
#It compares the observed frequencies (counts) to the expected frequencies, calculated under the null hypothesis that the variables are independent or that the observed distribution fits a given distribution.

#Chi-Squared Goodness-of-Fit Test
#This test assesses if the observed frequencies for a single categorical variable match expected frequencies for a known distribution.

#Suppose a die was rolled 60 times, and the observed frequencies of the outcomes are: observed = [8, 9, 10, 11, 12, 10]

#Define Observed and Expected Frequencies
#observed = np.array([8, 9, 10, 11, 12, 10])
#expected = np.array([10, 10, 10, 10, 10, 10])

#Run the Chi-Squared Goodness-of-Fit Test

#With a high p-value (0.848), we fail to reject the null hypothesis, indicating the observed distribution is not significantly different from a fair die
  import numpy as np from scipy import stats
  alpha = 0.05

Example 1 Manual

  observed_rolls = [22, 20, 18, 21, 19, 20] # Observed frequencies for faces 1 to 6
  total_rolls = 120
  num_faces = 6
  expected_frequency = total_rolls / num_faces # Each face should appear 20 times for a fair die
  expected_rolls = [expected_frequency] * num_faces # Expected frequencies for faces 1 to 6
  chi_squared_stat = sum((observed - expected) ** 2 / expected for observed, expected in zip(observed_rolls, expected_rolls))
  df = num_faces - 1 # Degrees of freedom is number of categories - 1
  p_value = 1 - chi2.cdf(chi_squared_stat, df)

Example 2 Shortcut

  observed_rolls = [22, 20, 18, 21, 19, 20] # Observed frequencies expected_rolls = [20] * 6 # Expected frequencies for each face if die is fair
  chi_squared_stat, p_value = chisquare(f_obs=observed_rolls, f_exp=expected_rolls)
#Imagine a basketball player took 100 shots in a game, with the following observed success rates for each type of shot:

#Shot Type 3-Pointer Mid-Range Layup Free Throw
#Successful Shots 25 30 20 25
   # Observed counts of each color observed = np.array([25, 30, 20, 25]) # 3-Pointer: 30% of total successful shots #Mid-Range: 40% of total successful shots #Layup: 20% of total successful shots #Free Throw: 10% of total successful shots total_count = sum(observed) expected = np.array([0.30 * total_count, 0.40 * total_count, 0.20 * total_count, 0.10 * total_count])
  chi2, p = chisquare(f_obs=observed, f_exp=expected) print("Chi-Squared Statistic:", chi2) print("p-value:", p)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *