two sample z test scipy
To start we’re going to import a few python packages.
ztest: Impors the function to perform a Z-test for comparing means.
numpy as np: for numberical operations.
from scipy.stats import norm: Imports the normal distribution functions.
from statsmodels.stats.weightstats import ztest
import numpy as np
from scipy.stats import norm
#import math
Then we set the significance level to 0.05.
alpha = 0.05
We set the random seed to 10 for reproducibility.
np.random.seed(10) # For reproducibility
Example 1
we create a list of numbers called sample_1
sample_1 = [370, 395, 400, 405, 390, 385, 410, 395, 400, 380, 390, 400, 410, 415, 395, 405, 390, 400, 420, 375, 400, 385, 390, 395, 410, 405, 400, 395, 380, 400]
We also create a list of numbers called sample_2
sample_2 = [360, 375, 385, 390, 370, 380, 395, 390, 385, 375, 380, 395, 400, 405, 385, 395, 375, 385, 395, 370, 380, 395, 390, 385, 375, 380, 395, 400, 385, 395]
We calculate the mean of sample_1 by dividing the sum of sample_1 by the length of sample_1
mean_sample_1 = sum(sample_1) / len(sample_1) # Mean of Sample 1
We also calculate the mean of sample_2 by dividing sum of sample_2 by the length of sample_2
mean_sample_2 = sum(sample_2) / len(sample_2) # Mean of Sample 2
print("Sample 1 Mean:", mean_sample_1) # Expected: 396.3

print("Sample 2 Mean:", mean_sample_2) # Expected: 385.5

we set a standard deviation of 15
std_dev = 15 # Given: standard deviation for both samples
Next, we calculate the number of observations in each sample.
n1, n2 = len(sample_1), len(sample_2)
Then we calculate the pooled standard error.
pooled_se = np.sqrt((std_dev**2 / n1) + (std_dev**2 / n2))
print("Pooled Standard Error:", round(pooled_se, 2)) # Expected: 3.87

Here, we calculate the Z-statistic for a two-sample Z-test.
mean_sample_1 - mean_sample_2
: Difference between the two sample means.pooled_se
: Standard error of the difference in means (computed earlier).
z_statistic = (mean_sample_1 - mean_sample_2) / pooled_se
print("Z-Statistic:", round(z_statistic, 2)) # Expected: 2.79

abs(z_statistic)
: Takes the absolute value of the Z-score.norm.cdf(...)
: Calculates the cumulative probability up to that Z-score under the standard normal distribution.1 - norm.cdf(...)
: Gets the probability in the tail beyond the Z-score.2 * (...)
: Doubles it for the two-tailed test (since the difference could be in either direction).
p_value = 2 * (1 - norm.cdf(abs(z_statistic))) # Two-tailed test
print("P-Value:", round(p_value, 4)) # Expected: 0.0052

if p_value < alpha:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")

z_stat, p_value = ztest(sample_1, sample_2, alternative='two-sided')
print(z_stat)

print(p_value)

Example 2 marathon times of two running clubs
marathon_std = 30
sample1 = np.random.normal(loc=272, scale=25, size=50) # Sample 1: Mean = 270, Std = 30
sample2 = np.random.normal(loc=255, scale=25, size=50) # Sample 2: Mean = 260, Std = 30
# Calculate means and standard deviations for both samples
mean1, size1 = np.mean(sample1), len(sample1)
mean2, size2 = np.mean(sample2), len(sample2)
mean2, size2 = np.mean(sample2), len(sample2)
pooled_se = np.sqrt((marathon_std**2 / size1) + (marathon_std**2 / size2))
z_score = (mean1 - mean2) / pooled_se
p_value = 2 * (1 - norm.cdf(abs(z_score)))
if p_value < alpha:
print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
print("Fail to reject the null hypothesis: No significant difference between the sample mean and population mean.")
Reject the null hypothesis: The sample mean is significantly different from the population mean.
Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.