Statistics

two sample z test scipy

To start we’re going to import a few python packages.

ztest: Impors the function to perform a Z-test for comparing means.

numpy as np: for numberical operations.

from scipy.stats import norm: Imports the normal distribution functions.

				
					from statsmodels.stats.weightstats import ztest
import numpy as np
from scipy.stats import norm
#import math

Then we set the significance level to 0.05.

				
					alpha = 0.05

We set the random seed to 10 for reproducibility.

				
					np.random.seed(10)  # For reproducibility

Example 1

we create a list of numbers called sample_1

				
					sample_1 = [370, 395, 400, 405, 390, 385, 410, 395, 400, 380, 390, 400, 410, 415, 395, 405, 390, 400, 420, 375, 400, 385, 390, 395, 410, 405, 400, 395, 380, 400]

We also create a list of numbers called sample_2

				
					sample_2 = [360, 375, 385, 390, 370, 380, 395, 390, 385, 375, 380, 395, 400, 405, 385, 395, 375, 385, 395, 370, 380, 395, 390, 385, 375, 380, 395, 400, 385, 395]

We calculate the mean of sample_1 by dividing the sum of sample_1 by the length of sample_1

				
					mean_sample_1 = sum(sample_1) / len(sample_1)  # Mean of Sample 1

We also calculate the mean of sample_2 by dividing sum of sample_2 by the length of sample_2

				
					mean_sample_2 = sum(sample_2) / len(sample_2)  # Mean of Sample 2

				
					print("Sample 1 Mean:", mean_sample_1)  # Expected: 396.3

				
					print("Sample 2 Mean:", mean_sample_2)  # Expected: 385.5

we set a standard deviation of 15

				
					std_dev = 15  # Given: standard deviation for both samples

Next, we calculate the number of observations in each sample.

				
					n1, n2 = len(sample_1), len(sample_2)

Then we calculate the pooled standard error.

				
					pooled_se = np.sqrt((std_dev**2 / n1) + (std_dev**2 / n2))

				
					print("Pooled Standard Error:", round(pooled_se, 2))  # Expected: 3.87

Here, we calculate the Z-statistic for a two-sample Z-test.

mean_sample_1 - mean_sample_2: Difference between the two sample means.
pooled_se: Standard error of the difference in means (computed earlier).

				
					z_statistic = (mean_sample_1 - mean_sample_2) / pooled_se

				
					print("Z-Statistic:", round(z_statistic, 2))  # Expected: 2.79

abs(z_statistic): Takes the absolute value of the Z-score.
norm.cdf(...): Calculates the cumulative probability up to that Z-score under the standard normal distribution.
1 - norm.cdf(...): Gets the probability in the tail beyond the Z-score.
2 * (...): Doubles it for the two-tailed test (since the difference could be in either direction).

				
					p_value = 2 * (1 - norm.cdf(abs(z_statistic)))  # Two-tailed test

				
					print("P-Value:", round(p_value, 4))  # Expected: 0.0052

				
					if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

Quicker way to test it – Not Entirely precise

The ztest function in statsmodels.stats.weightstats

does not explicitly allow for directly

passing the population standard deviation. Instead, it estimates the standard error based on the

sample standard deviations unless the sample variance is explicitly pooled

				
					z_stat, p_value = ztest(sample_1, sample_2, alternative='two-sided')

				
					print(z_stat)

				
					print(p_value)

Example 2 marathon times of two running clubs

				
					marathon_std = 30

				
					sample1 = np.random.normal(loc=272, scale=25, size=50)  # Sample 1: Mean = 270, Std = 30

				
					sample2 = np.random.normal(loc=255, scale=25, size=50)  # Sample 2: Mean = 260, Std = 30

				
					# Calculate means and standard deviations for both samples
mean1, size1 = np.mean(sample1), len(sample1)

				
					mean2, size2 = np.mean(sample2), len(sample2)

				
					mean2, size2 = np.mean(sample2), len(sample2)

				
					pooled_se = np.sqrt((marathon_std**2 / size1) + (marathon_std**2 / size2))

				
					z_score = (mean1 - mean2) / pooled_se

				
					p_value = 2 * (1 - norm.cdf(abs(z_score)))

				
					if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the sample mean and population mean.")

Reject the null hypothesis: The sample mean is significantly different from the population mean.

Free Community

Join 1,000+ AI Automation Builders

Weekly tutorials, live calls & direct access to Ryan & Matt.

Join Free →

Ryan Nolan

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

two sample z test scipy

Table of Contents

Example 1

Example 2 marathon times of two running clubs

Join 1,000+ AI Automation Builders

Ryan Nolan

Important Links

LinkedIn

Get in touch

Keep Learning

python standard error of the mean

python quantiles statistics

python variance and standard deviation

Python Z-Score

Spearman Rank Correlation