paired sign test in Python
Firstly, we import statistical functions from Scipy
binomtest: for performing a binomial test
shapiro: for performing the Shapiro-Wilk test (tests for normality)
numpy as np: for numerical operations and array handling.
from scipy.stats import binomtest, shapiro
import numpy as np
Here we set the significance level “alpha” to 0.05
alpha = 0.05
Example 1
Next, we create a NumPy array “before” with 10 numeric values.
before = np.array([120, 130, 135, 140, 135, 150, 140, 130, 145, 138])
Then we also create a NumPy array “after” with 10 numeric values.
after = np.array([118, 125, 130, 135, 133, 152, 137, 132, 146, 139])
Then we get the difference (elemet-wise difference) between before and after, storing the result in differences.
differences = before - after
print(differences)

Then we count how many values in differences are positive and store the count in positive_diffs.
positive_diffs = np.sum(differences > 0)
we also count how many values in differences that are negative, and we store it in negative_diffs.
negative_diffs = np.sum(differences < 0)
Here we use the min() method to check the smaller of the two counts.
test_statistic = min(positive_diffs, negative_diffs)
print(test_statistic)

Next, we calculate the number of non-zero differences by adding positive_diffs and negative_diffs.
n = positive_diffs + negative_diffs
Here, we perform binomial test using:
test_statistic: number of times the less common chnage occured.
n: total number of changes
p=0.5: expected probability under the null hypothesis.
alternative=’teo-sided’: tests if the number of changes in either direction is significantly different from 50%.
results = binomtest(test_statistic, n, p=0.5, alternative='two-sided')
p_value = results.pvalue
print(p_value)

if p_value < alpha:
print("The median of the differences is not zero (there is a difference).")
else:
print("The median of the differences between the paired observations is zero (no difference)..")

Example 2 one tail, zero value, shapiro, ordinal data,
We define a numpy array as our before.
before = np.array([3, 4, 2, 4, 3, 4, 5, 3, 4, 2])
We also define a numpy array as our after.
after = np.array([2, 4, 3, 5, 3, 5, 5, 4, 5, 3])
Then we perform element-wise subtraction.
differences = before - after
print(differences)

Next we perform the Shapiro-Wilk test on the differences to check if they follow a normal distribution
shapiro_stat, shapiro_p_value = shapiro(differences)
print(shapiro_p_value)

Next, we count how many values in differences are positive.
positive_diffs = np.sum(differences > 0)
Here, we also count how many values in differences are negative.
negative_diffs = np.sum(differences < 0)
Next, we set the test_statistic to the smaller of the two count.
test_statistic = min(positive_diffs, negative_diffs)
print(test_statistic)

Then we calculate the total number of non-zero changes by adding positive_diffs and negative_diffs.
n = positive_diffs + negative_diffs
print(n)

Next, we perform a one-sided binomial test to check if the number of less frequent changes (test_statistic) is significantly less than expected under the null hypothesis of equal probability p= (0.5)
results = binomtest(test_statistic, n=n, p=0.5, alternative='less')
p_value = results.pvalue
print(p_value)

if p_value < alpha:
print("The median of the differences is not zero (there is a difference which is less).")
else:
print("The median of the differences between the paired observations is zero (no difference)..")

Example statsmodel
Here, we import statsmodels
statsmodel : This is a python library for statistical modeling.
sign_test: from statsmodels.stats.descriptivestats- performs the sign test for paired data.
import statsmodels
from statsmodels.stats.descriptivestats import sign_test
before = np.array([120, 130, 135, 140, 135, 150, 140, 130, 145, 138])
after = np.array([118, 125, 130, 135, 133, 152, 137, 132, 146, 139])
The sign_test is a non-parametric test used to determine if there’s a consistent difference between two paired samples.
in this case, the before vs after measurements.
sign_test(before, after)

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.