paired sign test in Python

Firstly, we import statistical functions from Scipy

binomtest: for performing a binomial test

shapiro: for performing the Shapiro-Wilk test (tests for normality)

numpy as np: for numerical operations and array handling.

				
					from scipy.stats import binomtest, shapiro
import numpy as np
				
			

Here we set the significance  level “alpha” to 0.05

				
					alpha = 0.05
				
			

Example 1

Next, we create a NumPy array “before” with 10 numeric values.

				
					before = np.array([120, 130, 135, 140, 135, 150, 140, 130, 145, 138])
				
			

Then we also create a NumPy array “after” with 10 numeric values.

				
					after = np.array([118, 125, 130, 135, 133, 152, 137, 132, 146, 139])
				
			

Then we get the difference (elemet-wise difference) between before and after, storing the result in differences.

				
					differences = before - after
				
			
				
					print(differences)
				
			

Then we count how many values in differences are positive and store the count in positive_diffs.

				
					positive_diffs = np.sum(differences > 0)
				
			

we also count how many values in differences that are negative, and we store it in negative_diffs.

				
					negative_diffs = np.sum(differences < 0)
				
			

Here we use the min() method to check the smaller of the two counts.

				
					test_statistic = min(positive_diffs, negative_diffs)
				
			
				
					print(test_statistic)
				
			

Next, we calculate the number of non-zero differences by adding positive_diffs and negative_diffs.

				
					n = positive_diffs + negative_diffs
				
			

Here, we perform binomial test using:

test_statistic: number of times the less common chnage occured.

n: total number of changes

p=0.5: expected probability under the null hypothesis.

alternative=’teo-sided’: tests if the number of changes in either direction is significantly different from 50%.

				
					results = binomtest(test_statistic, n, p=0.5, alternative='two-sided')
				
			
				
					p_value = results.pvalue
				
			
				
					print(p_value)
				
			
				
					if p_value < alpha:
  print("The median of the differences is not zero (there is a difference).")
else:
  print("The median of the differences between the paired observations is zero (no difference)..")
				
			

Example 2 one tail, zero value, shapiro, ordinal data,

We define a numpy array as our before.

				
					before = np.array([3, 4, 2, 4, 3, 4, 5, 3, 4, 2])
				
			

We also define a numpy array as our after.

				
					after = np.array([2, 4, 3, 5, 3, 5, 5, 4, 5, 3])
				
			

Then we perform element-wise subtraction.

				
					differences = before - after
				
			
				
					print(differences)
				
			

Next we perform the Shapiro-Wilk test on the differences to check if they follow a normal distribution

				
					shapiro_stat, shapiro_p_value = shapiro(differences)
				
			
				
					print(shapiro_p_value)
				
			

Next, we count how many values in differences are positive.

				
					positive_diffs = np.sum(differences > 0)
				
			

Here, we also count how many values in differences are negative.

				
					negative_diffs = np.sum(differences < 0)
				
			

Next, we set the test_statistic to the smaller of the two count.

				
					test_statistic = min(positive_diffs, negative_diffs)
				
			
				
					print(test_statistic)
				
			

Then we calculate the total number of non-zero changes by adding positive_diffs and negative_diffs.

				
					n = positive_diffs + negative_diffs
				
			
				
					print(n)
				
			

Next, we perform a one-sided binomial test to check if the number of less frequent changes (test_statistic) is significantly less than expected under the null hypothesis of equal probability p= (0.5)

				
					results = binomtest(test_statistic, n=n, p=0.5, alternative='less')
				
			
				
					p_value = results.pvalue
				
			
				
					print(p_value)
				
			
				
					if p_value < alpha:
  print("The median of the differences is not zero (there is a difference which is less).")
else:
  print("The median of the differences between the paired observations is zero (no difference)..")
				
			

Example statsmodel

Here, we import  statsmodels

statsmodel : This is a python library for statistical modeling.

sign_test: from statsmodels.stats.descriptivestats- performs the sign test for paired data.

				
					import statsmodels
from statsmodels.stats.descriptivestats import sign_test
				
			
				
					before = np.array([120, 130, 135, 140, 135, 150, 140, 130, 145, 138])
				
			
				
					after = np.array([118, 125, 130, 135, 133, 152, 137, 132, 146, 139])
				
			

The sign_test is a non-parametric test used to determine if there’s a consistent difference between two paired samples.

in this case, the before vs after measurements.

				
					sign_test(before, after)
				
			

Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF.

Leave a Reply

Your email address will not be published. Required fields are marked *