Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you perform one-tailed two-sample Kolmogorov–Smirnov Test in Python?

I'm trying to perform a two-sample KS Test in Python 3 to detect any significant difference between distributions. For the sake of convenience, letting a and b a data column of .csv I'd like to compare, I simply ran the following "code":

from scipy.stats import ks_2samp
ks_2samp(a, b)

The returning values contained the greatest distance (statistics) and the p-value (pvalue):

Ks_2sampResult(statistic=0.0329418537762845, pvalue=0.000127997328482532)

What I would like to know is, since ks_2samp only treats the two-sided two-sample KS Test, is there a way to perform a one-sided two-sample KS Test in Python?

In addition, how can I find out the position of where the greatest distance occurs? (The x-axis value).

like image 593
jstaxlin Avatar asked Oct 28 '25 00:10

jstaxlin


1 Answers

scipy.stats.ks_2samp already supports what you want. You just need to tell the direction in which you want to test, i.e. which sample is assumed greater or smaller than the other.

This option for setting alternative is however only available since scipy 1.3.0.

ks_2samp(a, b, alternative='less')     # get p-value for testing if a < b
ks_2samp(a, b, alternative='greater')  # get p-value for testing if a > b

Edit: To identify the x-value where the largest difference occurred, you can use this function (mainly copy-paste from the source of ks_2samp):

def ks_2samp_x(data1, data2, alternative="two-sided"):
    data1 = np.sort(data1)
    data2 = np.sort(data2)
    n1 = data1.shape[0]
    n2 = data2.shape[0]

    data_all = np.concatenate([data1, data2])
    # using searchsorted solves equal data problem
    cdf1 = np.searchsorted(data1, data_all, side='right') / n1
    cdf2 = np.searchsorted(data2, data_all, side='right') / n2
    cddiffs = cdf1 - cdf2
    minS = np.argmin(cddiffs)   # ks_2samp uses np.min or np.max respectively 
    maxS = np.argmax(cddiffs)   # now we get instead the index in data_all
    alt2Dvalue = {'less': minS, 'greater': maxS, 'two-sided': max(minS, maxS)}
    d_arg = alt2Dvalue[alternative]
    return data_all[d_arg]
like image 124
ascripter Avatar answered Oct 30 '25 16:10

ascripter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!