KS-Disc

What?

KS-Disc is a Python library for the discrete version of the Kolmogrov-Smirnov test. For reasons hidden from us lowly mortals it is not included in SciPy. So I coded together a simple version of it.

How?

Step One

It runs on Python 3, and is installed using pip:

pip install ksdisc

What can it do?

It can perform a 1-sample test, in which a sample is compared against an analytical function. It can also perform a 2-sample test, in which two samples are compared against each other using a permutation test.

1-Sample Test

The 1-sample test takes an observed sample and a CDF. It then returns the p-value of the sample being from the given distribution.

ks_disc(y, cdf)

y is an n-length array, where each element is a sampled number drawn from a distribution.

cdf is a function that takes a number as an input. Note that it must be an increasing function in the span zero to one.

Example
from ksdisc import ks_disc
from random import randint

# 1-sample test
y = [randint(1, 3) for _ in range(20)]  # Uniform in [1, 3]
_cdf = lambda x: 0.0 if x < 0 else min(0.25*x, 1.0) # Uniform in [1, 4]

out = ks_disc(y, _cdf)

2-Sample Test

The 2-sample test takes two observed samples. It then returns the p-value of the samples being from the same distribution.

ks_disc_2sample(samples1, samples2)

samples1 is an n-length array, where each element is a sampled number drawn from a distribution.

samples2 is an m-length array, where each element is a sampled number drawn from a distribution.

Example
from ksdisc import ks_disc_2sample
from random import randint, random

# 2-sample test
samples1 = [randint(1, 15) for _ in range(1000)]
samples2 = [randint(1, 15) if random()<0.95 else 3 for _ in range(1000)]

out = ks_disc_2sample(samples1, samples2)