Loading [MathJax]/jax/output/HTML-CSS/jax.js

Anderson-Darling k-Sample Test

Stefan Kloppenborg, Jeffrey Borlik

20-Jan-2019

This vignette explores the Anderson–Darling k-Sample test. CMH-17-1G [1] provides a formulation for this test that appears different than the formulation given by Scholz and Stephens in their 1987 paper [2].

Both references use different nomenclature, which is summarized as follows:

Term CMH-17-1G Scholz and Stephens
A sample i i
The number of samples k k
An observation within a sample j j
The number of observations within the sample i ni ni
The total number of observations within all samples n N
Distinct values in combined data, ordered z(1)z(L) Z1ZL
The number of distinct values in the combined data L L

Given the possibility of ties in the data, the discrete version of the test must be used Scholz and Stephens (1987) give the test statistic as:

A2akN=N1Nki=11niLj=1ljN(NMaijniBaj)2Baj(NBaj)Nlj/4

CMH-17-1G gives the test statistic as:

ADK=n1n2(k1)ki=11niLj=1hj(nFijniHj)2Hj(nHj)nhj/4

By inspection, the CMH-17-1G version of this test statistic contains an extra factor of 1(k1).

Scholz and Stephens indicate that one rejects H0 at a significance level of α when:

A2akN(k1)σNtk1(α)

This can be rearranged to give a critical value:

A2crit=(k1)+σNtk1(α)

CHM-17-1G gives the critical value for ADK for α=0.025 as:

ADC=1+σn(1.96+1.149k10.391k1)

The definition of σn from the two sources differs by a factor of (k1).

The value in parentheses in the CMH-17-1G critical value corresponds to the interpolation formula for tm(α) given in Scholz and Stephen’s paper. It should be noted that this is not the student’s t-distribution, but rather a distribution referred to as the Tm distribution.

The cmstatr package use the package kSamples to perform the k-sample Anderson–Darling tests. This package uses the original formulation from Scholz and Stephens, so the test statistic will differ from that given software based on the CMH-17-1G formulation by a factor of (k1).

For comparison, SciPy’s implementation also uses the original Scholz and Stephens formulation. The statistic that it returns, however, is the normalized statistic, [A2akN(k1)]/σN, rather than kSamples’s A2akN value. To be consistent, SciPy also returns the critical values tk1(α) directly. (Currently, SciPy also floors/caps the returned p-value at 0.1% / 25%.) The values of k and σN are available in cmstatr’s ad_ksample return value, if an exact comparison to Python SciPy is necessary.

The conclusions about the null hypothesis drawn, however, will be the same, whether R or CMH-17-1G or SciPy.

References

[1]
“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.
[2]
F. W. Scholz and M. A. Stephens, “K-Sample Anderson--Darling Tests,” Journal of the American Statistical Association, vol. 82, no. 399. pp. 918–924, Sep-1987.