Provides percentile computation.
There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
- Let
n
be the length of the (sorted) array and 0 < p <= 100
be the desired percentile. - If
n = 1
return the unique array element (regardless of the value of p
); otherwise - Compute the estimated percentile position
pos = p * (n + 1) / 100
and the difference, d
between pos
and floor(pos)
(i.e. the fractional part of pos
). If pos >= n
return the largest element in the array; otherwise - Let
lower
be the element in position floor(pos)
in the array and let upper
be the next element in the array. Return lower + d * (upper - lower)
To compute percentiles, the data must be (totally) ordered. Input arrays are copied and then sorted using {@link java.util.Arrays#sort(double[])}. The ordering used by Arrays.sort(double[]
is the one determined by {@link java.lang.Double#compareTo(Double)}. This ordering makes Double.NaN
larger than any other value (including Double.POSITIVE_INFINITY
). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN}
evaluates to 2.5.
Since percentile estimation usually involves interpolation between array elements, arrays containing NaN
or infinite values will often result in NaN or infinite values returned. Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment()
or clear()
method, it must be synchronized externally.
This code is taken from Jakarta Commons - since we only need this routine, there's not much point in taking the entire library. It's unlikely that maths would change...
(C) Apache Software Foundation 2003-2004.