Provides percentile computation.
There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
- Let
n
be the length of the (sorted) array and 0 < p <= 100
be the desired percentile. - If
n = 1
return the unique array element (regardless of the value of p
); otherwise - Compute the estimated percentile position
pos = p * (n + 1) / 100
and the difference, d
between pos
and floor(pos)
(i.e. the fractional part of pos
). - If
pos < 1
return the smallest element in the array. - Else if
pos >= n
return the largest element in the array. - Else let
lower
be the element in position floor(pos)
in the array and let upper
be the next element in the array. Return lower + d * (upper - lower)
To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[])
is the one determined by {@link java.lang.Double#compareTo(Double)}. This ordering makes Double.NaN
larger than any other value (including Double.POSITIVE_INFINITY
). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN}
evaluates to 2.5.
Since percentile estimation usually involves interpolation between array elements, arrays containing NaN
or infinite values will often result in NaN
or infinite values returned.
Since 2.2, Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various {@code evaluate} methods. This greatly improves efficiency, both for a singlepercentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of the {@link #evaluate(double[],double)} or{@link #setData(double[])} methods and thereafter {@link #evaluate(double)}with just the percentile provided.
Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment()
or clear()
method, it must be synchronized externally.
@version $Id: Percentile.java 1244107 2012-02-14 16:17:55Z erans $