There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
n
be the length of the (sorted) array and 0 < p <= 100
be the desired percentile. n = 1
return the unique array element (regardless of the value of p
); otherwise pos = p * (n + 1) / 100
and the difference, d
between pos
and floor(pos)
(i.e. the fractional part of pos
). If pos >= n
return the largest element in the array; otherwiselower
be the element in position floor(pos)
in the array and let upper
be the next element in the array. Return lower + d * (upper - lower)
To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[])
is the one determined by {@link java.lang.Double#compareTo(Double)}. This ordering makes Double.NaN
larger than any other value (including Double.POSITIVE_INFINITY
). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN}
evaluates to 2.5.
Since percentile estimation usually involves interpolation between array elements, arrays containing NaN
or infinite values will often result in NaN
or infinite values returned.
Since 2.2, Percentile implementation uses only selection instead of complete sorting and caches selection algorithm state between calls to the various {@code evaluate} methods when several percentiles are to be computed on the same data.This greatly improves efficiency, both for single percentile and multiple percentiles computations. However, it also induces a need to be sure the data at one call to {@code evaluate} is the same as the data with the cached algorithmstate from the previous calls. Percentile does this by checking the array reference itself and a checksum of its content by default. If the user already knows he calls {@code evaluate} on an immutable array, he can save the checking time by calling the{@code evaluate} methods that do not
Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment()
or clear()
method, it must be synchronized externally.
|
|
|
|
|
|
|
|
|
|