MG4J returns for each query and each document a list of minimal intervals satisfying the query. Due to overlaps and long intervals, this list is not always the best way to show the result of a query to the user. Instances of this class select intervals using two parameters (maximum number of intervals and maximum interval length) and the following algorithm: intervals enqueued in a queue ordered by length; then, they are extracted from the queue and added greedily to the result set as long as they do not overlap any other interval already in the result set, they are not longer than the maximum length, and the result set contains less intervals than the maximum allowed.
If all intervals are longer than the maximum allowed length, then from the shorter interval we extract two new intervals long as half of the maximum allowed length and sharing the left and right extreme, respectively, with the original interval.
Warning: implementations of this class are not required to be thread-safe, but they provide {@linkplain it.unimi.dsi.lang.FlyweightPrototype flyweight copies}(actually, just copies, as no internal state is shared, but we implement the interface for consistency with the rest of the components used by a {@link it.unimi.dsi.mg4j.query.QueryEngine}). The {@link #copy()} method is strengthened so to return an object implementing this interface.
|
|