A class used to represent a set of many, potentially large, values (e.g. many long strings such as URLs), using a significantly smaller amount of memory.
The set is "lossy" in that it cannot definitively state that is does contain a value but it can definitively say if a value is not in the set. It can therefore be used as a Bloom Filter.
Another application of the set is that it can be used to perform fuzzy counting because it can estimate reasonably accurately how many unique values are contained in the set.This class is NOT threadsafe.
Internally a Bitset is used to record values and once a client has finished recording a stream of values the {@link #downsize(float)} method can be used to create a suitably smaller set thatis sized appropriately for the number of values recorded and desired saturation levels.
@lucene.experimental
|
|