KEY CHANGES:
Instances of this class represent a set of character sequences (with false positives) using a Bloom filter. Because of the way Bloom filters work, you cannot remove elements.
Bloom filters have an expected error rate, depending on the number of hash functions used, on the filter size and on the number of elements in the filter. This implementation uses a variable optimal number of hash functions, depending on the expected number of elements. More precisely, a Bloom filter for n character sequences with d hash functions will use ln 2 dn ≈ 1.44 dn bits; false positives will happen with probability 2-d.
Hash functions are generated at creation time using universal hashing. Each hash function uses {@link #NUMBER_OF_WEIGHTS} random integers, which are cyclically multiplied by the character codes in a character sequence. The resulting integers are XOR-ed together.
This class exports access methods that are very similar to those of {@link java.util.Set}, but it does not implement that interface, as too many non-optional methods would be unimplementable (e.g., iterators). @author Sebastiano Vigna @contributor Gordon Mohr
|
|
|
|