An object is added by sending it through a number of hash functions, each of which returns an index into the bit set. The bit at each of the indices is flipped on. We can query for an abject by sending it through the same hash functions. Then we look the bit at each index that was returned by a hash function. If any of the bits is unset, we know that the object is not in the Bloom filter (for otherwise all the bits should have already been set). If all the bits are set, we assume that the object is present in the Bloom filter.
We cannot know for sure that an object is in the bloom filter just because all its bits were set. There may be many collisions in the hash space, and all the bits for some object might be set by chance, rather than by adding that particular object.
The advantage of a Bloom filter is that its set representation can be stored in a significantly smaller space than information-theoretic lossless lower bounds. The price we pay for this is a certain amount of error in the query function. One nice feature of the Bloom filter is that its error is one-sided. This means that while the query function may return false positives (saying an object is present when it really isn't), it can never return false negatives (saying that an object is not present when it was already added.
|
|
|
|