An implementation of a page signature. It calculates an MD5 hash of a plain text "profile" of a page. In case there is no text, it calculates a hash using the {@link MD5Signature}.
The algorithm to calculate a page "profile" takes the plain text version of a page and performs the following steps:
QUANT = QUANT_RATE * maxFreq
, where QUANT_RATE
is 0.01f by default, and maxFreq
is the maximum token frequency). If maxFreq
is higher than 1, then QUANT is always higher than 2 (which means that tokens with frequency 1 are always discarded).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|