An inclusive filter will accept only those tokens with which it was initialized. For an example, an inclusive filter initialized with all of the words in the english dictionary would exclude all misspellings or foreign words in a token stream.
An exclusive filter will aceept only those tokens that are not in set with which it was initialized. An exclusive filter is often used with a list of common words that should be excluded, which is also known as a "stop list."
{@code TokenFilter} instances may be combined into a linear chain of filters.This allows for a highly configurable filter to be made from mulitple rules. Chained filters are created in a linear order and each filter must accept the token for the last filter to return {@code}. If the any of the earlier filters return {@code false}, then the token is not accepted.
This class also provides a static utility function {@link #loadFromSpecification(String) loadFromSpecification} for initializing achain of filters from a text configuration. This is intended to facility command-line tools that want to provide easily configurable filters. An example configuration might look like: include=top-tokens.txt:test-words.txt,exclude=stop-words.txt @see FilteredIterator
|
|
|
|
|
|
|
|