A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
You must specify the required {@link Version} compatibility when creating{@link LetterTokenizer}:
As of 3.1, {@link CharTokenizer} uses an int based API to normalize anddetect token characters. See {@link CharTokenizer#isTokenChar(int)} and{@link CharTokenizer#normalize(int)} for details.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.