This is the mainstream {@link Tokenizer}. It implements the {@link Tokenizer}interface in a straightforward approach without too specialized parse optimizations.
Beside the {@link Tokenizer} interface, the class StandardTokenizer
provides some basic features for cascading (nested) tokenizers. Consider the usual HTML pages found today in the WWW. Most of them are a mixture of regular HTML, cascading style sheets (CSS) and embedded JavaScript. These different languages use different syntaxes, so one needs varous tokenizers on the same input stream.
This {@link Tokenizer} implementation is not synchronized. Take care when usingwith multible threads.
@see Tokenizer @see TokenizerProperties @author Heiko BlauThis should be a good tokenizer for most European-language documents:
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
|
|
|
|
|
|