This tokenizer needs a statistical model to tokenize a text which reproduces the tokenization observed in the training data used to create the model. The {@link TokenizerModel} class encapsulates the model and providesmethods to create it from the binary representation.
A tokenizer instance is not thread safe. For each thread one tokenizer must be instantiated which can share one TokenizerModel
instance to safe memory.
To train a new model { {@link #train(String,ObjectStream,boolean,TrainingParameters)} methodcan be used.
Sample usage:
InputStream modelIn;
@see Tokenizer
@see TokenizerModel
@see TokenSample
...
TokenizerModel model = TokenizerModel(modelIn);
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize("A sentence to be tokenized.");
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|