During analysis, two threads are used to communicate with the TreeTagger. One process writes tokens to the TreeTagger process, while the other receives the analyzed tokens.
For easy integration into application, this class takes any object containing token information and either uses its {@link Object#toString()} method oran {@link TokenAdapter} set using {@link #setAdapter(TokenAdapter)} to extractthe actual token. To receive the an analyzed token, set a custom {@link TokenHandler} using {@link #setHandler(TokenHandler)}.
Per default the TreeTagger executable is searched for in the directories indicated by the system property {@literal treetagger.home}, the environment variables {@literal TREETAGGER_HOME} and {@literal TAGDIR}in this order. A full path to a model file optionally appended by a {@literal :} and the model encoding is expected by the {@link #setModel(String)}method.
For additional flexibility, register a custom {@link ExecutableResolver}using {@link #setExecutableProvider(ExecutableResolver)} or a custom{@link ModelResolver} using {@link #setModelProvider(ModelResolver)}. Custom providers may extract models and executable from archives or download them from some location and temporarily or permanently install them in the file system. A custom model resolver may also be used to resolve a language code (e.g. {@literal en}) to a particular model.
A simple illustration of how to use this class:
TreeTaggerWrapper tt = new TreeTaggerWrapper@author Richard Eckart de Castilho @param < O> the token type.(); try { tt.setModel("/treetagger/models/english.par:iso8859-1"); tt.setHandler(new TokenHandler () { void token(String token, String pos, String lemma) { System.out.println(token+"\t"+pos+"\t"+lemma); } }); tt.process(asList(new String[] {"This", "is", "a", "test", "."})); } finally { tt.destroy(); }
|
|
|
|