This is a pipeline that takes in a string and returns various analyzed linguistic forms. The String is tokenized via a tokenizer (using a TokenizerAnnotator), and then other sequence model style annotation can be used to add things like lemmas, POS tags, and named entities. These are returned as a list of CoreLabels. Other analysis components build and store parse trees, dependency graphs, etc.
This class is designed to apply multiple Annotators to an Annotation. The idea is that you first build up the pipeline by adding Annotators, and then you take the objects you wish to annotate and pass them in and get in return a fully annotated object. At the command-line level you can, e.g., tokenize text with StanfordCoreNLP with a command like:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
Please see the package level javadoc for sample usage and a more complete description.
The main entry point for the API is StanfordCoreNLP.process() .
Implementation note: There are other annotation pipelines, but they don't extend this one. Look for classes that implement Annotator and which have "Pipeline" in their name.
@author Jenny Finkel
@author Anna Rafferty
@author Christopher Manning
@author Mihai Surdeanu
@author Steven Bethard