An Analysis Engine is a component responsible for analyzing unstructured information, discovering and representing semantic content. Unstructured information includes, but is not restricted to, text documents.
An AnalysisEngine operates on an "analysis structure" (implemented by {@link org.apache.uima.cas.CAS}). The CAS
contains the artifact to be processed as well as semantic information already inferred from that artifact. The AnalysisEngine analyzes this information and adds new information to the CAS
.
To create an instance of an Analysis Engine, an application should call {@link org.apache.uima.UIMAFramework#produceAnalysisEngine(ResourceSpecifier)}.
A typical application interacts with the Analysis Engine interface as follows:
- Call {@link #newCAS()} to create a new Common Analysis System appropriate for thisAnalysisEngine.
- Use the {@link CAS} interface to populate the
CAS
with the artifact to beanalyzed any information known about this document (e.g. the language of a text document). - Optionally, create a {@link org.apache.uima.analysis_engine.ResultSpecification} thatidentifies the results you would like this AnalysisEngine to generate (e.g. people, places, and dates), and call the {#link {@link #setResultSpecification(ResultSpecification)} method.
- Call {@link #process(CAS)} - the AnalysisEngine will perform its analysis.
- Retrieve the results from the {@link CAS}.
- Call {@link CAS#reset()} to clear out the
CAS
and prepare for processing anew artifact. - Repeat steps 2 through 6 for each artifact to be processed.
Important: It is highly recommended that you reuse CAS
objects rather than calling newCAS()
prior to each analysis. This is because CAS
objects may be expensive to create and may consume a significant amount of memory.
Instead of using the {@link CAS} interface, applications may wish to use the Java-object-based{@link JCas} interface. In that case, the call to newCAS
from step 1 above wouldbe replaced by {@link #newJCas()}, and the {@link #process(JCas)} method would be used.
Analysis Engine implementations may or may not be capable of simultaneously processing multiple documents in a multithreaded environment. See the documentation associated with the implementation or factory method (e.g. ( {@link org.apache.uima.UIMAFramework#produceAnalysisEngine(ResourceSpecifier)}) that you are using.