CollectionProcessingManager
(CPM) manages the application of an {@link AnalysisEngine} to a collection of artifacts. For text analysis applications, this will bea collection of documents. The analysis results will then be delivered to one ore more {@link CasConsumer}s. The CPM is configured with an Analysis Engine and CAS Consumers by calling its {@link #setAnalysisEngine(AnalysisEngine)} and {@link #addCasConsumer(CasConsumer)} methods.Collection processing is then initiated by calling the {@link #process(CollectionReader)} or{@link #process(CollectionReader,int)} methods.
The process
methods take a {@link CollectionReader} object as an argument. TheCollection Reader retrieves each artifact from the collection as a {@link org.apache.uima.cas.CAS} object.
Listeners can register with the CPM by calling the {@link #addStatusCallbackListener(StatusCallbackListener)} method. These listeners receive statuscallbacks during the processing. At any time, performance and progress reports are available from the {@link #getPerformanceReport()} and {@link #getProgress()} methods.
A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a CPM or start a new processing job while a previous processing job is occurring will result in a {@link org.apache.uima.UIMA_IllegalStateException}. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPM.
A CollectionProcessingManager
instance can be obtained by calling {@link org.apache.uima.UIMAFramework#newCollectionProcessingManager()}.
|
|
|
|