A
CollectionProcessingEngine
(CPE) processes a collection of artifacts (for text analysis applications, this will be a collection of documents) and produces collection-level results.
A CPE consists of a {@link org.apache.uima.collection.CollectionReader}, zero or more {@link org.apache.uima.analysis_engine.AnalysisEngine}s and zero or more {@link org.apache.uima.collection.CasConsumer}s. The Collection Reader is responsible for reading artifacts from a collection and setting up the CAS. The AnalysisEngines analyze each CAS and the results are passed on to the CAS Consumers. CAS Consumers perform analysis over multiple CASes and generally produce collection-level results in some application-specific data structure.
Processing is started by calling the {@link #process()} method. Processing can be controlled viathe {@link #pause()}, {@link #resume()}, and {@link #stop()} methods.
Listeners can register with the CPE by calling the {@link #addStatusCallbackListener(StatusCallbackListener)} method. These listeners receive statuscallbacks during the processing. At any time, performance and progress reports are available from the {@link #getPerformanceReport()} and {@link #getProgress()} methods.
A CPE implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPE only supports processing one collection at a time. Attempting to start a new processing job while a previous processing job is running will result in an exception. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPE.
A CollectionProcessingEngine
instance can be obtained by calling {@link org.apache.uima.UIMAFramework#produceCollectionProcessingEngine(CpeDescription)}.