This implementation of IEmitter collects filenames from an Amazon Kinesis stream that has been started by a S3ManifestEmitter. The RedshiftManifestEmitter processes the list of Amazon S3 file names, generates a manifest file and performs an Amazon Redshift copy. The Amazon Redshift copy is done using transactions to prevent duplication of objects in Amazon Redshift.
It follows the following procedure:
- Write manifest file to Amazon S3
- Begin Amazon Redshift transaction
- If any files already exist in Amazon Redshift, return and checkpoint (this transaction already completed successfully so no need to duplicate)
- Write file names to Amazon Redshift file table
- Call Amazon Redshift copy
- Commit Amazon Redshift Transaction
This class requires the configuration of an Amazon S3 bucket and endpoint, as well as the following Amazon Redshift items:
- Amazon Redshift URL
- username and password
- data table and key column (data table stores items from the manifest copy)
- file table and key column (file table is used to store file names to prevent duplicate entries)
- mandatory flag for Amazon Redshift copy
- the delimiter used for string parsing when inserting entries into Amazon Redshift
NOTE: Amazon S3 bucket and Amazon Redshift table must be in the same region for Manifest Copy.