IlluminaBasecallsToSam transforms a lane of Illumina data file formats (bcl, locs, clocs, qseqs, etc.) into SAM or BAM file format.
In this application, barcode data is read from Illumina data file groups, each of which is associated with a tile. Each tile may contain data for any number of barcodes, and a single barcode's data may span multiple tiles. Once the barcode data is collected from files, each barcode's data is written to its own SAM/BAM. The barcode data must be written in order; this means that barcode data from each tile is sorted before it is written to file, and that if a barcode's data does span multiple tiles, data collected from each tile must be written in the order of the tiles themselves.
This class employs a number of private subclasses to achieve this goal. The TileReadAggregator controls the flow of operation. It is fed a number of Tiles which it uses to spawn TileReaders. TileReaders are responsible for reading Illumina data for their respective tiles from disk, and as they collect that data, it is fed back into the TileReadAggregator. When a TileReader completes a tile, it notifies the TileReadAggregator, which reviews what was read and conditionally queues its writing to disk, baring in mind the requirements of write-order described in the previous paragraph. As writes complete, the TileReadAggregator re-evalutes the state of reads/writes and may queue more writes. When all barcodes for all tiles have been written, the TileReadAggregator shuts down.
The TileReadAggregator controls task execution using a specialized ThreadPoolExecutor. It accepts special Runnables of type PriorityRunnable which allow a priority to be assigned to the runnable. When the ThreadPoolExecutor is assigning threads, it gives priority to those PriorityRunnables with higher priority values. In this application, TileReaders are assigned lowest priority, and write tasks are assigned high priority. It is designed in this fashion to minimize the amount of time data must remain in memory (write the data as soon as possible, then discard it from memory) while maximizing CPU usage.
@author jburke@broadinstitute.org
@author mccowan@broadinstitute.org