An input stream that decompresses from the BZip2 format (without the file header chars) to be read as any other stream.
The decompression requires large amounts of memory. Thus you should call the {@link #close() close()} method as soon as possible, to forceCBZip2InputStream to release the allocated memory. See {@link CBZip2OutputStream CBZip2OutputStream} for information about memoryusage.
CBZip2InputStream reads bytes from the compressed source stream via the single byte {@link java.io.InputStream#read() read()} method exclusively.Thus you should consider to use a buffered source stream.
This Ant code was enhanced so that it can de-compress blocks of bzip2 data. Current position in the stream is an important statistic for Hadoop. For example in LineRecordReader, we solely depend on the current position in the stream to know about the progess. The notion of position becomes complicated for compressed files. The Hadoop splitting is done in terms of compressed file. But a compressed file deflates to a large amount of data. So we have handled this problem in the following way. On object creation time, we find the next block start delimiter. Once such a marker is found, the stream stops there (we discard any read compressed data in this process) and the position is updated (i.e. the caller of this class will find out the stream location). At this point we are ready for actual reading (i.e. decompression) of data. The subsequent read calls give out data. The position is updated when the caller of this class has read off the current block + 1 bytes. In between the block reading, position is not updated. (We can only update the postion on block boundaries).
Instances of this class are not threadsafe.