RCFile
s, short of Record Columnar File, are flat files consisting of binary key/value pairs, which shares much similarity with SequenceFile
. RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part. When writing, RCFile.Writer first holds records' value bytes in memory, and determines a row split if the raw bytes size of buffered records overflow a given parameterWriter.columnsBufferSize, which can be set like: conf.setInt(COLUMNS_BUFFER_SIZE_CONF_STR, 4 * 1024 * 1024)
. RCFile
provides {@link org.apache.tajo.storage.v2.RCFile.Writer}, {@link org.apache.tajo.storage.v2.RCFile.Reader} and classes forwriting, reading respectively.
RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part.
RCFile compresses values in a more fine-grained manner then record level compression. However, It currently does not support compress the key part yet. The actual compression algorithm used to compress key and/or values can be specified by using the appropriate {@link org.apache.hadoop.io.compress.CompressionCodec}.
The {@link org.apache.tajo.storage.v2.RCFile.Reader} is used to read and explain the bytes of RCFile.
CompressionCodec
class which is used for compression of keys and/or values (if compression is enabled).
|
|
|
|