InputFormat
describes the input-specification for a Map-Reduce job. The Map-Reduce framework relies on the InputFormat
of the job to:
InputSplit
for processing by the {@link Mapper}. The default behavior of file-based {@link InputFormat}s, typically sub-classes of {@link FileInputFormat}, is to split the input into logical {@link InputSplit}s based on the total size, in bytes, of the input files. However, the {@link FileSystem} blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size.
Clearly, logical splits based on input-size is insufficient for many applications since record boundaries are to respected. In such cases, the application has to also implement a {@link RecordReader} on whom lies theresponsibilty to respect record-boundaries and present a record-oriented view of the logical InputSplit
to the individual task.
@see InputSplit
@see RecordReader
@see JobClient
@see FileInputFormat
InputFormat
describes the input-specification for a Map-Reduce job. The Map-Reduce framework relies on the InputFormat
of the job to:
InputSplit
for processing by the {@link Mapper}. The default behavior of file-based {@link InputFormat}s, typically sub-classes of {@link FileInputFormat}, is to split the input into logical {@link InputSplit}s based on the total size, in bytes, of the input files. However, the {@link FileSystem} blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size.
Clearly, logical splits based on input-size is insufficient for many applications since record boundaries are to respected. In such cases, the application has to also implement a {@link RecordReader} on whom lies theresponsibility to respect record-boundaries and present a record-oriented view of the logical InputSplit
to the individual task.
@see InputSplit
@see RecordReader
@see FileInputFormat
|
|