Reducer
implementations can access the {@link Configuration} for the job via the {@link JobContext#getConfiguration()} method.
Reducer
has 3 primary phases:
The Reducer
copies the sorted output from each {@link Mapper} using HTTP across the network.
The framework merge sorts Reducer
inputs by key
s (since different Mapper
s may have output the same key).
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.The grouping comparator is specified via {@link Job#setGroupingComparatorClass(Class)}. The sort order is controlled by {@link Job#setSortComparatorClass(Class)}.
For example, say that you want to find duplicate web pages and tag them all with the url of the "best" known example. You would set up the job like:In this phase the {@link #reduce(Object,Iterable,Context)}method is called for each <key, (collection of values)>
in the sorted inputs.
The output of the reduce task is typically written to a {@link RecordWriter} via {@link Context#write(Object,Object)}.
The output of the Reducer
is not re-sorted.
Example:
@see Mapper @see Partitionerpublic class IntSumReducerextends Reducer { private IntWritable result = new IntWritable(); public void reduce(Key key, Iterable values, Context context) throws IOException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.collect(key, result); } }
|
|
|
|
|
|
|
|
|
|
|
|