The abstract Combiner class is used to build combiners for the {@link Job}.
Those Combiners are distributed inside of the cluster and are running alongside the {@link Mapper} implementations in the same node.
Combiners are called in a threadsafe way so internal locking is not required.
Combiners are normally used to build intermediate results on the mapping nodes to lower the traffic overhead between the different nodes before the reducing phase.
Combiners need to be capable of combining data in multiple chunks to create a more streaming like internal behavior.
A simple Combiner implementation in combination with a {@link Reducer} could looklike that avg-function implementation:
public class AvgCombiner implements Combiner<String, Integer, Tuple<Long, Long>> { private long count; private long amount; public void combine(String key, Integer value) { count++; amount += value; } public Tuple<Long, Long> finalizeChunk() { Tuple<Long, Long> tuple = new Tuple<>(count, amount); count = 0; amount = 0; return tuple; } } public class SumReducer implements Reducer<String, Tuple<Long, Long>, Integer> { private long count; private long amount; public void reduce( String key, Tuple<Long, Long> value ) { count += value.getFirst(); amount += value.getSecond(); } public Integer finalizeReduce() { return amount / count; } }@param < KeyIn> key type of the resulting keys @param < ValueIn> value type of the incoming values @param < ValueOut> value type of the reduced values @since 3.2
|
|
|
|
|
|