Examples of org.apache.hadoop.mapreduce.Reducer

org.apache.hadoop.mapreduce.Reducer
Reduces a set of intermediate values which share a key to a smaller set of values.
Reducer implementations can access the {@link Configuration} for the job via the {@link JobContext#getConfiguration()} method.
Reducer has 3 primary phases:
1. Shuffle
  
  The Reducer copies the sorted output from each {@link Mapper} using HTTP across the network.
2. Sort
  
  The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
  
  The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
  
  SecondarySort
  
  To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.The grouping comparator is specified via {@link Job#setGroupingComparatorClass(Class)}. The sort order is controlled by {@link Job#setSortComparatorClass(Class)}.
  For example, say that you want to find duplicate web pages and tag them all with the url of the "best" known example. You would set up the job like:
  - Map Input Key: url
  - Map Input Value: document
  - Map Output Key: document checksum, url pagerank
  - Map Output Value: url
  - Partitioner: by checksum
  - OutputKeyComparator: by checksum and then decreasing pagerank
  - OutputValueGroupingComparator: by checksum
3. Reduce
  
  In this phase the {@link #reduce(Object,Iterable,Context)}method is called for each <key, (collection of values)> in the sorted inputs.
  
  The output of the reduce task is typically written to a {@link RecordWriter} via {@link Context#write(Object,Object)}.
The output of the Reducer is not re-sorted.

Example:
```
 public class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Key key, Iterable values,  Context context) throws IOException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.collect(key, result); } } 
```
@see Mapper @see Partitioner

  public void testUseOfWritableRegisteredComparator() throws IOException {
    
    // this test should use the comparator registered inside TestWritable
    // to output the keys in reverse order
    MapReduceDriver<TestWritable,Text,TestWritable,Text,TestWritable,Text> driver 
      = MapReduceDriver.newMapReduceDriver(new Mapper(), new Reducer());
    
    driver.withInput(new TestWritable("A1"), new Text("A1"))
      .withInput(new TestWritable("A2"), new Text("A2"))
      .withInput(new TestWritable("A3"), new Text("A3"))
      .withKeyGroupingComparator(new TestWritable.SingleGroupComparator())

View Full Code Here


  @Test
  public void testClosedFormMapReduce() throws IOException {
    
    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver 
      = MapReduceDriver.newMapReduceDriver(new StatefulMapper(), new Reducer());
    
    mapReduceDriver.addInput(new LongWritable(1L), new Text("hello"));
    mapReduceDriver.addInput(new LongWritable(2L), new Text("schmo"));
    mapReduceDriver.withOutput(new Text("SomeKey"), new IntWritable(2));
    mapReduceDriver.runTest();

View Full Code Here

  /**
   * Tests that method context.getNumReduceTasks() correctly returns value from configuration
   * */
  @Test
  public void testContext_getNumReduceTasks() throws Exception {
    Reducer reducer = new TestReducer();
    ReduceDriver driver = ReduceDriver.newReduceDriver(reducer);
    driver.getConfiguration().setInt("mapred.reduce.tasks", 10);


    List<Text> values = new ArrayList();
    values.add(new Text("bb"));

View Full Code Here

  public void testUseOfWritableRegisteredComparator() throws IOException {


    // this test should use the comparator registered inside TestWritable
    // to output the keys in reverse order
    MapReduceDriver<TestWritable, Text, TestWritable, Text, TestWritable, Text> driver = MapReduceDriver
        .newMapReduceDriver(new Mapper(), new Reducer());


    driver
        .withInput(new TestWritable("A1"), new Text("A1"))
        .withInput(new TestWritable("A2"), new Text("A2"))
        .withInput(new TestWritable("A3"), new Text("A3"))

View Full Code Here


  @Test
  public void testClosedFormMapReduce() throws IOException {


    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver
      = MapReduceDriver.newMapReduceDriver(new StatefulMapper(), new Reducer());


    mapReduceDriver.addInput(new LongWritable(1L), new Text("hello"));
    mapReduceDriver.addInput(new LongWritable(2L), new Text("schmo"));
    mapReduceDriver.withOutput(new Text("SomeKey"), new IntWritable(2));
    mapReduceDriver.runTest();

View Full Code Here

    context.getConfiguration().setClassLoader(basicMapReduceContext.getProgram().getClassLoader());
    basicMapReduceContext.getMetricsCollectionService().startAndWait();


    try {
      String userReducer = context.getConfiguration().get(ATTR_REDUCER_CLASS);
      Reducer delegate = createReducerInstance(context.getConfiguration().getClassLoader(), userReducer);


      // injecting runtime components, like datasets, etc.
      try {
        Reflections.visit(delegate, TypeToken.of(delegate.getClass()),
                          new PropertyFieldSetter(basicMapReduceContext.getSpecification().getProperties()),
                          new MetricsFieldSetter(basicMapReduceContext.getMetrics()),
                          new DataSetFieldSetter(basicMapReduceContext));
      } catch (Throwable t) {
        LOG.error("Failed to inject fields to {}.", delegate.getClass(), t);
        throw Throwables.propagate(t);
      }


      LoggingContextAccessor.setLoggingContext(basicMapReduceContext.getLoggingContext());


      // this is a hook for periodic flushing of changes buffered by datasets (to avoid OOME)
      WrappedReducer.Context flushingContext = createAutoFlushingContext(context, basicMapReduceContext);


      if (delegate instanceof ProgramLifecycle) {
        try {
          ((ProgramLifecycle<BasicMapReduceContext>) delegate).initialize(basicMapReduceContext);
        } catch (Exception e) {
          LOG.error("Failed to initialize mapper with " + basicMapReduceContext.toString(), e);
          throw Throwables.propagate(e);
        }
      }


      delegate.run(flushingContext);
      // sleep to allow metrics to be written
      TimeUnit.SECONDS.sleep(2L);


      // transaction is not finished, but we want all operations to be dispatched (some could be buffered in
      // memory by tx agent

View Full Code Here

TOP

Related Classes of org.apache.hadoop.mapreduce.Reducer

co.cask.cdap.internal.app.runtime.batch.ReducerWrapper

org.apache.hadoop.mrunit.internal.mapreduce.MockReduceContextWrapperTest

org.apache.hadoop.mrunit.mapreduce.TestMapReduceDriver

org.apache.hadoop.mrunit.mapreduce.TestStatefulMapReduce

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of org.apache.hadoop.mapreduce.Reducer

Shuffle

Sort

SecondarySort

Reduce

Related Classes of org.apache.hadoop.mapreduce.Reducer