Examples of com.datasalt.pangool.tuplemr.TupleMRBuilder$Input

com.datasalt.pangool.tuplemr.TupleMRBuilder

TupleMRBuilder creates Tuple-based Map-Reduce jobs.

One of the key concepts of Tuple-based Map-Reduce is that Hadoop Key-Value pairs are no longer used.Instead,they are replaced by tuples.
Tuples(see {@link ITuple}) are just an ordered list of elements whose types are defined in a {@link Schema}.TupleMRBuilder contains several methods to define how grouping and sorting among tuples will be performed, avoiding the complex task of defining custom binary {@link SortComparator} ,{@link GroupComparator} and {@link TupleHashPartitioner} implementations.

A Tuple-based Map-Red job, in its simplest form, requires to define :

Intermediate schemas:
An schema specifies the name and types of a Tuple's fields. Several schemas can be defined in order to perform joins among different input data. It's mandatory to specify ,at least,one schema using {@link #addIntermediateSchema(Schema)}
Group-by fields:
Needed to specify how the tuples will be grouped. Several tuples with the same group-by fields will be groupped and reduced together in the Reduce phase.
Tuple-based Mapper:
The job needs to specify a {@link TupleMapper} instance,the Tuple-basedimplementation of Hadoop's {@link Mapper}. Unlike Hadoop's Mappers, Tuple-based mappers are configured using stateful serializable instances and not static class definitions.
Tuple-based Reducer: Similar to mapper instances,the job needs to specify a {@link TupleReducer} instance,the Tuple-based implementation ofHadoop's {@link Reducer}.

@see ITuple @see Schema @see TupleMapper @see TupleReducer


      Schema schema = new Schema("sch", Fields.parse(pangoolSchema));
      Path inputP = new Path(inputPath);


      // Use Pangool API - parse CSV, etc
      TupleMRBuilder builder = new TupleMRBuilder(conf);
      TupleTextInputFormat parsingInputFormat = new TupleTextInputFormat(schema, skipHeading, false,
          separator.charAt(0), quotes.charAt(0), escape.charAt(0), FieldSelector.NONE, null);
      TupleTextOutputFormat outputFormat = new TupleTextOutputFormat(schema, false, separator.charAt(0),
          quotes.charAt(0), escape.charAt(0));


      builder.addIntermediateSchema(schema);
      builder.addInput(inputP, parsingInputFormat, new IdentityTupleMapper());
      builder.setGroupByFields(groupBy);
      builder.setOutput(outP, outputFormat, ITuple.class, NullWritable.class);
      builder.setTupleReducer(new IdentityTupleReducer());
      builder.setJarByClass(this.getClass());
      
      builder.createJob().waitForCompletion(true);
    }


    return 1;
  }

Examples of com.datasalt.pangool.tuplemr.TupleMRBuilder$Input

Related Classes of com.datasalt.pangool.tuplemr.TupleMRBuilder$Input