Examples of cascading.pipe.Checkpoint

cascading.pipe.Checkpoint

The Checkpoint pipe, if supported by the current planner, will force data to be persisted at the point in the tuple stream an instance of Checkpoint is inserted into the pipe assembly.

If a checkpoint {@link cascading.tap.Tap} is added to the {@link cascading.flow.FlowDef} via the{@link cascading.flow.FlowDef#addCheckpoint(Checkpoint,cascading.tap.Tap)} method, that Tap instancewill be used to capture the intermediate result sets.

It is required that any Scheme used as a checkpoint must source {@link cascading.tuple.Fields#UNKNOWN} andsink {@link cascading.tuple.Fields#ALL}.

If used with a {@link cascading.scheme.hadoop.TextDelimited} {@link cascading.scheme.Scheme} class andthe {@code hasHeader} value is {@code true}, a header with the resolved field names will be written to the file.

This is especially useful for debugging complex flows.

For the {@link cascading.flow.hadoop.HadoopFlowConnector} and Hadoop platform, a Checkpoint will force a newMapReduce job ( {@link cascading.flow.hadoop.HadoopFlowStep} into the {@link cascading.flow.Flow} plan.

This can be important when used in conjunction with a {@link HashJoin} where all the operations upstreamfrom the HashJoin significantly filter out data allowing it to fit in memory.

  public MulitStepFlowGraph()
    {
    Pipe lower = new Pipe( "lower" );
    Pipe upper = new Pipe( "upper" );


    lower = new Checkpoint( lower );
    upper = new Checkpoint( upper );


    lower = new Checkpoint( lower );
    upper = new Checkpoint( upper );


    Pipe sink = new Merge( "sink", lower, upper );


    Map<String, Tap> sources = createHashMap();

    Function splitter = new RegexSplitter( new Fields( "num", "char" ), " " );


    Pipe pipeLower = new Each( new Pipe( "lower" ), new Fields( "line" ), splitter );
    Pipe pipeUpper = new Each( new Pipe( "upper" ), new Fields( "line" ), splitter );


    pipeUpper = new Checkpoint( pipeUpper );


    Pipe splice = new HashJoin( pipeLower, new Fields( "num" ), pipeUpper, new Fields( "num" ), Fields.size( 4 ) );


    Map<Object, Object> properties = getProperties();
    FlowConnectorProps.setCheckpointTapDecoratorClass( properties, DistCacheTap.class.getName() );

Examples of cascading.pipe.Checkpoint

Related Classes of cascading.pipe.Checkpoint