The Checkpoint pipe, if supported by the current planner, will force data to be persisted at the point in the tuple stream an instance of Checkpoint is inserted into the pipe assembly.
If a checkpoint {@link cascading.tap.Tap} is added to the {@link cascading.flow.FlowDef} via the{@link cascading.flow.FlowDef#addCheckpoint(Checkpoint,cascading.tap.Tap)} method, that Tap instancewill be used to capture the intermediate result sets.
It is required that any Scheme used as a checkpoint must source {@link cascading.tuple.Fields#UNKNOWN} andsink {@link cascading.tuple.Fields#ALL}.
If used with a {@link cascading.scheme.hadoop.TextDelimited} {@link cascading.scheme.Scheme} class andthe {@code hasHeader} value is {@code true}, a header with the resolved field names will be written to the file.
This is especially useful for debugging complex flows.
For the {@link cascading.flow.hadoop.HadoopFlowConnector} and Hadoop platform, a Checkpoint will force a newMapReduce job ( {@link cascading.flow.hadoop.HadoopFlowStep} into the {@link cascading.flow.Flow} plan.
This can be important when used in conjunction with a {@link HashJoin} where all the operations upstreamfrom the HashJoin significantly filter out data allowing it to fit in memory.