A Scheme defines what is stored in a {@link Tap} instance by declaring the {@link Tuple}field names, and alternately parsing or rendering the incoming or outgoing {@link Tuple}stream, respectively.
A Scheme defines the type of resource data will be sourced from or sinked to.
The default sourceFields are {@link Fields#UNKNOWN} and the default sinkFields are {@link Fields#ALL}.
Any given sourceFields only label the values in the {@link Tuple}s as they are sourced. It does not necessarily filter the output since a given implementation may choose to collapse values and ignore keys depending on the format.
If the sinkFields are {@link Fields#ALL}, the Cascading planner will attempt to resolve the actual field names and make them available via the {@link cascading.scheme.SinkCall#getOutgoingEntry()} method. Sometimes this maynot be possible (in the case the {@link Tap#openForWrite(cascading.flow.FlowProcess)} method is called from usercode directly (without planner intervention).
If the sinkFields are a valid selector, the {@link #sink(cascading.flow.FlowProcess,SinkCall)} method willonly see the fields expected.
Setting the {@code numSinkParts} value to 1 (one) attempts to ensure the output resource has only one part.In the case of MapReduce, this is only a suggestion for the Map side, on the Reduce side it does this by setting the number of reducers to the given value. This may affect performance, so be cautioned. Note that setting numSinkParts does not force the planner to insert a final Reduce operation in the job, so numSinkParts may be ignored entirely if the final job is Map only. To force the Flow to have a final Reduce, add a {@link cascading.pipe.GroupBy} to the assembly before sinking.