See the {@link FlowDef} class for a fluent way to define a new Flow.
Use the FlowConnector to link source and sink {@link Tap} instances with an assembly of {@link Pipe} instances intoan executable {@link cascading.flow.Flow}.
FlowConnector invokes a planner for the target execution environment.
For executing Flows in local memory against local files, see {@link cascading.flow.local.LocalFlowConnector}.
For Apache Hadoop, see the {@link cascading.flow.hadoop.HadoopFlowConnector}. Or if you have a pre-existing custom Hadoop job to execute, see {@link cascading.flow.hadoop.MapReduceFlow}, which doesn't require a planner.
Note that all {@code connect} methods take a single {@code tail} or an array of {@code tail} Pipe instances. "tail"refers to the last connected Pipe instances in a pipe-assembly. Pipe-assemblies are graphs of object with "heads" and "tails". From a given "tail", all connected heads can be found, but not the reverse. So "tails" must be supplied by the user.
The FlowConnector and the underlying execution framework (Hadoop or local mode) can be configured via a {@link Map} or {@link Properties} instance given to the constructor.
This properties map must be populated before constructing a FlowConnector instance. Many planner specific properties can be set through the {@link FlowConnectorProps} fluent interface.
Some planners have required properties. Hadoop expects {@link AppProps#setApplicationJarPath(java.util.Map,String)} or{@link AppProps#setApplicationJarClass(java.util.Map,Class)} to be set.
Any properties set and passed through the FlowConnector constructor will be global to all Flow instances created through the that FlowConnector instance. Some properties are on the {@link FlowDef} and would only be applicable to theresulting Flow instance.
These properties are used to influence the current planner and are also passed down to the execution framework to override any default values. For example when using the Hadoop planner, the number of reducers or mappers can be set by using platform specific properties.
Custom operations (Functions, Filter, etc) may also retrieve these property values at runtime through calls to {@link cascading.flow.FlowProcess#getProperty(String)} or {@link FlowProcess#getStringProperty(String)}.
Most applications will need to call {@link cascading.property.AppProps#setApplicationJarClass(java.util.Map,Class)} or{@link cascading.property.AppProps#setApplicationJarPath(java.util.Map,String)} so thatthe correct application jar file is passed through to all child processes. The Class or path must reference the custom application jar, not a Cascading library class or jar. The easiest thing to do is give setApplicationJarClass the Class with your static main function and let Cascading figure out which jar to use.
Note that Map
By default, all {@link cascading.operation.Assertion}s are planned into the resulting Flow instance. This can be changed for a given Flow by calling {@link FlowDef#setAssertionLevel(cascading.operation.AssertionLevel)} or globallyvia {@link FlowConnectorProps#setAssertionLevel(cascading.operation.AssertionLevel)}.
Also by default, all {@link cascading.operation.Debug}s are planned into the resulting Flow instance. This can be changed for a given flow by calling {@link FlowDef#setDebugLevel(cascading.operation.DebugLevel)} or globally via{@link FlowConnectorProps#setDebugLevel(cascading.operation.DebugLevel)}.
As of version 3.0, custom {@link cascading.flow.planner.rule.RuleRegistry} instances can be provided to customizea given planner. @see cascading.flow.local.LocalFlowConnector @see cascading.flow.hadoop.HadoopFlowConnector
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|