file://...
will denote Lfs. Call {@link #setTemporaryDirectory(java.util.Map,String)} to use a different temporary file directory pathother than the current Hadoop default path. By default Cascading on Hadoop will assume any source or sink Tap using the {@code file://} URI schemeintends to read files from the local client filesystem (for example when using the {@code Lfs} Tap) where the Hadoopjob jar is started, Tap so will force any MapReduce jobs reading or writing to {@code file://} resources to run inHadoop "standalone mode" so that the file can be read. To change this behavior, {@link HfsProps#setLocalModeScheme(java.util.Map,String)} to set a different scheme value,or to "none" to disable entirely for the case the file to be read is available on every Hadoop processing node in the exact same path. Hfs can optionally combine multiple small files (or a series of small "blocks") into larger "splits". This reduces the number of resulting map tasks created by Hadoop and can improve application performance. This is enabled by calling {@link HfsProps#setUseCombinedInput(boolean)} to {@code true}. By default, merging or combining splits into large ones is disabled.
|
|