A Tap represents the physical data source or sink in a connected {@link cascading.flow.Flow}. That is, a source Tap is the head end of a connected {@link Pipe} and {@link Tuple} stream, anda sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk, distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts out the complexity of connecting to these types of data sources.
A Tap takes a {@link Scheme} instance, which is used to identify the type of resource (text file, binary file, etc).A Tap is responsible for how the resource is reached.
By default when planning a Flow, Tap equality is a function of the {@link #getIdentifier()} and {@link #getScheme()}values. That is, two Tap instances are the same Tap instance if they sink/source the same resource and sink/source the same fields.
Some more advanced taps, like a database tap, may need to extend equality to include any filtering, like the {@code where} clause in a SQL statement so two taps reading from the same SQL table aren't considered equal.
Taps are also used to determine dependencies between two or more {@link Flow} instances when used with a{@link cascading.cascade.Cascade}. In that case the {@link #getFullIdentifier(Object)} value is used and the Schemeis ignored.