Class Unique {@link SubAssembly} is used to filter all duplicates out of a tuple stream.
Typically finding unique value in a tuple stream relies on a {@link GroupBy} and a {@link FirstNBuffer}{@link cascading.operation.Buffer} operation.
If the {@code include} value is set to {@link Include#NO_NULLS}, any tuple consisting of only {@code null}values will be removed from the stream.
This SubAssembly uses the {@link FilterPartialDuplicates} {@link cascading.operation.Filter}to remove as many observed duplicates before the GroupBy operator to reduce IO over the network.
This strategy is similar to using {@code combiners}, except no sorting or serialization is invoked and results in a much simpler mechanism.
The {@code threshold} value tells the underlying FilterPartialDuplicates how many values to cache for duplicatecomparison before dropping values from the LRU cache.