Typically finding the max value of a field in a tuple stream relies on a {@link cascading.pipe.GroupBy} and a{@link cascading.operation.aggregator.MaxValue} {@link cascading.operation.Aggregator} operation.
This SubAssembly also uses the {@link cascading.pipe.assembly.MaxBy.MaxPartials} {@link cascading.pipe.assembly.AggregateBy.Functor}to track the maximum value before the GroupBy operator to reduce IO over the network.
This strategy is similar to using {@code combiners}, except no sorting or serialization is invoked and results in a much simpler mechanism.
The {@code threshold} value tells the underlying MaxPartials functions how many unique key sums to accumulatein the LRU cache, before emitting the least recently used entry.
By default, either the value of {@link #AGGREGATE_BY_THRESHOLD} System property or {@link AggregateBy#DEFAULT_THRESHOLD}will be used. @see AggregateBy
|
|
|
|