{ (a, b, c), (null, null, null), (a, b, null), (a, null, c), (a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all".
Usage goes something like this:
{@code events = load '/logs/events' using EventLoader() as (lang, event, app_id); cubed = foreach x generate FLATTEN(piggybank.CubeDimensions(lang, event, app_id)) as (lang, event, app_id), measure; cube = foreach (group cubed by (lang, event, app_id) parallel $P) generate flatten(group) as (lang, event, app_id), COUNT_STAR(cubed), SUM(measure); store cube into 'event_cube';}
Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation.
|
|