Cube operator syntax
{@code} alias = CUBE rel BY CUBE | ROLLUP }(col_ref) [, { CUBE | ROLLUP }(col_ref) ...];} alias - output alias CUBE - operator rel - input relation BY - operator CUBE | ROLLUP - cube or rollup operation col_ref - column references or * or range in the schema referred by rel
The cube computation and rollup computation using UDFs {@link org.apache.pig.builtin.CubeDimensions} and{@link org.apache.pig.builtin.RollupDimensions} can be represented like below
{@code events = LOAD '/logs/events' USING EventLoader() AS (lang, event, app_id, event_id, total); eventcube = CUBE events BY CUBE(lang, event), ROLLUP(app_id, event_id); result = FOREACH eventcube GENERATE FLATTEN(group) as (lang, event), COUNT_STAR(cube), SUM(cube.total); STORE result INTO 'cuberesult';}In the above example, CUBE(lang, event) will generate all combinations of aggregations {(lang, event), (lang, ), ( , event), ( , )}. For n dimensions, 2^n combinations of aggregations will be generated. Similarly, ROLLUP(app_id, event_id) will generate aggregations from the most detailed to the most general (grandtotal) level in the hierarchical order like {(app_id, event_id), (app_id, ), ( , )}. For n dimensions, n+1 combinations of aggregations will be generated. The output of the above example query will have the following combinations of aggregations {(lang, event, app_id, event_id), (lang, , app_id, event_id), ( , event, app_id, event_id), ( , , app_id, event_id), (lang, event, app_id, ), (lang, , app_id, ), ( , event, app_id, ), ( , , app_id, ), (lang, event, , ), (lang, , , ), ( , event, , ), ( , , , )} Total number of combinations will be ( 2^n * (n+1) ) Since cube and rollup clause use null to represent "all" values of a dimension, if the dimension values contain null values it will be converted to "unknown" before computing cube or rollup.
|
|
|
|