AggregatorStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
AggregatorStageTransformer
Overview
Transforms DataStage AGGREGATOR stages to SQL GROUP BY statements. Performs grouping and aggregation operations (COUNT, SUM, AVG, MAX, MIN).
Stage Type: AGGREGATOR (group operator)
Output: SQL SELECT with GROUP BY
Capabilities
- Single and multiple grouping keys
- COUNT(*) aggregation
- SUM, AVG, MAX, MIN aggregations
- Automatic output schema generation
- Supports composite GROUP BY clauses
DataStage Stage Example
Input (.dsx format)
#### STAGE: AggregateData
## Operator
group
## Operator options
-hash
-key 'CUSTOMER_ID'
-key 'PRODUCT_ID'
-countField 'SALES_COUNT'
-sumField 'SALE_AMOUNT' 'TOTAL_SALES'
-avgField 'SALE_AMOUNT' 'AVG_SALES'
## General options
[ident('AggregateData'); jobmon_ident('AggregateData')]
## Inputs
0< [] 'SalesData:input.v'
## Outputs
0> [] 'AggregateData:Output.v'
Input Schema
[
{"name": "CUSTOMER_ID", "type": "string"},
{"name": "PRODUCT_ID", "type": "string"},
{"name": "SALE_AMOUNT", "type": "number"},
{"name": "SALE_DATE", "type": "date"}
]Generated SQL Output
SELECT
CUSTOMER_ID,
PRODUCT_ID,
COUNT(*) AS SALES_COUNT,
SUM(SALE_AMOUNT) AS TOTAL_SALES,
AVG(SALE_AMOUNT) AS AVG_SALES
FROM salesdata_V0S8
GROUP BY CUSTOMER_ID, PRODUCT_IDOutput Schema
[
{"name": "CUSTOMER_ID", "type": "string"},
{"name": "PRODUCT_ID", "type": "string"},
{"name": "SALES_COUNT", "type": "number"},
{"name": "TOTAL_SALES", "type": "number"},
{"name": "AVG_SALES", "type": "number"}
]How It Works
1. Identify Grouping Keys
Parses -key options to identify GROUP BY columns:
-key 'CUSTOMER_ID'
-key 'PRODUCT_ID'
Generates:
GROUP BY CUSTOMER_ID, PRODUCT_ID2. Extract Aggregation Functions
Parses aggregation options:
| Option | SQL | Example |
|---|---|---|
-countField 'ROW_COUNT' | COUNT(*) AS ROW_COUNT | Count rows per group |
-sumField 'AMOUNT' 'TOTAL' | SUM(AMOUNT) AS TOTAL | Sum of AMOUNT column |
-avgField 'AMOUNT' 'AVG_AMT' | AVG(AMOUNT) AS AVG_AMT | Average of AMOUNT |
-maxField 'VALUE' 'MAX_VAL' | MAX(VALUE) AS MAX_VAL | Maximum VALUE |
-minField 'VALUE' 'MIN_VAL' | MIN(VALUE) AS MIN_VAL | Minimum VALUE |
3. Build SELECT Clause
Combines grouping keys and aggregations:
SELECT
<grouping_key_1>,
<grouping_key_2>,
COUNT(*) AS <count_field>,
SUM(<input_col>) AS <sum_field>,
...
FROM input_table
GROUP BY <grouping_key_1>, <grouping_key_2>4. Infer Output Schema
- Grouping keys inherit types from input schema
- COUNT fields typed as INTEGER
- SUM/AVG fields typed as DECIMAL
- MAX/MIN fields inherit input column type
Aggregation Functions
COUNT
-countField 'ROW_COUNT'
Generates:
COUNT(*) AS ROW_COUNTCounts rows per group.
SUM
-sumField 'SALE_AMOUNT' 'TOTAL_SALES'
Generates:
SUM(SALE_AMOUNT) AS TOTAL_SALESSums values of SALE_AMOUNT column per group.
AVG
-avgField 'SALE_AMOUNT' 'AVG_SALES'
Generates:
AVG(SALE_AMOUNT) AS AVG_SALESCalculates average of SALE_AMOUNT per group.
MAX
-maxField 'SALE_DATE' 'LATEST_DATE'
Generates:
MAX(SALE_DATE) AS LATEST_DATEFinds maximum value per group.
MIN
-minField 'SALE_DATE' 'EARLIEST_DATE'
Generates:
MIN(SALE_DATE) AS EARLIEST_DATEFinds minimum value per group.
Single Key Example
Input
-key 'REGION_CODE'
-countField 'REGION_COUNT'
Generated SQL
SELECT
REGION_CODE,
COUNT(*) AS REGION_COUNT
FROM input_table
GROUP BY REGION_CODEComposite Key Example
Input
-key 'REGION_CODE'
-key 'CATEGORY_CODE'
-key 'PERIOD_CODE'
-countField 'ROW_COUNT'
Generated SQL
SELECT
REGION_CODE,
CATEGORY_CODE,
PERIOD_CODE,
COUNT(*) AS ROW_COUNT
FROM input_table
GROUP BY REGION_CODE, CATEGORY_CODE, PERIOD_CODEMultiple Aggregations
You can combine multiple aggregation functions:
Input
-key 'CUSTOMER_ID'
-countField 'ORDER_COUNT'
-sumField 'ORDER_VALUE' 'TOTAL_VALUE'
-avgField 'ORDER_VALUE' 'AVG_VALUE'
-maxField 'ORDER_DATE' 'LAST_ORDER_DATE'
-minField 'ORDER_DATE' 'FIRST_ORDER_DATE'
Generated SQL
SELECT
CUSTOMER_ID,
COUNT(*) AS ORDER_COUNT,
SUM(ORDER_VALUE) AS TOTAL_VALUE,
AVG(ORDER_VALUE) AS AVG_VALUE,
MAX(ORDER_DATE) AS LAST_ORDER_DATE,
MIN(ORDER_DATE) AS FIRST_ORDER_DATE
FROM input_table
GROUP BY CUSTOMER_IDLimitations
- ⚠️ No HAVING clause - Post-aggregation filters not supported (use downstream FILTER stage)
- ⚠️ No complex expressions - Aggregations only work on simple column references, not expressions
- ⚠️ No DISTINCT aggregations - COUNT(DISTINCT col) not supported
- ⚠️ Hash partitioning ignored -
-hashoption is DataStage-specific and not converted to SQL - ⚠️ No ROLLUP/CUBE - Advanced grouping sets not supported
Related Transformers
- TransformerStageTransformer - For post-aggregation column derivations
- ModifyStageTransformer - For post-aggregation column filtering
- RemoveDuplicatesTransformer - For simple deduplication without aggregation
Updated 5 days ago
