FunnelStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
FunnelStageTransformer
Overview
Transforms DataStage FUNNEL stages to SQL UNION ALL statements. Combines multiple inputs with identical schemas into a single output dataset.
Stage Type: FUNNEL (funnel operator)
Output: SQL UNION ALL
Capabilities
- Combines 2 or more input datasets
- UNION ALL (preserves duplicates)
- Column filtering via keep clause
- Preserves all rows from all inputs
- Assumes identical schemas across inputs
DataStage Stage Example
Input (.dsx format)
#### STAGE: CombineData
## Operator
funnel
## General options
[ident('CombineData'); jobmon_ident('CombineData')]
## Inputs
0< [] 'DataSource1:output.v'
1< [] 'DataSource2:output.v'
2< [] 'DataSource3:output.v'
3< [] 'DataSource4:output.v'
## Outputs
0> [modify (
keep
EVENT_DATE,PERIOD,REGION,SEGMENT,
PRODUCT_CODE,DEMAND_TYPE,PROJECT_NAME,PROCESS_NAME,
MULTIPLIER_VALUE,CLASSIFICATION,QUANTITY;
)] 'CombineData:output.v'
Input Schemas (All Identical)
[
{"name": "EVENT_DATE", "type": "date"},
{"name": "PERIOD", "type": "string"},
{"name": "REGION", "type": "string"},
{"name": "SEGMENT", "type": "string"},
{"name": "PRODUCT_CODE", "type": "string"},
{"name": "DEMAND_TYPE", "type": "string"},
{"name": "PROJECT_NAME", "type": "string"},
{"name": "PROCESS_NAME", "type": "string"},
{"name": "MULTIPLIER_VALUE", "type": "number"},
{"name": "CLASSIFICATION", "type": "string"},
{"name": "QUANTITY", "type": "number"}
]Generated SQL Output
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource1_V0S23
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource2_V14S0
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource3_V15S0
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource4_V19S0Output Schema
[
{"name": "EVENT_DATE", "type": "date"},
{"name": "PERIOD", "type": "string"},
{"name": "REGION", "type": "string"},
{"name": "SEGMENT", "type": "string"},
{"name": "PRODUCT_CODE", "type": "string"},
{"name": "DEMAND_TYPE", "type": "string"},
{"name": "PROJECT_NAME", "type": "string"},
{"name": "PROCESS_NAME", "type": "string"},
{"name": "MULTIPLIER_VALUE", "type": "number"},
{"name": "CLASSIFICATION", "type": "string"},
{"name": "QUANTITY", "type": "number"}
]How It Works
1. Identify Inputs
Parses all input links:
## Inputs
0< [] 'DataSource1:output.v'
1< [] 'DataSource2:output.v'
2< [] 'DataSource3:output.v'
2. Extract Keep Clause
Reads column list from modify section:
keep
EVENT_DATE,PERIOD,REGION,SEGMENT,
PRODUCT_CODE,QUANTITY;
3. Generate UNION ALL
Creates SELECT for each input and combines with UNION ALL:
SELECT <columns> FROM input1
UNION ALL
SELECT <columns> FROM input2
UNION ALL
SELECT <columns> FROM input34. Preserve Schema
Output schema matches first input (all inputs must have identical schemas).
UNION ALL vs UNION
The FUNNEL stage uses UNION ALL, which:
- Preserves all rows including duplicates
- Faster than UNION (no deduplication overhead)
- Order not guaranteed unless ORDER BY is added
If you need to remove duplicates, use RemoveDuplicatesTransformer after the Funnel stage.
Keep Clause
With Keep Clause
keep
EVENT_DATE,REGION,SEGMENT,QUANTITY;
Only specified columns appear in SELECT:
SELECT EVENT_DATE, REGION, SEGMENT, QUANTITY FROM input1
UNION ALL
SELECT EVENT_DATE, REGION, SEGMENT, QUANTITY FROM input2Without Keep Clause
## Outputs
0> [] 'CombineData:output.v'
All columns are selected:
SELECT * FROM input1
UNION ALL
SELECT * FROM input2Multiple Inputs Example
2 Inputs
## Inputs
0< [] 'Source1:output.v'
1< [] 'Source2:output.v'
5 Inputs
## Inputs
0< [] 'Source1:output.v'
1< [] 'Source2:output.v'
2< [] 'Source3:output.v'
3< [] 'Source4:output.v'
4< [] 'Source5:output.v'
Each input becomes a SELECT in the UNION ALL chain.
Limitations
- ⚠️ Identical schemas required - All inputs must have the same columns and types
- ⚠️ No ORDER BY - Output order is not guaranteed (use downstream SORT stage if needed)
- ⚠️ Minimum 2 inputs - Funnel requires at least 2 input datasets
- ⚠️ No deduplication - Uses UNION ALL (preserves duplicates); use RemoveDuplicatesTransformer if needed
Related Transformers
- RemoveDuplicatesTransformer - For removing duplicates after UNION ALL
- AggregatorStageTransformer - For grouping/aggregating combined data
- ModifyStageTransformer - For column filtering after combining
Updated 5 days ago
