FunnelStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

FunnelStageTransformer

Overview

Transforms DataStage FUNNEL stages to SQL UNION ALL statements. Combines multiple inputs with identical schemas into a single output dataset.

Stage Type: FUNNEL (funnel operator) Output: SQL UNION ALL

Capabilities

  • Combines 2 or more input datasets
  • UNION ALL (preserves duplicates)
  • Column filtering via keep clause
  • Preserves all rows from all inputs
  • Assumes identical schemas across inputs

DataStage Stage Example

Input (.dsx format)

#### STAGE: CombineData
## Operator
funnel
## General options
[ident('CombineData'); jobmon_ident('CombineData')]
## Inputs
0< [] 'DataSource1:output.v'
1< [] 'DataSource2:output.v'
2< [] 'DataSource3:output.v'
3< [] 'DataSource4:output.v'
## Outputs
0> [modify (
keep
  EVENT_DATE,PERIOD,REGION,SEGMENT,
  PRODUCT_CODE,DEMAND_TYPE,PROJECT_NAME,PROCESS_NAME,
  MULTIPLIER_VALUE,CLASSIFICATION,QUANTITY;
)] 'CombineData:output.v'

Input Schemas (All Identical)

[
  {"name": "EVENT_DATE", "type": "date"},
  {"name": "PERIOD", "type": "string"},
  {"name": "REGION", "type": "string"},
  {"name": "SEGMENT", "type": "string"},
  {"name": "PRODUCT_CODE", "type": "string"},
  {"name": "DEMAND_TYPE", "type": "string"},
  {"name": "PROJECT_NAME", "type": "string"},
  {"name": "PROCESS_NAME", "type": "string"},
  {"name": "MULTIPLIER_VALUE", "type": "number"},
  {"name": "CLASSIFICATION", "type": "string"},
  {"name": "QUANTITY", "type": "number"}
]

Generated SQL Output

SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource1_V0S23
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource2_V14S0
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource3_V15S0
UNION ALL
SELECT EVENT_DATE, PERIOD, REGION, SEGMENT, PRODUCT_CODE, DEMAND_TYPE, PROJECT_NAME, PROCESS_NAME, MULTIPLIER_VALUE, CLASSIFICATION, QUANTITY FROM datasource4_V19S0

Output Schema

[
  {"name": "EVENT_DATE", "type": "date"},
  {"name": "PERIOD", "type": "string"},
  {"name": "REGION", "type": "string"},
  {"name": "SEGMENT", "type": "string"},
  {"name": "PRODUCT_CODE", "type": "string"},
  {"name": "DEMAND_TYPE", "type": "string"},
  {"name": "PROJECT_NAME", "type": "string"},
  {"name": "PROCESS_NAME", "type": "string"},
  {"name": "MULTIPLIER_VALUE", "type": "number"},
  {"name": "CLASSIFICATION", "type": "string"},
  {"name": "QUANTITY", "type": "number"}
]

How It Works

1. Identify Inputs

Parses all input links:

## Inputs
0< [] 'DataSource1:output.v'
1< [] 'DataSource2:output.v'
2< [] 'DataSource3:output.v'

2. Extract Keep Clause

Reads column list from modify section:

keep
  EVENT_DATE,PERIOD,REGION,SEGMENT,
  PRODUCT_CODE,QUANTITY;

3. Generate UNION ALL

Creates SELECT for each input and combines with UNION ALL:

SELECT <columns> FROM input1
UNION ALL
SELECT <columns> FROM input2
UNION ALL
SELECT <columns> FROM input3

4. Preserve Schema

Output schema matches first input (all inputs must have identical schemas).

UNION ALL vs UNION

The FUNNEL stage uses UNION ALL, which:

  • Preserves all rows including duplicates
  • Faster than UNION (no deduplication overhead)
  • Order not guaranteed unless ORDER BY is added

If you need to remove duplicates, use RemoveDuplicatesTransformer after the Funnel stage.

Keep Clause

With Keep Clause

keep
  EVENT_DATE,REGION,SEGMENT,QUANTITY;

Only specified columns appear in SELECT:

SELECT EVENT_DATE, REGION, SEGMENT, QUANTITY FROM input1
UNION ALL
SELECT EVENT_DATE, REGION, SEGMENT, QUANTITY FROM input2

Without Keep Clause

## Outputs
0> [] 'CombineData:output.v'

All columns are selected:

SELECT * FROM input1
UNION ALL
SELECT * FROM input2

Multiple Inputs Example

2 Inputs

## Inputs
0< [] 'Source1:output.v'
1< [] 'Source2:output.v'

5 Inputs

## Inputs
0< [] 'Source1:output.v'
1< [] 'Source2:output.v'
2< [] 'Source3:output.v'
3< [] 'Source4:output.v'
4< [] 'Source5:output.v'

Each input becomes a SELECT in the UNION ALL chain.

Limitations

  • ⚠️ Identical schemas required - All inputs must have the same columns and types
  • ⚠️ No ORDER BY - Output order is not guaranteed (use downstream SORT stage if needed)
  • ⚠️ Minimum 2 inputs - Funnel requires at least 2 input datasets
  • ⚠️ No deduplication - Uses UNION ALL (preserves duplicates); use RemoveDuplicatesTransformer if needed

Related Transformers

  • RemoveDuplicatesTransformer - For removing duplicates after UNION ALL
  • AggregatorStageTransformer - For grouping/aggregating combined data
  • ModifyStageTransformer - For column filtering after combining