LookupStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
LookupStageTransformer
Overview
Transforms DataStage LOOKUP stages to SQL multi-table JOIN statements. Handles multiple reference tables with single or composite join keys and optional column mappings (view clause).
Stage Type: LOOKUP (lookup operator)
Output: SQL SELECT with multiple JOINs
Capabilities
- Multiple reference table lookups (1 primary + N reference tables)
- Single and composite join keys
- View mappings for column renames (e.g.,
MATCH_KEY=REF_KEY) - LEFT JOIN and INNER JOIN support
- Column filtering via keep clause
- Automatic duplicate column renaming (adds
_refNsuffix)
DataStage Stage Example
Input (.dsx format)
#### STAGE: LookupData
## Operator
lookup
## Operator options
-table
-key ID_NUMBER
-ifNotFound continue
-table
-key CATEGORY_CODE
-ifNotFound continue
-table
-key REGION_CODE
-key STATUS_CODE
-ifNotFound continue
## General options
[ident('LookupData'); jobmon_ident('LookupData')]
## Inputs
0< [] 'PrimaryData:input.v'
1< [view (ID_NUMBER=MATCH_ID;)] 'RefTable1:ref1.v'
2< [] 'RefTable2:ref2.v'
3< [] 'RefTable3:ref3.v'
## Outputs
0> [modify (
keep
RECORD_ID,ID_NUMBER,STATUS_CODE,
CATEGORY_CODE,REGION_CODE,DATE_FIELD,
AMOUNT,REF1_VALUE,REF2_DESC,
REF3_NAME,REF3_CODE;
)] 'LookupData:Output.v'
Input Schemas
Primary Input (PrimaryData):
[
{"name": "RECORD_ID", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "REGION_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "AMOUNT", "type": "number"}
]Reference Input 1 (RefTable1):
[
{"name": "MATCH_ID", "type": "string"},
{"name": "REF1_VALUE", "type": "string"}
]Reference Input 2 (RefTable2):
[
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "REF2_DESC", "type": "string"}
]Reference Input 3 (RefTable3):
[
{"name": "REGION_CODE", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "REF3_NAME", "type": "string"},
{"name": "REF3_CODE", "type": "string"}
]Generated SQL Output
SELECT
a.RECORD_ID,
a.ID_NUMBER,
a.STATUS_CODE,
a.CATEGORY_CODE,
a.REGION_CODE,
a.DATE_FIELD,
a.AMOUNT,
ref1.REF1_VALUE,
ref2.REF2_DESC,
ref3.REF3_NAME,
ref3.REF3_CODE
FROM primarydata_V0S5 AS a
LEFT JOIN reftable1_V0S12 AS ref1
ON a.ID_NUMBER = ref1.MATCH_ID
LEFT JOIN reftable2_V0S18 AS ref2
ON a.CATEGORY_CODE = ref2.CATEGORY_CODE
LEFT JOIN reftable3_V0S24 AS ref3
ON a.REGION_CODE = ref3.REGION_CODE
AND a.STATUS_CODE = ref3.STATUS_CODEOutput Schema
[
{"name": "RECORD_ID", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "REGION_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "AMOUNT", "type": "number"},
{"name": "REF1_VALUE", "type": "string"},
{"name": "REF2_DESC", "type": "string"},
{"name": "REF3_NAME", "type": "string"},
{"name": "REF3_CODE", "type": "string"}
]How It Works
1. Identify Primary Input
The first input (index 0) is the primary dataset, aliased as a:
0< [] 'PrimaryData:input.v' ← Primary input
2. Parse Reference Tables
Each -table marker starts a new reference table configuration:
-table ← Reference 1
-key ID_NUMBER
-ifNotFound continue
-table ← Reference 2
-key CATEGORY_CODE
-ifNotFound continue
3. Extract Join Keys
Parses -key options for each reference table:
-key ID_NUMBER ← Single key
For composite keys:
-key REGION_CODE ← Composite key (multiple conditions)
-key STATUS_CODE
4. Apply View Mappings
View clause in input definition maps column names:
1< [view (ID_NUMBER=MATCH_ID;)] 'RefTable1:ref1.v'
Means: Join primary.ID_NUMBER = ref1.MATCH_ID
Without view mapping, column names must match exactly.
5. Determine Join Type
-ifNotFound continue→LEFT JOIN(NULL if no match)-ifNotFound fail→INNER JOIN(skip rows with no match)
6. Apply Keep Clause
Filters output columns based on keep list in modify section:
keep
RECORD_ID,ID_NUMBER,STATUS_CODE,
REF1_VALUE,REF2_DESC,REF3_NAME;
Only these columns appear in SELECT.
7. Handle Duplicate Columns
If both primary and reference have the same column name:
- First occurrence keeps original name
- Subsequent occurrences get
_refNsuffix
Example:
-- Both primary and ref3 have STATUS_CODE
a.STATUS_CODE, -- From primary
ref3.STATUS_CODE AS STATUS_CODE_ref3 -- From ref3 (renamed)View Mappings
View mappings allow column renames in join conditions:
Without View Mapping
1< [] 'RefTable:ref.v'
-key CUSTOMER_ID
Generates:
ON a.CUSTOMER_ID = ref1.CUSTOMER_IDWith View Mapping
1< [view (CUSTOMER_ID=CSP;)] 'RefTable:ref.v'
-key CUSTOMER_ID
Generates:
ON a.CUSTOMER_ID = ref1.CSPThe syntax is: primary_col=reference_col
Composite Keys
Multiple -key options for a single reference table create composite join conditions:
Input
-table
-key REGION_CODE
-key PERIOD_CODE
-ifNotFound continue
Generated SQL
LEFT JOIN reference_table AS ref1
ON a.REGION_CODE = ref1.REGION_CODE
AND a.PERIOD_CODE = ref1.PERIOD_CODEJoin Type Determination
LEFT JOIN (continue)
-ifNotFound continue
Returns all primary rows; NULL for unmatched reference columns.
INNER JOIN (fail)
-ifNotFound fail
Returns only rows with matches in all reference tables.
Multiple Reference Tables
The LOOKUP stage supports many reference tables (tested with 13+ references):
## Inputs
0< [] 'Primary:input.v' ← Primary
1< [] 'Ref1:ref1.v' ← Reference 1
2< [] 'Ref2:ref2.v' ← Reference 2
3< [] 'Ref3:ref3.v' ← Reference 3
...
N< [] 'RefN:refN.v' ← Reference N
Each reference gets its own -table section in operator options.
Limitations
- ⚠️ Order matters - Reference tables must appear in operator_options in the same order as input links
- ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)
- ⚠️ View mappings - Only supports simple column renames, not expressions
- ⚠️ Keep clause required - Without keep clause, all columns from all tables are included (can cause conflicts)
Related Transformers
- JoinStageTransformer - For joining two datasets with explicit join syntax
- ModifyStageTransformer - For post-lookup column filtering
- TransformerStageTransformer - For post-lookup column derivations
Updated 5 days ago
