LookupStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

LookupStageTransformer

Overview

Transforms DataStage LOOKUP stages to SQL multi-table JOIN statements. Handles multiple reference tables with single or composite join keys and optional column mappings (view clause).

Stage Type: LOOKUP (lookup operator) Output: SQL SELECT with multiple JOINs

Capabilities

  • Multiple reference table lookups (1 primary + N reference tables)
  • Single and composite join keys
  • View mappings for column renames (e.g., MATCH_KEY=REF_KEY)
  • LEFT JOIN and INNER JOIN support
  • Column filtering via keep clause
  • Automatic duplicate column renaming (adds _refN suffix)

DataStage Stage Example

Input (.dsx format)

#### STAGE: LookupData
## Operator
lookup
## Operator options
-table
-key ID_NUMBER
-ifNotFound continue
-table
-key CATEGORY_CODE
-ifNotFound continue
-table
-key REGION_CODE
-key STATUS_CODE
-ifNotFound continue
## General options
[ident('LookupData'); jobmon_ident('LookupData')]
## Inputs
0< [] 'PrimaryData:input.v'
1< [view (ID_NUMBER=MATCH_ID;)] 'RefTable1:ref1.v'
2< [] 'RefTable2:ref2.v'
3< [] 'RefTable3:ref3.v'
## Outputs
0> [modify (
keep
  RECORD_ID,ID_NUMBER,STATUS_CODE,
  CATEGORY_CODE,REGION_CODE,DATE_FIELD,
  AMOUNT,REF1_VALUE,REF2_DESC,
  REF3_NAME,REF3_CODE;
)] 'LookupData:Output.v'

Input Schemas

Primary Input (PrimaryData):

[
  {"name": "RECORD_ID", "type": "string"},
  {"name": "ID_NUMBER", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "CATEGORY_CODE", "type": "string"},
  {"name": "REGION_CODE", "type": "string"},
  {"name": "DATE_FIELD", "type": "date"},
  {"name": "AMOUNT", "type": "number"}
]

Reference Input 1 (RefTable1):

[
  {"name": "MATCH_ID", "type": "string"},
  {"name": "REF1_VALUE", "type": "string"}
]

Reference Input 2 (RefTable2):

[
  {"name": "CATEGORY_CODE", "type": "string"},
  {"name": "REF2_DESC", "type": "string"}
]

Reference Input 3 (RefTable3):

[
  {"name": "REGION_CODE", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "REF3_NAME", "type": "string"},
  {"name": "REF3_CODE", "type": "string"}
]

Generated SQL Output

SELECT
  a.RECORD_ID,
  a.ID_NUMBER,
  a.STATUS_CODE,
  a.CATEGORY_CODE,
  a.REGION_CODE,
  a.DATE_FIELD,
  a.AMOUNT,
  ref1.REF1_VALUE,
  ref2.REF2_DESC,
  ref3.REF3_NAME,
  ref3.REF3_CODE
FROM primarydata_V0S5 AS a
LEFT JOIN reftable1_V0S12 AS ref1
  ON a.ID_NUMBER = ref1.MATCH_ID
LEFT JOIN reftable2_V0S18 AS ref2
  ON a.CATEGORY_CODE = ref2.CATEGORY_CODE
LEFT JOIN reftable3_V0S24 AS ref3
  ON a.REGION_CODE = ref3.REGION_CODE
  AND a.STATUS_CODE = ref3.STATUS_CODE

Output Schema

[
  {"name": "RECORD_ID", "type": "string"},
  {"name": "ID_NUMBER", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "CATEGORY_CODE", "type": "string"},
  {"name": "REGION_CODE", "type": "string"},
  {"name": "DATE_FIELD", "type": "date"},
  {"name": "AMOUNT", "type": "number"},
  {"name": "REF1_VALUE", "type": "string"},
  {"name": "REF2_DESC", "type": "string"},
  {"name": "REF3_NAME", "type": "string"},
  {"name": "REF3_CODE", "type": "string"}
]

How It Works

1. Identify Primary Input

The first input (index 0) is the primary dataset, aliased as a:

0< [] 'PrimaryData:input.v'  ← Primary input

2. Parse Reference Tables

Each -table marker starts a new reference table configuration:

-table                    ← Reference 1
-key ID_NUMBER
-ifNotFound continue

-table                    ← Reference 2
-key CATEGORY_CODE
-ifNotFound continue

3. Extract Join Keys

Parses -key options for each reference table:

-key ID_NUMBER           ← Single key

For composite keys:

-key REGION_CODE         ← Composite key (multiple conditions)
-key STATUS_CODE

4. Apply View Mappings

View clause in input definition maps column names:

1< [view (ID_NUMBER=MATCH_ID;)] 'RefTable1:ref1.v'

Means: Join primary.ID_NUMBER = ref1.MATCH_ID

Without view mapping, column names must match exactly.

5. Determine Join Type

  • -ifNotFound continueLEFT JOIN (NULL if no match)
  • -ifNotFound failINNER JOIN (skip rows with no match)

6. Apply Keep Clause

Filters output columns based on keep list in modify section:

keep
  RECORD_ID,ID_NUMBER,STATUS_CODE,
  REF1_VALUE,REF2_DESC,REF3_NAME;

Only these columns appear in SELECT.

7. Handle Duplicate Columns

If both primary and reference have the same column name:

  • First occurrence keeps original name
  • Subsequent occurrences get _refN suffix

Example:

-- Both primary and ref3 have STATUS_CODE
a.STATUS_CODE,              -- From primary
ref3.STATUS_CODE AS STATUS_CODE_ref3  -- From ref3 (renamed)

View Mappings

View mappings allow column renames in join conditions:

Without View Mapping

1< [] 'RefTable:ref.v'
-key CUSTOMER_ID

Generates:

ON a.CUSTOMER_ID = ref1.CUSTOMER_ID

With View Mapping

1< [view (CUSTOMER_ID=CSP;)] 'RefTable:ref.v'
-key CUSTOMER_ID

Generates:

ON a.CUSTOMER_ID = ref1.CSP

The syntax is: primary_col=reference_col

Composite Keys

Multiple -key options for a single reference table create composite join conditions:

Input

-table
-key REGION_CODE
-key PERIOD_CODE
-ifNotFound continue

Generated SQL

LEFT JOIN reference_table AS ref1
  ON a.REGION_CODE = ref1.REGION_CODE
  AND a.PERIOD_CODE = ref1.PERIOD_CODE

Join Type Determination

LEFT JOIN (continue)

-ifNotFound continue

Returns all primary rows; NULL for unmatched reference columns.

INNER JOIN (fail)

-ifNotFound fail

Returns only rows with matches in all reference tables.

Multiple Reference Tables

The LOOKUP stage supports many reference tables (tested with 13+ references):

## Inputs
0< [] 'Primary:input.v'      ← Primary
1< [] 'Ref1:ref1.v'           ← Reference 1
2< [] 'Ref2:ref2.v'           ← Reference 2
3< [] 'Ref3:ref3.v'           ← Reference 3
...
N< [] 'RefN:refN.v'           ← Reference N

Each reference gets its own -table section in operator options.

Limitations

  • ⚠️ Order matters - Reference tables must appear in operator_options in the same order as input links
  • ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)
  • ⚠️ View mappings - Only supports simple column renames, not expressions
  • ⚠️ Keep clause required - Without keep clause, all columns from all tables are included (can cause conflicts)

Related Transformers

  • JoinStageTransformer - For joining two datasets with explicit join syntax
  • ModifyStageTransformer - For post-lookup column filtering
  • TransformerStageTransformer - For post-lookup column derivations