JoinStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

JoinStageTransformer

Overview

Transforms DataStage JOIN stages to SQL JOIN statements. Supports LEFT, INNER, RIGHT, and FULL OUTER joins with single or multiple join keys.

Stage Type: JOIN (leftouterjoin, innerjoin, rightouterjoin, fullouterjoin operators) Output: SQL SELECT with JOIN

Capabilities

  • LEFT OUTER JOIN, INNER JOIN, RIGHT JOIN, FULL OUTER JOIN
  • Single and composite join keys
  • Column filtering via keep clause
  • Automatic duplicate column renaming (adds suffix)
  • Handles null-safe joins

DataStage Stage Example

Input (.dsx format)

#### STAGE: JoinRecords
## Operator
leftouterjoin
## Operator options
-key 'ID_NUMBER'
## General options
[ident('JoinRecords'); jobmon_ident('JoinRecords')]
## Inputs
0< [] 'SourceData:Sorted.v'
1< [] 'DimensionData:Sorted.v'
## Outputs
0> [-pp; modify (
  PLAN_KEY:nullable string[max=8]=PLAN_KEY;
keep
  RECORD_ID,STATUS_CODE,TYPE_VALUE,
  ID_NUMBER,CATEGORY_CODE,DATE_FIELD,
  TIME_FIELD,AMOUNT,REGION_CODE,
  CONTRACT_KEY,PLAN_KEY,ACCOUNT_KEY,
  CUSTOMER_KEY;
)] 'JoinRecords:Output.v'

Input Schemas

Left Input (SourceData):

[
  {"name": "RECORD_ID", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "TYPE_VALUE", "type": "string"},
  {"name": "ID_NUMBER", "type": "string"},
  {"name": "CATEGORY_CODE", "type": "string"},
  {"name": "DATE_FIELD", "type": "date"},
  {"name": "TIME_FIELD", "type": "time"},
  {"name": "AMOUNT", "type": "number"}
]

Right Input (DimensionData):

[
  {"name": "ID_NUMBER", "type": "string"},
  {"name": "CONTRACT_KEY", "type": "number"},
  {"name": "PLAN_KEY", "type": "number"},
  {"name": "ACCOUNT_KEY", "type": "number"},
  {"name": "CUSTOMER_KEY", "type": "number"},
  {"name": "REGION_CODE", "type": "string"}
]

Generated SQL Output

SELECT
  sourcedata_V0S3.RECORD_ID,
  sourcedata_V0S3.STATUS_CODE,
  sourcedata_V0S3.TYPE_VALUE,
  sourcedata_V0S3.ID_NUMBER,
  sourcedata_V0S3.CATEGORY_CODE,
  sourcedata_V0S3.DATE_FIELD,
  sourcedata_V0S3.TIME_FIELD,
  sourcedata_V0S3.AMOUNT,
  dimensiondata_V0S10.REGION_CODE,
  dimensiondata_V0S10.CONTRACT_KEY,
  dimensiondata_V0S10.PLAN_KEY,
  dimensiondata_V0S10.ACCOUNT_KEY,
  dimensiondata_V0S10.CUSTOMER_KEY
FROM sourcedata_V0S3
LEFT OUTER JOIN dimensiondata_V0S10
  ON sourcedata_V0S3.ID_NUMBER = dimensiondata_V0S10.ID_NUMBER

Output Schema

[
  {"name": "RECORD_ID", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "TYPE_VALUE", "type": "string"},
  {"name": "ID_NUMBER", "type": "string"},
  {"name": "CATEGORY_CODE", "type": "string"},
  {"name": "DATE_FIELD", "type": "date"},
  {"name": "TIME_FIELD", "type": "time"},
  {"name": "AMOUNT", "type": "number"},
  {"name": "REGION_CODE", "type": "string"},
  {"name": "CONTRACT_KEY", "type": "number"},
  {"name": "PLAN_KEY", "type": "number"},
  {"name": "ACCOUNT_KEY", "type": "number"},
  {"name": "CUSTOMER_KEY", "type": "number"}
]

How It Works

1. Identify Join Type

Maps DataStage operator to SQL join type:

  • leftouterjoinLEFT OUTER JOIN
  • innerjoinINNER JOIN
  • rightouterjoinRIGHT OUTER JOIN
  • fullouterjoinFULL OUTER JOIN

2. Extract Join Keys

Parses the -key option to identify join columns:

-key 'ID_NUMBER'

Supports multiple keys:

-key 'FIELD1,FIELD2'

3. Build JOIN Condition

Creates ON clause with equality conditions:

ON left.ID_NUMBER = right.ID_NUMBER

For composite keys:

ON left.KEY1 = right.KEY1
   AND left.KEY2 = right.KEY2

4. Apply Keep Clause

Filters output columns based on keep list in modify section.

5. Handle Duplicate Columns

If both inputs have the same column name:

  • First occurrence keeps original name
  • Second occurrence gets _right suffix

Example:

-- Both tables have ID_NUMBER
sourcedata.ID_NUMBER,           -- From left
dimensiondata.ID_NUMBER AS ID_NUMBER_right  -- From right (renamed)

Join Types

LEFT OUTER JOIN

-operator leftouterjoin

Returns all rows from left input, matching rows from right (NULL if no match).

INNER JOIN

-operator innerjoin

Returns only rows with matches in both inputs.

RIGHT OUTER JOIN

-operator rightouterjoin

Returns all rows from right input, matching rows from left (NULL if no match).

FULL OUTER JOIN

-operator fullouterjoin

Returns all rows from both inputs (NULL where no match).

Composite Keys Example

Input

-key 'REGION_CODE,PERIOD_CODE'

Generated SQL

SELECT ...
FROM left_table
LEFT OUTER JOIN right_table
  ON left_table.REGION_CODE = right_table.REGION_CODE
     AND left_table.PERIOD_CODE = right_table.PERIOD_CODE

Limitations

  • ⚠️ Sort requirement - DataStage JOIN stages require pre-sorted inputs; Snowflake SQL does not require sorting
  • ⚠️ Partitioning ignored - DataStage partition/sort operators before JOIN are not converted
  • ⚠️ Keep clause required - Output columns must be explicitly listed in keep clause
  • ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)

Related Transformers

  • LookupStageTransformer - Similar to JOIN but optimized for multiple reference tables
  • TransformerStageTransformer - For post-join column derivations
  • ModifyStageTransformer - For post-join column filtering