JoinStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
JoinStageTransformer
Overview
Transforms DataStage JOIN stages to SQL JOIN statements. Supports LEFT, INNER, RIGHT, and FULL OUTER joins with single or multiple join keys.
Stage Type: JOIN (leftouterjoin, innerjoin, rightouterjoin, fullouterjoin operators)
Output: SQL SELECT with JOIN
Capabilities
- LEFT OUTER JOIN, INNER JOIN, RIGHT JOIN, FULL OUTER JOIN
- Single and composite join keys
- Column filtering via keep clause
- Automatic duplicate column renaming (adds suffix)
- Handles null-safe joins
DataStage Stage Example
Input (.dsx format)
#### STAGE: JoinRecords
## Operator
leftouterjoin
## Operator options
-key 'ID_NUMBER'
## General options
[ident('JoinRecords'); jobmon_ident('JoinRecords')]
## Inputs
0< [] 'SourceData:Sorted.v'
1< [] 'DimensionData:Sorted.v'
## Outputs
0> [-pp; modify (
PLAN_KEY:nullable string[max=8]=PLAN_KEY;
keep
RECORD_ID,STATUS_CODE,TYPE_VALUE,
ID_NUMBER,CATEGORY_CODE,DATE_FIELD,
TIME_FIELD,AMOUNT,REGION_CODE,
CONTRACT_KEY,PLAN_KEY,ACCOUNT_KEY,
CUSTOMER_KEY;
)] 'JoinRecords:Output.v'
Input Schemas
Left Input (SourceData):
[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"}
]Right Input (DimensionData):
[
{"name": "ID_NUMBER", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"},
{"name": "REGION_CODE", "type": "string"}
]Generated SQL Output
SELECT
sourcedata_V0S3.RECORD_ID,
sourcedata_V0S3.STATUS_CODE,
sourcedata_V0S3.TYPE_VALUE,
sourcedata_V0S3.ID_NUMBER,
sourcedata_V0S3.CATEGORY_CODE,
sourcedata_V0S3.DATE_FIELD,
sourcedata_V0S3.TIME_FIELD,
sourcedata_V0S3.AMOUNT,
dimensiondata_V0S10.REGION_CODE,
dimensiondata_V0S10.CONTRACT_KEY,
dimensiondata_V0S10.PLAN_KEY,
dimensiondata_V0S10.ACCOUNT_KEY,
dimensiondata_V0S10.CUSTOMER_KEY
FROM sourcedata_V0S3
LEFT OUTER JOIN dimensiondata_V0S10
ON sourcedata_V0S3.ID_NUMBER = dimensiondata_V0S10.ID_NUMBEROutput Schema
[
{"name": "RECORD_ID", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"},
{"name": "ID_NUMBER", "type": "string"},
{"name": "CATEGORY_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "TIME_FIELD", "type": "time"},
{"name": "AMOUNT", "type": "number"},
{"name": "REGION_CODE", "type": "string"},
{"name": "CONTRACT_KEY", "type": "number"},
{"name": "PLAN_KEY", "type": "number"},
{"name": "ACCOUNT_KEY", "type": "number"},
{"name": "CUSTOMER_KEY", "type": "number"}
]How It Works
1. Identify Join Type
Maps DataStage operator to SQL join type:
leftouterjoin→LEFT OUTER JOINinnerjoin→INNER JOINrightouterjoin→RIGHT OUTER JOINfullouterjoin→FULL OUTER JOIN
2. Extract Join Keys
Parses the -key option to identify join columns:
-key 'ID_NUMBER'
Supports multiple keys:
-key 'FIELD1,FIELD2'
3. Build JOIN Condition
Creates ON clause with equality conditions:
ON left.ID_NUMBER = right.ID_NUMBERFor composite keys:
ON left.KEY1 = right.KEY1
AND left.KEY2 = right.KEY24. Apply Keep Clause
Filters output columns based on keep list in modify section.
5. Handle Duplicate Columns
If both inputs have the same column name:
- First occurrence keeps original name
- Second occurrence gets
_rightsuffix
Example:
-- Both tables have ID_NUMBER
sourcedata.ID_NUMBER, -- From left
dimensiondata.ID_NUMBER AS ID_NUMBER_right -- From right (renamed)Join Types
LEFT OUTER JOIN
-operator leftouterjoin
Returns all rows from left input, matching rows from right (NULL if no match).
INNER JOIN
-operator innerjoin
Returns only rows with matches in both inputs.
RIGHT OUTER JOIN
-operator rightouterjoin
Returns all rows from right input, matching rows from left (NULL if no match).
FULL OUTER JOIN
-operator fullouterjoin
Returns all rows from both inputs (NULL where no match).
Composite Keys Example
Input
-key 'REGION_CODE,PERIOD_CODE'
Generated SQL
SELECT ...
FROM left_table
LEFT OUTER JOIN right_table
ON left_table.REGION_CODE = right_table.REGION_CODE
AND left_table.PERIOD_CODE = right_table.PERIOD_CODELimitations
- ⚠️ Sort requirement - DataStage JOIN stages require pre-sorted inputs; Snowflake SQL does not require sorting
- ⚠️ Partitioning ignored - DataStage partition/sort operators before JOIN are not converted
- ⚠️ Keep clause required - Output columns must be explicitly listed in keep clause
- ⚠️ No complex join conditions - Only equality joins supported (no inequalities or functions)
Related Transformers
- LookupStageTransformer - Similar to JOIN but optimized for multiple reference tables
- TransformerStageTransformer - For post-join column derivations
- ModifyStageTransformer - For post-join column filtering
Updated 5 days ago
