TransformerStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
TransformerStageTransformer
Overview
Transforms DataStage TRANSFORMER stages to SQL SELECT statements. This is a hybrid transformer that handles simple column passthroughs deterministically and uses LLM for complex C transformation code.
Stage Type: TRANSFORMER
Output: SQL SELECT with column derivations
Capabilities
Deterministic (No LLM)
- Simple column passthroughs (
LinkName.ColumnNamepattern) - Column renames (
output_col = input_col) - Constraints (WHERE filters)
- Null checks and basic conditions
- Single output link transformations
Hybrid (Uses LLM)
- Complex C transformation code
- DataStage BASIC expressions
- Custom derivations with functions
- Uses GPT-5-nano for fast translation
- Parallel LLM calls for multiple derivations
DataStage Stage Example
Input (.dsx format)
Orchestrate Section:
#### STAGE: TransformData
## Operator
transform
## Operator options
-flag run
-name 'V0S107_Transform_Data'
-oldnullhandling
## General options
[ident('TransformData'); jobmon_ident('TransformData')]
## Inputs
0< [] 'SourceData:In_Transform.v'
## Outputs
0> [] 'TransformData:Output_Link.v'
C Transformation Code (in DSRECORD):
// define our input/output link names
inputname 0 In_Transform;
outputname 0 Output_Link;
initialize {
// define our control variables
int8 RowRejected0;
int8 NullSetVar0;
}
mainloop {
// initialise the rejected row variable
RowRejected0 = 1;
// evaluate columns (no constraints) for link: Output_Link
Output_Link.ID_FIELD = In_Transform.ID_COL;
Output_Link.KEY_1 = In_Transform.KEY_FIELD_1;
Output_Link.KEY_2 = In_Transform.KEY_FIELD_2;
Output_Link.KEY_3 = In_Transform.KEY_FIELD_3;
Output_Link.DESC_FIELD = In_Transform.DESCRIPTION;
Output_Link.VALUE_FIELD = In_Transform.VALUE_COL;
writerecord 0;
RowRejected0 = 0;
}
finish {
}Generated SQL Output
SELECT
ID_COL AS ID_FIELD,
KEY_FIELD_1 AS KEY_1,
KEY_FIELD_2 AS KEY_2,
KEY_FIELD_3 AS KEY_3,
DESCRIPTION AS DESC_FIELD,
VALUE_COL AS VALUE_FIELD
FROM sourcedata_V0S8Output Schema
[
{"name": "ID_FIELD", "type": "string"},
{"name": "KEY_1", "type": "number"},
{"name": "KEY_2", "type": "number"},
{"name": "KEY_3", "type": "number"},
{"name": "DESC_FIELD", "type": "string"},
{"name": "VALUE_FIELD", "type": "number"}
]How It Works
1. Parse C Transformation Code
Extracts the mainloop section containing derivation logic:
Output_Link.ID_FIELD = In_Transform.ID_COL;2. Detect Pattern Type
- Simple assignment (deterministic):
output.col = input.col - Column rename (deterministic):
output.new_name = input.old_name - Complex expression (LLM):
output.col = func(input.col1, input.col2)
3. Generate SQL
For simple assignments:
SELECT
input_col AS output_col
FROM upstream_stageFor complex expressions (requires LLM):
SELECT
<translated_expression> AS output_col
FROM upstream_stageDeterministic vs LLM Decision
| C Code Pattern | Method | Example |
|---|---|---|
output.col = input.col | Deterministic | ID_FIELD = ID_COL |
output.col = input.col1 | Deterministic (rename) | KEY_1 = KEY_FIELD_1 |
output.col = SetNull() | Deterministic | NULL AS col |
if (condition) ... | Deterministic (WHERE) | Converted to WHERE clause |
output.col = Trim(input.col) | LLM | Requires expression translation |
output.col = func(...) | LLM | Custom DataStage functions |
Performance Optimization
When LLM is required:
- Uses GPT-5-nano for fast, cheap translations
- Parallel calls for multiple derivations
- Caches common expression patterns
- Falls back to LLM-free passthrough if translation fails
Limitations
- ⚠️ Multiple outputs not supported - Transformer stages with multiple output links (e.g.,
Output_Link1,Output_Link2) cannot be converted. Each transformer must have exactly one output link. - ⚠️ Transform parameters (argruns) not supported - Transformer stages that use runtime parameters via
argrunoptions cannot be converted deterministically - Constraints - Only simple constraints converted to WHERE; complex logic ignored
- Stage variables - Not supported (requires LLM or manual review)
- Loop logic - For/while loops not supported
- Reject links - Ignored (all rows assumed to pass)
Workarounds
Multiple Outputs
If a transformer has multiple outputs, split into separate stages:
- Original: 1 transformer → 2 outputs
- Workaround: 2 transformers → 1 output each
Transform Parameters
If a transformer uses argrun parameters:
- Replace with job parameters (
[&"param"]) - Or hardcode values if they don't change
Related Transformers
- CopyStageTransformer - For simple dataset reads without transformation
- ModifyStageTransformer - For column renames without C code
- ContainerTransformer (Tableau) - Similar hybrid approach for Tableau nodes
Updated 5 days ago
