TransformerStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

TransformerStageTransformer

Overview

Transforms DataStage TRANSFORMER stages to SQL SELECT statements. This is a hybrid transformer that handles simple column passthroughs deterministically and uses LLM for complex C transformation code.

Stage Type: TRANSFORMER Output: SQL SELECT with column derivations

Capabilities

Deterministic (No LLM)

  • Simple column passthroughs (LinkName.ColumnName pattern)
  • Column renames (output_col = input_col)
  • Constraints (WHERE filters)
  • Null checks and basic conditions
  • Single output link transformations

Hybrid (Uses LLM)

  • Complex C transformation code
  • DataStage BASIC expressions
  • Custom derivations with functions
  • Uses GPT-5-nano for fast translation
  • Parallel LLM calls for multiple derivations

DataStage Stage Example

Input (.dsx format)

Orchestrate Section:

#### STAGE: TransformData
## Operator
transform
## Operator options
-flag run
-name 'V0S107_Transform_Data'
-oldnullhandling
## General options
[ident('TransformData'); jobmon_ident('TransformData')]
## Inputs
0< [] 'SourceData:In_Transform.v'
## Outputs
0> [] 'TransformData:Output_Link.v'

C Transformation Code (in DSRECORD):

// define our input/output link names
inputname 0 In_Transform;
outputname 0 Output_Link;

initialize {
  // define our control variables
  int8 RowRejected0;
  int8 NullSetVar0;
}

mainloop {
  // initialise the rejected row variable
  RowRejected0 = 1;

  // evaluate columns (no constraints) for link: Output_Link
  Output_Link.ID_FIELD = In_Transform.ID_COL;
  Output_Link.KEY_1 = In_Transform.KEY_FIELD_1;
  Output_Link.KEY_2 = In_Transform.KEY_FIELD_2;
  Output_Link.KEY_3 = In_Transform.KEY_FIELD_3;
  Output_Link.DESC_FIELD = In_Transform.DESCRIPTION;
  Output_Link.VALUE_FIELD = In_Transform.VALUE_COL;
  writerecord 0;
  RowRejected0 = 0;
}

finish {
}

Generated SQL Output

SELECT
  ID_COL AS ID_FIELD,
  KEY_FIELD_1 AS KEY_1,
  KEY_FIELD_2 AS KEY_2,
  KEY_FIELD_3 AS KEY_3,
  DESCRIPTION AS DESC_FIELD,
  VALUE_COL AS VALUE_FIELD
FROM sourcedata_V0S8

Output Schema

[
  {"name": "ID_FIELD", "type": "string"},
  {"name": "KEY_1", "type": "number"},
  {"name": "KEY_2", "type": "number"},
  {"name": "KEY_3", "type": "number"},
  {"name": "DESC_FIELD", "type": "string"},
  {"name": "VALUE_FIELD", "type": "number"}
]

How It Works

1. Parse C Transformation Code

Extracts the mainloop section containing derivation logic:

Output_Link.ID_FIELD = In_Transform.ID_COL;

2. Detect Pattern Type

  • Simple assignment (deterministic): output.col = input.col
  • Column rename (deterministic): output.new_name = input.old_name
  • Complex expression (LLM): output.col = func(input.col1, input.col2)

3. Generate SQL

For simple assignments:

SELECT
  input_col AS output_col
FROM upstream_stage

For complex expressions (requires LLM):

SELECT
  <translated_expression> AS output_col
FROM upstream_stage

Deterministic vs LLM Decision

C Code PatternMethodExample
output.col = input.colDeterministicID_FIELD = ID_COL
output.col = input.col1Deterministic (rename)KEY_1 = KEY_FIELD_1
output.col = SetNull()DeterministicNULL AS col
if (condition) ...Deterministic (WHERE)Converted to WHERE clause
output.col = Trim(input.col)LLMRequires expression translation
output.col = func(...)LLMCustom DataStage functions

Performance Optimization

When LLM is required:

  • Uses GPT-5-nano for fast, cheap translations
  • Parallel calls for multiple derivations
  • Caches common expression patterns
  • Falls back to LLM-free passthrough if translation fails

Limitations

  • ⚠️ Multiple outputs not supported - Transformer stages with multiple output links (e.g., Output_Link1, Output_Link2) cannot be converted. Each transformer must have exactly one output link.
  • ⚠️ Transform parameters (argruns) not supported - Transformer stages that use runtime parameters via argrun options cannot be converted deterministically
  • Constraints - Only simple constraints converted to WHERE; complex logic ignored
  • Stage variables - Not supported (requires LLM or manual review)
  • Loop logic - For/while loops not supported
  • Reject links - Ignored (all rows assumed to pass)

Workarounds

Multiple Outputs

If a transformer has multiple outputs, split into separate stages:

  • Original: 1 transformer → 2 outputs
  • Workaround: 2 transformers → 1 output each

Transform Parameters

If a transformer uses argrun parameters:

  • Replace with job parameters ([&"param"])
  • Or hardcode values if they don't change

Related Transformers

  • CopyStageTransformer - For simple dataset reads without transformation
  • ModifyStageTransformer - For column renames without C code
  • ContainerTransformer (Tableau) - Similar hybrid approach for Tableau nodes