ModifyStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

ModifyStageTransformer

Overview

Transforms DataStage MODIFY stages to SQL SELECT statements. Handles column renames, column selection (keep), and column exclusion (drop).

Stage Type: MODIFY Output: SQL SELECT with column operations

Capabilities

  • Column renames (NEW_NAME = OLD_NAME)
  • Column drops (DROP clause)
  • Column keep/selection (implicit from rename list)
  • Pass-through columns (no modification)
  • Type conversions (when specified)

DataStage Stage Example

Input (.dsx format)

#### STAGE: ModifyColumns
## Operator
modify '
## Operator options
DROP SOURCE_ID
;
CODE_VALUE = CODE_VALUE
;
'
## General options
[ident('ModifyColumns'); jobmon_ident('ModifyColumns')]
## Inputs
0< [] 'InputData:In_Modify.v'
## Outputs
0> [] 'ModifyColumns:Output.v'

Input Schema (from upstream stage)

[
  {"name": "CODE_VALUE", "type": "string"},
  {"name": "SOURCE_ID", "type": "string"},
  {"name": "REFERENCE_KEY", "type": "number"}
]

Generated SQL Output

SELECT
  CODE_VALUE,
  REFERENCE_KEY
FROM inputdata_V0S37

Output Schema

[
  {"name": "CODE_VALUE", "type": "string"},
  {"name": "REFERENCE_KEY", "type": "number"}
]

How It Works

1. Parse Modify Options

Extracts operations from the modify string:

DROP SOURCE_ID       ← Column to exclude
;
CODE_VALUE = CODE_VALUE  ← Column rename (or passthrough)
;

2. Apply DROP Operations

Removes specified columns from output:

  • DROP SOURCE_IDSOURCE_ID excluded from SELECT

3. Apply Renames

Processes rename statements:

  • CODE_VALUE = CODE_VALUE → Passthrough (no rename needed)
  • NEW_COL = OLD_COLOLD_COL AS NEW_COL

4. Include Remaining Columns

Columns not mentioned in DROP or rename are passed through automatically:

  • REFERENCE_KEY → Included in SELECT (not dropped, not renamed)

5. Generate SELECT

SELECT
  <renamed_columns>,
  <passthrough_columns>
FROM upstream_stage

Modify Operations

DROP Column

DROP TEMP_FIELD
;

Excludes the column from output.

Rename Column

NEW_NAME = OLD_NAME
;

Renames OLD_NAME to NEW_NAME in output.

Passthrough (Identity)

FIELD_NAME = FIELD_NAME
;

Keeps column with same name (redundant but valid).

Multiple Operations

DROP FIELD_A
;
NEW_B = OLD_B
;
NEW_C = OLD_C
;

Type Conversions

The MODIFY stage can also convert types:

VALUE_STR = STRING(VALUE_NUM, 10)
;

Note: Complex type conversions may require LLM assistance for expression translation.

Schema Inference

Output schema is built from:

  1. Input schema (from upstream stage)
  2. Minus dropped columns
  3. Plus renamed columns (with new names)
  4. Types preserved from input (unless conversion specified)

Limitations

  • Complex expressions - Only simple renames supported deterministically; complex derivations require LLM
  • Type conversions - Limited to simple CAST operations; complex conversions need LLM
  • Conditional logic - Not supported (use TRANSFORMER stage)
  • Multiple outputs - MODIFY supports only one output link

Common Use Cases

1. Drop Temporary Columns

DROP TEMP_COL1
;
DROP TEMP_COL2
;

2. Rename for Clarity

ID_FIELD = ID_COL
;
DATE_FIELD = DATE_COL
;

3. Clean Schema

DROP AUDIT_TIMESTAMP
;
DROP ETL_BATCH_ID
;
FINAL_VALUE = VALUE_FIELD
;

Related Transformers

  • CopyStageTransformer - For reading data with column filtering
  • TransformerStageTransformer - For complex transformations with C code
  • InputStageTransformer - Can include modify operations in orchestrate code