CopyStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

CopyStageTransformer

Overview

Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).

Stage Type: COPY Output: SQL SELECT with column filtering and renaming

Capabilities

  • Maps DataStage datasets to Snowflake tables
  • Reads from upstream stages via schema_registry
  • Applies keep clause column filtering
  • Handles column type definitions from modify section
  • Deduplicates columns for stages with multiple outputs

DataStage Stage Example

Input (.dsx format)

#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
  SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
  DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
  SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'

Dataset Mapping Configuration

dataset_table_mapping = {
    'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}

Generated SQL Output

SELECT
    SOURCE_KEY,
    DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLE

Output Schema

[
  {
    "name": "SOURCE_KEY",
    "type": "number"
  },
  {
    "name": "DIMENSION_KEY",
    "type": "number"
  }
]

How It Works

  1. Reads dataset path from inputs: [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
  2. Maps dataset to Snowflake table using dataset_table_mapping
  3. Extracts column types from modify section:
    • decimal[38,10]number
    • decimal[15,0]number
    • string[max=N]string
  4. Applies keep filter to select only specified columns: SOURCE_KEY, DIMENSION_KEY
  5. Generates SELECT with filtered column list from mapped table

Type Mapping

DataStage types are converted to Snowflake types:

DataStage TypeSnowflake Type
decimal[p,s]number
string[max=N]string
int32, int64number
datedate
timestamptimestamp

Column Filtering

The transformer supports three filtering modes:

  1. Keep clause - Only specified columns are selected

    keep
      SOURCE_KEY,DIMENSION_KEY;
  2. Drop clause - All columns except specified ones

    drop
      TEMP_COLUMN;
  3. No filter - All columns from source

Reading from Stages vs Datasets

Dataset Input

## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'

Requires mapping in dataset_table_mapping.

Stage Input

## Inputs
0< [] 'PreviousStage:output.v'

Reads schema from schema_registry and references upstream CTE.

Limitations

  • Requires dataset_table_mapping for dataset inputs (no automatic table discovery)
  • Column type inference depends on modify section or schema_registry being populated
  • Multiple output links result in deduplicated column list
  • View mappings not supported (column renames in COPY should use downstream MODIFY stage)

Related Transformers

  • ModifyStageTransformer - For column renames and filters without reading datasets
  • ImportStageTransformer - For file inputs (CSV, sequential files)
  • InputStageTransformer - For Oracle table/SQL inputs