CopyStageTransformer | Conversor DDF | Dadosfera Documentation

Passo a passo de como iniciar sua jornada na Dadosfera

CopyStageTransformer

Overview

Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).

Stage Type: COPY Output: SQL SELECT with column filtering and renaming

Capabilities

Maps DataStage datasets to Snowflake tables
Reads from upstream stages via schema_registry
Applies keep clause column filtering
Handles column type definitions from modify section
Deduplicates columns for stages with multiple outputs

DataStage Stage Example

Input (.dsx format)

#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
  SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
  DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
  SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'

Dataset Mapping Configuration

dataset_table_mapping = {
    'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}

Generated SQL Output

SELECT
    SOURCE_KEY,
    DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLE

Output Schema

[
  {
    "name": "SOURCE_KEY",
    "type": "number"
  },
  {
    "name": "DIMENSION_KEY",
    "type": "number"
  }
]

How It Works

Reads dataset path from inputs: [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
Maps dataset to Snowflake table using dataset_table_mapping
Extracts column types from modify section:
- decimal[38,10] → number
- decimal[15,0] → number
- string[max=N] → string
Applies keep filter to select only specified columns: SOURCE_KEY, DIMENSION_KEY
Generates SELECT with filtered column list from mapped table

Type Mapping

DataStage types are converted to Snowflake types:

DataStage Type	Snowflake Type
`decimal[p,s]`	`number`
`string[max=N]`	`string`
`int32`, `int64`	`number`
`date`	`date`
`timestamp`	`timestamp`

Column Filtering

The transformer supports three filtering modes:

Keep clause - Only specified columns are selected
```
keep
  SOURCE_KEY,DIMENSION_KEY;
```
Drop clause - All columns except specified ones
```
drop
  TEMP_COLUMN;
```
No filter - All columns from source

Reading from Stages vs Datasets

Dataset Input

## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'

Requires mapping in dataset_table_mapping.

Stage Input

## Inputs
0< [] 'PreviousStage:output.v'

Reads schema from schema_registry and references upstream CTE.

Limitations

Requires dataset_table_mapping for dataset inputs (no automatic table discovery)
Column type inference depends on modify section or schema_registry being populated
Multiple output links result in deduplicated column list
View mappings not supported (column renames in COPY should use downstream MODIFY stage)

Related Transformers

ModifyStageTransformer - For column renames and filters without reading datasets
ImportStageTransformer - For file inputs (CSV, sequential files)
InputStageTransformer - For Oracle table/SQL inputs

Updated about 1 month ago