CopyStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
CopyStageTransformer
Overview
Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).
Stage Type: COPY
Output: SQL SELECT with column filtering and renaming
Capabilities
- Maps DataStage datasets to Snowflake tables
- Reads from upstream stages via schema_registry
- Applies
keepclause column filtering - Handles column type definitions from modify section
- Deduplicates columns for stages with multiple outputs
DataStage Stage Example
Input (.dsx format)
#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'
Dataset Mapping Configuration
dataset_table_mapping = {
'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}Generated SQL Output
SELECT
SOURCE_KEY,
DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLEOutput Schema
[
{
"name": "SOURCE_KEY",
"type": "number"
},
{
"name": "DIMENSION_KEY",
"type": "number"
}
]How It Works
- Reads dataset path from inputs:
[ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds' - Maps dataset to Snowflake table using
dataset_table_mapping - Extracts column types from modify section:
decimal[38,10]→numberdecimal[15,0]→numberstring[max=N]→string
- Applies keep filter to select only specified columns:
SOURCE_KEY, DIMENSION_KEY - Generates SELECT with filtered column list from mapped table
Type Mapping
DataStage types are converted to Snowflake types:
| DataStage Type | Snowflake Type |
|---|---|
decimal[p,s] | number |
string[max=N] | string |
int32, int64 | number |
date | date |
timestamp | timestamp |
Column Filtering
The transformer supports three filtering modes:
-
Keep clause - Only specified columns are selected
keep SOURCE_KEY,DIMENSION_KEY; -
Drop clause - All columns except specified ones
drop TEMP_COLUMN; -
No filter - All columns from source
Reading from Stages vs Datasets
Dataset Input
## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'
Requires mapping in dataset_table_mapping.
Stage Input
## Inputs
0< [] 'PreviousStage:output.v'
Reads schema from schema_registry and references upstream CTE.
Limitations
- Requires dataset_table_mapping for dataset inputs (no automatic table discovery)
- Column type inference depends on modify section or schema_registry being populated
- Multiple output links result in deduplicated column list
- View mappings not supported (column renames in COPY should use downstream MODIFY stage)
Related Transformers
- ModifyStageTransformer - For column renames and filters without reading datasets
- ImportStageTransformer - For file inputs (CSV, sequential files)
- InputStageTransformer - For Oracle table/SQL inputs
Updated 5 days ago
