# CopyStageTransformer Passo a passo de como iniciar sua jornada na Dadosfera # CopyStageTransformer ## Overview Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references). **Stage Type:** `COPY` **Output:** SQL SELECT with column filtering and renaming ## Capabilities * Maps DataStage datasets to Snowflake tables * Reads from upstream stages via schema\_registry * Applies `keep` clause column filtering * Handles column type definitions from modify section * Deduplicates columns for stages with multiple outputs ## DataStage Stage Example ### Input (.dsx format) ``` #### STAGE: CopyDimension ## Operator copy ## General options [ident('CopyDimension')] ## Inputs 0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds' ## Outputs 0> [-pp; modify ( SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY; DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY; keep SOURCE_KEY,DIMENSION_KEY; )] 'CopyDimension:Out_Process.v' ``` ### Dataset Mapping Configuration ```python dataset_table_mapping = { 'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE' } ``` ### Generated SQL Output ```sql SELECT SOURCE_KEY, DIMENSION_KEY FROM PUBLIC.DIM_LOOKUP_TABLE ``` ### Output Schema ```json [ { "name": "SOURCE_KEY", "type": "number" }, { "name": "DIMENSION_KEY", "type": "number" } ] ``` ## How It Works 1. **Reads dataset path** from inputs: `[ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'` 2. **Maps dataset to Snowflake table** using `dataset_table_mapping` 3. **Extracts column types** from modify section: * `decimal[38,10]` → `number` * `decimal[15,0]` → `number` * `string[max=N]` → `string` 4. **Applies keep filter** to select only specified columns: `SOURCE_KEY, DIMENSION_KEY` 5. **Generates SELECT** with filtered column list from mapped table ## Type Mapping DataStage types are converted to Snowflake types: | DataStage Type | Snowflake Type | |---------------|---------------| | `decimal[p,s]` | `number` | | `string[max=N]` | `string` | | `int32`, `int64` | `number` | | `date` | `date` | | `timestamp` | `timestamp` | ## Column Filtering The transformer supports three filtering modes: 1. **Keep clause** - Only specified columns are selected ``` keep SOURCE_KEY,DIMENSION_KEY; ``` 2. **Drop clause** - All columns except specified ones ``` drop TEMP_COLUMN; ``` 3. **No filter** - All columns from source ## Reading from Stages vs Datasets ### Dataset Input ``` ## Inputs 0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds' ``` Requires mapping in `dataset_table_mapping`. ### Stage Input ``` ## Inputs 0< [] 'PreviousStage:output.v' ``` Reads schema from `schema_registry` and references upstream CTE. ## Limitations * **Requires dataset\_table\_mapping** for dataset inputs (no automatic table discovery) * **Column type inference** depends on modify section or schema\_registry being populated * **Multiple output links** result in deduplicated column list * **View mappings not supported** (column renames in COPY should use downstream MODIFY stage) ## Related Transformers * **ModifyStageTransformer** - For column renames and filters without reading datasets * **ImportStageTransformer** - For file inputs (CSV, sequential files) * **InputStageTransformer** - For Oracle table/SQL inputs