# CopyStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

# CopyStageTransformer

## Overview

Transforms DataStage COPY stages to SQL SELECT statements. Handles both dataset inputs (mapped to Snowflake tables) and stage inputs (CTE references).

**Stage Type:** `COPY`
**Output:** SQL SELECT with column filtering and renaming

## Capabilities

* Maps DataStage datasets to Snowflake tables
* Reads from upstream stages via schema\_registry
* Applies `keep` clause column filtering
* Handles column type definitions from modify section
* Deduplicates columns for stages with multiple outputs

## DataStage Stage Example

### Input (.dsx format)

```
#### STAGE: CopyDimension
## Operator
copy
## General options
[ident('CopyDimension')]
## Inputs
0< [ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'
## Outputs
0> [-pp; modify (
  SOURCE_KEY:nullable decimal[38,10]=SOURCE_KEY;
  DIMENSION_KEY:nullable decimal[15,0]=DIMENSION_KEY;
keep
  SOURCE_KEY,DIMENSION_KEY;
)] 'CopyDimension:Out_Process.v'
```

### Dataset Mapping Configuration

```python
dataset_table_mapping = {
    'DimLookup.ds': 'PUBLIC.DIM_LOOKUP_TABLE'
}
```

### Generated SQL Output

```sql
SELECT
    SOURCE_KEY,
    DIMENSION_KEY
FROM PUBLIC.DIM_LOOKUP_TABLE
```

### Output Schema

```json
[
  {
    "name": "SOURCE_KEY",
    "type": "number"
  },
  {
    "name": "DIMENSION_KEY",
    "type": "number"
  }
]
```

## How It Works

1. **Reads dataset path** from inputs: `[ds] '[&"$PathDataSetWarehouse"]/DimLookup.ds'`
2. **Maps dataset to Snowflake table** using `dataset_table_mapping`
3. **Extracts column types** from modify section:
   * `decimal[38,10]` → `number`
   * `decimal[15,0]` → `number`
   * `string[max=N]` → `string`
4. **Applies keep filter** to select only specified columns: `SOURCE_KEY, DIMENSION_KEY`
5. **Generates SELECT** with filtered column list from mapped table

## Type Mapping

DataStage types are converted to Snowflake types:

| DataStage Type | Snowflake Type |
|---------------|---------------|
| `decimal[p,s]` | `number` |
| `string[max=N]` | `string` |
| `int32`, `int64` | `number` |
| `date` | `date` |
| `timestamp` | `timestamp` |

## Column Filtering

The transformer supports three filtering modes:

1. **Keep clause** - Only specified columns are selected
   ```
   keep
     SOURCE_KEY,DIMENSION_KEY;
   ```

2. **Drop clause** - All columns except specified ones
   ```
   drop
     TEMP_COLUMN;
   ```

3. **No filter** - All columns from source

## Reading from Stages vs Datasets

### Dataset Input

```
## Inputs
0< [ds] '[&"$PathDataSetDwfatos"]/MyDataset.ds'
```

Requires mapping in `dataset_table_mapping`.

### Stage Input

```
## Inputs
0< [] 'PreviousStage:output.v'
```

Reads schema from `schema_registry` and references upstream CTE.

## Limitations

* **Requires dataset\_table\_mapping** for dataset inputs (no automatic table discovery)
* **Column type inference** depends on modify section or schema\_registry being populated
* **Multiple output links** result in deduplicated column list
* **View mappings not supported** (column renames in COPY should use downstream MODIFY stage)

## Related Transformers

* **ModifyStageTransformer** - For column renames and filters without reading datasets
* **ImportStageTransformer** - For file inputs (CSV, sequential files)
* **InputStageTransformer** - For Oracle table/SQL inputs