ImportStageTransformer

Overview

Transforms DataStage IMPORT stages (CSV/sequential file readers) to SQL SELECT statements. Maps file paths to pre-loaded Snowflake tables.

Stage Type: INPUT (import operator with PxSequentialFile) Output: SQL SELECT from mapped Snowflake table

Recommended Approach

For IMPORT stages reading CSV or sequential files, we strongly recommend:

Ingest the files into Snowflake using Dadosfera Ingestion module
Map the file path to the target Snowflake table in the input mapping configuration
The converter will generate a simple SELECT from the target table

This approach ensures data is already loaded in Snowflake before the converted SQL runs.

Example Workflow

Original (DataStage):

File: /data/input/sales_data.csv
Schema: ID, NAME, AMOUNT

Recommended:

Ingest file using Dadosfera Ingestion module:
- Source: /data/input/sales_data.csv
- Target: PUBLIC.SALES_DATA_TABLE
- Run ingestion pipeline

Configure input mapping:

{
  "/data/input/sales_data.csv": "PUBLIC.SALES_DATA_TABLE"
}

Converter generates:

SELECT ID, NAME, AMOUNT FROM PUBLIC.SALES_DATA_TABLE

Capabilities

Maps CSV/sequential file paths to Snowflake tables
Preserves column schema from file definition
Handles delimited files (CSV, semicolon, tab, etc.)
Supports nullable columns
Column list from inline schema or -firstLineColumnNames

DataStage Stage Example

Input (.dsx format)

#### STAGE: ImportFileData
## Operator
import
## Operator options
-schema record
  {final_delim=end, record_delim_string='\\r\\n', delim=';', quote=none}
(
  REGION_CODE:nullable ustring[max=50];
  SEGMENT_CODE:nullable ustring[max=50];
  PRODUCT_NAME:nullable ustring[max=255];
  VALUE_AMOUNT:nullable ustring[max=255];
  STATUS_FLAG:nullable ustring[max=10];
)
-rejects continue
-reportProgress yes
-firstLineColumnNames
-file '[&\"$PathDataFiles\"]/input_data/processed/[&\"file_name\"].csv'

## General options
[ident('ImportFileData'); jobmon_ident('ImportFileData')]
## Outputs
0> [] 'ImportFileData:output.v'

Input Mapping Configuration

{
  "[&\"$PathDataFiles\"]/input_data/processed/[&\"file_name\"].csv": "PUBLIC.INPUT_DATA_TABLE"
}

Generated SQL Output

SELECT
  REGION_CODE,
  SEGMENT_CODE,
  PRODUCT_NAME,
  VALUE_AMOUNT,
  STATUS_FLAG
FROM PUBLIC.INPUT_DATA_TABLE

Output Schema

[
  {"name": "REGION_CODE", "type": "string"},
  {"name": "SEGMENT_CODE", "type": "string"},
  {"name": "PRODUCT_NAME", "type": "string"},
  {"name": "VALUE_AMOUNT", "type": "string"},
  {"name": "STATUS_FLAG", "type": "string"}
]

How It Works

1. Extract File Path

Parses the -file option to identify the source file:

-file '[&\"$PathDataFiles\"]/input_data/processed/sales_2024.csv'

2. Lookup Snowflake Table

Uses input mapping to find corresponding Snowflake table:

{
  "[&\"$PathDataFiles\"]/input_data/processed/sales_2024.csv": "PUBLIC.SALES_2024_TABLE"
}

3. Extract Schema

Reads column definitions from inline schema:

-schema record
  {final_delim=end, record_delim_string='\\r\\n', delim=';', quote=none}
(
  REGION_CODE:nullable ustring[max=50];
  SEGMENT_CODE:nullable ustring[max=50];
  ...
)

4. Generate SELECT Statement

Creates SQL selecting specified columns from mapped table:

SELECT
  REGION_CODE,
  SEGMENT_CODE,
  ...
FROM PUBLIC.SALES_2024_TABLE

File Options

Delimiter Configuration

-schema record
  {delim=';'}          ← Semicolon delimiter
  {delim=','}          ← Comma delimiter (CSV)
  {delim='\\t'}        ← Tab delimiter (TSV)

Record Delimiter

{record_delim_string='\\r\\n'}   ← Windows line endings (CRLF)
{record_delim_string='\\n'}      ← Unix line endings (LF)

Quote Handling

{quote='"'}          ← Double quote for text fields
{quote=none}         ← No quoting

First Line Options

-firstLineColumnNames    ← Skip first row (contains headers)

Without this option, all rows are treated as data.

Column Type Mapping

DataStage types in schema are converted to Snowflake types:

DataStage Type	Snowflake Type
`ustring[max=N]`	`string`
`string[max=N]`	`string`
`int32`	`number`
`int64`	`number`
`decimal[p,s]`	`number`
`date`	`date`
`timestamp`	`timestamp`

All types can be marked nullable.

Input Mapping Examples

Simple File Mapping

{
  "/data/sales.csv": "PUBLIC.SALES_TABLE"
}

Parameterized File Path

DataStage often uses parameters in file paths:

-file '[&\"$BasePath\"]/input/[&\"region\"]_data.csv'

Map by replacing parameters with actual values or use pattern matching:

{
  "[&\"$BasePath\"]/input/[&\"region\"]_data.csv": "PUBLIC.REGION_DATA_TABLE"
}

Multiple Files

{
  "/data/sales_2023.csv": "PUBLIC.SALES_2023",
  "/data/sales_2024.csv": "PUBLIC.SALES_2024",
  "/data/customers.csv": "PUBLIC.CUSTOMERS"
}

Reject Handling

-rejects continue

DataStage option to continue processing when encountering malformed records. In Snowflake, this is handled during the ingestion phase (not in SQL conversion).

Limitations

⚠️ Requires input mapping - File paths must be mapped to Snowflake tables; no automatic file reading in SQL
⚠️ Pre-ingestion required - Files must be loaded into Snowflake using Dadosfera Ingestion module before SQL execution
⚠️ Parameter resolution - DataStage job parameters in file paths must be resolved or mapped appropriately
⚠️ No file format conversion - CSV parsing, delimiter handling, and data type conversions must be done during ingestion, not in SQL
⚠️ Reject records - Malformed record handling must be configured in the ingestion pipeline

Best Practices

When to Use IMPORT Transformer

✅ Simple file reads with pre-defined schema
✅ Files already ingested into Snowflake tables
✅ Straightforward column mappings (1:1 from file to table)

Pre-Ingestion with Dadosfera

Always use Dadosfera Ingestion module to:

Load CSV/sequential files into Snowflake tables
Handle schema mapping and data type conversions
Configure error handling for malformed records
Set up incremental loads if needed

Then configure input_mapping to point to the ingested tables.

Related Transformers

InputStageTransformer - For Oracle database inputs (not files)
CopyStageTransformer - For reading DataStage datasets (not files)
ModifyStageTransformer - For post-import column filtering/renaming