ImportStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
ImportStageTransformer
Overview
Transforms DataStage IMPORT stages (CSV/sequential file readers) to SQL SELECT statements. Maps file paths to pre-loaded Snowflake tables.
Stage Type: INPUT (import operator with PxSequentialFile)
Output: SQL SELECT from mapped Snowflake table
Recommended Approach
For IMPORT stages reading CSV or sequential files, we strongly recommend:
- Ingest the files into Snowflake using Dadosfera Ingestion module
- Map the file path to the target Snowflake table in the input mapping configuration
- The converter will generate a simple
SELECTfrom the target table
This approach ensures data is already loaded in Snowflake before the converted SQL runs.
Example Workflow
Original (DataStage):
File: /data/input/sales_data.csv
Schema: ID, NAME, AMOUNT
Recommended:
-
Ingest file using Dadosfera Ingestion module:
- Source:
/data/input/sales_data.csv - Target:
PUBLIC.SALES_DATA_TABLE - Run ingestion pipeline
- Source:
-
Configure input mapping:
{ "/data/input/sales_data.csv": "PUBLIC.SALES_DATA_TABLE" } -
Converter generates:
SELECT ID, NAME, AMOUNT FROM PUBLIC.SALES_DATA_TABLE
Capabilities
- Maps CSV/sequential file paths to Snowflake tables
- Preserves column schema from file definition
- Handles delimited files (CSV, semicolon, tab, etc.)
- Supports nullable columns
- Column list from inline schema or
-firstLineColumnNames
DataStage Stage Example
Input (.dsx format)
#### STAGE: ImportFileData
## Operator
import
## Operator options
-schema record
{final_delim=end, record_delim_string='\\r\\n', delim=';', quote=none}
(
REGION_CODE:nullable ustring[max=50];
SEGMENT_CODE:nullable ustring[max=50];
PRODUCT_NAME:nullable ustring[max=255];
VALUE_AMOUNT:nullable ustring[max=255];
STATUS_FLAG:nullable ustring[max=10];
)
-rejects continue
-reportProgress yes
-firstLineColumnNames
-file '[&\"$PathDataFiles\"]/input_data/processed/[&\"file_name\"].csv'
## General options
[ident('ImportFileData'); jobmon_ident('ImportFileData')]
## Outputs
0> [] 'ImportFileData:output.v'
Input Mapping Configuration
{
"[&\"$PathDataFiles\"]/input_data/processed/[&\"file_name\"].csv": "PUBLIC.INPUT_DATA_TABLE"
}Generated SQL Output
SELECT
REGION_CODE,
SEGMENT_CODE,
PRODUCT_NAME,
VALUE_AMOUNT,
STATUS_FLAG
FROM PUBLIC.INPUT_DATA_TABLEOutput Schema
[
{"name": "REGION_CODE", "type": "string"},
{"name": "SEGMENT_CODE", "type": "string"},
{"name": "PRODUCT_NAME", "type": "string"},
{"name": "VALUE_AMOUNT", "type": "string"},
{"name": "STATUS_FLAG", "type": "string"}
]How It Works
1. Extract File Path
Parses the -file option to identify the source file:
-file '[&\"$PathDataFiles\"]/input_data/processed/sales_2024.csv'
2. Lookup Snowflake Table
Uses input mapping to find corresponding Snowflake table:
{
"[&\"$PathDataFiles\"]/input_data/processed/sales_2024.csv": "PUBLIC.SALES_2024_TABLE"
}3. Extract Schema
Reads column definitions from inline schema:
-schema record
{final_delim=end, record_delim_string='\\r\\n', delim=';', quote=none}
(
REGION_CODE:nullable ustring[max=50];
SEGMENT_CODE:nullable ustring[max=50];
...
)
4. Generate SELECT Statement
Creates SQL selecting specified columns from mapped table:
SELECT
REGION_CODE,
SEGMENT_CODE,
...
FROM PUBLIC.SALES_2024_TABLEFile Options
Delimiter Configuration
-schema record
{delim=';'} ← Semicolon delimiter
{delim=','} ← Comma delimiter (CSV)
{delim='\\t'} ← Tab delimiter (TSV)
Record Delimiter
{record_delim_string='\\r\\n'} ← Windows line endings (CRLF)
{record_delim_string='\\n'} ← Unix line endings (LF)
Quote Handling
{quote='"'} ← Double quote for text fields
{quote=none} ← No quoting
First Line Options
-firstLineColumnNames ← Skip first row (contains headers)
Without this option, all rows are treated as data.
Column Type Mapping
DataStage types in schema are converted to Snowflake types:
| DataStage Type | Snowflake Type |
|---|---|
ustring[max=N] | string |
string[max=N] | string |
int32 | number |
int64 | number |
decimal[p,s] | number |
date | date |
timestamp | timestamp |
All types can be marked nullable.
Input Mapping Examples
Simple File Mapping
{
"/data/sales.csv": "PUBLIC.SALES_TABLE"
}Parameterized File Path
DataStage often uses parameters in file paths:
-file '[&\"$BasePath\"]/input/[&\"region\"]_data.csv'
Map by replacing parameters with actual values or use pattern matching:
{
"[&\"$BasePath\"]/input/[&\"region\"]_data.csv": "PUBLIC.REGION_DATA_TABLE"
}Multiple Files
{
"/data/sales_2023.csv": "PUBLIC.SALES_2023",
"/data/sales_2024.csv": "PUBLIC.SALES_2024",
"/data/customers.csv": "PUBLIC.CUSTOMERS"
}Reject Handling
-rejects continue
DataStage option to continue processing when encountering malformed records. In Snowflake, this is handled during the ingestion phase (not in SQL conversion).
Limitations
- ⚠️ Requires input mapping - File paths must be mapped to Snowflake tables; no automatic file reading in SQL
- ⚠️ Pre-ingestion required - Files must be loaded into Snowflake using Dadosfera Ingestion module before SQL execution
- ⚠️ Parameter resolution - DataStage job parameters in file paths must be resolved or mapped appropriately
- ⚠️ No file format conversion - CSV parsing, delimiter handling, and data type conversions must be done during ingestion, not in SQL
- ⚠️ Reject records - Malformed record handling must be configured in the ingestion pipeline
Best Practices
When to Use IMPORT Transformer
- ✅ Simple file reads with pre-defined schema
- ✅ Files already ingested into Snowflake tables
- ✅ Straightforward column mappings (1:1 from file to table)
Pre-Ingestion with Dadosfera
Always use Dadosfera Ingestion module to:
- Load CSV/sequential files into Snowflake tables
- Handle schema mapping and data type conversions
- Configure error handling for malformed records
- Set up incremental loads if needed
Then configure input_mapping to point to the ingested tables.
Related Transformers
- InputStageTransformer - For Oracle database inputs (not files)
- CopyStageTransformer - For reading DataStage datasets (not files)
- ModifyStageTransformer - For post-import column filtering/renaming
Updated 5 days ago
