InputStageTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
InputStageTransformer
Overview
Transforms DataStage INPUT stages (Oracle Connector) to Snowflake SQL. Converts Oracle SQL syntax to Snowflake-compatible SQL using sqlglot library.
Stage Type: INPUT (pxbridge operator with Oracle connector)
Output: Snowflake SQL SELECT statement
Recommended Approach
For INPUT stages with complex Oracle SQL queries, we strongly recommend:
- Create an Oracle view with the original query
- Ingest the view data into Snowflake using Dadosfera Ingestion module
- Replace the INPUT stage with a simple
SELECT * FROM target_table
This approach avoids SQL conversion issues and ensures data integrity.
Example Workflow
Original (Oracle):
-- Complex Oracle query with UDFs, DECODE, etc.
SELECT ... FROM ... WHERE ... (complex logic)Recommended:
-
Create Oracle view:
CREATE VIEW ORACLE_SCHEMA.VW_SOURCE_DATA AS SELECT ... (original complex query) -
Ingest to Snowflake using Dadosfera Ingestion:
- Configure Oracle source connection
- Select
ORACLE_SCHEMA.VW_SOURCE_DATAas source - Define target:
SNOWFLAKE_SCHEMA.SOURCE_DATA - Run ingestion pipeline
-
Replace INPUT stage SQL:
SELECT * FROM SNOWFLAKE_SCHEMA.SOURCE_DATA
Capabilities (When Auto-Conversion is Needed)
- Oracle SQL to Snowflake SQL conversion
- Handles Oracle-specific functions (NVL, SUBSTR, etc.)
- Converts data type syntax
- Preserves complex CASE statements
- Maintains WHERE clause filters
- Supports nested subqueries
DataStage Stage Example
Input (.dsx format)
#### STAGE: InputData
## Operator
pxbridge
## Operator options
-connector '{name=OracleConnector}'
-XMLProperties '<SelectStatement>
SELECT
ID_FIELD,
STATUS_CODE,
NVL(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
SUBSTR(NAME_FIELD, 1, 10) AS SHORT_NAME,
CASE
WHEN TYPE_CODE = 'A' THEN '1'
WHEN TYPE_CODE = 'B' THEN '2'
ELSE '0'
END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE STATUS = 'ACTIVE'
AND ROWNUM < 10000
</SelectStatement>'
## General options
[ident('InputData'); jobmon_ident('InputData')]
## Outputs
0> [] 'InputData:Output.v'
Oracle SQL (Input)
SELECT
ID_FIELD,
STATUS_CODE,
NVL(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
SUBSTR(NAME_FIELD, 1, 10) AS SHORT_NAME,
CASE
WHEN TYPE_CODE = 'A' THEN '1'
WHEN TYPE_CODE = 'B' THEN '2'
ELSE '0'
END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE STATUS = 'ACTIVE'
AND ROWNUM < 10000Generated Snowflake SQL (Output)
SELECT
ID_FIELD,
STATUS_CODE,
COALESCE(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
SUBSTRING(NAME_FIELD, 1, 10) AS SHORT_NAME,
CASE
WHEN TYPE_CODE = 'A' THEN '1'
WHEN TYPE_CODE = 'B' THEN '2'
ELSE '0'
END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE
STATUS = 'ACTIVE'
AND ROWNUM < 10000Output Schema
[
{"name": "ID_FIELD", "type": "string"},
{"name": "STATUS_CODE", "type": "string"},
{"name": "DATE_FIELD", "type": "date"},
{"name": "SHORT_NAME", "type": "string"},
{"name": "TYPE_VALUE", "type": "string"}
]How It Works (Auto-Conversion)
1. Extract Oracle SQL
Parses the XML properties to extract the SelectStatement containing Oracle SQL.
2. Detect Oracle Functions
Identifies Oracle-specific syntax that needs conversion:
NVL()→COALESCE()SUBSTR()→SUBSTRING()||(concatenation) →CONCAT()SYSDATE→CURRENT_TIMESTAMPTO_CHAR(),TO_DATE()→ Snowflake equivalents
3. Convert Using sqlglot
Uses the sqlglot library to parse and transpile:
import sqlglot
snowflake_sql = sqlglot.transpile(
oracle_sql,
read='oracle',
write='snowflake'
)[0]4. Handle Oracle UDFs
Detects user-defined functions that cannot be automatically converted:
- Logs warnings for manual review
- Preserves function calls as-is
- Marks for post-processing
5. Extract Schema
Attempts to infer schema from:
- Column aliases in SELECT
- DSSchema in stage metadata
- Column metadata from Oracle connector
Oracle to Snowflake Function Mapping
| Oracle Function | Snowflake Equivalent | Example |
|---|---|---|
NVL(a, b) | COALESCE(a, b) | COALESCE(DATE_FIELD, CURRENT_DATE) |
SUBSTR(str, pos, len) | SUBSTRING(str, pos, len) | SUBSTRING(NAME, 1, 10) |
a || b | CONCAT(a, b) | CONCAT(FIRST_NAME, LAST_NAME) |
SYSDATE | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP |
ROWNUM | ROW_NUMBER() or LIMIT | Context-dependent |
DECODE(x, a, b, c) | CASE WHEN x=a THEN b ELSE c END | See CASE conversion |
Limitations
- ⚠️ Complex queries - For complex Oracle SQL with UDFs, PL/SQL, or advanced features, use the recommended view + Dadosfera Ingestion approach
- ⚠️ Oracle UDFs - Custom Oracle functions cannot be automatically converted; require manual review
- ⚠️ Complex PL/SQL - Procedural logic not supported (use stored procedures in Snowflake)
- ⚠️ ROWNUM - Simple ROWNUM < N converted; complex ROWNUM logic may need manual adjustment
- ⚠️ Hierarchical Queries -
CONNECT BYnot supported by sqlglot; requires rewrite - ⚠️ Flashback Queries - Oracle-specific time travel syntax not convertible
- ⚠️ Partitioned Outer Joins - Oracle syntax
(+)may not convert correctly
Best Practices
When to Use Auto-Conversion
- ✅ Simple SELECT statements
- ✅ Standard Oracle functions (NVL, SUBSTR, etc.)
- ✅ Basic CASE/WHERE/JOIN logic
- ✅ No custom UDFs
When to Use View + Dadosfera Ingestion
- ✅ Complex queries with UDFs
- ✅ Queries using DECODE, CONNECT BY, (+) joins
- ✅ Queries with PL/SQL elements
- ✅ Queries that parse fixed-width files (SUBSTR on LINHA column)
- ✅ Any query where automatic conversion is uncertain
Warnings Generated
The transformer logs warnings for:
- Unrecognized Oracle functions
- Failed conversions
- Ambiguous syntax
- Missing schema information
Example warning:
⚠️ Oracle UDF detected: CUSTOM_FUNCTION() - requires manual review
⚠️ Recommendation: Create Oracle view and ingest using Dadosfera Ingestion module
Related Transformers
- ImportStageTransformer - For file-based inputs (CSV, sequential)
- CopyStageTransformer - For DataStage dataset inputs
- TransformerStageTransformer - For post-processing with derivations
Updated 5 days ago
