InputStageTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

InputStageTransformer

Overview

Transforms DataStage INPUT stages (Oracle Connector) to Snowflake SQL. Converts Oracle SQL syntax to Snowflake-compatible SQL using sqlglot library.

Stage Type: INPUT (pxbridge operator with Oracle connector) Output: Snowflake SQL SELECT statement

Recommended Approach

For INPUT stages with complex Oracle SQL queries, we strongly recommend:

  1. Create an Oracle view with the original query
  2. Ingest the view data into Snowflake using Dadosfera Ingestion module
  3. Replace the INPUT stage with a simple SELECT * FROM target_table

This approach avoids SQL conversion issues and ensures data integrity.

Example Workflow

Original (Oracle):

-- Complex Oracle query with UDFs, DECODE, etc.
SELECT ... FROM ... WHERE ... (complex logic)

Recommended:

  1. Create Oracle view:

    CREATE VIEW ORACLE_SCHEMA.VW_SOURCE_DATA AS
    SELECT ... (original complex query)
  2. Ingest to Snowflake using Dadosfera Ingestion:

    • Configure Oracle source connection
    • Select ORACLE_SCHEMA.VW_SOURCE_DATA as source
    • Define target: SNOWFLAKE_SCHEMA.SOURCE_DATA
    • Run ingestion pipeline
  3. Replace INPUT stage SQL:

    SELECT * FROM SNOWFLAKE_SCHEMA.SOURCE_DATA

Capabilities (When Auto-Conversion is Needed)

  • Oracle SQL to Snowflake SQL conversion
  • Handles Oracle-specific functions (NVL, SUBSTR, etc.)
  • Converts data type syntax
  • Preserves complex CASE statements
  • Maintains WHERE clause filters
  • Supports nested subqueries

DataStage Stage Example

Input (.dsx format)

#### STAGE: InputData
## Operator
pxbridge
## Operator options
-connector '{name=OracleConnector}'
-XMLProperties '<SelectStatement>
SELECT
  ID_FIELD,
  STATUS_CODE,
  NVL(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
  SUBSTR(NAME_FIELD, 1, 10) AS SHORT_NAME,
  CASE
    WHEN TYPE_CODE = 'A' THEN '1'
    WHEN TYPE_CODE = 'B' THEN '2'
    ELSE '0'
  END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE STATUS = 'ACTIVE'
  AND ROWNUM < 10000
</SelectStatement>'

## General options
[ident('InputData'); jobmon_ident('InputData')]
## Outputs
0> [] 'InputData:Output.v'

Oracle SQL (Input)

SELECT
  ID_FIELD,
  STATUS_CODE,
  NVL(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
  SUBSTR(NAME_FIELD, 1, 10) AS SHORT_NAME,
  CASE
    WHEN TYPE_CODE = 'A' THEN '1'
    WHEN TYPE_CODE = 'B' THEN '2'
    ELSE '0'
  END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE STATUS = 'ACTIVE'
  AND ROWNUM < 10000

Generated Snowflake SQL (Output)

SELECT
  ID_FIELD,
  STATUS_CODE,
  COALESCE(DATE_FIELD, CURRENT_DATE) AS DATE_FIELD,
  SUBSTRING(NAME_FIELD, 1, 10) AS SHORT_NAME,
  CASE
    WHEN TYPE_CODE = 'A' THEN '1'
    WHEN TYPE_CODE = 'B' THEN '2'
    ELSE '0'
  END AS TYPE_VALUE
FROM SOURCE_SCHEMA.SOURCE_TABLE
WHERE
  STATUS = 'ACTIVE'
  AND ROWNUM < 10000

Output Schema

[
  {"name": "ID_FIELD", "type": "string"},
  {"name": "STATUS_CODE", "type": "string"},
  {"name": "DATE_FIELD", "type": "date"},
  {"name": "SHORT_NAME", "type": "string"},
  {"name": "TYPE_VALUE", "type": "string"}
]

How It Works (Auto-Conversion)

1. Extract Oracle SQL

Parses the XML properties to extract the SelectStatement containing Oracle SQL.

2. Detect Oracle Functions

Identifies Oracle-specific syntax that needs conversion:

  • NVL()COALESCE()
  • SUBSTR()SUBSTRING()
  • || (concatenation) → CONCAT()
  • SYSDATECURRENT_TIMESTAMP
  • TO_CHAR(), TO_DATE() → Snowflake equivalents

3. Convert Using sqlglot

Uses the sqlglot library to parse and transpile:

import sqlglot

snowflake_sql = sqlglot.transpile(
    oracle_sql,
    read='oracle',
    write='snowflake'
)[0]

4. Handle Oracle UDFs

Detects user-defined functions that cannot be automatically converted:

  • Logs warnings for manual review
  • Preserves function calls as-is
  • Marks for post-processing

5. Extract Schema

Attempts to infer schema from:

  • Column aliases in SELECT
  • DSSchema in stage metadata
  • Column metadata from Oracle connector

Oracle to Snowflake Function Mapping

Oracle FunctionSnowflake EquivalentExample
NVL(a, b)COALESCE(a, b)COALESCE(DATE_FIELD, CURRENT_DATE)
SUBSTR(str, pos, len)SUBSTRING(str, pos, len)SUBSTRING(NAME, 1, 10)
a || bCONCAT(a, b)CONCAT(FIRST_NAME, LAST_NAME)
SYSDATECURRENT_TIMESTAMPCURRENT_TIMESTAMP
ROWNUMROW_NUMBER() or LIMITContext-dependent
DECODE(x, a, b, c)CASE WHEN x=a THEN b ELSE c ENDSee CASE conversion

Limitations

  • ⚠️ Complex queries - For complex Oracle SQL with UDFs, PL/SQL, or advanced features, use the recommended view + Dadosfera Ingestion approach
  • ⚠️ Oracle UDFs - Custom Oracle functions cannot be automatically converted; require manual review
  • ⚠️ Complex PL/SQL - Procedural logic not supported (use stored procedures in Snowflake)
  • ⚠️ ROWNUM - Simple ROWNUM < N converted; complex ROWNUM logic may need manual adjustment
  • ⚠️ Hierarchical Queries - CONNECT BY not supported by sqlglot; requires rewrite
  • ⚠️ Flashback Queries - Oracle-specific time travel syntax not convertible
  • ⚠️ Partitioned Outer Joins - Oracle syntax (+) may not convert correctly

Best Practices

When to Use Auto-Conversion

  • ✅ Simple SELECT statements
  • ✅ Standard Oracle functions (NVL, SUBSTR, etc.)
  • ✅ Basic CASE/WHERE/JOIN logic
  • ✅ No custom UDFs

When to Use View + Dadosfera Ingestion

  • Complex queries with UDFs
  • Queries using DECODE, CONNECT BY, (+) joins
  • Queries with PL/SQL elements
  • Queries that parse fixed-width files (SUBSTR on LINHA column)
  • Any query where automatic conversion is uncertain

Warnings Generated

The transformer logs warnings for:

  • Unrecognized Oracle functions
  • Failed conversions
  • Ambiguous syntax
  • Missing schema information

Example warning:

⚠️ Oracle UDF detected: CUSTOM_FUNCTION() - requires manual review
⚠️ Recommendation: Create Oracle view and ingest using Dadosfera Ingestion module

Related Transformers

  • ImportStageTransformer - For file-based inputs (CSV, sequential)
  • CopyStageTransformer - For DataStage dataset inputs
  • TransformerStageTransformer - For post-processing with derivations