Concepts

Pipeline Structure

A pipeline is a data integration workflow that extracts data from a source and loads it into Snowflake. Each pipeline can contain multiple jobs, where each job represents a single table or entity extraction.

Pipeline
├── Job 1 (e.g., users table)
├── Job 2 (e.g., orders table)
└── Job 3 (e.g., products table)

Important: All jobs within a pipeline should use the same connector type (e.g., all JDBC, all S3, or all Singer).

Connector Types

Connector Type	Sync Mode Field	Options
JDBC (PostgreSQL, MySQL, etc.)	`load_type`	`full_load`, `incremental`, `incremental_with_qualify`
S3/Azure (CSV, JSON, Parquet)	`replication_method`	`INCREMENTAL` (default), `FULL_TABLE`
Singer (SaaS APIs)	`replication_method`	`INCREMENTAL`, `FULL_TABLE`
MongoDB	`replication_method`	`FULL_TABLE`

Sync Modes Explained

JDBC Load Types:

full_load - Extracts all data on each run (truncate and reload)
incremental - Extracts only new/modified records based on an incremental column (e.g., updated_at)
incremental_with_qualify - Incremental extraction with deduplication using primary keys

S3/Azure Replication Methods:

INCREMENTAL (default) - Only processes files modified since the last run, based on file LastModified timestamp
FULL_TABLE - Processes all files on each run

Singer Replication Methods:

INCREMENTAL - Extracts only new/modified records since last sync (uses connector's internal state)
FULL_TABLE - Extracts all records on each run

Request Body

Field	Type	Required	Description
`id`	string	Yes	Pipeline UUID
`name`	string	Yes	Pipeline name (unique per customer)
`description`	string	No	Pipeline description
`cron`	string	No	Cron expression for scheduling (null for manual execution)
`jobs`	array	Yes	Array of job configurations

Note: The customer_id is automatically injected by Maestro from the authenticated user's JWT token. You do not need to provide it in the request body.

Note: The config_id in auth_parameters refers to the connection ID created in the Dadosfera Connections area. The format follows the pattern: {timestamp}_{random_id}_{connector_name}-{version} (e.g., 1761160648374_kq2ovtdm_postgresql-1.0.0).

JDBC Connectors

PostgreSQL - Incremental Load

{
  "id": "1113e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "postgres_users_pipeline",
  "description": "Pipeline to ingest users table from PostgreSQL",
  "cron": "0 6 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "postgresql",
        "table_schema": "public",
        "table_name": "users",
        "incremental_column_name": "updated_at",
        "incremental_column_type": "timestamp with time zone",
        "load_type": "incremental",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1761160648374_kq2ovtdm_postgresql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

PostgreSQL - Incremental with Qualify (Deduplication)

Use incremental_with_qualify when you need to deduplicate records based on primary keys. This is useful for tables where the same record can be updated multiple times.

{
  "id": "2223e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "postgres_orders_deduplicated",
  "description": "Pipeline with deduplication using primary keys",
  "cron": null,
  "jobs": [
    {
      "input": {
        "plugin": "postgresql",
        "table_schema": "public",
        "table_name": "orders",
        "incremental_column_name": "updated_at",
        "incremental_column_type": "timestamp with time zone",
        "load_type": "incremental_with_qualify",
        "primary_keys": ["id"],
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1761160648374_kq2ovtdm_postgresql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

PostgreSQL - Full Load with Custom Output Table

By default, tables are created in the PUBLIC schema with the source table name. You can customize the output table name and schema using the raw configuration in the output:

{
  "id": "3333e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "postgres_customers_bronze",
  "description": "Full load with custom output table name and schema",
  "cron": "0 0 * * 0",
  "jobs": [
    {
      "input": {
        "plugin": "postgresql",
        "table_schema": "public",
        "table_name": "customers",
        "load_type": "full_load",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1761160648374_kq2ovtdm_postgresql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake",
        "raw": {
          "table_name": "customers",
          "table_schema": "BRONZE"
        }
      }
    }
  ]
}

PostgreSQL - Incremental with Qualify and Custom Tables (Bronze/Gold)

For incremental_with_qualify, you can customize both the raw (staging) table and the qualify (deduplicated) table. This is useful for implementing medallion architecture patterns:

raw: The staging table where all incremental records are inserted (default: PUBLIC schema)
qualify: The final deduplicated table (default: STAGED schema)

{
  "id": "3334e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "orders_medallion_architecture",
  "description": "Incremental with qualify using Bronze/Gold medallion architecture",
  "cron": "0 */2 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "postgresql",
        "table_schema": "public",
        "table_name": "orders",
        "incremental_column_name": "updated_at",
        "incremental_column_type": "timestamp with time zone",
        "load_type": "incremental_with_qualify",
        "primary_keys": ["order_id"],
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1761160648374_kq2ovtdm_postgresql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake",
        "raw": {
          "table_name": "staging_orders",
          "table_schema": "BRONZE"
        },
        "qualify": {
          "table_name": "orders",
          "table_schema": "GOLD"
        }
      }
    }
  ]
}

Data Flow: Source → BRONZE.staging_orders (raw inserts) → GOLD.orders (deduplicated)

PostgreSQL - With Column Selection

{
  "id": "4443e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "postgres_selected_columns",
  "description": "Pipeline with specific column selection",
  "cron": null,
  "jobs": [
    {
      "input": {
        "plugin": "postgresql",
        "table_schema": "public",
        "table_name": "transactions",
        "incremental_column_name": "created_at",
        "incremental_column_type": "timestamp with time zone",
        "load_type": "incremental",
        "column_include_list": [
          "id",
          "user_id",
          "amount",
          "created_at"
        ],
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1761160648374_kq2ovtdm_postgresql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

MySQL - Incremental with Qualify (Multiple Primary Keys)

{
  "id": "5553e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "mysql_user_events",
  "description": "Pipeline with composite primary key deduplication",
  "cron": "0 */4 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "mysql",
        "table_schema": "ecommerce",
        "table_name": "user_events",
        "incremental_column_name": "event_timestamp",
        "incremental_column_type": "datetime",
        "load_type": "incremental_with_qualify",
        "primary_keys": ["tenant_id", "user_id", "event_id"],
        "column_include_list": [
          "tenant_id",
          "user_id",
          "event_id",
          "event_type",
          "event_timestamp",
          "event_data"
        ],
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1762345678901_abc123xy_mysql-1.0.0"
        },
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

MongoDB - Full Table Replication

{
  "id": "6663e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "mongodb_companies",
  "description": "Pipeline to ingest MongoDB collection",
  "cron": null,
  "jobs": [
    {
      "input": {
        "plugin": "mongodb",
        "sync_database": "production",
        "table_name": "companies",
        "replication_method": "FULL_TABLE",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1763456789012_def456ab_mongodb-1.0.0"
        },
        "config": {},
        "network_config": null
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

S3/Cloud Storage Connectors

S3 - CSV Files (Incremental - Default)

{
  "id": "7773e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "s3_csv_sales",
  "description": "Pipeline to ingest CSV files from S3 (incremental based on file modification time)",
  "cron": "0 8 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "csv",
        "source_bucket": "my-data-bucket",
        "source_prefix": "raw/sales_data",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1764567890123_ghi789cd_s3-1.0.0"
        },
        "file_format_params": {
          "file_format": "csv",
          "sep": ",",
          "header": true,
          "encoding": "UTF-8"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

S3 - CSV Files (Full Load)

{
  "id": "7774e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "s3_csv_full_load",
  "description": "Pipeline to ingest all CSV files on each run",
  "cron": "0 0 * * 0",
  "jobs": [
    {
      "input": {
        "plugin": "csv",
        "source_bucket": "my-data-bucket",
        "source_prefix": "archives/historical",
        "replication_method": "FULL_TABLE",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1764567890123_ghi789cd_s3-1.0.0"
        },
        "file_format_params": {
          "file_format": "csv",
          "sep": ";",
          "header": true,
          "encoding": "UTF-8"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

S3 - JSON Files

{
  "id": "8883e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "s3_json_events",
  "description": "Pipeline to ingest JSON files from S3",
  "cron": null,
  "jobs": [
    {
      "input": {
        "plugin": "json",
        "source_bucket": "my-data-bucket",
        "source_prefix": "events/2024",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1764567890123_ghi789cd_s3-1.0.0"
        },
        "file_format_params": {
          "file_format": "json",
          "encoding": "UTF-8"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

S3 - Parquet Files

{
  "id": "9993e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "s3_parquet_analytics",
  "description": "Pipeline to ingest Parquet files from S3",
  "cron": "0 6 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "parquet",
        "source_bucket": "analytics-bucket",
        "source_prefix": "processed/daily",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1764567890123_ghi789cd_s3-1.0.0"
        },
        "file_format_params": {
          "file_format": "parquet"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Azure Blob Storage - Parquet Files

{
  "id": "aaa3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "azure_parquet_data",
  "description": "Pipeline to ingest Parquet from Azure Blob Storage",
  "cron": null,
  "jobs": [
    {
      "input": {
        "plugin": "parquet",
        "container_name": "data-container",
        "source_prefix": "datasets/marketing",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1765678901234_jkl012ef_azure-blob-1.0.0"
        },
        "file_format_params": {
          "file_format": "parquet"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Singer Connectors

Facebook Ads - Incremental

{
  "id": "bbb3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "facebook_ads_insights",
  "description": "Pipeline to ingest Facebook Ads data incrementally",
  "cron": "0 8 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "facebook",
        "table_name": "ads_insights",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1766789012345_mno345gh_facebook-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01T00:00:00Z"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

HubSpot - Incremental

{
  "id": "ccc3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "hubspot_contacts",
  "description": "Pipeline to ingest HubSpot CRM data",
  "cron": "0 6 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "hubspot",
        "table_name": "contacts",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1767890123456_pqr678ij_hubspot-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01T00:00:00Z",
          "disable_collection": true
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Salesforce - Bulk API

{
  "id": "ddd3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "salesforce_opportunities",
  "description": "Pipeline to ingest Salesforce data using Bulk API",
  "cron": "0 7 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "salesforce",
        "table_name": "opportunities",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1768901234567_stu901kl_salesforce-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01T00:00:00Z",
          "api_type": "bulk"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Zendesk - Full Table

{
  "id": "eee3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "zendesk_tickets",
  "description": "Pipeline to ingest all Zendesk tickets (full table)",
  "cron": "0 */6 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "zendesk",
        "table_name": "tickets",
        "replication_method": "FULL_TABLE",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1769012345678_vwx234mn_zendesk-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01T00:00:00Z",
          "request_timeout": 300
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

VTEX - Incremental Orders

{
  "id": "eee4e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "vtex_orders",
  "description": "Pipeline to ingest VTEX orders incrementally",
  "cron": "0 */2 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "vtex",
        "table_name": "orders",
        "replication_method": "INCREMENTAL",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1769112345678_vtx567op_vtex-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01",
          "window_size": 1
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Google Sheets

{
  "id": "fff3e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "google_sheets_budget",
  "description": "Pipeline to ingest Google Sheets data",
  "cron": "0 9 * * 1",
  "jobs": [
    {
      "input": {
        "plugin": "google-sheets",
        "table_name": "budget_2024",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1770123456789_yza567qr_google-sheets-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01",
          "spreadsheet_id": "1FojlvtLwS0-BzGS37R0jEXtwSHqSiO1Uw-7RKQQO-C4"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Google Analytics 4 (GA4)

{
  "id": "0003e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "ga4_website_analytics",
  "description": "Pipeline to ingest Google Analytics 4 data",
  "cron": "0 10 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "ga4",
        "table_name": "website_events",
        "auth_parameters": {
          "auth_type": "connection_manager",
          "config_id": "1771234567890_bcd890st_ga4-1.0.0"
        },
        "config": {
          "start_date": "2024-01-01",
          "property_id": "123456789"
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Generic REST API

{
  "id": "1114e943-2187-4fdd-9c2c-54338fedaeef",
  "name": "rest_api_custom",
  "description": "Pipeline to ingest data from a custom REST API",
  "cron": "0 */2 * * *",
  "jobs": [
    {
      "input": {
        "plugin": "rest-api-msdk",
        "table_name": "api_data",
        "config": {
          "api_url": "https://api.example.com",
          "streams": [
            {
              "name": "users",
              "path": "/api/v1/users",
              "records_path": "$.data[*]",
              "primary_keys": ["id"]
            }
          ]
        }
      },
      "transformations": [],
      "output": {
        "plugin": "dadosfera_snowflake"
      }
    }
  ]
}

Response

Success (200)

{
  "status": true,
  "exception_type": null,
  "traceback": null,
  "data": {
    "pipeline_id": "1113e943-2187-4fdd-9c2c-54338fedaeef"
  }
}

Error - Pipeline Already Exists (400)

{
  "status": false,
  "exception_type": "PipelineAlreadyExists",
  "traceback": "Pipeline with name 'postgres_users_pipeline' already exists",
  "data": null
}

Error - Validation Error (400)

{
  "status": false,
  "exception_type": "PipelineValidationError",
  "traceback": "Invalid load_type: must be one of full_load, incremental, incremental_with_qualify",
  "data": null
}

Error - Table Name Conflict (409)

{
  "detail": {
    "status": false,
    "exception_type": "RepositoryResourceAlreadyExists",
    "traceback": "Table name 'TB__USERS' already exists for another pipeline",
    "data": null
  }
}