[EN] Pipelines
What is a pipeline?
A data pipeline is the process by which data from a source is directed to a destination with or without prior processing and transformation. At Dadosfera, we use the EL(T) paradigm.
A Pipeline at Dadosfera has its monitoring metrics and specific properties, such as name, description, status, and execution history.
Recurrent data collection in batches is done through the creation of a pipeline, determined from the selection of a source, for the evolution of the collected data within the Platform in the following stages.
Stages
Loading data into the Platform basically consists of:
- Registering or choosing a registered data source;
- Defining the general information of the pipeline;
- Inserting the pipeline settings (which vary according to the type of source);
- Defining the entities, columns, and synchronization mode (which varies according to the type of source);
- Creating micro-transformation (optional);
- Choosing the frequency of collection.
Supported Data
Classification | Data Type |
---|---|
Numeric | number, decimal numeric, int, integer, bigint, smallint, byteint, float, float4, float8, double, double precision, real |
String and Binary | varchar, char, character, string, text, binary, verbinary |
Logical | boolean |
Date and Time | date, datetime, time, timestamp, timestamp_ltz, timestamp_ntz, timestamp_tz |
Semi-structured | variant, object, array |
Geospatial | geography |
Unsupported Data
Classification | Data Type |
---|---|
LOB (Large Object) | blob, clob |
Others | enum, user-defined data type |
To learn more about supported data types, access the "Data types" topic in the Snowflake documentation.
Updated about 1 year ago