[EN] Pipelines

📘

What is a pipeline?

A data pipeline is the process by which data from a source is directed to a destination with or without prior processing and transformation. At Dadosfera, we use the EL(T) paradigm.

A Pipeline at Dadosfera has its monitoring metrics and specific properties, such as name, description, status, and execution history.

Recurrent data collection in batches is done through the creation of a pipeline, determined from the selection of a source, for the evolution of the collected data within the Platform in the following stages.

Stages

Loading data into the Platform basically consists of:

  • Registering or choosing a registered data source;
  • Defining the general information of the pipeline;
  • Inserting the pipeline settings (which vary according to the type of source);
  • Defining the entities, columns, and synchronization mode (which varies according to the type of source);
  • Creating micro-transformation (optional);
  • Choosing the frequency of collection.

Supported Data

ClassificationData Type
Numericnumber, decimal numeric, int, integer, bigint, smallint, byteint, float, float4, float8, double, double precision, real
String and Binaryvarchar, char, character, string, text, binary, verbinary
Logicalboolean
Date and Timedate, datetime, time, timestamp, timestamp_ltz, timestamp_ntz, timestamp_tz
Semi-structuredvariant, object, array
Geospatialgeography

Unsupported Data

ClassificationData Type
LOB (Large Object)blob, clob
Othersenum, user-defined data type

To learn more about supported data types, access the "Data types" topic in the Snowflake documentation.