[EN] Scheduler

About

Configure the desired frequency for your pipeline to run. You can choose from the presented options or insert a custom frequency using a cron expression.

You can delve into the possibilities and limitations of scheduling through the official Airflow documentation.

📘

  • The default time zone used in the frequency is UTC.

  • All frequency methods define when extractions will start. They do not control how long the replication job will run or when the data will actually be at the destination.

How to Configure Scheduling

In the last step of creating the pipeline, you can choose from the available intuitive options or use the 'Custom' option, in which a cron expression is manually entered.

Single Extraction

Opt for a single initial data extraction, not configuring the schedule. The cold load is performed to collect static data, without defining the schedule. After creation, the collection will be performed only once. However, manual sync is still possible to run the collection again.

Custom Scheduling

It's possible to specify granular start times for your data extraction. Using a cron expression, you can specify the exact hours, days of the week, and days of the month when data extraction should start. Dadosfera uses the Quartz standard for cron scheduling.

Syntax

A cron expression is composed of six fields that describe, separated by spaces. The fields in the expression must be in the following order, and an expression must have all six fields to be considered valid:

[minutes] [hours] [day of the month] [month] [day of the week]

Allowed Characters

FieldAllowed ValuesSpecial Characters Allowed in Dadosfera
Minutes0-59n/a
Hours0-23-
Day of the Month1-31, - * /
Month1-12, - * /
Day of the Week0 - 6, - * /

📘

  • Currently, Dadosfera updates data, at a minimum, Hourly.

  • If your contracted plan is Basic, the minimum frequency is Daily. If you wish to perform your collection at a higher frequency, please contact the sales team.

Special Character Descriptions

  • ASTERISK - Selects all values within a field.
    Examples:
    _ in the Month field means "every month"
    _ in the Day of the Week field means "every day of the week"

  • COMMA
    Specifies a list of two or more values.
    Examples:
    1,2,5 in the Month field means “the months of January, February, and May”
    2, 6 in the Day of the Week field means “the days Monday and Friday”

  • HYPHEN
    Specifies a range of values.
    Examples:
    5-8 in the Hour field means "hours 5, 6, 7, and 8"
    2-4 in the Day of the Week field means "the days Monday, Tuesday, and Wednesday"

  • SLASH
    Specifies increments. Formatted as: /<value_to_increment>
    Examples:
    0/15 in the Minute field means "the minutes 0, 15, 30, and 45"
    3/6 in the Hour field means “every 6 hours starting at the third hour”
    1/5 in the Day of the Month field means "every 5 days starting on the first day of the month"

Examples

Run at midnight UTC every day

MinutesHoursDay of the MonthMonthDay of the Week
00---

Run at six o'clock UTC every day

MinutesHoursDay of the MonthMonthDay of the Week
06---

Run every Monday at six o'clock UTC

MinutesHoursDay of the MonthMonthDay of the Week
06--1

Run at six o'clock UTC on the 1st of each month

MinutesHoursDay of the MonthMonthDay of the Week
061--

Run at twenty-two o'clock UTC, from Monday to Friday

MinutesHoursDay of the MonthMonthDay of the Week
022--1-5

Run at midnight and twelve o'clock UTC, on the 1st of the month, every 2 months

MinutesHoursDay of the MonthMonthDay of the Week
00,121*/2-

References

  1. Cron Expression Translator - A free cron expression translator.

  2. Tool to Learn, Build, and Test Regular Expressions - A tool to learn, build, and test Regular Expressions.

Ready! Now just wait for the collection to be done at the scheduled time and day.

If you want to run the pipeline immediately, you can do so manually. Go to "Pipelines", "List", and "Synchronize Pipeline".

After a few minutes, your catalog will be updated in the exploration tab as a Data Asset.