[EN] Scheduler
About
Configure the desired frequency for your pipeline to run. You can choose from the presented options or insert a custom frequency using a cron expression.
You can delve into the possibilities and limitations of scheduling through the official Airflow documentation.
The default time zone used in the frequency is UTC.
All frequency methods define when extractions will start. They do not control how long the replication job will run or when the data will actually be at the destination.
How to Configure Scheduling
In the last step of creating the pipeline, you can choose from the available intuitive options or use the 'Custom' option, in which a cron expression is manually entered.
Single Extraction
Opt for a single initial data extraction, not configuring the schedule. The cold load is performed to collect static data, without defining the schedule. After creation, the collection will be performed only once. However, manual sync is still possible to run the collection again.
Custom Scheduling
It's possible to specify granular start times for your data extraction. Using a cron expression, you can specify the exact hours, days of the week, and days of the month when data extraction should start. Dadosfera uses the Quartz standard for cron scheduling.
Syntax
A cron expression is composed of six fields that describe, separated by spaces. The fields in the expression must be in the following order, and an expression must have all six fields to be considered valid:
[minutes] [hours] [day of the month] [month] [day of the week]
Allowed Characters
Field | Allowed Values | Special Characters Allowed in Dadosfera |
---|---|---|
Minutes | 0-59 | n/a |
Hours | 0-23 | - |
Day of the Month | 1-31 | , - * / |
Month | 1-12 | , - * / |
Day of the Week | 0 - 6 | , - * / |
Currently, Dadosfera updates data, at a minimum, Hourly.
If your contracted plan is Basic, the minimum frequency is Daily. If you wish to perform your collection at a higher frequency, please contact the sales team.
Special Character Descriptions
-
ASTERISK - Selects all values within a field.
Examples:
_ in the Month field means "every month"
_ in the Day of the Week field means "every day of the week" -
COMMA
Specifies a list of two or more values.
Examples:
1,2,5 in the Month field means “the months of January, February, and May”
2, 6 in the Day of the Week field means “the days Monday and Friday” -
HYPHEN
Specifies a range of values.
Examples:
5-8 in the Hour field means "hours 5, 6, 7, and 8"
2-4 in the Day of the Week field means "the days Monday, Tuesday, and Wednesday" -
SLASH
Specifies increments. Formatted as: /<value_to_increment>
Examples:
0/15 in the Minute field means "the minutes 0, 15, 30, and 45"
3/6 in the Hour field means “every 6 hours starting at the third hour”
1/5 in the Day of the Month field means "every 5 days starting on the first day of the month"
Examples
Run at midnight UTC every day
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 0 | - | - | - |
Run at six o'clock UTC every day
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 6 | - | - | - |
Run every Monday at six o'clock UTC
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 6 | - | - | 1 |
Run at six o'clock UTC on the 1st of each month
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 6 | 1 | - | - |
Run at twenty-two o'clock UTC, from Monday to Friday
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 22 | - | - | 1-5 |
Run at midnight and twelve o'clock UTC, on the 1st of the month, every 2 months
Minutes | Hours | Day of the Month | Month | Day of the Week |
---|---|---|---|---|
0 | 0,12 | 1 | */2 | - |
References
-
Cron Expression Translator - A free cron expression translator.
-
Tool to Learn, Build, and Test Regular Expressions - A tool to learn, build, and test Regular Expressions.
Ready! Now just wait for the collection to be done at the scheduled time and day.
If you want to run the pipeline immediately, you can do so manually. Go to "Pipelines", "List", and "Synchronize Pipeline".
After a few minutes, your catalog will be updated in the exploration tab as a Data Asset.
Updated about 1 year ago