[EN] Dadosfera Infrastructure Guide
Here are described the Infrastructure and Add-ons combinations of Dadosfera. For prices and conditions, please contact us.
Modules
Module | Description | Basic Pro. | Standard | Advanced | Enterprise* |
---|---|---|---|---|---|
Collection | Data collection using ready-to-use connectors. | Shared (Small) | Shared (Medium) | Shared (Medium) | Dedicated (Large +) |
Processing | Create Data Transformation flows in Python, R or Julia within Dadosfera. | Not Available | Optional (Medium) | Dedicated (Large) | Dedicated (Large +) |
Query | Query Module for materialization and creation of data views (views) using SQL. Also used for Massive Data Transformation. | Shared (Small) | Shared (Small) | Dedicated (Small) | Dedicated with Auto-Scaling** (Small +) |
Visualization | Construction of Descriptive Analyses such as Reports, Dashboards, and Graphs. | Dedicated (Small) | Dedicated (Medium) | Dedicated (Large) | Dedicated (Large +) |
Intelligence | Creation of notebooks and ML/AI models and availability of Data Apps in R Shiny or Streamlit. | Not Available | Optional (Medium) | Dedicated (Large) | Dedicated (Large +) |
[*] The Enterprise plan offers flexibility to vertically scale (in number of nodes) and horizontally (in node size) all the above resources. For customized proposals, contact our Commercial team.
[**] In the Enterprise plan, there is the possibility of configuring Automated Auto-Scaling. Thus, the client can have access to a cluster with several nodes, which scale according to processing needs. More info here
Important
Dadosfera guarantees the isolation of all client data, regardless of the use of shared or non-shared cloud resources.
- Definition of Clusters
- Dedicated Cluster: Resources are exclusive per client.
- Shared Cluster: Resources are shared with other clients.
Infrastructure Specification by Module
Below, we specify the amount of CPU and RAM resources available for each module, whether in a cluster or not.
Module | Small | Medium | Large |
---|---|---|---|
Collection | 2 vCPU/4.00 GB | 2 vCPU/8.00 GB | 4 vCPU/16.00 GB |
Processing | - | 4 vCPU/16.00 GB | 8 vCPU/32.00 GB |
Query* | 8 vCPU/16.00 GB | 16 vCPU/32.00 GB | 32 vCPU/64.00 GB |
Catalog | 2x (2 vCPU/1.00 GB) | 2x (2 vCPU/2.00 GB) | 2x2 vCPU/4.00 GB |
Visualization | 2x (2 vCPU/2.00 GB) | 2x (2 vCPU/4.00 GB) | 2x2 vCPU/8.00 GB |
Intelligence | - | 4 vCPU/16.00 GB | 8 vCPU/32.00 GB |
[*] By default, Dadosfera provides only 1 node for the Query module. For Autoscaling or Multi-Node Cluster, contact us.
Service Availability
To consult the Availability of the DW service used, refer to the documentation below
Massive Transformation Module
The Massive Transformation module allows the use of the computational power of the Query (MPP/DW) module to perform transformations of large volumes of data, in seconds, using Python and/or R.
Enterprise Implementation
To create a 100% dedicated Dadosfera environment, an Implementation phase is required, executed by the Dadosfera Professional Services team. This implementation lasts between 30 and 90 days, depending on the complexity of the project, consisting of the following scope:
- Creation of an exclusive Cloud Account for the client
- Governance Checklist for the new account
- Identity and Access Configuration
- Infrastructure Provisioning (via IaC)
- Deployment of Dadosfera Software in the modules
- MPP Configuration
- Module Configuration
- Standard Implementation (Standard Setup)
Additional Features and Components
Below are defined additional infrastructure components or functionalities that enable different forms of secure integrations between Dadosfera and its clients' environments.
Definition:
Feature/Resource | Description | Module |
---|---|---|
VPN | A Virtual Private Network allows secure access to a private network from a public or unsecure network, ensuring that the data sent and received by Dadosfera is encrypted and secure. | Connections |
VPC Peering (AWS) | VPC Peering is a network connection between two VPCs that allows traffic routing between them using private IPv4 or IPv6 addresses. This connection is more performant and guarantees zero cost for networks in the same region. | Connections |
Fixed IP | A fixed IP is a permanent internet address that remains the same over time, unlike dynamic IPs that change. Dadosfera provides these IPs for Firewall releases, ensuring a safe opening of the clients' data sources. | Connections |
SSH Tunneling | SSH Tunneling, or SSH port forwarding, is a method of transporting arbitrary network data over an encrypted SSH connection. It can provide a secure path for data transmission. | Connections |
Row Level Security (RLS) | RLS is a feature in databases that restricts access to data rows based on user roles or permissions. Some users have restricted visibility of data based on these roles. | Visualization |
Dedicated Spark Cluster | Dadosfera can provision a Dedicated Spark Cluster to port client data transformation codes using this technology. | Transformation |
Optimized Snowpark Cluster | An Optimized Snowpark Cluster refers to a cluster specially configured for Snowpark, a service in the Snowflake Data Cloud that allows users to easily process large volumes of data, using Python, R, or Scala. | Transformation |
Extended Backup / Disaster Recovery | Refers to strategies and procedures put in place to recover and protect an enterprise's IT infrastructure in the event of a disaster. Included for up to 90 days in the Enterprise Tier. | Query |
Multi-cluster warehouse | In cloud data platforms like Snowflake, a multi-cluster warehouse allows simultaneous processing and performance enhancement by using multiple computing clusters. | Query |
Up to 90 days of Time-Travel | In data systems like Snowflake, Time-Travel refers to the ability to access historical data within a certain period, in this case, up to 90 days. | Query |
Annual Rotation of Encrypted Data Keys | An annual rekey refers to the practice of changing encryption keys on an annual basis. It is a recommended security practice to protect sensitive data. | Query |
Materialized Views | In databases, a Materialized View is a database object that contains the results of a query and can be updated as the data changes. They are often used to improve query performance. | Query |
Search Optimization | A feature that enables performant use of the MPP as a full-text search database. | Query |
Dynamic Data Masking (Dynamic Data Masking) | It is a Column Level Security (CLS) feature that uses mask policies to selectively mask plain text data in table columns and views during query execution. | Query |
External Data Tokenization | Allows accounts to tokenize data before loading it into Snowflake and decrypt the data during query execution. Tokenization is the process of removing sensitive data, replacing it with an indecipherable token. External Tokenization uses mask policies with external functions. | Query |
GPU | Possibility of using GPU in the infrastructure for Training and Inference of ML/AI models. | Intelligence |
Availability
Feature/Resource | Module | Basic Pro. | Standard | Advanced | Enterprise |
---|---|---|---|---|---|
VPN | Connections | Optional | Included | Included | Included |
VPC Peering | Connections | Not Available | Optional | Included | Included |
Fixed IP | Connections | Optional | Included | Included | Included |
| Row Level Security (RLS)
Availability
Feature/Resource | Module | Basic Pro. | Standard | Advanced | Enterprise |
---|---|---|---|---|---|
VPN | Connections | Optional | Included | Included | Included |
VPC Peering | Connections | Not Available | Optional | Included | Included |
Fixed IP | Connections | Optional | Included | Included | Included |
Row Level Security (RLS) | Visualization | Not Available | Optional | Included | Included |
SSH Tunneling | Connections | Optional | Optional | Included | Included |
Backup / Disaster Recovery** | Query | Not Available | Optional | Included | Included |
Dedicated Spark Cluster | Transformation | Not Available | Optional | Optional | Included |
Optimized Snowpark Cluster* | Transformation | Not Available | Not Available | Optional | Optional |
Multi-cluster warehouse | Query | Not Available | Not Available | Optional | Included |
Up to 90 days of Time-Travel | Query | Not Available | Not Available | Optional | Included |
Annual Rotation of Encrypted Data Key | Query | Not Available | Not Available | Optional | Included |
Materialized Views | Query | Not Available | Not Available | Optional | Included |
Search Optimization | Query | Not Available | Not Available | Optional | Included |
Dynamic Data Masking | Query | Not Available | Not Available | Optional | Included |
External Data Tokenization | Query | Not Available | Not Available | Optional | Included |
GPU | Intelligence | Not Available | Not Available | Optional | Optional |
[*] Only available for Clusters with nodes larger than Medium.
[**] Fail-Safe of up to 7 days - immediate availability. Historical Glacier - via request, available within up to 7 days.
Cloud and Availability Regions
- Dadosfera SaaS (Tiers Basic Professional, Standard, and Advanced) is available on AWS - Amazon Web Services, region US - North Virginia
us-east-1
For Enterprise Tier, we have the following availability:
Cloud Provider | Available Modules | Region |
---|---|---|
Amazon Web Services (AWS) | All | us-east-1 |
Google Cloud Platform (GCP) | Intelligence, Transformation, Query (DW) | us-east-1 |
Azure Cloud Platform (Azure) | Query (DW/Massive Transformation) | us-east-1 |
For availability in other regions and cloud providers, contact us.
Updated 12 months ago