[EN] Dadosfera Infrastructure Guide

Here are described the Infrastructure and Add-ons combinations of Dadosfera. For prices and conditions, please contact us.

Modules

ModuleDescriptionBasic Pro.StandardAdvancedEnterprise*
CollectionData collection using ready-to-use connectors.Shared (Small)Shared (Medium)Shared (Medium)Dedicated (Large +)
ProcessingCreate Data Transformation flows in Python, R or Julia within Dadosfera.Not AvailableOptional (Medium)Dedicated (Large)Dedicated (Large +)
QueryQuery Module for materialization and creation of data views (views) using SQL. Also used for Massive Data Transformation.Shared (Small)Shared (Small)Dedicated (Small)Dedicated with Auto-Scaling** (Small +)
VisualizationConstruction of Descriptive Analyses such as Reports, Dashboards, and Graphs.Dedicated (Small)Dedicated (Medium)Dedicated (Large)Dedicated (Large +)
IntelligenceCreation of notebooks and ML/AI models and availability of Data Apps in R Shiny or Streamlit.Not AvailableOptional (Medium)Dedicated (Large)Dedicated (Large +)

[*] The Enterprise plan offers flexibility to vertically scale (in number of nodes) and horizontally (in node size) all the above resources. For customized proposals, contact our Commercial team.

[**] In the Enterprise plan, there is the possibility of configuring Automated Auto-Scaling. Thus, the client can have access to a cluster with several nodes, which scale according to processing needs. More info here

🚧

Important

Dadosfera guarantees the isolation of all client data, regardless of the use of shared or non-shared cloud resources.

👍
  • Definition of Clusters

  • Dedicated Cluster: Resources are exclusive per client.
  • Shared Cluster: Resources are shared with other clients.

Infrastructure Specification by Module

Below, we specify the amount of CPU and RAM resources available for each module, whether in a cluster or not.

ModuleSmallMediumLarge
Collection2 vCPU/4.00 GB2 vCPU/8.00 GB4 vCPU/16.00 GB
Processing-4 vCPU/16.00 GB8 vCPU/32.00 GB
Query*8 vCPU/16.00 GB16 vCPU/32.00 GB32 vCPU/64.00 GB
Catalog2x (2 vCPU/1.00 GB)2x (2 vCPU/2.00 GB)2x2 vCPU/4.00 GB
Visualization2x (2 vCPU/2.00 GB)2x (2 vCPU/4.00 GB)2x2 vCPU/8.00 GB
Intelligence-4 vCPU/16.00 GB8 vCPU/32.00 GB

[*] By default, Dadosfera provides only 1 node for the Query module. For Autoscaling or Multi-Node Cluster, contact us.

🚧

Service Availability

To consult the Availability of the DW service used, refer to the documentation below

📘

Massive Transformation Module

The Massive Transformation module allows the use of the computational power of the Query (MPP/DW) module to perform transformations of large volumes of data, in seconds, using Python and/or R.

Enterprise Implementation

To create a 100% dedicated Dadosfera environment, an Implementation phase is required, executed by the Dadosfera Professional Services team. This implementation lasts between 30 and 90 days, depending on the complexity of the project, consisting of the following scope:

  • Creation of an exclusive Cloud Account for the client
  • Governance Checklist for the new account
  • Identity and Access Configuration
  • Infrastructure Provisioning (via IaC)
  • Deployment of Dadosfera Software in the modules
  • MPP Configuration
  • Module Configuration
  • Standard Implementation (Standard Setup)

Additional Features and Components

Below are defined additional infrastructure components or functionalities that enable different forms of secure integrations between Dadosfera and its clients' environments.

Definition:

Feature/ResourceDescriptionModule
VPNA Virtual Private Network allows secure access to a private network from a public or unsecure network, ensuring that the data sent and received by Dadosfera is encrypted and secure.Connections
VPC Peering (AWS)VPC Peering is a network connection between two VPCs that allows traffic routing between them using private IPv4 or IPv6 addresses. This connection is more performant and guarantees zero cost for networks in the same region.Connections
Fixed IPA fixed IP is a permanent internet address that remains the same over time, unlike dynamic IPs that change. Dadosfera provides these IPs for Firewall releases, ensuring a safe opening of the clients' data sources.Connections
SSH TunnelingSSH Tunneling, or SSH port forwarding, is a method of transporting arbitrary network data over an encrypted SSH connection. It can provide a secure path for data transmission.Connections
Row Level Security (RLS)RLS is a feature in databases that restricts access to data rows based on user roles or permissions. Some users have restricted visibility of data based on these roles.Visualization
Dedicated Spark ClusterDadosfera can provision a Dedicated Spark Cluster to port client data transformation codes using this technology.Transformation
Optimized Snowpark ClusterAn Optimized Snowpark Cluster refers to a cluster specially configured for Snowpark, a service in the Snowflake Data Cloud that allows users to easily process large volumes of data, using Python, R, or Scala.Transformation
Extended Backup / Disaster RecoveryRefers to strategies and procedures put in place to recover and protect an enterprise's IT infrastructure in the event of a disaster. Included for up to 90 days in the Enterprise Tier.Query
Multi-cluster warehouseIn cloud data platforms like Snowflake, a multi-cluster warehouse allows simultaneous processing and performance enhancement by using multiple computing clusters.Query
Up to 90 days of Time-TravelIn data systems like Snowflake, Time-Travel refers to the ability to access historical data within a certain period, in this case, up to 90 days.Query
Annual Rotation of Encrypted Data KeysAn annual rekey refers to the practice of changing encryption keys on an annual basis. It is a recommended security practice to protect sensitive data.Query
Materialized ViewsIn databases, a Materialized View is a database object that contains the results of a query and can be updated as the data changes. They are often used to improve query performance.Query
Search OptimizationA feature that enables performant use of the MPP as a full-text search database.Query
Dynamic Data Masking (Dynamic Data Masking)It is a Column Level Security (CLS) feature that uses mask policies to selectively mask plain text data in table columns and views during query execution.Query
External Data TokenizationAllows accounts to tokenize data before loading it into Snowflake and decrypt the data during query execution. Tokenization is the process of removing sensitive data, replacing it with an indecipherable token. External Tokenization uses mask policies with external functions.Query
GPUPossibility of using GPU in the infrastructure for Training and Inference of ML/AI models.Intelligence

Availability

Feature/ResourceModuleBasic Pro.StandardAdvancedEnterprise
VPNConnectionsOptionalIncludedIncludedIncluded
VPC PeeringConnectionsNot AvailableOptionalIncludedIncluded
Fixed IPConnectionsOptionalIncludedIncludedIncluded

| Row Level Security (RLS)

Availability

Feature/ResourceModuleBasic Pro.StandardAdvancedEnterprise
VPNConnectionsOptionalIncludedIncludedIncluded
VPC PeeringConnectionsNot AvailableOptionalIncludedIncluded
Fixed IPConnectionsOptionalIncludedIncludedIncluded
Row Level Security (RLS)VisualizationNot AvailableOptionalIncludedIncluded
SSH TunnelingConnectionsOptionalOptionalIncludedIncluded
Backup / Disaster Recovery**QueryNot AvailableOptionalIncludedIncluded
Dedicated Spark ClusterTransformationNot AvailableOptionalOptionalIncluded
Optimized Snowpark Cluster*TransformationNot AvailableNot AvailableOptionalOptional
Multi-cluster warehouseQueryNot AvailableNot AvailableOptionalIncluded
Up to 90 days of Time-TravelQueryNot AvailableNot AvailableOptionalIncluded
Annual Rotation of Encrypted Data KeyQueryNot AvailableNot AvailableOptionalIncluded
Materialized ViewsQueryNot AvailableNot AvailableOptionalIncluded
Search OptimizationQueryNot AvailableNot AvailableOptionalIncluded
Dynamic Data MaskingQueryNot AvailableNot AvailableOptionalIncluded
External Data TokenizationQueryNot AvailableNot AvailableOptionalIncluded
GPUIntelligenceNot AvailableNot AvailableOptionalOptional

[*] Only available for Clusters with nodes larger than Medium.

[**] Fail-Safe of up to 7 days - immediate availability. Historical Glacier - via request, available within up to 7 days.

Cloud and Availability Regions

  • Dadosfera SaaS (Tiers Basic Professional, Standard, and Advanced) is available on AWS - Amazon Web Services, region US - North Virginia us-east-1

For Enterprise Tier, we have the following availability:

Cloud ProviderAvailable ModulesRegion
Amazon Web Services (AWS)Allus-east-1
Google Cloud Platform (GCP)Intelligence, Transformation, Query (DW)us-east-1
Azure Cloud Platform (Azure)Query (DW/Massive Transformation)us-east-1

For availability in other regions and cloud providers, contact us.