AggregateTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
AggregateTransformer
Overview
Transforms Tableau Aggregate nodes to SQL GROUP BY statements with aggregation functions. Groups rows by specified columns and applies aggregate calculations (SUM, AVG, COUNT, MIN, MAX, COUNTD).
Node Type: .v1.Aggregate / .v2018_2_3.SuperAggregate
Output: SQL SELECT with GROUP BY
Capabilities
- Group by one or more columns
- Aggregate functions: SUM, AVG, COUNT, MIN, MAX, COUNTD
- Multiple aggregations on different columns
- Column renaming in aggregations
- Handles large grouping sets (10+ columns)
Tableau Node Example
Example 1: Sum Sales by Region and Date
Input (Tableau format)
{
"id": "40f4a735-e0f1-440b-a3d3-e2f3d85b8b2d",
"name": "Aggregate Sales",
"node_type": ".v2018_2_3.SuperAggregate",
"base_type": "superNode",
"actionNode": {
"nodeType": ".v1.Aggregate",
"groupByFields": [
{
"function": "GroupBy",
"columnName": "Year",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Month",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Region",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Division",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Customer Name",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Product Code",
"newColumnName": null
}
],
"aggregateFields": [
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": null
}
]
}
}Input Schema
[
{"name": "year", "type": "integer"},
{"name": "month", "type": "integer"},
{"name": "region", "type": "string"},
{"name": "division", "type": "string"},
{"name": "customer_name", "type": "string"},
{"name": "product_code", "type": "string"},
{"name": "sales_amount", "type": "real"},
{"name": "invoice_date", "type": "date"},
{"name": "units", "type": "integer"}
]Generated SQL Output
SELECT
year,
month,
region,
division,
customer_name,
product_code,
SUM(sales_amount) AS sales_amount
FROM sales_data_V0S8
GROUP BY year, month, region, division, customer_name, product_codeOutput Schema
{
"year": "integer",
"month": "integer",
"region": "string",
"division": "string",
"customer_name": "string",
"product_code": "string",
"sales_amount": "real"
}Example 2: Multiple Aggregations
Input (Tableau format)
{
"id": "5a6b7c8d-9e0f-1a2b-3c4d-5e6f7a8b9c0d",
"name": "Product Statistics",
"node_type": ".v1.Aggregate",
"actionNode": {
"groupByFields": [
{
"function": "GroupBy",
"columnName": "Product Category",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Region",
"newColumnName": null
}
],
"aggregateFields": [
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": "Total Sales"
},
{
"function": "AVG",
"columnName": "Unit Price",
"newColumnName": "Average Price"
},
{
"function": "COUNT",
"columnName": "Order ID",
"newColumnName": "Order Count"
},
{
"function": "MAX",
"columnName": "Order Date",
"newColumnName": "Latest Order"
}
]
}
}Generated SQL Output
SELECT
product_category,
region,
SUM(sales_amount) AS total_sales,
AVG(unit_price) AS average_price,
COUNT(order_id) AS order_count,
MAX(order_date) AS latest_order
FROM orders_V0S12
GROUP BY product_category, regionExample 3: Count Distinct (COUNTD)
Input (Tableau format)
{
"id": "6b7c8d9e-0f1a-2b3c-4d5e-6f7a8b9c0d1e",
"name": "Customer Analytics",
"node_type": ".v1.Aggregate",
"actionNode": {
"groupByFields": [
{
"function": "GroupBy",
"columnName": "Region",
"newColumnName": null
},
{
"function": "GroupBy",
"columnName": "Year",
"newColumnName": null
}
],
"aggregateFields": [
{
"function": "COUNTD",
"columnName": "Customer ID",
"newColumnName": "Unique Customers"
},
{
"function": "SUM",
"columnName": "Revenue",
"newColumnName": "Total Revenue"
}
]
}
}Generated SQL Output
SELECT
region,
year,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(revenue) AS total_revenue
FROM transactions_V0S5
GROUP BY region, yearExample 4: MIN and MAX Aggregations
Input (Tableau format)
{
"id": "7c8d9e0f-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
"name": "Price Range Analysis",
"node_type": ".v1.Aggregate",
"actionNode": {
"groupByFields": [
{
"function": "GroupBy",
"columnName": "Product Category",
"newColumnName": null
}
],
"aggregateFields": [
{
"function": "MIN",
"columnName": "Unit Price",
"newColumnName": "Min Price"
},
{
"function": "MAX",
"columnName": "Unit Price",
"newColumnName": "Max Price"
},
{
"function": "AVG",
"columnName": "Unit Price",
"newColumnName": "Avg Price"
}
]
}
}Generated SQL Output
SELECT
product_category,
MIN(unit_price) AS min_price,
MAX(unit_price) AS max_price,
AVG(unit_price) AS avg_price
FROM products_V0S3
GROUP BY product_categoryHow It Works
1. Identify Grouping Columns
Extracts columns from groupByFields:
{
"groupByFields": [
{
"function": "GroupBy",
"columnName": "Region"
},
{
"function": "GroupBy",
"columnName": "Product Category"
}
]
}2. Identify Aggregations
Extracts aggregate functions from aggregateFields:
{
"aggregateFields": [
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": "Total Sales"
}
]
}3. Sanitize Field Names
Converts field names to SQL identifiers:
"Sales Amount" → sales_amount
"Product Category" → product_category
4. Apply Column Renames
Uses newColumnName if specified:
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": "Total Sales" ← Rename
}Generates:
SUM(sales_amount) AS total_sales5. Generate GROUP BY Statement
Creates SQL with GROUP BY clause:
SELECT
<grouping_columns>,
<aggregate_functions>
FROM input_table
GROUP BY <grouping_columns>Aggregate Functions
SUM - Total Sum
SUM(sales_amount) AS total_salesCalculates the sum of all values in the column.
AVG - Average
AVG(unit_price) AS average_priceCalculates the mean (average) of all values in the column.
COUNT - Row Count
COUNT(order_id) AS order_countCounts the number of non-NULL values in the column.
COUNTD - Distinct Count
COUNT(DISTINCT customer_id) AS unique_customersCounts the number of unique (distinct) values in the column.
MIN - Minimum Value
MIN(unit_price) AS min_priceFinds the smallest value in the column.
MAX - Maximum Value
MAX(order_date) AS latest_orderFinds the largest value in the column.
Column Renaming
Without Rename (newColumnName = null)
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": null
}Uses original column name:
SUM(sales_amount) AS sales_amountWith Rename (newColumnName specified)
{
"function": "SUM",
"columnName": "Sales Amount",
"newColumnName": "Total Sales"
}Uses new name:
SUM(sales_amount) AS total_salesLarge Grouping Sets
Aggregate transformer handles large numbers of grouping columns efficiently:
Example with 15 GROUP BY columns:
SELECT
year, month, region, division, customer_name,
product_code, material_number, order_type,
sales_rep_name, sales_rep_number, city,
state, manager_name, manager_number, invoice_date,
SUM(sales_amount) AS sales_amount
FROM sales_V0S15
GROUP BY year, month, region, division, customer_name,
product_code, material_number, order_type,
sales_rep_name, sales_rep_number, city,
state, manager_name, manager_number, invoice_dateNo Grouping (Aggregate All Rows)
If no groupByFields are specified, aggregates over all rows:
{
"groupByFields": [],
"aggregateFields": [
{
"function": "SUM",
"columnName": "Sales Amount"
}
]
}Generated SQL:
SELECT
SUM(sales_amount) AS sales_amount
FROM sales_V0S8No GROUP BY clause is added (aggregates entire table).
Annotations Support
Aggregate nodes support before/after action annotations for additional renames:
beforeActionAnnotations
Renames applied before aggregation:
{
"before_action_annotations": [
{
"columnName": "Old Name",
"newColumnName": "New Name"
}
]
}afterActionAnnotations
Renames applied after aggregation:
{
"after_action_annotations": [
{
"columnName": "sales_amount",
"newColumnName": "revenue"
}
]
}Limitations
- ⚠️ No HAVING clause - Post-aggregation filtering must be done with downstream Container node
- ⚠️ No ROLLUP/CUBE - Advanced grouping sets not supported; use multiple aggregate nodes
- ⚠️ No custom expressions - Aggregate functions operate on single columns only
Best Practices
When to Use Aggregate Transformer
- ✅ Summarizing data by dimensions (region, product, time period)
- ✅ Calculating totals (sum, count) across groups
- ✅ Finding min/max values within categories
- ✅ Counting distinct values (unique customers, products, etc.)
Optimizing Aggregations
- Reduce grouping columns if possible - fewer dimensions = faster queries
- Filter before aggregating - use upstream Container node to filter rows first
- Use appropriate aggregations - COUNTD is slower than COUNT; use only when needed
Post-Aggregation Filtering
To filter after aggregation (HAVING clause equivalent):
- Add Aggregate node
- Add downstream Container node with FilterOperation
-- Example: Only regions with > 1000 total sales
SELECT * FROM (
SELECT region, SUM(sales) AS total_sales
FROM data
GROUP BY region
) WHERE total_sales > 1000Related Transformers
- ContainerTransformer - For filtering rows before or after aggregation
- UnionTransformer - For combining aggregated data from multiple sources
- JoinTransformer - For joining aggregated results with dimension tables
Updated 5 days ago
