AggregateTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

AggregateTransformer

Overview

Transforms Tableau Aggregate nodes to SQL GROUP BY statements with aggregation functions. Groups rows by specified columns and applies aggregate calculations (SUM, AVG, COUNT, MIN, MAX, COUNTD).

Node Type: .v1.Aggregate / .v2018_2_3.SuperAggregate Output: SQL SELECT with GROUP BY

Capabilities

  • Group by one or more columns
  • Aggregate functions: SUM, AVG, COUNT, MIN, MAX, COUNTD
  • Multiple aggregations on different columns
  • Column renaming in aggregations
  • Handles large grouping sets (10+ columns)

Tableau Node Example

Example 1: Sum Sales by Region and Date

Input (Tableau format)

{
  "id": "40f4a735-e0f1-440b-a3d3-e2f3d85b8b2d",
  "name": "Aggregate Sales",
  "node_type": ".v2018_2_3.SuperAggregate",
  "base_type": "superNode",
  "actionNode": {
    "nodeType": ".v1.Aggregate",
    "groupByFields": [
      {
        "function": "GroupBy",
        "columnName": "Year",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Month",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Region",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Division",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Customer Name",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Product Code",
        "newColumnName": null
      }
    ],
    "aggregateFields": [
      {
        "function": "SUM",
        "columnName": "Sales Amount",
        "newColumnName": null
      }
    ]
  }
}

Input Schema

[
  {"name": "year", "type": "integer"},
  {"name": "month", "type": "integer"},
  {"name": "region", "type": "string"},
  {"name": "division", "type": "string"},
  {"name": "customer_name", "type": "string"},
  {"name": "product_code", "type": "string"},
  {"name": "sales_amount", "type": "real"},
  {"name": "invoice_date", "type": "date"},
  {"name": "units", "type": "integer"}
]

Generated SQL Output

SELECT
  year,
  month,
  region,
  division,
  customer_name,
  product_code,
  SUM(sales_amount) AS sales_amount
FROM sales_data_V0S8
GROUP BY year, month, region, division, customer_name, product_code

Output Schema

{
  "year": "integer",
  "month": "integer",
  "region": "string",
  "division": "string",
  "customer_name": "string",
  "product_code": "string",
  "sales_amount": "real"
}

Example 2: Multiple Aggregations

Input (Tableau format)

{
  "id": "5a6b7c8d-9e0f-1a2b-3c4d-5e6f7a8b9c0d",
  "name": "Product Statistics",
  "node_type": ".v1.Aggregate",
  "actionNode": {
    "groupByFields": [
      {
        "function": "GroupBy",
        "columnName": "Product Category",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Region",
        "newColumnName": null
      }
    ],
    "aggregateFields": [
      {
        "function": "SUM",
        "columnName": "Sales Amount",
        "newColumnName": "Total Sales"
      },
      {
        "function": "AVG",
        "columnName": "Unit Price",
        "newColumnName": "Average Price"
      },
      {
        "function": "COUNT",
        "columnName": "Order ID",
        "newColumnName": "Order Count"
      },
      {
        "function": "MAX",
        "columnName": "Order Date",
        "newColumnName": "Latest Order"
      }
    ]
  }
}

Generated SQL Output

SELECT
  product_category,
  region,
  SUM(sales_amount) AS total_sales,
  AVG(unit_price) AS average_price,
  COUNT(order_id) AS order_count,
  MAX(order_date) AS latest_order
FROM orders_V0S12
GROUP BY product_category, region

Example 3: Count Distinct (COUNTD)

Input (Tableau format)

{
  "id": "6b7c8d9e-0f1a-2b3c-4d5e-6f7a8b9c0d1e",
  "name": "Customer Analytics",
  "node_type": ".v1.Aggregate",
  "actionNode": {
    "groupByFields": [
      {
        "function": "GroupBy",
        "columnName": "Region",
        "newColumnName": null
      },
      {
        "function": "GroupBy",
        "columnName": "Year",
        "newColumnName": null
      }
    ],
    "aggregateFields": [
      {
        "function": "COUNTD",
        "columnName": "Customer ID",
        "newColumnName": "Unique Customers"
      },
      {
        "function": "SUM",
        "columnName": "Revenue",
        "newColumnName": "Total Revenue"
      }
    ]
  }
}

Generated SQL Output

SELECT
  region,
  year,
  COUNT(DISTINCT customer_id) AS unique_customers,
  SUM(revenue) AS total_revenue
FROM transactions_V0S5
GROUP BY region, year

Example 4: MIN and MAX Aggregations

Input (Tableau format)

{
  "id": "7c8d9e0f-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "name": "Price Range Analysis",
  "node_type": ".v1.Aggregate",
  "actionNode": {
    "groupByFields": [
      {
        "function": "GroupBy",
        "columnName": "Product Category",
        "newColumnName": null
      }
    ],
    "aggregateFields": [
      {
        "function": "MIN",
        "columnName": "Unit Price",
        "newColumnName": "Min Price"
      },
      {
        "function": "MAX",
        "columnName": "Unit Price",
        "newColumnName": "Max Price"
      },
      {
        "function": "AVG",
        "columnName": "Unit Price",
        "newColumnName": "Avg Price"
      }
    ]
  }
}

Generated SQL Output

SELECT
  product_category,
  MIN(unit_price) AS min_price,
  MAX(unit_price) AS max_price,
  AVG(unit_price) AS avg_price
FROM products_V0S3
GROUP BY product_category

How It Works

1. Identify Grouping Columns

Extracts columns from groupByFields:

{
  "groupByFields": [
    {
      "function": "GroupBy",
      "columnName": "Region"
    },
    {
      "function": "GroupBy",
      "columnName": "Product Category"
    }
  ]
}

2. Identify Aggregations

Extracts aggregate functions from aggregateFields:

{
  "aggregateFields": [
    {
      "function": "SUM",
      "columnName": "Sales Amount",
      "newColumnName": "Total Sales"
    }
  ]
}

3. Sanitize Field Names

Converts field names to SQL identifiers:

"Sales Amount" → sales_amount
"Product Category" → product_category

4. Apply Column Renames

Uses newColumnName if specified:

{
  "function": "SUM",
  "columnName": "Sales Amount",
  "newColumnName": "Total Sales"    ← Rename
}

Generates:

SUM(sales_amount) AS total_sales

5. Generate GROUP BY Statement

Creates SQL with GROUP BY clause:

SELECT
  <grouping_columns>,
  <aggregate_functions>
FROM input_table
GROUP BY <grouping_columns>

Aggregate Functions

SUM - Total Sum

SUM(sales_amount) AS total_sales

Calculates the sum of all values in the column.

AVG - Average

AVG(unit_price) AS average_price

Calculates the mean (average) of all values in the column.

COUNT - Row Count

COUNT(order_id) AS order_count

Counts the number of non-NULL values in the column.

COUNTD - Distinct Count

COUNT(DISTINCT customer_id) AS unique_customers

Counts the number of unique (distinct) values in the column.

MIN - Minimum Value

MIN(unit_price) AS min_price

Finds the smallest value in the column.

MAX - Maximum Value

MAX(order_date) AS latest_order

Finds the largest value in the column.

Column Renaming

Without Rename (newColumnName = null)

{
  "function": "SUM",
  "columnName": "Sales Amount",
  "newColumnName": null
}

Uses original column name:

SUM(sales_amount) AS sales_amount

With Rename (newColumnName specified)

{
  "function": "SUM",
  "columnName": "Sales Amount",
  "newColumnName": "Total Sales"
}

Uses new name:

SUM(sales_amount) AS total_sales

Large Grouping Sets

Aggregate transformer handles large numbers of grouping columns efficiently:

Example with 15 GROUP BY columns:

SELECT
  year, month, region, division, customer_name,
  product_code, material_number, order_type,
  sales_rep_name, sales_rep_number, city,
  state, manager_name, manager_number, invoice_date,
  SUM(sales_amount) AS sales_amount
FROM sales_V0S15
GROUP BY year, month, region, division, customer_name,
         product_code, material_number, order_type,
         sales_rep_name, sales_rep_number, city,
         state, manager_name, manager_number, invoice_date

No Grouping (Aggregate All Rows)

If no groupByFields are specified, aggregates over all rows:

{
  "groupByFields": [],
  "aggregateFields": [
    {
      "function": "SUM",
      "columnName": "Sales Amount"
    }
  ]
}

Generated SQL:

SELECT
  SUM(sales_amount) AS sales_amount
FROM sales_V0S8

No GROUP BY clause is added (aggregates entire table).

Annotations Support

Aggregate nodes support before/after action annotations for additional renames:

beforeActionAnnotations

Renames applied before aggregation:

{
  "before_action_annotations": [
    {
      "columnName": "Old Name",
      "newColumnName": "New Name"
    }
  ]
}

afterActionAnnotations

Renames applied after aggregation:

{
  "after_action_annotations": [
    {
      "columnName": "sales_amount",
      "newColumnName": "revenue"
    }
  ]
}

Limitations

  • ⚠️ No HAVING clause - Post-aggregation filtering must be done with downstream Container node
  • ⚠️ No ROLLUP/CUBE - Advanced grouping sets not supported; use multiple aggregate nodes
  • ⚠️ No custom expressions - Aggregate functions operate on single columns only

Best Practices

When to Use Aggregate Transformer

  • Summarizing data by dimensions (region, product, time period)
  • Calculating totals (sum, count) across groups
  • Finding min/max values within categories
  • Counting distinct values (unique customers, products, etc.)

Optimizing Aggregations

  1. Reduce grouping columns if possible - fewer dimensions = faster queries
  2. Filter before aggregating - use upstream Container node to filter rows first
  3. Use appropriate aggregations - COUNTD is slower than COUNT; use only when needed

Post-Aggregation Filtering

To filter after aggregation (HAVING clause equivalent):

  1. Add Aggregate node
  2. Add downstream Container node with FilterOperation
-- Example: Only regions with > 1000 total sales
SELECT * FROM (
  SELECT region, SUM(sales) AS total_sales
  FROM data
  GROUP BY region
) WHERE total_sales > 1000

Related Transformers

  • ContainerTransformer - For filtering rows before or after aggregation
  • UnionTransformer - For combining aggregated data from multiple sources
  • JoinTransformer - For joining aggregated results with dimension tables