JoinTransformer

Passo a passo de como iniciar sua jornada na Dadosfera

JoinTransformer

Overview

Transforms Tableau Join nodes to SQL JOIN statements. Combines two input datasets based on join conditions with support for multiple join types (INNER, LEFT, RIGHT, FULL, ANTI joins).

Node Type: .v1.SimpleJoin / .v2018_2_3.SuperJoin Output: SQL SELECT with JOIN

Capabilities

  • Join types: INNER, LEFT, RIGHT, FULL, LEFT ANTI, RIGHT ANTI
  • Multiple join conditions (AND logic)
  • Equality comparisons (==)
  • Duplicate column name handling with suffixes
  • Table aliasing for clarity

Tableau Node Example

Example 1: Inner Join on Multiple Columns

Input (Tableau format)

{
  "id": "7bf017f8-2d70-411f-8af2-a831301b1c66",
  "name": "Join Sales by Period",
  "node_type": ".v2018_2_3.SuperJoin",
  "base_type": "superNode",
  "actionNode": {
    "nodeType": ".v1.SimpleJoin",
    "joinType": "inner",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Year]",
        "rightExpression": "[Year]"
      },
      {
        "comparator": "==",
        "leftExpression": "[Month]",
        "rightExpression": "[Month]"
      }
    ]
  }
}

Input Schemas

Left Input (Sales Data):

[
  {"name": "year", "type": "integer"},
  {"name": "month", "type": "integer"},
  {"name": "region", "type": "string"},
  {"name": "sales_amount", "type": "real"},
  {"name": "customer_name", "type": "string"}
]

Right Input (Targets Data):

[
  {"name": "year", "type": "integer"},
  {"name": "month", "type": "integer"},
  {"name": "target_amount", "type": "real"}
]

Generated SQL Output

SELECT
  a.region,
  a.sales_amount,
  a.customer_name,
  a.month,
  a.year,
  b.target_amount,
  b.ano AS year_1,
  b.mes AS month_1
FROM sales_data_V0S12 AS a
INNER JOIN targets_V0S15 AS b
  ON a.year = b.year AND a.month = b.month

Output Schema

{
  "region": "string",
  "sales_amount": "real",
  "customer_name": "string",
  "month": "integer",
  "year": "integer",
  "target_amount": "real",
  "year_1": "integer",
  "month_1": "integer"
}

Example 2: Left Join (Keep All Left Rows)

Input (Tableau format)

{
  "id": "8c9d0e1f-2a3b-4c5d-6e7f-8a9b0c1d2e3f",
  "name": "Join Customers with Orders",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "left",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Customer ID]",
        "rightExpression": "[Customer ID]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  a.customer_id,
  a.customer_name,
  a.email,
  b.order_id,
  b.order_date,
  b.order_amount,
  b.customer_id AS customer_id_1
FROM customers_V0S5 AS a
LEFT JOIN orders_V0S8 AS b
  ON a.customer_id = b.customer_id

Example 3: Right Join

Input (Tableau format)

{
  "id": "9d0e1f2a-3b4c-5d6e-7f8a-9b0c1d2e3f4a",
  "name": "Right Join Products",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "right",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Product Code]",
        "rightExpression": "[Product Code]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  a.product_code,
  a.sales_amount,
  b.product_code AS product_code_1,
  b.product_name,
  b.category
FROM sales_V0S3 AS a
RIGHT JOIN products_V0S7 AS b
  ON a.product_code = b.product_code

Example 4: Full Outer Join

Input (Tableau format)

{
  "id": "0e1f2a3b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
  "name": "Full Join Transactions",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "full",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Transaction ID]",
        "rightExpression": "[Transaction ID]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  a.transaction_id,
  a.amount,
  a.date,
  b.transaction_id AS transaction_id_1,
  b.reconciled,
  b.status
FROM source_a_V0S10 AS a
FULL OUTER JOIN source_b_V0S12 AS b
  ON a.transaction_id = b.transaction_id

Example 5: Left Anti Join (Rows Only in Left)

Input (Tableau format)

{
  "id": "1f2a3b4c-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
  "name": "Find Unmatched Customers",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "leftanti",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Customer ID]",
        "rightExpression": "[Customer ID]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  a.customer_id,
  a.customer_name,
  a.region
FROM all_customers_V0S4 AS a
LEFT ANTI JOIN active_customers_V0S8 AS b
  ON a.customer_id = b.customer_id

Example 6: Right Anti Join (Rows Only in Right)

Input (Tableau format)

{
  "id": "2a3b4c5d-6e7f-8a9b-0c1d-2e3f4a5b6c7d",
  "name": "Find Orphaned Records",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "rightanti",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Order ID]",
        "rightExpression": "[Order ID]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  b.order_id,
  b.status,
  b.created_date
FROM orders_V0S6 AS a
RIGHT ANTI JOIN orphaned_orders_V0S9 AS b
  ON a.order_id = b.order_id

Example 7: Single Condition Join

Input (Tableau format)

{
  "id": "3b4c5d6e-7f8a-9b0c-1d2e-3f4a5b6c7d8e",
  "name": "Join by Region",
  "node_type": ".v1.SimpleJoin",
  "actionNode": {
    "joinType": "inner",
    "conditions": [
      {
        "comparator": "==",
        "leftExpression": "[Region Code]",
        "rightExpression": "[Region Code]"
      }
    ]
  }
}

Generated SQL Output

SELECT
  a.region_code,
  a.sales_amount,
  b.region_code AS region_code_1,
  b.region_name,
  b.manager
FROM sales_V0S2 AS a
INNER JOIN regions_V0S5 AS b
  ON a.region_code = b.region_code

How It Works

1. Identify Join Type

Reads joinType from actionNode:

{
  "joinType": "inner"
}

Supported types:

  • inner → INNER JOIN
  • left → LEFT JOIN
  • right → RIGHT JOIN
  • full → FULL OUTER JOIN
  • leftanti → LEFT ANTI JOIN
  • rightanti → RIGHT ANTI JOIN

2. Parse Join Conditions

Extracts conditions array:

{
  "conditions": [
    {
      "comparator": "==",
      "leftExpression": "[Year]",
      "rightExpression": "[Year]"
    },
    {
      "comparator": "==",
      "leftExpression": "[Month]",
      "rightExpression": "[Month]"
    }
  ]
}

3. Convert to SQL ON Clause

Translates conditions to SQL:

[Year] == [Year]  →  a.year = b.year
[Month] == [Month]  →  a.month = b.month

Multiple conditions are combined with AND:

ON a.year = b.year AND a.month = b.month

4. Handle Duplicate Column Names

When both inputs have same column names:

  • Join columns: Appear from both sides (with suffix on right side)
  • Non-join columns: If duplicated, suffix added to right side

Example:

Left: year, month, sales_amount
Right: year, month, target_amount

Output:
- year (from left)
- month (from left)
- sales_amount (from left)
- target_amount (from right)
- year_1 (from right, suffixed)
- month_1 (from right, suffixed)

5. Generate Table Aliases

Creates table aliases for clarity:

  • Left input: AS a
  • Right input: AS b

All column references prefixed with alias:

a.year, a.month, b.target_amount

6. Build Complete JOIN Statement

SELECT
  a.column1,
  a.column2,
  b.column3,
  b.column1 AS column1_1  -- duplicate, suffixed
FROM left_input AS a
<JOIN_TYPE> JOIN right_input AS b
  ON <join_conditions>

Join Types Comparison

INNER JOIN

Returns only rows with matches in both tables.

Left: 100 rows
Right: 80 rows
Result: 50 rows (only matches)

LEFT JOIN

Returns all rows from left table, NULL for non-matching right rows.

Left: 100 rows
Right: 80 rows
Result: 100 rows (all left + matches)

RIGHT JOIN

Returns all rows from right table, NULL for non-matching left rows.

Left: 100 rows
Right: 80 rows
Result: 80 rows (all right + matches)

FULL OUTER JOIN

Returns all rows from both tables, NULL where no match.

Left: 100 rows
Right: 80 rows
Result: 130 rows (all rows from both)

LEFT ANTI JOIN

Returns only rows from left table with NO match in right table.

Left: 100 rows
Right: 80 rows
Result: 50 rows (left-only rows)

RIGHT ANTI JOIN

Returns only rows from right table with NO match in left table.

Left: 100 rows
Right: 80 rows
Result: 30 rows (right-only rows)

Multiple Join Conditions

Join conditions are combined with AND logic:

{
  "conditions": [
    {"leftExpression": "[Year]", "rightExpression": "[Year]"},
    {"leftExpression": "[Month]", "rightExpression": "[Month]"},
    {"leftExpression": "[Region]", "rightExpression": "[Region]"}
  ]
}

Converts to:

ON a.year = b.year
   AND a.month = b.month
   AND a.region = b.region

All conditions must be satisfied for a match.

Duplicate Column Suffix Pattern

When duplicate column names exist:

Pattern 1: Join Column Duplication

Join on: Customer ID
Left has: customer_id
Right has: customer_id

Output:
- customer_id (from left)
- customer_id_1 (from right)

Pattern 2: Non-Join Column Duplication

Left has: amount, date
Right has: amount, date

Output:
- amount (from left)
- date (from left)
- amount_1 (from right)
- date_1 (from right)

Pattern 3: Multiple Duplicates

Left has: id, name, value
Right has: id, name, status

Output:
- id (from left)
- name (from left)
- value (from left)
- status (from right)
- id_1 (from right)
- name_1 (from right)

Suffix increments if _1 already exists: _2, _3, etc.

Limitations

  • ⚠️ Equality comparisons only - Only == comparator supported; no <, >, !=, etc.
  • ⚠️ Two inputs only - Joins exactly two datasets; chain multiple joins for 3+ tables
  • ⚠️ AND logic only - Multiple conditions combined with AND; OR logic not supported
  • ⚠️ No complex expressions - Join conditions must be simple column references

Best Practices

When to Use Join Transformer

  • Combining related datasets - Sales with targets, customers with orders
  • Enriching data - Adding dimension data to fact tables
  • Finding matches/non-matches - ANTI joins for data quality checks

Join Performance Optimization

  1. Filter before joining - Use upstream Container nodes to reduce row counts
  2. Join on indexed columns - Use primary/foreign key columns when possible
  3. Choose correct join type - Use INNER JOIN when possible (faster than OUTER)
  4. Minimize duplicate columns - Select only needed columns in upstream nodes

Chaining Multiple Joins

To join 3+ tables:

Table A → Join (A + B) → Join (AB + C) → Join (ABC + D)

Each join adds one more table to the result.

Handling Duplicate Names

If duplicate suffixing is undesirable:

  1. Use upstream RenameColumn action to rename columns before join
  2. Use downstream RenameColumn to clean up suffixed names after join

Related Transformers

  • UnionTransformer - For combining datasets vertically (stacking rows)
  • AggregateTransformer - For aggregating joined results
  • ContainerTransformer - For filtering before or after joins