JoinTransformer
Passo a passo de como iniciar sua jornada na Dadosfera
JoinTransformer
Overview
Transforms Tableau Join nodes to SQL JOIN statements. Combines two input datasets based on join conditions with support for multiple join types (INNER, LEFT, RIGHT, FULL, ANTI joins).
Node Type: .v1.SimpleJoin / .v2018_2_3.SuperJoin
Output: SQL SELECT with JOIN
Capabilities
- Join types: INNER, LEFT, RIGHT, FULL, LEFT ANTI, RIGHT ANTI
- Multiple join conditions (AND logic)
- Equality comparisons (==)
- Duplicate column name handling with suffixes
- Table aliasing for clarity
Tableau Node Example
Example 1: Inner Join on Multiple Columns
Input (Tableau format)
{
"id": "7bf017f8-2d70-411f-8af2-a831301b1c66",
"name": "Join Sales by Period",
"node_type": ".v2018_2_3.SuperJoin",
"base_type": "superNode",
"actionNode": {
"nodeType": ".v1.SimpleJoin",
"joinType": "inner",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Year]",
"rightExpression": "[Year]"
},
{
"comparator": "==",
"leftExpression": "[Month]",
"rightExpression": "[Month]"
}
]
}
}Input Schemas
Left Input (Sales Data):
[
{"name": "year", "type": "integer"},
{"name": "month", "type": "integer"},
{"name": "region", "type": "string"},
{"name": "sales_amount", "type": "real"},
{"name": "customer_name", "type": "string"}
]Right Input (Targets Data):
[
{"name": "year", "type": "integer"},
{"name": "month", "type": "integer"},
{"name": "target_amount", "type": "real"}
]Generated SQL Output
SELECT
a.region,
a.sales_amount,
a.customer_name,
a.month,
a.year,
b.target_amount,
b.ano AS year_1,
b.mes AS month_1
FROM sales_data_V0S12 AS a
INNER JOIN targets_V0S15 AS b
ON a.year = b.year AND a.month = b.monthOutput Schema
{
"region": "string",
"sales_amount": "real",
"customer_name": "string",
"month": "integer",
"year": "integer",
"target_amount": "real",
"year_1": "integer",
"month_1": "integer"
}Example 2: Left Join (Keep All Left Rows)
Input (Tableau format)
{
"id": "8c9d0e1f-2a3b-4c5d-6e7f-8a9b0c1d2e3f",
"name": "Join Customers with Orders",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "left",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Customer ID]",
"rightExpression": "[Customer ID]"
}
]
}
}Generated SQL Output
SELECT
a.customer_id,
a.customer_name,
a.email,
b.order_id,
b.order_date,
b.order_amount,
b.customer_id AS customer_id_1
FROM customers_V0S5 AS a
LEFT JOIN orders_V0S8 AS b
ON a.customer_id = b.customer_idExample 3: Right Join
Input (Tableau format)
{
"id": "9d0e1f2a-3b4c-5d6e-7f8a-9b0c1d2e3f4a",
"name": "Right Join Products",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "right",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Product Code]",
"rightExpression": "[Product Code]"
}
]
}
}Generated SQL Output
SELECT
a.product_code,
a.sales_amount,
b.product_code AS product_code_1,
b.product_name,
b.category
FROM sales_V0S3 AS a
RIGHT JOIN products_V0S7 AS b
ON a.product_code = b.product_codeExample 4: Full Outer Join
Input (Tableau format)
{
"id": "0e1f2a3b-4c5d-6e7f-8a9b-0c1d2e3f4a5b",
"name": "Full Join Transactions",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "full",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Transaction ID]",
"rightExpression": "[Transaction ID]"
}
]
}
}Generated SQL Output
SELECT
a.transaction_id,
a.amount,
a.date,
b.transaction_id AS transaction_id_1,
b.reconciled,
b.status
FROM source_a_V0S10 AS a
FULL OUTER JOIN source_b_V0S12 AS b
ON a.transaction_id = b.transaction_idExample 5: Left Anti Join (Rows Only in Left)
Input (Tableau format)
{
"id": "1f2a3b4c-5d6e-7f8a-9b0c-1d2e3f4a5b6c",
"name": "Find Unmatched Customers",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "leftanti",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Customer ID]",
"rightExpression": "[Customer ID]"
}
]
}
}Generated SQL Output
SELECT
a.customer_id,
a.customer_name,
a.region
FROM all_customers_V0S4 AS a
LEFT ANTI JOIN active_customers_V0S8 AS b
ON a.customer_id = b.customer_idExample 6: Right Anti Join (Rows Only in Right)
Input (Tableau format)
{
"id": "2a3b4c5d-6e7f-8a9b-0c1d-2e3f4a5b6c7d",
"name": "Find Orphaned Records",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "rightanti",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Order ID]",
"rightExpression": "[Order ID]"
}
]
}
}Generated SQL Output
SELECT
b.order_id,
b.status,
b.created_date
FROM orders_V0S6 AS a
RIGHT ANTI JOIN orphaned_orders_V0S9 AS b
ON a.order_id = b.order_idExample 7: Single Condition Join
Input (Tableau format)
{
"id": "3b4c5d6e-7f8a-9b0c-1d2e-3f4a5b6c7d8e",
"name": "Join by Region",
"node_type": ".v1.SimpleJoin",
"actionNode": {
"joinType": "inner",
"conditions": [
{
"comparator": "==",
"leftExpression": "[Region Code]",
"rightExpression": "[Region Code]"
}
]
}
}Generated SQL Output
SELECT
a.region_code,
a.sales_amount,
b.region_code AS region_code_1,
b.region_name,
b.manager
FROM sales_V0S2 AS a
INNER JOIN regions_V0S5 AS b
ON a.region_code = b.region_codeHow It Works
1. Identify Join Type
Reads joinType from actionNode:
{
"joinType": "inner"
}Supported types:
inner→ INNER JOINleft→ LEFT JOINright→ RIGHT JOINfull→ FULL OUTER JOINleftanti→ LEFT ANTI JOINrightanti→ RIGHT ANTI JOIN
2. Parse Join Conditions
Extracts conditions array:
{
"conditions": [
{
"comparator": "==",
"leftExpression": "[Year]",
"rightExpression": "[Year]"
},
{
"comparator": "==",
"leftExpression": "[Month]",
"rightExpression": "[Month]"
}
]
}3. Convert to SQL ON Clause
Translates conditions to SQL:
[Year] == [Year] → a.year = b.year
[Month] == [Month] → a.month = b.month
Multiple conditions are combined with AND:
ON a.year = b.year AND a.month = b.month4. Handle Duplicate Column Names
When both inputs have same column names:
- Join columns: Appear from both sides (with suffix on right side)
- Non-join columns: If duplicated, suffix added to right side
Example:
Left: year, month, sales_amount
Right: year, month, target_amount
Output:
- year (from left)
- month (from left)
- sales_amount (from left)
- target_amount (from right)
- year_1 (from right, suffixed)
- month_1 (from right, suffixed)
5. Generate Table Aliases
Creates table aliases for clarity:
- Left input:
AS a - Right input:
AS b
All column references prefixed with alias:
a.year, a.month, b.target_amount6. Build Complete JOIN Statement
SELECT
a.column1,
a.column2,
b.column3,
b.column1 AS column1_1 -- duplicate, suffixed
FROM left_input AS a
<JOIN_TYPE> JOIN right_input AS b
ON <join_conditions>Join Types Comparison
INNER JOIN
Returns only rows with matches in both tables.
Left: 100 rows
Right: 80 rows
Result: 50 rows (only matches)
LEFT JOIN
Returns all rows from left table, NULL for non-matching right rows.
Left: 100 rows
Right: 80 rows
Result: 100 rows (all left + matches)
RIGHT JOIN
Returns all rows from right table, NULL for non-matching left rows.
Left: 100 rows
Right: 80 rows
Result: 80 rows (all right + matches)
FULL OUTER JOIN
Returns all rows from both tables, NULL where no match.
Left: 100 rows
Right: 80 rows
Result: 130 rows (all rows from both)
LEFT ANTI JOIN
Returns only rows from left table with NO match in right table.
Left: 100 rows
Right: 80 rows
Result: 50 rows (left-only rows)
RIGHT ANTI JOIN
Returns only rows from right table with NO match in left table.
Left: 100 rows
Right: 80 rows
Result: 30 rows (right-only rows)
Multiple Join Conditions
Join conditions are combined with AND logic:
{
"conditions": [
{"leftExpression": "[Year]", "rightExpression": "[Year]"},
{"leftExpression": "[Month]", "rightExpression": "[Month]"},
{"leftExpression": "[Region]", "rightExpression": "[Region]"}
]
}Converts to:
ON a.year = b.year
AND a.month = b.month
AND a.region = b.regionAll conditions must be satisfied for a match.
Duplicate Column Suffix Pattern
When duplicate column names exist:
Pattern 1: Join Column Duplication
Join on: Customer ID
Left has: customer_id
Right has: customer_id
Output:
- customer_id (from left)
- customer_id_1 (from right)
Pattern 2: Non-Join Column Duplication
Left has: amount, date
Right has: amount, date
Output:
- amount (from left)
- date (from left)
- amount_1 (from right)
- date_1 (from right)
Pattern 3: Multiple Duplicates
Left has: id, name, value
Right has: id, name, status
Output:
- id (from left)
- name (from left)
- value (from left)
- status (from right)
- id_1 (from right)
- name_1 (from right)
Suffix increments if _1 already exists: _2, _3, etc.
Limitations
- ⚠️ Equality comparisons only - Only
==comparator supported; no<,>,!=, etc. - ⚠️ Two inputs only - Joins exactly two datasets; chain multiple joins for 3+ tables
- ⚠️ AND logic only - Multiple conditions combined with AND; OR logic not supported
- ⚠️ No complex expressions - Join conditions must be simple column references
Best Practices
When to Use Join Transformer
- ✅ Combining related datasets - Sales with targets, customers with orders
- ✅ Enriching data - Adding dimension data to fact tables
- ✅ Finding matches/non-matches - ANTI joins for data quality checks
Join Performance Optimization
- Filter before joining - Use upstream Container nodes to reduce row counts
- Join on indexed columns - Use primary/foreign key columns when possible
- Choose correct join type - Use INNER JOIN when possible (faster than OUTER)
- Minimize duplicate columns - Select only needed columns in upstream nodes
Chaining Multiple Joins
To join 3+ tables:
Table A → Join (A + B) → Join (AB + C) → Join (ABC + D)
Each join adds one more table to the result.
Handling Duplicate Names
If duplicate suffixing is undesirable:
- Use upstream RenameColumn action to rename columns before join
- Use downstream RenameColumn to clean up suffixed names after join
Related Transformers
- UnionTransformer - For combining datasets vertically (stacking rows)
- AggregateTransformer - For aggregating joined results
- ContainerTransformer - For filtering before or after joins
Updated 5 days ago
