From 04ca572349f46bfddfa9ceb2f841c87315b551f6 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 19 Dec 2024 11:45:18 +0000 Subject: [PATCH] add more detail --- website/docs/docs/build/join-logic.md | 87 +++++++++++++++++++++++---- 1 file changed, 75 insertions(+), 12 deletions(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 99d63b38657..22249938b56 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -10,24 +10,24 @@ Joins are a powerful part of MetricFlow and simplify the process of making all v Joins use `entities` defined in your semantic model configs as the join keys between tables. Assuming entities are defined in the semantic model, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types. -
- What are fan-out or chasm joins? -
-
— Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows.

- — Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data.
-
-
- + +- Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows. +- Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data. + ## Types of joins :::tip Joins are auto-generated MetricFlow automatically generates the necessary joins to the defined semantic objects, eliminating the need for you to create new semantic models or configuration files. -This document explains the different types of joins that can be used with entities and how to query them using the CLI. +This section explains the different types of joins that can be used with entities and how to query them. ::: -MetricFlow primarily uses left joins for joins, and restricts the use of fan-out and chasm joins. Refer to the table below to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. +- MetricFlow primarily uses left joins for joins. +- For queries that involve multiple `fct` models, MetricFlow uses full outer joins. +- It restricts the use of fan-out and chasm joins. + +Refer to the following table to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. | entity type - Table A | entity type - Table B | Join type | |---------------------------|---------------------------|----------------------| @@ -39,9 +39,28 @@ MetricFlow primarily uses left joins for joins, and restricts the use of fan-out | Unique | Foreign | ❌ Fan-out (Not allowed) | | Foreign | Primary | ✅ Left | | Foreign | Unique | ✅ Left | -| Foreign | Foreign | ❌ Fan-out (Not allowed) | +| Foreign | Foreign | ❌ Fan-out (Not allowed) | + +This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. + +### Explanation of joins + +- **Left joins** — MetricFlow defaults to left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table. +- **Full outer joins** — For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables. + +Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice. -### Example +### Semantic validation + +MetricFlow performs semantic validation by executing `explain` queries in the data platform to ensure that the generated SQL gets executed without errors. This validation includes: + +- Verifying that all referenced tables and columns exist. +- Ensuring the data platform supports SQL functions, such as `date_diff(x, y)`. +- Checking for ambiguous joins or paths in multi-hop joins. + +If validation fails, MetricFlow surfaces errors for users to address before executing the query. + +## Example The following example uses two semantic models with a common entity and shows a MetricFlow query that requires a join between the two semantic models. The two semantic models are: - `transactions` @@ -83,6 +102,50 @@ dbt sl query --metrics average_purchase_price --group-by metric_time,user_id__ty mf query --metrics average_purchase_price --group-by metric_time,user_id__type # In dbt Core ``` +#### SQL examples + +The following tabs provide SQL examples for both left joins and full outer joins, showing how MetricFlow handles these scenarios in practice. + + + + +Following the previous example using the `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models. + +```sql +select + transactions.user_id, + transactions.purchase_price, + user_signup.type +from transactions +left outer join user_signup + on transactions.user_id = user_signup.user_id +where transactions.purchase_price is not null +group by + transactions.user_id, + user_signup.type; +``` + + + + +If you have multiple `fct` models, let's say `sales` and `returns`, MetricFlow uses full outer joins to ensure all data points are captured. + +This example shows a full outer join between the `sales` and `returns` semantic models. + +```sql +select + sales.user_id, + sales.total_sales, + returns.total_returns +from sales +full outer join returns + on sales.user_id = returns.user_id +where sales.user_id is not null or returns.user_id is not null; +``` + + + + ## Multi-hop joins MetricFlow allows users to join measures and dimensions across a graph of entities by moving from one table to another within a graph. This is referred to as "multi-hop join".