Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add outer_coalesce join strategy in the user guide #15405

Merged
merged 2 commits into from
Mar 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/src/python/user-guide/transformations/joins.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@
print(df_outer_join)
# --8<-- [end:outer]

# --8<-- [start:outer_coalesce]
df_outer_coalesce_join = df_customers.join(
df_orders, on="customer_id", how="outer_coalesce"
)
print(df_outer_coalesce_join)
# --8<-- [end:outer_coalesce]

# --8<-- [start:df3]
df_colors = pl.DataFrame(
{
Expand Down
16 changes: 15 additions & 1 deletion docs/src/rust/user-guide/transformations/joins.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,26 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
df_orders.clone().lazy(),
[col("customer_id")],
[col("customer_id")],
JoinArgs::new(JoinType::Outer { coalesce: true }),
JoinArgs::new(JoinType::Outer { coalesce: false }),
)
.collect()?;
println!("{}", &df_outer_join);
// --8<-- [end:outer]

// --8<-- [start:outer_coalesce]
let df_outer_join = df_customers
.clone()
.lazy()
.join(
df_orders.clone().lazy(),
[col("customer_id")],
[col("customer_id")],
JoinArgs::new(JoinType::Outer { coalesce: true }),
)
.collect()?;
println!("{}", &df_outer_join);
// --8<-- [end:outer_coalesce]

// --8<-- [start:df3]
let df_colors = df!(
"color"=> &["red", "blue", "green"],
Expand Down
29 changes: 21 additions & 8 deletions docs/user-guide/transformations/joins.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@

Polars supports the following join strategies by specifying the `how` argument:

| Strategy | Description |
| -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `inner` | Returns row with matching keys in _both_ frames. Non-matching rows in either the left or right frame are discarded. |
| `left` | Returns all rows in the left dataframe, whether or not a match in the right-frame is found. Non-matching rows have their right columns null-filled. |
| `outer` | Returns all rows from both the left and right dataframe. If no match is found in one frame, columns from the other frame are null-filled. |
| `cross` | Returns the Cartesian product of all rows from the left frame with all rows from the right frame. Duplicates rows are retained; the table length of `A` cross-joined with `B` is always `len(A) × len(B)`. |
| `semi` | Returns all rows from the left frame in which the join key is also present in the right frame. |
| `anti` | Returns all rows from the left frame in which the join key is _not_ present in the right frame. |
| Strategy | Description |
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `inner` | Returns row with matching keys in _both_ frames. Non-matching rows in either the left or right frame are discarded. |
| `left` | Returns all rows in the left dataframe, whether or not a match in the right-frame is found. Non-matching rows have their right columns null-filled. |
| `outer` | Returns all rows from both the left and right dataframe. If no match is found in one frame, columns from the other frame are null-filled. |
| `outer_coalesce` | Returns all rows from both the left and right dataframe. This is similar to `outer`, but with the key columns being merged. |
| `cross` | Returns the Cartesian product of all rows from the left frame with all rows from the right frame. Duplicates rows are retained; the table length of `A` cross-joined with `B` is always `len(A) × len(B)`. |
| `semi` | Returns all rows from the left frame in which the join key is also present in the right frame. |
| `anti` | Returns all rows from the left frame in which the join key is _not_ present in the right frame. |

### Inner join

Expand Down Expand Up @@ -62,6 +63,18 @@ The `outer` join produces a `DataFrame` that contains all the rows from both `Da
--8<-- "python/user-guide/transformations/joins.py:outer"
```

### Outer coalesce join

The `outer_coalesce` join combines all rows from both `DataFrames` like an `outer` join, but it merges the join keys into a single column by coalescing the values. This ensures a unified view of the join key, avoiding nulls in key columns whenever possible. Let's compare it with the outer join using the two `DataFrames` we used above:

{{code_block('user-guide/transformations/joins','outer_coalesce',['join'])}}

```python exec="on" result="text" session="user-guide/transformations/joins"
--8<-- "python/user-guide/transformations/joins.py:outer_coalesce"
```

In contrast to an `outer` join, where `customer_id` and `customer_id_right` columns would remain separate, the `outer_coalesce` join merges these columns into a single `customer_id` column.

### Cross join

A `cross` join is a Cartesian product of the two `DataFrames`. This means that every row in the left `DataFrame` is joined with every row in the right `DataFrame`. The `cross` join is useful for creating a `DataFrame` with all possible combinations of the columns in two `DataFrames`. Let's take for example the following two `DataFrames`.
Expand Down
Loading