Re-organize the post based on feedback

posit-dev · Jan 24, 2025 · acca7ef · acca7ef
1 parent faf11c4
commit acca7ef
Showing 1 changed file with 41 additions and 28 deletions.
diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd
@@ -2,19 +2,18 @@
 title: "Style Table Body with `mask=` in `loc.body()`"
 html-table-processing: none
 author: Rich Iannone, Michael Chow and Jerry Wu
-date: 2025-01-23
+date: 2025-01-24
 freeze: true
 jupyter: python3
 format:
   html:
     code-summary: "Show the Code"
 ---
 
-In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post demonstrates three approaches to styling the table body, so you can compare methods and choose the one that best fits your needs:
+In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach:
 
-* **Using a for-loop:** Repeatedly call `GT.tab_style()` for each column.
-* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects.
 * **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling.
+* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects.
 
 Let’s dive in.
 
@@ -36,55 +35,69 @@ df_mini = (
     .select(["mfr", "drivetrain", *year_cols])
 )
 
-gt = GT(df_mini, rowname_col="drivetrain", groupname_col="mfr")
+gt = GT(df_mini).tab_stub(rowname_col="drivetrain", groupname_col="mfr").opt_stylize(color="cyan")
 gt
 ```
 
 The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year.
 
-In the following section, we'll demonstrate three different ways to highlight the cell text in red if the average horsepower exceeds 650.
+### Leveraging the `mask=` parameter in `loc.body()`
+The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates to a boolean result for each cell.
 
-### Using a for-loop: Repeatedly call `GT.tab_style()` for each column
-The most intuitive way is to call `GT.tab_style()` for each column. Here's how:
-```{python}
-gt1 = gt # <1>
-for col in year_cols:
-    gt1 = gt1.tab_style(
-        style=style.text(color="red"),
-        locations=loc.body(columns=col, rows=pl.col(col).gt(650))
-    )
-gt1
-```
-1. Since we want to keep `gt` intact for later use, we will modify `gt1` in this approach instead.
+Here’s how we can use it to achieve the two goals:
 
+* Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650.
+* Fill the background color as black if the cell value is missing in the last two columns (`2016.0` and `2017.0`).
 
-### Utilizing the `locations=` parameter in `GT.tab_style()`: Pass a list of `loc.body()` objects
-A more concise method is to pass a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`, as shown below:
 ```{python}
 (
     gt.tab_style(
         style=style.text(color="red"),
-        locations=[
-            loc.body(columns=col, rows=pl.col(col).gt(650))
-            for col in year_cols
-        ],
+        locations=loc.body(mask=cs.numeric().gt(650))
+    ).tab_style(
+        style=style.fill(color="black"),
+        locations=loc.body(mask=pl.nth(-2, -1).is_null()),
     )
 )
 ```
 
+In this example:
 
-### Leveraging the `mask=` parameter in `loc.body()`: Use Polars expressions for streamlined styling
-The most modern approach is to pass a Polars expression to the `mask=` parameter in `loc.body()`, as shown below:
+* `cs.numeric()` targets numerical columns, and `.gt(650)` checks if the cell value is greater than 650.
+* `pl.nth(-2, -1)` targets the last two columns, and `.is_null()` identifies missing values.
+
+Did you notice that we can use Polars selectors and expressions to dynamically identify columns at runtime? This is definitely a killer feature when working with pivoted operations.
+
+The `mask=` parameter acts as a syntactic sugar, streamlining the process and removing the need to loop through columns manually.
+
+::: {.callout-warning collapse="false"}
+## Using `mask=` Independently
+`mask=` should not be used in combination with the `columns` or `rows` arguments. Attempting to do so will raise a `ValueError`.
+:::
+
+### Utilizing the `locations=` parameter in `GT.tab_style()`
+A more "old-fashioned" approach involves passing a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`:
 ```{python}
+# | eval: false
 (
     gt.tab_style(
         style=style.text(color="red"),
-        locations=loc.body(mask=cs.numeric().gt(650))
+        locations=[loc.body(columns=col, rows=pl.col(col).gt(650))
+                   for col in year_cols],
+    ).tab_style(
+        style=style.fill(color="black"),
+        locations=[loc.body(columns=col, rows=pl.col(col).is_null())
+                   for col in year_cols[-2:]],
     )
 )
 ```
 
-In this example, `loc.body()` is smart enough to automatically target the rows where the cell value exceeds 650 for each numerical column. In general, you can think of `mask=` as a syntactic sugar that Great Tables provides to save you from having to manually loop through the columns.
+This approach, though functional, demands additional effort:
+
+* Explicitly preparing the column names in advance.
+* Specifying the `columns=` and `rows=` arguments for each `loc.body()` in the loop.
+
+While effective, it is less efficient and more verbose compared to the first approach.
 
 ### Wrapping up
 We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach.