diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index 4526c4ddd..d141c5c48 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -2,7 +2,7 @@ title: "Style Table Body with `mask=` in `loc.body()`" html-table-processing: none author: Rich Iannone, Michael Chow and Jerry Wu -date: 2025-01-23 +date: 2025-01-24 freeze: true jupyter: python3 format: @@ -10,11 +10,10 @@ format: code-summary: "Show the Code" --- -In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post demonstrates three approaches to styling the table body, so you can compare methods and choose the one that best fits your needs: +In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach: -* **Using a for-loop:** Repeatedly call `GT.tab_style()` for each column. -* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects. * **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling. +* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects. Let’s dive in. @@ -36,55 +35,69 @@ df_mini = ( .select(["mfr", "drivetrain", *year_cols]) ) -gt = GT(df_mini, rowname_col="drivetrain", groupname_col="mfr") +gt = GT(df_mini).tab_stub(rowname_col="drivetrain", groupname_col="mfr").opt_stylize(color="cyan") gt ``` The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year. -In the following section, we'll demonstrate three different ways to highlight the cell text in red if the average horsepower exceeds 650. +### Leveraging the `mask=` parameter in `loc.body()` +The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates to a boolean result for each cell. -### Using a for-loop: Repeatedly call `GT.tab_style()` for each column -The most intuitive way is to call `GT.tab_style()` for each column. Here's how: -```{python} -gt1 = gt # <1> -for col in year_cols: - gt1 = gt1.tab_style( - style=style.text(color="red"), - locations=loc.body(columns=col, rows=pl.col(col).gt(650)) - ) -gt1 -``` -1. Since we want to keep `gt` intact for later use, we will modify `gt1` in this approach instead. +Here’s how we can use it to achieve the two goals: +* Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650. +* Fill the background color as black if the cell value is missing in the last two columns (`2016.0` and `2017.0`). -### Utilizing the `locations=` parameter in `GT.tab_style()`: Pass a list of `loc.body()` objects -A more concise method is to pass a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`, as shown below: ```{python} ( gt.tab_style( style=style.text(color="red"), - locations=[ - loc.body(columns=col, rows=pl.col(col).gt(650)) - for col in year_cols - ], + locations=loc.body(mask=cs.numeric().gt(650)) + ).tab_style( + style=style.fill(color="black"), + locations=loc.body(mask=pl.nth(-2, -1).is_null()), ) ) ``` +In this example: -### Leveraging the `mask=` parameter in `loc.body()`: Use Polars expressions for streamlined styling -The most modern approach is to pass a Polars expression to the `mask=` parameter in `loc.body()`, as shown below: +* `cs.numeric()` targets numerical columns, and `.gt(650)` checks if the cell value is greater than 650. +* `pl.nth(-2, -1)` targets the last two columns, and `.is_null()` identifies missing values. + +Did you notice that we can use Polars selectors and expressions to dynamically identify columns at runtime? This is definitely a killer feature when working with pivoted operations. + +The `mask=` parameter acts as a syntactic sugar, streamlining the process and removing the need to loop through columns manually. + +::: {.callout-warning collapse="false"} +## Using `mask=` Independently +`mask=` should not be used in combination with the `columns` or `rows` arguments. Attempting to do so will raise a `ValueError`. +::: + +### Utilizing the `locations=` parameter in `GT.tab_style()` +A more "old-fashioned" approach involves passing a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`: ```{python} +# | eval: false ( gt.tab_style( style=style.text(color="red"), - locations=loc.body(mask=cs.numeric().gt(650)) + locations=[loc.body(columns=col, rows=pl.col(col).gt(650)) + for col in year_cols], + ).tab_style( + style=style.fill(color="black"), + locations=[loc.body(columns=col, rows=pl.col(col).is_null()) + for col in year_cols[-2:]], ) ) ``` -In this example, `loc.body()` is smart enough to automatically target the rows where the cell value exceeds 650 for each numerical column. In general, you can think of `mask=` as a syntactic sugar that Great Tables provides to save you from having to manually loop through the columns. +This approach, though functional, demands additional effort: + +* Explicitly preparing the column names in advance. +* Specifying the `columns=` and `rows=` arguments for each `loc.body()` in the loop. + +While effective, it is less efficient and more verbose compared to the first approach. ### Wrapping up We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach.