tables_contest_2021_blog_post.Rmd

---
title: |
  [![Ketchbrook Analytics Logo](www/ka_logo_long.jpg "Visit Ketchbrook Analytics"){width=100%}](https://www.ketchbrookanalytics.com)
  <hr>
  Managing Credit Risk with {migrate} + {gt}
subtitle: "An Introduction to Building Beautiful State Transition Matrices"
output: 
  prettydoc::html_pretty:
    theme: architect
    highlight: github
knit: (function(inputFile, encoding) {
  rmarkdown::render(inputFile, output_file = "index.html", encoding = encoding, output_dir = "docs") })
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

First and foremost... here's the final product:

```{r echo=FALSE, warning=FALSE, message=FALSE, fig.alt="Risk Rating Migration Matrix built with {migrate} and {gt}"}
library(htmltools)
htmltools::includeHTML(path = "final_gt.html")
```

## Background

Recently, [Ketchbrook Analytics](https://www.ketchbrookanalytics.com) released *version 0.4.0* of the [**{migrate}** package](https://github.com/mthomas-ketchbrook/migrate/releases/tag/0.4.0). To celebrate this new release, we wanted to showcase **{migrate}**'s functionality as part of our submission to [*RStudio's 2021 Table Contest*](https://blog.rstudio.com/2021/09/30/rstudio-table-contest-2021/).

In this blog, we are going to demonstrate how to use **{migrate}** and the [**{gt}** package](https://gt.rstudio.com/) together to create beautiful *state transition matrices*.

### What is a *State Transition Matrix*?

Great question! A *state transition matrix* is a rectangular ${n}$ x ${n}$ matrix showing the quantity of a variable $x$ for each state ${N}$ at time $t_0$ and later at time $t_1$.

For example, in the *state transition matrix* at the top of this article, 18.45% of the Principal Balance that started in state **AAA** at 2021-06-30 moved to state **AA** at 2021-09-30.

Imagine that you are a bank that issues credit cards to customers. It is important for you to quantify and manage the risk in your credit card portfolio. To do this, every quarter you run a credit report for each customer that tells you if their credit score category is **"Poor"**, **"Fair"**, **"Good"**, **"Very Good"**, or **"Exceptional"**. A state transition matrix can show you the changes in outstanding customer balances across each credit score category from one quarter to the next (i.e., "How much of our outstanding balances went from **Poor** to **Fair** during the period?"), or simply the number of customers who went from one risk rating to another during that time period (i.e., "How many customers went from **Good** to **Poor** during the period?").

The *starting* state is represented on the left-hand vertical axis of the matrix, while the *ending* state is located on the column headers (horizontally). Because of this, in transition matrices where `percent = TRUE` (i.e., calculated on a percentage basis), rows will sum to 100%, but columns will not.

*Note: in this article, we will use the terms "transition" and "migration" interchangably.*

## Getting Started

In addition to {migrate} and {gt}, we will use the {tibble}, {dplyr}, and {scales} packages for our analysis

```{r libraries, warning=FALSE, message=FALSE}
library(migrate)   # easily calculate state transitions  
library(gt)   # make nice static tables
library(tibble)   # manage row / column name conversions
library(dplyr)   # data prep
library(scales)   # format values (e.g., percentages)
```

## Data Preparation

We will be analyzing the built-in data set from the **{migrate}** package

```{r mock_credit_data}
# Load package data
data(mock_credit)

# Check out a small chunk of the data
mock_credit %>% 
  dplyr::arrange(customer_id, date) %>% 
  dplyr::slice(9:18) %>% 
  knitr::kable()
```

Note that the *risk_rating* variable, which we will use to represent the *state*, is (importantly) ordinal

```{r show_ordinal}
unique(mock_credit$risk_rating)
```

Using the `migrate()` function, we can calculate the percentage of the total **principal_balance** for each *starting* risk rating that ended up in each *ending* **risk rating**

```{r mock_credit_migration}
# Calculate the migration
migration <- mock_credit %>% 
  migrate::migrate(
    id = customer_id, 
    time = date, 
    state = risk_rating, 
    metric = principal_balance
  )

# Show the first few rows of 'migration'
migration %>% 
  dplyr::slice(1:5) %>% 
  knitr::kable()
```

Once we have calculated the migration, we can build the rectangular transition matrix using the `build_matrix()` function:

```{r mock_credit_matrix}
# Build the migration matrix
matrix <- migration %>% 
  migrate::build_matrix(
    state_start = risk_rating_start, 
    state_end = risk_rating_end, 
    metric = principal_balance
  ) %>% 
  tibble::as_tibble(rownames = NA)   # keep the row names

# View the matrix
matrix
```

## Building the {gt} Table

Now it's time to let the **{gt}** package work its magic!

```{r mock_credit_gt}
# Build the base table
gt <- matrix %>% 
  gt::gt(
    rownames_to_stub = TRUE
  )

# View the base table
gt
```

We can see that we get a basic, un-formatted {gt} table... we need to spruce it up.

### Decreasing Risk Transitions

First, let's apply conditional formatting to the *good* values in the matrix (i.e., where the risk state went from more risky to less risky)

```{r mock_credit_green_vals}
# Capture the unique percentage *principal_balance* amounts representing risk state 
# decreases (i.e., the *id* got less risky from time 0 to time 1)
green_values <- migration %>% 
  dplyr::filter(risk_rating_start > risk_rating_end) %>% 
  dplyr::pull(principal_balance) %>% 
  unique() 

# Create a color palette function to scale input values from very light green to 
# green
green_pal <- scales::col_numeric(
  palette = c("#f2ffed", "green"), 
  domain = range(min(green_values), max(green_values))
)
```

```{r mock_credit_green_loop}
# Apply the `green_pal()` function to the relevant values in the matrix;
# The nested for loops ensure the formatting applies to cells below & to the 
# left of the matrix diagonal
for (i in 1:(ncol(matrix) - 1)) {
  
  for (j in (i + 1):(nrow(matrix))) {
    
    cur_val <- as.data.frame(matrix)[j, i]
    
    gt <- gt %>%
      gt::tab_style(
        style = gt::cell_fill(
          color = as.character(green_pal(cur_val)),
        ),
        locations = gt::cells_body(
          columns = names(matrix)[i],
          rows = j
        )
      )
    
  }
  
}

# View the updated table
gt
```

### Increasing Risk Transitions

Let's now repeat the process for the *bad* transitions in the matrix

```{r mock_credit_red}
# Capture the unique percentage *principal_balance* amounts representing risk state 
# increases (i.e., the *id* got more risky from time 0 to time 1)
red_values <- migration %>% 
  dplyr::filter(risk_rating_start < risk_rating_end) %>% 
  dplyr::pull(principal_balance) %>% 
  unique() 


red_pal <- scales::col_numeric(
  palette = c("#ffe7e6", "#ff746b"), 
  domain = range(min(red_values), max(red_values))
)

# Apply the `red_pal()` function to the relevant values in the matrix;
# The nested for loops ensure the formatting applies to cells above & to the 
# right of the matrix diagonal
for (i in 2:(ncol(matrix))) {
  
  for (j in (1:(i - 1))) {
    
    cur_val <- as.data.frame(matrix)[j, i]
    
    gt <- gt %>%
      gt::tab_style(
        style = gt::cell_fill(
          color = as.character(red_pal(cur_val)),
        ),
        locations = gt::cells_body(
          columns = names(matrix)[i],
          rows = j
        )
      )
    
  }
  
}

# View the updated table
gt
```

### Formatting Diagonal, Zeros, and Percentages

The many zero values make the table look congested. We can de-clutter the table some by formatting the zero values. First, we can overwrite their background fill color as white

```{r mock_credit_zeros_fill}
# Format all the zero values in the table to be filled white
for (i in 1:ncol(matrix)) {
  
  gt <- gt %>% 
    gt::tab_style(
      style = gt::cell_fill(color = "white"),
      locations = gt::cells_body(
        columns = names(matrix)[i],
        rows = eval(parse(text = paste0(names(matrix)[i], " == 0")))
      )
    )
  
}

# Format all the values on the diagonal to be filled white
for (i in 1:ncol(matrix)) {
    
    gt <- gt %>% 
      gt::tab_style(
        style = gt::cell_fill(color = "white"),
        locations = gt::cells_body(
          columns = names(matrix)[i],
          rows = i
        )
      )
    
  }

# View the updated table
gt
```

Then we can format the text in each cell so that all of the values display as percentages, except for zeros

```{r mock_credit_zeros_replace}
# Create the string to replace zeros with
zero_replace <- paste(
  rep("-", 6), 
  collapse = ""
)

# Format non-zero cell values as percentages, and apply 'zero_replace' to zeros
gt <- gt %>% 
  gt::fmt(
    columns = gt::everything(), 
    fns = function(x) ifelse(x == 0, zero_replace, scales::percent(x, accuracy = 0.01))
  )

# View the updated table
gt
```

## Final Table Formatting

Now that we have the contents of our rows, columns, and cell values in a good state, we can format the title, background colors, add images, etc. The table looks good... but let's make it great!

We want to embed the hex logos for the **{migrate}** and **{gt}** packages as *clickable* links on either side of the title, so we first built the HTML strings that will be used 

```{r hex_imgs}
# Build the HTML to display the {migrate} hex logo to the left of the title
migrate_hex_html <- paste0(
  "<a href=https://github.com/mthomas-ketchbrook/migrate#migrate->", 
  "<img src='https://raw.githubusercontent.com/mthomas-ketchbrook/migrate/master/man/figures/logo.png' ", 
  "style='height:60px; float:left; vertical-align:middle; padding-left: 10px;'>", 
  "</a>"
)

# Build the HTML to display the {gt} hex logo to the right of the title
gt_hex_html <- paste0(
  "<a href=https://gt.rstudio.com/>", 
  "<img src='https://raw.githubusercontent.com/rstudio/gt/master/man/figures/logo.svg' ", 
  "style='height:60px; float:right; vertical-align:middle; padding-right: 10px;'>", 
  "</a>"
)
```

Lastly, we can apply these hex logos and some final formatting touches to the remainder of the table

```{r final_format}
out <- gt %>% 
  gt::tab_spanner(
    label = "ENDING RISK RATING", 
    columns = -1
  ) %>% 
  gt::tab_stubhead(
    label = "STARTING RISK RATING"
  ) %>% 
  gt::tab_header(
    title = gt::md(
      paste(
        migrate_hex_html,
        "**Risk Rating Migration**", 
        gt_hex_html, 
        "<br>*2021-06-30* &#10145;&#65039; *2021-09-30*"
      )
    )
  ) %>% 
  gt::tab_style(
    style = list(
      gt::cell_text(align = "center")
    ),
    locations = gt::cells_stub(rows = TRUE)
  ) %>% 
  gt::tab_options(
    heading.background.color = "#627D9F", 
    stub.background.color = "#343635", 
    column_labels.background.color = "#343635"
  )

# View the final gt
out
```

We did it!  Make sure to save the final product some place safe!

```{r save}
gt::gtsave(
  out,
  filename = "final_gt.html"
)
```

## Analyzing Migration of Connecticut Town Bond Ratings

In the State of Connecticut, each town has its own bond rating, which gets re-assessed periodically by agencies such as [Moody's](https://www.moodys.com/) and [Standard & Poor's](https://www.spglobal.com/). We can see in the [open dataset](https://data.ct.gov/Local-Government/Municipal-Fiscal-Indicators-Bond-Ratings-2019/3w9d-7jbi/data) that most town bond ratings were issued in July 2019 and January 2021. Let's analyze the bond rating migration across these two time-points!

First, since **Moody's** and **Standard & Poor's** have different ratings scales, we will have to filter our data to choose just one agency. For the sake of this analysis we chose **Moody's**.

### Data Gathering

Using the {RSocrata} package that plays nicely with the Socrata API that [data.ct.gov](https://data.ct.gov) was built on, we can easily query this dataset from R

```{r socrata}
library(RSocrata)   # for communicating with the Socrata API

# Define the columns we want to return from the dataset
query_cols <- paste(
  "town", 
  "rating_date", 
  "bond_rating", 
  sep = ", "
)

# Send the query to the API and return the data frame
df <- RSocrata::read.socrata(
  url = glue::glue(
    "https://data.ct.gov/resource/3w9d-7jbi.csv?", 
    "$where=rating_agency like '%Moody%'&",   # filter for only Moody's Ratings
    "$select={query_cols}"   # select only desired columns
  )
)

# View the first few rows of data
head(df) %>% 
  knitr::kable()
```

Next, we need to establish the factor levels and order for the bond ratings. We can get this information from [Moody's *Rating Scale and Definitions* fact sheet](https://www.moodys.com/sites/products/productattachments/ap075378_1_1408_ki.pdf)

```{r moodys_ratings}
# Establish the Moody's ratings, in order from best (least risky) to worst
moodys_ratings <- c(
  "Aaa", "Aa1", "Aa2", "Aa3", "A1", "A2", "A3", 
  "Baa1", "Baa2", "Baa3", "Ba1", "Ba2", "Ba3", "B1", "B2", "B3", 
  "Caa1", "Caa2", "Caa3", "Ca", "C"
)
```

### Data Prep

To clean up our data before building our migration matrix, we need to do a few things:

1. Convert **rating_date** column variable to type *'Date'*
2. Convert **bond_rating** column variable to type *'Ordered Factor'*
2. Remove any duplicate **town**/**rating_date** combinations, in case any town received two Moody's ratings on the same day (we would have to choose one)

```{r prep_ct}
# Perform some data prep steps
clean_df <- df %>% 
  dplyr::mutate(
    rating_date = as.Date(rating_date), 
    bond_rating = factor(bond_rating, levels = moodys_ratings, ordered = TRUE)
  ) %>% 
  dplyr::arrange(town, rating_date) %>% 
  dplyr::distinct(town, rating_date, .keep_all = TRUE)
```

Now we are ready to employ **{migrate}** to build our migration matrix. Instead of using `percent = TRUE`, we will set that argument to `FALSE` so that the matrix displays the number (count) of towns that moved from each bond rating at 2019-07-01 to each bond rating at 2021-01-01

### Building the Base Migration Matrix Table

```{r mig_matrix_ct}
# Calculate the migration (Note: when we don't provide a `metric` argument, 
# `migrate()` uses the count of the *id* column values)
mig <- clean_df %>% 
  migrate::migrate(
    id = town, 
    time = rating_date, 
    state = bond_rating, 
    percent = FALSE
  )

# Build the migration matrix
mat <- mig %>% 
  migrate::build_matrix(
    state_start = bond_rating_start, 
    state_end = bond_rating_end, 
    metric = count
  ) %>% 
  tibble::as_tibble(rownames = NA) 

# Build the base table
gt <- mat %>% 
  gt::gt(
    rownames_to_stub = TRUE
  )

# View the base table
gt
```

We can see that we have quite a few more risk states (bond ratings) than we did in the previous `mock_credit` dataset.

Instead of re-executing the for-loops for applying the conditional formatting in our in-line code, we developed a nice custom function containing all of that logic, [`fmt_migrate()`](https://github.com/mthomas-ketchbrook/migrate_table/blob/master/R/fmt_migrate.R)

### Conditional Formatting with a Custom Function

```{r import_fun}
# Import custom function
source("R/fmt_migrate.R")

# Apply the conditional formatting to the base table
gt <- gt %>% 
  fmt_migrate(
    migrated_data = mig, 
    matrix_data = mat
  )

# Vew the table
gt
```

Voila! This table is less exciting than the previous table we built, because there were only two towns whose Moody's bond rating changed from 2019 to 2021 -- all other towns' ratings stayed the same.

### Adding Final Touches

Still, we can apply some final formatting touches similar to our previous table by replacing zero values, adding the Connecticut State Flag image to the title, and more

```{r final_formatting_ct}

# Build the string to replace zeros with
zero_replace <- paste(
  rep("-", 3), 
  collapse = ""
)

# Replace zeros
gt <- gt %>% 
  gt::fmt(
    columns = gt::everything(), 
    fns = function(x) ifelse(x == 0, zero_replace, x)
  )

# Create the HTML for the CT State Flag image with an embedded hyperlink to the 
# bond ratings data set
ct_flag_html <- paste0(
  "<a href=https://data.ct.gov/Local-Government/Municipal-Fiscal-Indicators-Bond-Ratings-2019/3w9d-7jbi/data>", 
  "<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Flag_of_Connecticut.svg/1200px-Flag_of_Connecticut.svg.png' ", 
  "style='height:60px; float:left; vertical-align:middle; padding-left: 10px;'>", 
  "</a>"
)

# Add some final touches
gt <- gt %>% 
  gt::tab_spanner(
    label = "ENDING BOND RATING", 
    columns = -1
  ) %>% 
  gt::tab_stubhead(
    label = "STARTING BOND RATING"
  ) %>% 
  gt::tab_header(
    title = gt::md(
      paste(
        ct_flag_html,
        "**Bond Rating Migration**", 
        "<br>*2020-01-01* &#10145;&#65039; *2021-07-01*"
      )
    )
  ) %>% 
  gt::tab_style(
    style = list(
      gt::cell_text(align = "center")
    ),
    locations = gt::cells_stub(rows = TRUE)
  ) %>% 
  gt::fmt_missing(
    columns = gt::everything(), 
    missing_text = 0
  ) %>% 
  gt::cols_width(
    eval(parse(text = paste0("c(", paste(colnames(mat), collapse = ", "), ") ~ gt::px(60)")))
  ) %>% 
  gt::tab_options(
    heading.background.color = "#FFD40A", 
    stub.background.color = "#042984", 
    column_labels.background.color = "#042984", 
    container.width = gt::px(1000)
  )

# View the final table
gt
```

From this table, we can quickly glean a few insights:

* Most towns' bond ratings stayed the same from July 2019 to January 2021, with rating **Aa2** being the most common rating (30 towns began with this rating, and all 30 of those ended with the same rating)
* One town's bond rating worsened from **Baa2** to **Baa3**, which is a one-step rating decline
* One town's bond rating improved from **B1** to **Ba3**, which is a one-step rating improvement

*** 

Questions about anything we did?  Don't hesitate to reach out!
[info@ketchbrookanalytics.com](mailto:info@ketchbrookanalytics.com?subject=RStudio%202021%20Table%20Contest)