-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtables_contest_2021_blog_post.Rmd
551 lines (433 loc) · 17.7 KB
/
tables_contest_2021_blog_post.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
---
title: |
[![Ketchbrook Analytics Logo](www/ka_logo_long.jpg "Visit Ketchbrook Analytics"){width=100%}](https://www.ketchbrookanalytics.com)
<hr>
Managing Credit Risk with {migrate} + {gt}
subtitle: "An Introduction to Building Beautiful State Transition Matrices"
output:
prettydoc::html_pretty:
theme: architect
highlight: github
knit: (function(inputFile, encoding) {
rmarkdown::render(inputFile, output_file = "index.html", encoding = encoding, output_dir = "docs") })
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
First and foremost... here's the final product:
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.alt="Risk Rating Migration Matrix built with {migrate} and {gt}"}
library(htmltools)
htmltools::includeHTML(path = "final_gt.html")
```
## Background
Recently, [Ketchbrook Analytics](https://www.ketchbrookanalytics.com) released *version 0.4.0* of the [**{migrate}** package](https://github.com/mthomas-ketchbrook/migrate/releases/tag/0.4.0). To celebrate this new release, we wanted to showcase **{migrate}**'s functionality as part of our submission to [*RStudio's 2021 Table Contest*](https://blog.rstudio.com/2021/09/30/rstudio-table-contest-2021/).
In this blog, we are going to demonstrate how to use **{migrate}** and the [**{gt}** package](https://gt.rstudio.com/) together to create beautiful *state transition matrices*.
### What is a *State Transition Matrix*?
Great question! A *state transition matrix* is a rectangular ${n}$ x ${n}$ matrix showing the quantity of a variable $x$ for each state ${N}$ at time $t_0$ and later at time $t_1$.
For example, in the *state transition matrix* at the top of this article, 18.45% of the Principal Balance that started in state **AAA** at 2021-06-30 moved to state **AA** at 2021-09-30.
Imagine that you are a bank that issues credit cards to customers. It is important for you to quantify and manage the risk in your credit card portfolio. To do this, every quarter you run a credit report for each customer that tells you if their credit score category is **"Poor"**, **"Fair"**, **"Good"**, **"Very Good"**, or **"Exceptional"**. A state transition matrix can show you the changes in outstanding customer balances across each credit score category from one quarter to the next (i.e., "How much of our outstanding balances went from **Poor** to **Fair** during the period?"), or simply the number of customers who went from one risk rating to another during that time period (i.e., "How many customers went from **Good** to **Poor** during the period?").
The *starting* state is represented on the left-hand vertical axis of the matrix, while the *ending* state is located on the column headers (horizontally). Because of this, in transition matrices where `percent = TRUE` (i.e., calculated on a percentage basis), rows will sum to 100%, but columns will not.
*Note: in this article, we will use the terms "transition" and "migration" interchangably.*
## Getting Started
In addition to {migrate} and {gt}, we will use the {tibble}, {dplyr}, and {scales} packages for our analysis
```{r libraries, warning=FALSE, message=FALSE}
library(migrate) # easily calculate state transitions
library(gt) # make nice static tables
library(tibble) # manage row / column name conversions
library(dplyr) # data prep
library(scales) # format values (e.g., percentages)
```
## Data Preparation
We will be analyzing the built-in data set from the **{migrate}** package
```{r mock_credit_data}
# Load package data
data(mock_credit)
# Check out a small chunk of the data
mock_credit %>%
dplyr::arrange(customer_id, date) %>%
dplyr::slice(9:18) %>%
knitr::kable()
```
Note that the *risk_rating* variable, which we will use to represent the *state*, is (importantly) ordinal
```{r show_ordinal}
unique(mock_credit$risk_rating)
```
Using the `migrate()` function, we can calculate the percentage of the total **principal_balance** for each *starting* risk rating that ended up in each *ending* **risk rating**
```{r mock_credit_migration}
# Calculate the migration
migration <- mock_credit %>%
migrate::migrate(
id = customer_id,
time = date,
state = risk_rating,
metric = principal_balance
)
# Show the first few rows of 'migration'
migration %>%
dplyr::slice(1:5) %>%
knitr::kable()
```
Once we have calculated the migration, we can build the rectangular transition matrix using the `build_matrix()` function:
```{r mock_credit_matrix}
# Build the migration matrix
matrix <- migration %>%
migrate::build_matrix(
state_start = risk_rating_start,
state_end = risk_rating_end,
metric = principal_balance
) %>%
tibble::as_tibble(rownames = NA) # keep the row names
# View the matrix
matrix
```
## Building the {gt} Table
Now it's time to let the **{gt}** package work its magic!
```{r mock_credit_gt}
# Build the base table
gt <- matrix %>%
gt::gt(
rownames_to_stub = TRUE
)
# View the base table
gt
```
We can see that we get a basic, un-formatted {gt} table... we need to spruce it up.
### Decreasing Risk Transitions
First, let's apply conditional formatting to the *good* values in the matrix (i.e., where the risk state went from more risky to less risky)
```{r mock_credit_green_vals}
# Capture the unique percentage *principal_balance* amounts representing risk state
# decreases (i.e., the *id* got less risky from time 0 to time 1)
green_values <- migration %>%
dplyr::filter(risk_rating_start > risk_rating_end) %>%
dplyr::pull(principal_balance) %>%
unique()
# Create a color palette function to scale input values from very light green to
# green
green_pal <- scales::col_numeric(
palette = c("#f2ffed", "green"),
domain = range(min(green_values), max(green_values))
)
```
```{r mock_credit_green_loop}
# Apply the `green_pal()` function to the relevant values in the matrix;
# The nested for loops ensure the formatting applies to cells below & to the
# left of the matrix diagonal
for (i in 1:(ncol(matrix) - 1)) {
for (j in (i + 1):(nrow(matrix))) {
cur_val <- as.data.frame(matrix)[j, i]
gt <- gt %>%
gt::tab_style(
style = gt::cell_fill(
color = as.character(green_pal(cur_val)),
),
locations = gt::cells_body(
columns = names(matrix)[i],
rows = j
)
)
}
}
# View the updated table
gt
```
### Increasing Risk Transitions
Let's now repeat the process for the *bad* transitions in the matrix
```{r mock_credit_red}
# Capture the unique percentage *principal_balance* amounts representing risk state
# increases (i.e., the *id* got more risky from time 0 to time 1)
red_values <- migration %>%
dplyr::filter(risk_rating_start < risk_rating_end) %>%
dplyr::pull(principal_balance) %>%
unique()
red_pal <- scales::col_numeric(
palette = c("#ffe7e6", "#ff746b"),
domain = range(min(red_values), max(red_values))
)
# Apply the `red_pal()` function to the relevant values in the matrix;
# The nested for loops ensure the formatting applies to cells above & to the
# right of the matrix diagonal
for (i in 2:(ncol(matrix))) {
for (j in (1:(i - 1))) {
cur_val <- as.data.frame(matrix)[j, i]
gt <- gt %>%
gt::tab_style(
style = gt::cell_fill(
color = as.character(red_pal(cur_val)),
),
locations = gt::cells_body(
columns = names(matrix)[i],
rows = j
)
)
}
}
# View the updated table
gt
```
### Formatting Diagonal, Zeros, and Percentages
The many zero values make the table look congested. We can de-clutter the table some by formatting the zero values. First, we can overwrite their background fill color as white
```{r mock_credit_zeros_fill}
# Format all the zero values in the table to be filled white
for (i in 1:ncol(matrix)) {
gt <- gt %>%
gt::tab_style(
style = gt::cell_fill(color = "white"),
locations = gt::cells_body(
columns = names(matrix)[i],
rows = eval(parse(text = paste0(names(matrix)[i], " == 0")))
)
)
}
# Format all the values on the diagonal to be filled white
for (i in 1:ncol(matrix)) {
gt <- gt %>%
gt::tab_style(
style = gt::cell_fill(color = "white"),
locations = gt::cells_body(
columns = names(matrix)[i],
rows = i
)
)
}
# View the updated table
gt
```
Then we can format the text in each cell so that all of the values display as percentages, except for zeros
```{r mock_credit_zeros_replace}
# Create the string to replace zeros with
zero_replace <- paste(
rep("-", 6),
collapse = ""
)
# Format non-zero cell values as percentages, and apply 'zero_replace' to zeros
gt <- gt %>%
gt::fmt(
columns = gt::everything(),
fns = function(x) ifelse(x == 0, zero_replace, scales::percent(x, accuracy = 0.01))
)
# View the updated table
gt
```
## Final Table Formatting
Now that we have the contents of our rows, columns, and cell values in a good state, we can format the title, background colors, add images, etc. The table looks good... but let's make it great!
We want to embed the hex logos for the **{migrate}** and **{gt}** packages as *clickable* links on either side of the title, so we first built the HTML strings that will be used
```{r hex_imgs}
# Build the HTML to display the {migrate} hex logo to the left of the title
migrate_hex_html <- paste0(
"<a href=https://github.com/mthomas-ketchbrook/migrate#migrate->",
"<img src='https://raw.githubusercontent.com/mthomas-ketchbrook/migrate/master/man/figures/logo.png' ",
"style='height:60px; float:left; vertical-align:middle; padding-left: 10px;'>",
"</a>"
)
# Build the HTML to display the {gt} hex logo to the right of the title
gt_hex_html <- paste0(
"<a href=https://gt.rstudio.com/>",
"<img src='https://raw.githubusercontent.com/rstudio/gt/master/man/figures/logo.svg' ",
"style='height:60px; float:right; vertical-align:middle; padding-right: 10px;'>",
"</a>"
)
```
Lastly, we can apply these hex logos and some final formatting touches to the remainder of the table
```{r final_format}
out <- gt %>%
gt::tab_spanner(
label = "ENDING RISK RATING",
columns = -1
) %>%
gt::tab_stubhead(
label = "STARTING RISK RATING"
) %>%
gt::tab_header(
title = gt::md(
paste(
migrate_hex_html,
"**Risk Rating Migration**",
gt_hex_html,
"<br>*2021-06-30* ➡️ *2021-09-30*"
)
)
) %>%
gt::tab_style(
style = list(
gt::cell_text(align = "center")
),
locations = gt::cells_stub(rows = TRUE)
) %>%
gt::tab_options(
heading.background.color = "#627D9F",
stub.background.color = "#343635",
column_labels.background.color = "#343635"
)
# View the final gt
out
```
We did it! Make sure to save the final product some place safe!
```{r save}
gt::gtsave(
out,
filename = "final_gt.html"
)
```
## Analyzing Migration of Connecticut Town Bond Ratings
In the State of Connecticut, each town has its own bond rating, which gets re-assessed periodically by agencies such as [Moody's](https://www.moodys.com/) and [Standard & Poor's](https://www.spglobal.com/). We can see in the [open dataset](https://data.ct.gov/Local-Government/Municipal-Fiscal-Indicators-Bond-Ratings-2019/3w9d-7jbi/data) that most town bond ratings were issued in July 2019 and January 2021. Let's analyze the bond rating migration across these two time-points!
First, since **Moody's** and **Standard & Poor's** have different ratings scales, we will have to filter our data to choose just one agency. For the sake of this analysis we chose **Moody's**.
### Data Gathering
Using the {RSocrata} package that plays nicely with the Socrata API that [data.ct.gov](https://data.ct.gov) was built on, we can easily query this dataset from R
```{r socrata}
library(RSocrata) # for communicating with the Socrata API
# Define the columns we want to return from the dataset
query_cols <- paste(
"town",
"rating_date",
"bond_rating",
sep = ", "
)
# Send the query to the API and return the data frame
df <- RSocrata::read.socrata(
url = glue::glue(
"https://data.ct.gov/resource/3w9d-7jbi.csv?",
"$where=rating_agency like '%Moody%'&", # filter for only Moody's Ratings
"$select={query_cols}" # select only desired columns
)
)
# View the first few rows of data
head(df) %>%
knitr::kable()
```
Next, we need to establish the factor levels and order for the bond ratings. We can get this information from [Moody's *Rating Scale and Definitions* fact sheet](https://www.moodys.com/sites/products/productattachments/ap075378_1_1408_ki.pdf)
```{r moodys_ratings}
# Establish the Moody's ratings, in order from best (least risky) to worst
moodys_ratings <- c(
"Aaa", "Aa1", "Aa2", "Aa3", "A1", "A2", "A3",
"Baa1", "Baa2", "Baa3", "Ba1", "Ba2", "Ba3", "B1", "B2", "B3",
"Caa1", "Caa2", "Caa3", "Ca", "C"
)
```
### Data Prep
To clean up our data before building our migration matrix, we need to do a few things:
1. Convert **rating_date** column variable to type *'Date'*
2. Convert **bond_rating** column variable to type *'Ordered Factor'*
2. Remove any duplicate **town**/**rating_date** combinations, in case any town received two Moody's ratings on the same day (we would have to choose one)
```{r prep_ct}
# Perform some data prep steps
clean_df <- df %>%
dplyr::mutate(
rating_date = as.Date(rating_date),
bond_rating = factor(bond_rating, levels = moodys_ratings, ordered = TRUE)
) %>%
dplyr::arrange(town, rating_date) %>%
dplyr::distinct(town, rating_date, .keep_all = TRUE)
```
Now we are ready to employ **{migrate}** to build our migration matrix. Instead of using `percent = TRUE`, we will set that argument to `FALSE` so that the matrix displays the number (count) of towns that moved from each bond rating at 2019-07-01 to each bond rating at 2021-01-01
### Building the Base Migration Matrix Table
```{r mig_matrix_ct}
# Calculate the migration (Note: when we don't provide a `metric` argument,
# `migrate()` uses the count of the *id* column values)
mig <- clean_df %>%
migrate::migrate(
id = town,
time = rating_date,
state = bond_rating,
percent = FALSE
)
# Build the migration matrix
mat <- mig %>%
migrate::build_matrix(
state_start = bond_rating_start,
state_end = bond_rating_end,
metric = count
) %>%
tibble::as_tibble(rownames = NA)
# Build the base table
gt <- mat %>%
gt::gt(
rownames_to_stub = TRUE
)
# View the base table
gt
```
We can see that we have quite a few more risk states (bond ratings) than we did in the previous `mock_credit` dataset.
Instead of re-executing the for-loops for applying the conditional formatting in our in-line code, we developed a nice custom function containing all of that logic, [`fmt_migrate()`](https://github.com/mthomas-ketchbrook/migrate_table/blob/master/R/fmt_migrate.R)
### Conditional Formatting with a Custom Function
```{r import_fun}
# Import custom function
source("R/fmt_migrate.R")
# Apply the conditional formatting to the base table
gt <- gt %>%
fmt_migrate(
migrated_data = mig,
matrix_data = mat
)
# Vew the table
gt
```
Voila! This table is less exciting than the previous table we built, because there were only two towns whose Moody's bond rating changed from 2019 to 2021 -- all other towns' ratings stayed the same.
### Adding Final Touches
Still, we can apply some final formatting touches similar to our previous table by replacing zero values, adding the Connecticut State Flag image to the title, and more
```{r final_formatting_ct}
# Build the string to replace zeros with
zero_replace <- paste(
rep("-", 3),
collapse = ""
)
# Replace zeros
gt <- gt %>%
gt::fmt(
columns = gt::everything(),
fns = function(x) ifelse(x == 0, zero_replace, x)
)
# Create the HTML for the CT State Flag image with an embedded hyperlink to the
# bond ratings data set
ct_flag_html <- paste0(
"<a href=https://data.ct.gov/Local-Government/Municipal-Fiscal-Indicators-Bond-Ratings-2019/3w9d-7jbi/data>",
"<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Flag_of_Connecticut.svg/1200px-Flag_of_Connecticut.svg.png' ",
"style='height:60px; float:left; vertical-align:middle; padding-left: 10px;'>",
"</a>"
)
# Add some final touches
gt <- gt %>%
gt::tab_spanner(
label = "ENDING BOND RATING",
columns = -1
) %>%
gt::tab_stubhead(
label = "STARTING BOND RATING"
) %>%
gt::tab_header(
title = gt::md(
paste(
ct_flag_html,
"**Bond Rating Migration**",
"<br>*2020-01-01* ➡️ *2021-07-01*"
)
)
) %>%
gt::tab_style(
style = list(
gt::cell_text(align = "center")
),
locations = gt::cells_stub(rows = TRUE)
) %>%
gt::fmt_missing(
columns = gt::everything(),
missing_text = 0
) %>%
gt::cols_width(
eval(parse(text = paste0("c(", paste(colnames(mat), collapse = ", "), ") ~ gt::px(60)")))
) %>%
gt::tab_options(
heading.background.color = "#FFD40A",
stub.background.color = "#042984",
column_labels.background.color = "#042984",
container.width = gt::px(1000)
)
# View the final table
gt
```
From this table, we can quickly glean a few insights:
* Most towns' bond ratings stayed the same from July 2019 to January 2021, with rating **Aa2** being the most common rating (30 towns began with this rating, and all 30 of those ended with the same rating)
* One town's bond rating worsened from **Baa2** to **Baa3**, which is a one-step rating decline
* One town's bond rating improved from **B1** to **Ba3**, which is a one-step rating improvement
***
Questions about anything we did? Don't hesitate to reach out!
[[email protected]](mailto:[email protected]?subject=RStudio%202021%20Table%20Contest)