maskr introduces the masked
class that suppresses or “masks” printing
of certain elements of an atomic vector while keeping the underlying
data available for computation. Masked vectors can include numeric,
logical character or factor data.
maskr can be installed from CRAN using:
install.packages('maskr')
You can also install the development version of maskr from GitHub:
devtools::install_github('inpowell/maskr')
library(maskr)
Masked vectors can be built from atomic vectors using masked()
:
x <- 0:8
masked(x, 0 < x & x < 5)
#> <integer+masked[9]>
#> 0 n.p. n.p. n.p. n.p. 5 6 7 8
Here, the masked cells are indicated by “n.p.” (not published) by
default. We can change how they are presented using the
maskr.replacement
option:
options(maskr.replacement = '*')
masked(x, 0 < x & x < 5)
#> <integer+masked[9]>
#> 0 * * * * 5 6 7 8
options(maskr.replacement = NULL)
Other types of atomic vectors can be masked as well:
masked(letters, letters %in% c('a', 'e', 'i', 'o', 'u'))
#> <character+masked[26]>
#> n.p. b c d n.p. f g h n.p. j k l m n n.p. p
#> q r s t n.p. v w x y z
We can also use this to control which data gets displayed in data frames and cross-tables.
tabular <- tibble::tibble(
Activity = gl(4, 4, 16, labels = c("I", "II", "III", "Total")),
Region = gl(4, 1, 16, labels = c("A", "B", "C", "Total")),
Count = as.integer(c(
10, 25, 5, 40,
16, 13, 11, 40,
17, 20, 24, 61,
43, 58, 40, 141
))
)
suppress <- rep(FALSE, 16L)
suppress[c(5, 8, 9, 12)] <- TRUE
tabular$Count <- masked(tabular$Count, suppress)
tabular
#> # A tibble: 16 × 3
#> Activity Region Count
#> <fct> <fct> <int+msk>
#> 1 I A 10
#> 2 I B 25
#> 3 I C 5
#> 4 I Total 40
#> 5 II A n.p.
#> 6 II B 13
#> 7 II C 11
#> 8 II Total n.p.
#> 9 III A n.p.
#> 10 III B 20
#> 11 III C 24
#> 12 III Total n.p.
#> 13 Total A 43
#> 14 Total B 58
#> 15 Total C 40
#> 16 Total Total 141
This works with tidyverse reshaping functions like pivot_wider()
:
tidyr::pivot_wider(tabular, names_from = 'Region', values_from = 'Count')
#> # A tibble: 4 × 5
#> Activity A B C Total
#> <fct> <int+msk> <int+msk> <int+msk> <int+msk>
#> 1 I 10 25 5 40
#> 2 II n.p. 13 11 n.p.
#> 3 III n.p. 20 24 n.p.
#> 4 Total 43 58 40 141
Masked vectors support basic arithmetic, so for example we can find percentages while maintaining the correct masking pattern.
tabular |>
dplyr::group_by(Activity) |>
dplyr::mutate(Percent = 100 * Count / Count[Region == 'Total'])
#> # A tibble: 16 × 4
#> # Groups: Activity [4]
#> Activity Region Count Percent
#> <fct> <fct> <int+msk> <dbl+msk>
#> 1 I A 10 25
#> 2 I B 25 62.5
#> 3 I C 5 12.5
#> 4 I Total 40 100
#> 5 II A n.p. n.p.
#> 6 II B 13 n.p.
#> 7 II C 11 n.p.
#> 8 II Total n.p. n.p.
#> 9 III A n.p. n.p.
#> 10 III B 20 n.p.
#> 11 III C 24 n.p.
#> 12 III Total n.p. n.p.
#> 13 Total A 43 30.5
#> 14 Total B 58 41.1
#> 15 Total C 40 28.4
#> 16 Total Total 141 100
Notice that where we have divided by a masked cell, the percentage is also masked.
Using masked vectors, as opposed to just replacing values we want to suppress with missing values, means we can always recover our data before we publish it:
tabular$Count <- unmask(tabular$Count)
tabular
#> # A tibble: 16 × 3
#> Activity Region Count
#> <fct> <fct> <int>
#> 1 I A 10
#> 2 I B 25
#> 3 I C 5
#> 4 I Total 40
#> 5 II A 16
#> 6 II B 13
#> 7 II C 11
#> 8 II Total 40
#> 9 III A 17
#> 10 III B 20
#> 11 III C 24
#> 12 III Total 61
#> 13 Total A 43
#> 14 Total B 58
#> 15 Total C 40
#> 16 Total Total 141