Skip to content

Commit

Permalink
rename *_missing_reasons to *_channels
Browse files Browse the repository at this point in the history
  • Loading branch information
khusmann committed Mar 2, 2024
1 parent f59fcec commit 472bc40
Show file tree
Hide file tree
Showing 9 changed files with 44 additions and 42 deletions.
4 changes: 2 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
S3method(tbl_format_footer,deinterlaced_df)
S3method(tbl_format_header,deinterlaced_df)
S3method(tbl_format_setup,deinterlaced_df)
export(coalesce_missing_reasons)
export(coalesce_channels)
export(deinterlace_type_convert)
export(drop_missing_cols)
export(drop_value_cols)
Expand All @@ -16,7 +16,7 @@ export(icol_integer)
export(icol_logical)
export(icol_number)
export(icol_time)
export(interlace_missing_reasons)
export(interlace_channels)
export(interlacer_example)
export(missing_cols)
export(missing_names)
Expand Down
8 changes: 4 additions & 4 deletions R/coalesce_missing_reasons.R → R/coalesce_channels.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#'
#' Mutations of deinterlaced data frames can result in variables that either
#' have both values and missing reasons, or no values and no missing reasons.
#' `coalesce_missing_reasons()` takes care of both situations. In the case where
#' `coalesce_channels()` takes care of both situations. In the case where
#' there is both a value and missing reason, it will choose which to keep based
#' on the `keep` paramter. In case where no value or missing reason exists, it
#' will fill the missing reason with the `default_reason` parameter.
Expand All @@ -25,10 +25,10 @@
#' @return A deinterlaced tibble.
#'
#' @export
coalesce_missing_reasons <- function(
coalesce_channels <- function(
x,
keep = c("values", "missing"),
default_reason = getOption("default_missing_reason")
default_reason = getOption("default_missing_reason"),
keep = c("values", "missing")
) {
default_reason <- factor(default_reason %||% "UNKNOWN_REASON")
keep <- match.arg(keep)
Expand Down
4 changes: 2 additions & 2 deletions R/deinterlaced_df.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ abort_if_deinterlace_df_problems <- function(x, call = caller_call()) {

if (length(df_problems) > 0) {
cli_abort(
c(df_problems[[1]], "i" = "Run `coalesce_missing_reasons()` to fix."),
c(df_problems[[1]], "i" = "Run `coalesce_channels()` to fix."),
call = call
)
}
Expand Down Expand Up @@ -101,7 +101,7 @@ tbl_format_footer.deinterlaced_df <- function(x, setup, ...) {
extra <- format_bullets_raw(
c(
"x" = glue("Warning: {setup$interlaced_probs[[1]]}"),
"i" = glue("Run `coalesce_missing_reasons()` to fix.")
"i" = glue("Run `coalesce_channels()` to fix.")
)
)
} else {
Expand Down
2 changes: 1 addition & 1 deletion R/interlace_missing_reasons.R → R/interlace_channels.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
#' that contain both values and missing reasons.
#'
#' @export
interlace_missing_reasons <- function(x) {
interlace_channels <- function(x) {
abort_if_deinterlace_df_problems(x)

# TODO: this is another function that would benefit from native speedup
Expand Down
20 changes: 10 additions & 10 deletions man/coalesce_missing_reasons.Rd → man/coalesce_channels.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ test_that("nop if no changes are necessary", {
a = c(1, NA),
.a. = factor(c(NA, "UNKNOWN_REASON"))
) |>
coalesce_missing_reasons()
coalesce_channels()

expect_equal(result, result)
})
Expand All @@ -13,7 +13,7 @@ test_that("new missing value reasons make values disappear when keep=missing", {
a = c(1, 2),
.a. = factor(c(NA, "UNKNOWN_REASON"))
) |>
coalesce_missing_reasons(keep = "missing")
coalesce_channels(keep = "missing")

expected <- tibble(
a = c(1, NA),
Expand All @@ -28,7 +28,7 @@ test_that("new missing value reasons disappear if value available", {
a = c(1, 2),
.a. = factor(c(NA, "UNKNOWN_REASON"))
) |>
coalesce_missing_reasons()
coalesce_channels()

expected <- tibble(
a = c(1, 2),
Expand All @@ -43,7 +43,7 @@ test_that("missing (missing value) reasons result in default reason", {
a = c(1, NA),
.a. = factor(c(NA, NA))
) |>
coalesce_missing_reasons()
coalesce_channels()

expected <- tibble(
a = c(1, NA),
Expand Down
4 changes: 2 additions & 2 deletions tests/testthat/test-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ test_that("global missing reasons load properly", {

expect_equal(
result_raw,
interlace_missing_reasons(result),
interlace_channels(result),
ignore_attr = TRUE
)
})
Expand Down Expand Up @@ -84,7 +84,7 @@ test_that("column-level missing reasons can be specified with icol_*", {

expect_equal(
result_raw,
interlace_missing_reasons(result),
interlace_channels(result),
ignore_attr = TRUE
)
})
Expand Down
28 changes: 15 additions & 13 deletions vignettes/mutations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(interlacer)
```

Expand All @@ -24,7 +25,7 @@ Similarly, if a variable is missing its value AND its missing reason, it's
probably a sign we made a mistake somewhere.

This means whenever we `mutate()` the values of a variable, the missing reasons
are properly updated, and vice versa. To illustrate this, let's load some
must also be updated, and vice versa. To illustrate this, let's load some
example data:

```{r}
Expand Down Expand Up @@ -100,18 +101,18 @@ reason are absent. In next part of the mutation, we fill in the
`TECHNICAL_ERROR` missing reason for these rows into `.favorite_color.`,
resulting in a well-formed deinterlaced dataframe.

## An easier way
## An easier way with `coalesce_channels()`

As you can imagine, manually fixing the value & missing reason structure
of your dataframe for every mutation you do can get cumbersome! Luckily,
interlacer provides an easier way via `coalesce_missing_reasons()`:
interlacer provides an easier way via `coalesce_channels()`:

```{r}
df |>
mutate(
.age. = "REDACTED",
) |>
coalesce_missing_reasons(keep = "missing")
coalesce_channels(keep = "missing")
df |>
mutate(
Expand All @@ -121,10 +122,10 @@ df |>
NA
)
) |>
coalesce_missing_reasons(default_reason = "TECHNICAL_ERROR")
coalesce_channels(default_reason = "TECHNICAL_ERROR")
```

`coalesce_missing_reasons()` should be run every time you mutate something in
`coalesce_channels()` should be run every time you mutate something in
a deinterlaced dataframe. It accepts two arguments `keep`, and `default_reason`.
With these paramters set, it fixes both possible problem cases as follows:

Expand All @@ -139,21 +140,22 @@ Case 2: NEITHER a value nor a missing reason exists

These rules allow us to mutate our deinterlaced variables without needing to
specify BOTH the values and missing reason actions -- we only need to think
about our operation one channel, and then a call to `coalesce_missing_reasons()`
about our operation one channel, and then a call to `coalesce_channels()`
takes care of the other.

## Creating New Columns

`coalesce_missing_reasons()` will also automatically create missing reason
`coalesce_channels()` will also automatically create missing reason
columns if they don't automatically exist. This is useful for adding new
variables to your dataframe:

```{r}
df |>
mutate(
person_type = if_else(age < 18, "CHILD", "ADULT"),
.after = person_id
) %>%
coalesce_missing_reasons(default_reason = "AGE_UNAVAILABLE")
coalesce_channels(default_reason = "AGE_UNAVAILABLE")
```

## Writing interlaced files
Expand All @@ -168,15 +170,15 @@ write_interlaced_csv(df, "interlaced_output.csv")
This will combine the value and missing reasons into interlaced character
columns, and write the result as a csv. Alternatively, if you want to
re-interlace the columns without writing to a file for more control in the
writing process, you can use `interlace_missing_reasons()`:
writing process, you can use `interlace_channels()`:

```{r}
interlace_missing_reasons(df)
interlace_channels(df)
```

## Final note: Setting the global default reason

By default, `coalesce_missing_reasons()` will use `UNKNOWN_REASON` as the
By default, `coalesce_channels()` will use `UNKNOWN_REASON` as the
default missing reason. Sometimes you want to use a different default value,
to act as the "catch-all" missing reason, so you don't have to constantly
specify it. To do this, set the global `default_missing_reason` option:
Expand All @@ -187,5 +189,5 @@ options(default_missing_reason = -99)
tibble(
a = c(1,2,3, NA, 5)
) |>
coalesce_missing_reasons()
coalesce_channels()
```

0 comments on commit 472bc40

Please sign in to comment.