Skip to content

Commit

Permalink
add extra math note
Browse files Browse the repository at this point in the history
  • Loading branch information
khusmann committed Mar 6, 2024
1 parent a00d220 commit 72fd7cb
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 1 deletion.
33 changes: 33 additions & 0 deletions vignettes/coded-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,26 @@ df_coded |>
)
```

In fact, ANY math you do without filtering for missing codes potentially ruins
the integrity of your data:

```{r}
# This will add 1 to the age values, but ALSO add one to all of the missing
# reason codes, resulting in corrupted data!
df_coded |>
mutate(
age_next_year = age + 1,
.after = person_id
)
# This will give you your intended result, but it's easy to forget
df_coded |>
mutate(
age_next_year = if_else(age < 0, age, age + 1),
.after = person_id
)
```

Have you ever thought you had a significant result, only to find that it's
only because there are some stray missing reason codes still interlaced with
your values? It's a bad time.
Expand Down Expand Up @@ -153,6 +173,19 @@ df_decoded_deinterlaced |>
)
```

Other operations work with similar ease:

```{r}
df_decoded_deinterlaced |>
mutate(
age_next_year = age + 1,
.after = person_id
) |>
coalesce_channels(default_reason = "AGE_UNAVAILABLE")
```



## Numeric codes with character missing reasons (SAS, Stata)

Like SPSS, SAS and Stata will encode factor levels as numeric values, but
Expand Down
2 changes: 1 addition & 1 deletion vignettes/mutations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ df |>
coalesce_channels(default_reason = "AGE_UNAVAILABLE")
```

## Column joins
## Joining columns

`coalesce_channels()` should also be used when joining new columns onto
an interlaced data frame, to fill in missing reasons when no matches are found:
Expand Down

0 comments on commit 72fd7cb

Please sign in to comment.