diff --git a/404.html b/404.html
index 18fc697..7e95c80 100644
--- a/404.html
+++ b/404.html
@@ -31,7 +31,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -49,7 +49,7 @@
Articles
@@ -59,7 +59,14 @@
-
+
diff --git a/LICENSE.html b/LICENSE.html
index e7f299c..4ee1984 100644
--- a/LICENSE.html
+++ b/LICENSE.html
@@ -10,7 +10,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -27,7 +27,7 @@
Articles
@@ -35,7 +35,13 @@
-
+
@@ -43,7 +49,7 @@
diff --git a/articles/coded-data.html b/articles/coded-data.html
index e94027c..7c005a9 100644
--- a/articles/coded-data.html
+++ b/articles/coded-data.html
@@ -33,7 +33,7 @@
interlacer
-
0.2.2
+
0.3.0
@@ -51,7 +51,7 @@
Articles
@@ -61,7 +61,14 @@
-
+
@@ -76,7 +83,7 @@
Coded Data
-
+ Source: vignettes/coded-data.Rmd
coded-data.Rmd
@@ -84,27 +91,33 @@
In addition to interlacing values and missing reasons, many
statistical software packages will store categorical values and missing
-reasons as alphanumeric codes. These codes are often chosen so that
-numeric comparisons or casts can be used to determine if a value
-represents a real value or missing reason. Like 8-character variable
-name limits, this practice comes from a historical need to save digital
-storage space even if it made analyses less readable and more
-error-prone.
-Even though storage is cheap these days, coded formats continue to be
-the standard format used by statistical software packages like SPSS,
-SAS, and Stata. This article will describe these common coding schemes
-and how they can be decoded and deinterlaced to make them easier to work
-with in R.
+reasons as alphanumeric codes. Working with these files can be a pain
+because the codes are often arbitrary magic
+numbers that obfuscate the meaning of your syntax and results.
+To facilitate working with such data, interlacer provides a new
+cfactor
type. The cfactor
allows you to attach
+labels to coded data and work with it as a regular R
+factor
. Unlike a regular R factor
, however, a
+cfactor
can be converted back into its coded representation
+at any time (whereas R factor
values lose their original
+codes).
+
+
⚠️ ⚠️ ⚠️ WARNING ⚠️ ⚠️ ⚠️
+
+
The cfactor
type is a highly experimental feature (even
+compared to the rest of interlacer) and has not been thoroughly tested!
+I’m sharing them in a super pre-alpha, unstable state to get feedback on
+them before I invest more time polishing their implementation.
+
-
Numeric codes with negative missing reasons (SPSS)
+SPSS-style codes
- It’s extremely common to find data sources that encode all
-categorical responses as numeric values, with negative values
-representing missing reason codes. SPSS is one such example. Here’s an
-SPSS-formatted version of the colors.csv
example:
+
As a motivating example, consider this coded version of the
+colors.csv
example:
library ( readr )
-library ( interlacer , warn.conflicts = FALSE )
+library ( dplyr , warn.conflicts = FALSE )
+library ( interlacer , warn.conflicts = FALSE )
read_file (
interlacer_example ( "colors_coded.csv" )
@@ -134,211 +147,148 @@ Numeric codes with neg
2
: RED
3
: YELLOW
-This format gives you the ability to load everything as a numeric
-type:
+This style of coding, with positive values representing categorical
+levels and negative values representing missing values, is a common
+format used by SPSS.
+These data can be loaded as interlaced numeric values as follows:
-( df_coded <- read_csv (
+( df_coded <- read_interlaced_csv (
interlacer_example ( "colors_coded.csv" ) ,
- col_types = "n"
+ na = c ( - 99 , - 98 , - 97 )
) )
#> # A tibble: 11 × 3
-#> person_id age favorite_color
-#> <dbl> <dbl> <dbl>
-#> 1 1 20 1
-#> 2 2 -98 1
-#> 3 3 21 -98
-#> 4 4 30 -97
-#> 5 5 1 -99
-#> 6 6 41 2
-#> 7 7 50 -97
-#> 8 8 30 3
-#> 9 9 -98 -98
-#> 10 10 -97 2
-#> 11 11 10 -98
-To test if a value is a missing code, you can check if it’s less than
-0:
+#> person_id age favorite_color
+#> <dbl,int> <dbl,int> <dbl,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <-98>
+#> 4 4 30 <-97>
+#> 5 5 1 <-99>
+#> 6 6 41 2
+#> 7 7 50 <-97>
+#> 8 8 30 3
+#> 9 9 <-98> <-98>
+#> 10 10 <-97> 2
+#> 11 11 10 <-98>
+
This representation is awkward to work with because the codes are
+meaningless and obfuscate the significance of any code you write or any
+results you output. If you wanted select everyone with a
+BLUE
favorite color, for example, you would write:
-library ( dplyr , warn.conflicts = FALSE )
-
-df_coded |>
- mutate (
- age = if_else ( age > 0 , age , NA )
- ) |>
- summarize (
- mean_age = mean ( age , na.rm = TRUE ) ,
- n = n ( ) ,
- .by = favorite_color
- ) |>
- arrange ( favorite_color )
-#> # A tibble: 6 × 3
-#> favorite_color mean_age n
-#> <dbl> <dbl> <int>
-#> 1 -99 1 1
-#> 2 -98 15.5 3
-#> 3 -97 40 2
-#> 4 1 20 2
-#> 5 2 41 2
-#> 6 3 30 1
-
The downsides of this approach are twofold: 1) all of your values and
-missing reasons become codes you have to remember and 2) it’s really
-easy to make mistakes.
-
What sort of mistakes? Well, because everything is numeric, there’s
-nothing stopping us from treating missing reason codes as if they are
-regular values… If you forget to remove your missing reason codes, R
-will still happily compute aggregations using the negative numbers!
+
df_coded |>
+ filter ( favorite_color == 1 )
+#> # A tibble: 2 × 3
+#> person_id age favorite_color
+#> <dbl,int> <dbl,int> <dbl,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+Similarly, if you wanted to filter for OMITTED
favorite
+colors, you would write:
df_coded |>
- mutate (
-# age = if_else(age > 0, age, NA)
- ) |>
- summarize (
- mean_age = mean ( age , na.rm = TRUE ) ,
- n = n ( ) ,
- .by = favorite_color
- ) |>
- arrange ( favorite_color )
-#> # A tibble: 6 × 3
-#> favorite_color mean_age n
-#> <dbl> <dbl> <int>
-#> 1 -99 1 1
-#> 2 -98 -22.3 3
-#> 3 -97 40 2
-#> 4 1 -39 2
-#> 5 2 -28 2
-#> 6 3 30 1
-In fact, ANY math you do without filtering for missing codes
-potentially ruins the integrity of your data:
+ filter ( favorite_color == na ( - 97 ) )
+#> # A tibble: 2 × 3
+#> person_id age favorite_color
+#> <dbl,int> <dbl,int> <dbl,int>
+#> 1 4 30 <-97>
+#> 2 7 50 <-97>
+To make these data more ergnomic to work with, you can use
+interlacer’s v_col_cfactor()
and
+na_col_cfactor()
collector types to load these values as a
+cfactor
instead, which allows you to associate codes with
+human-readable labels:
-# This will add 1 to the age values, but ALSO add one to all of the missing
-# reason codes, resulting in corrupted data!
-df_coded |>
- mutate (
- age_next_year = age + 1 ,
- )
-#> # A tibble: 11 × 4
-#> person_id age favorite_color age_next_year
-#> <dbl> <dbl> <dbl> <dbl>
-#> 1 1 20 1 21
-#> 2 2 -98 1 -97
-#> 3 3 21 -98 22
-#> 4 4 30 -97 31
-#> 5 5 1 -99 2
-#> 6 6 41 2 42
-#> 7 7 50 -97 51
-#> 8 8 30 3 31
-#> 9 9 -98 -98 -97
-#> 10 10 -97 2 -96
-#> 11 11 10 -98 11
-
-
-# This will give you your intended result, but it's easy to forget
-df_coded |>
- mutate (
- age_next_year = if_else ( age < 0 , age , age + 1 ) ,
- )
-#> # A tibble: 11 × 4
-#> person_id age favorite_color age_next_year
-#> <dbl> <dbl> <dbl> <dbl>
-#> 1 1 20 1 21
-#> 2 2 -98 1 -98
-#> 3 3 21 -98 22
-#> 4 4 30 -97 31
-#> 5 5 1 -99 2
-#> 6 6 41 2 42
-#> 7 7 50 -97 51
-#> 8 8 30 3 31
-#> 9 9 -98 -98 -98
-#> 10 10 -97 2 -97
-#> 11 11 10 -98 11
-Have you ever thought you had a significant result, only to find that
-it’s only because there are some stray missing reason codes still
-interlaced with your values? It’s a bad time.
-You’re much better off loading these formats with interlacer as
-factors, then converting the codes into labels:
-
( df_decoded <- read_interlaced_csv (
interlacer_example ( "colors_coded.csv" ) ,
- na = c ( - 99 , - 98 , - 97 ) ,
- show_col_types = FALSE ,
-) |>
- mutate (
- across (
- everything ( ) ,
- \( x ) map_na_channel (
- x ,
- \( v ) factor (
- v ,
- levels = c ( - 99 , - 98 , - 97 ) ,
- labels = c ( "N/A" , "REFUSED" , "OMITTED" ) ,
- )
- )
- ) ,
- favorite_color = map_value_channel (
- favorite_color ,
- \( v ) factor (
- v ,
- levels = c ( 1 , 2 , 3 ) ,
- labels = c ( "BLUE" , "RED" , "YELLOW" )
- )
- ) ,
- ) )
+ col_types = x_cols (
+ favorite_color = v_col_cfactor ( codes = c ( BLUE = 1 , RED = 2 , YELLOW = 3 ) ) ,
+ ) ,
+ na = na_col_cfactor ( REFUSED = - 99 , OMITTED = - 98 , `N/A` = - 97 )
+) )
#> # A tibble: 11 × 3
-#> person_id age favorite_color
-#> <dbl,fct> <dbl,fct> <fct,fct>
-#> 1 1 20 BLUE
-#> 2 2 <REFUSED> BLUE
-#> 3 3 21 <REFUSED>
-#> 4 4 30 <OMITTED>
-#> 5 5 1 <N/A>
-#> 6 6 41 RED
-#> 7 7 50 <OMITTED>
-#> 8 8 30 YELLOW
-#> 9 9 <REFUSED> <REFUSED>
-#> 10 10 <OMITTED> RED
-#> 11 11 10 <REFUSED>
-Now aggregations won’t mix up values and missing codes, and you won’t
-have to keep cross-referencing your codebook to know what values
-mean:
+#> person_id age favorite_color
+#> <dbl,cfct> <dbl,cfct> <cfct,cfct>
+#> 1 1 20 BLUE
+#> 2 2 <OMITTED> BLUE
+#> 3 3 21 <OMITTED>
+#> 4 4 30 <N/A>
+#> 5 5 1 <REFUSED>
+#> 6 6 41 RED
+#> 7 7 50 <N/A>
+#> 8 8 30 YELLOW
+#> 9 9 <OMITTED> <OMITTED>
+#> 10 10 <N/A> RED
+#> 11 11 10 <OMITTED>
+Now human-readable labels, instead of the magic codes, can be used
+when working with the data:
+
+df_decoded |>
+ filter ( favorite_color == "BLUE" )
+#> # A tibble: 2 × 3
+#> person_id age favorite_color
+#> <dbl,cfct> <dbl,cfct> <cfct,cfct>
+#> 1 1 20 BLUE
+#> 2 2 <OMITTED> BLUE
+
+
+df_decoded |>
+ filter ( favorite_color == na ( "OMITTED" ) )
+#> # A tibble: 3 × 3
+#> person_id age favorite_color
+#> <dbl,cfct> <dbl,cfct> <cfct,cfct>
+#> 1 3 21 <OMITTED>
+#> 2 9 <OMITTED> <OMITTED>
+#> 3 11 10 <OMITTED>
+But you can still convert the labels of values or missing reasons
+back to codes if you wish, using as.codes()
. The following
+will convert the missing reason channel of age
and the
+value channel of the favorite_color
into their coded
+representation:
df_decoded |>
- summarize (
- mean_age = mean ( age , na.rm = TRUE ) ,
- n = n ( ) ,
- .by = favorite_color
- ) |>
- arrange ( favorite_color )
-#> # A tibble: 6 × 3
-#> favorite_color mean_age n
-#> <fct,fct> <dbl> <int>
-#> 1 BLUE 20 2
-#> 2 RED 41 2
-#> 3 YELLOW 30 1
-#> 4 <N/A> 1 1
-#> 5 <REFUSED> 15.5 3
-#> 6 <OMITTED> 40 2
-Other operations work with similar ease:
+ mutate (
+ age = map_na_channel ( age , as.codes ) ,
+ favorite_color = map_value_channel ( favorite_color , as.codes )
+ )
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <dbl,cfct> <dbl,int> <int,cfct>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <OMITTED>
+#> 4 4 30 <N/A>
+#> 5 5 1 <REFUSED>
+#> 6 6 41 2
+#> 7 7 50 <N/A>
+#> 8 8 30 3
+#> 9 9 <-98> <OMITTED>
+#> 10 10 <-97> 2
+#> 11 11 10 <OMITTED>
+To recode all cfactor
channels in a data frame into
+their coded representation you can do the following:
df_decoded |>
mutate (
- age_next_year = age + 1 ,
+ across_value_channels ( where_value_channel ( is.cfactor ) , as.codes ) ,
+ across_na_channels ( where_na_channel ( is.cfactor ) , as.codes ) ,
)
-#> # A tibble: 11 × 4
-#> person_id age favorite_color age_next_year
-#> <dbl,fct> <dbl,fct> <fct,fct> <dbl>
-#> 1 1 20 BLUE 21
-#> 2 2 <REFUSED> BLUE NA
-#> 3 3 21 <REFUSED> 22
-#> 4 4 30 <OMITTED> 31
-#> 5 5 1 <N/A> 2
-#> 6 6 41 RED 42
-#> 7 7 50 <OMITTED> 51
-#> 8 8 30 YELLOW 31
-#> 9 9 <REFUSED> <REFUSED> NA
-#> 10 10 <OMITTED> RED NA
-#> 11 11 10 <REFUSED> 11
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <dbl,int> <dbl,int> <int,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <-98>
+#> 4 4 30 <-97>
+#> 5 5 1 <-99>
+#> 6 6 41 2
+#> 7 7 50 <-97>
+#> 8 8 30 3
+#> 9 9 <-98> <-98>
+#> 10 10 <-97> 2
+#> 11 11 10 <-98>
-
Numeric codes with character missing reasons (SAS, Stata)
+SAS- and Stata-style codes
Like SPSS, SAS and Stata will encode factor levels as numeric values,
but instead of representing missing reasons as negative codes, they are
@@ -360,139 +310,183 @@
Numeric codes wi
#> 9,.a,.a
#> 10,.b,2
#> 11,10,.a
-Here, the same value codes are used as the previous example, except
-the missing reasons are coded as follows:
+In this example, the same value coding scheme is used for
+favorite_color
as the previous example, except the missing
+reason channels are coded as follows:
-"."
: N/A
-".a"
: REFUSED
-".b"
: OMITTED
+“.”: N/A
+“.a”: REFUSED
+“.b”: OMITTED
-To handle these missing reasons without interlacer, columns must be
-loaded as character vectors:
+These data can be easily loaded by interlacer into a
+cfactor
missing reason channel as follows:
-( df_coded_char <- read_csv (
- interlacer_example ( "colors_coded_char.csv" ) ,
- col_types = "c"
-) )
-#> # A tibble: 11 × 3
-#> person_id age favorite_color
-#> <chr> <chr> <chr>
-#> 1 1 20 1
-#> 2 2 .a 1
-#> 3 3 21 .a
-#> 4 4 30 .b
-#> 5 5 1 .
-#> 6 6 41 2
-#> 7 7 50 .b
-#> 8 8 30 3
-#> 9 9 .a .a
-#> 10 10 .b 2
-#> 11 11 10 .a
-To test if a value is missing, they can be cast to numeric types. If
-the cast fails, you know it’s a missing code. If it is successful, you
-know it’s a coded value.
-
-df_coded_char |>
- mutate (
- age = if_else ( ! is.na ( as.numeric ( age ) ) , as.numeric ( age ) , NA )
- ) |>
- summarize (
- mean_age = mean ( age , na.rm = TRUE ) ,
- n = n ( ) ,
- .by = favorite_color
- ) |>
- arrange ( favorite_color )
-#> Warning: There were 2 warnings in `mutate()`.
-#> The first warning was:
-#> ℹ In argument: `age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA)`.
-#> Caused by warning in `is_logical()`:
-#> ! NAs introduced by coercion
-#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
-#> # A tibble: 6 × 3
-#> favorite_color mean_age n
-#> <chr> <dbl> <int>
-#> 1 . 1 1
-#> 2 .a 15.5 3
-#> 3 .b 40 2
-#> 4 1 20 2
-#> 5 2 41 2
-#> 6 3 30 1
-Although the character missing codes help prevent us from mistakenly
-including missing codes in value aggregations, having to cast our
-columns to numeric all the time to check for missingness is hardly
-ergonomic, and generates annoying warnings. Like before, it’s easier to
-import with interlacer and decode the values and missing reasons:
-
read_interlaced_csv (
interlacer_example ( "colors_coded_char.csv" ) ,
- na = c ( "." , ".a" , ".b" ) ,
- show_col_types = FALSE ,
-) |>
- mutate (
- across (
- everything ( ) ,
- \( x ) map_na_channel (
- x ,
- \( v ) factor (
- v ,
- levels = c ( "." , ".a" , ".b" ) ,
- labels = c ( "N/A" , "REFUSED" , "OMITTED" )
- )
- )
- ) ,
- favorite_color = map_value_channel (
- favorite_color ,
- \( v ) factor (
- v ,
- levels = c ( 1 , 2 , 3 ) ,
- labels = c ( "BLUE" , "RED" , "YELLOW" )
- )
- )
- )
+ col_types = x_cols (
+ favorite_color = v_col_cfactor ( codes = c ( BLUE = 1 , RED = 2 , YELLOW = 3 ) ) ,
+ ) ,
+ na = c ( `N/A` = "." , REFUSED = ".a" , OMITTED = ".b" ) ,
+)
#> # A tibble: 11 × 3
-#> person_id age favorite_color
-#> <dbl,fct> <dbl,fct> <fct,fct>
-#> 1 1 20 BLUE
-#> 2 2 <REFUSED> BLUE
-#> 3 3 21 <REFUSED>
-#> 4 4 30 <OMITTED>
-#> 5 5 1 <N/A>
-#> 6 6 41 RED
-#> 7 7 50 <OMITTED>
-#> 8 8 30 YELLOW
-#> 9 9 <REFUSED> <REFUSED>
-#> 10 10 <OMITTED> RED
-#> 11 11 10 <REFUSED>
+#> person_id age favorite_color
+#> <dbl,cfct> <dbl,cfct> <cfct,cfct>
+#> 1 1 20 BLUE
+#> 2 2 <REFUSED> BLUE
+#> 3 3 21 <REFUSED>
+#> 4 4 30 <OMITTED>
+#> 5 5 1 <N/A>
+#> 6 6 41 RED
+#> 7 7 50 <OMITTED>
+#> 8 8 30 YELLOW
+#> 9 9 <REFUSED> <REFUSED>
+#> 10 10 <OMITTED> RED
+#> 11 11 10 <REFUSED>
-
Encoding a decoded & deinterlaced data frame.
+The cfactor
type
- Re-coding and re-interlacing a data frame can be done as follows:
+
The cfactor
is an extension of base R’s
+factor
type. They are created from numeric
or
+character
codes using the cfactor()
+function:
+
+( example_cfactor <- cfactor (
+ c ( 10 , 20 , 30 , 10 , 20 , 30 ) ,
+ codes = c ( LEVEL_A = 10 , LEVEL_B = 20 , LEVEL_C = 30 )
+) )
+#> <cfactor<int+bd96a>[6]>
+#> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C
+#>
+#> Categorical levels:
+#> label code
+#> LEVEL_A 10
+#> LEVEL_B 20
+#> LEVEL_C 30
+
+
+( example_cfactor2 <- cfactor (
+ c ( "a" , "b" , "c" , "a" , "b" , "c" ) ,
+ codes = c ( LEVEL_A = "a" , LEVEL_B = "b" , LEVEL_C = "c" )
+) )
+#> <cfactor<chr+99cda>[6]>
+#> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C
+#>
+#> Categorical levels:
+#> label code
+#> LEVEL_A a
+#> LEVEL_B b
+#> LEVEL_C c
+
cfactor
vectors can be used wherever regular base R
+factor
types are used, because they are fully-compatible
+factor
types:
+
+levels ( example_cfactor )
+#> [1] "LEVEL_A" "LEVEL_B" "LEVEL_C"
+
+
+levels ( example_cfactor2 )
+#> [1] "LEVEL_A" "LEVEL_B" "LEVEL_C"
+
But unlike a regular factor
, a cfactor
+additionally stores the codes for the factor levels. This means you can
+convert it back into its coded representation at any time, if
+desired:
+
+codes ( example_cfactor )
+#> LEVEL_A LEVEL_B LEVEL_C
+#> 10 20 30
+
+as.codes ( example_cfactor )
+#> [1] 10 20 30 10 20 30
+
+
+codes ( example_cfactor2 )
+#> LEVEL_A LEVEL_B LEVEL_C
+#> "a" "b" "c"
+
+as.codes ( example_cfactor2 )
+#> [1] "a" "b" "c" "a" "b" "c"
+
IMPORTANT: The as.numeric()
and
+as.integer()
functions do not convert a
+cfactor
with numeric codes into its coded representation.
+Instead, in order to retain full compatibility with the base R
+factor
type, it always returns a result coded by the
+index of each level in the factor:
+
+
+
When the levels are changed, the cfactor
will drop its
+codes and degrade into a regular R factor:
+
+cfactor_copy <- example_cfactor
-df_decoded |>
+# cfactory_copy is a cfactor and a factor
+is.cfactor ( cfactor_copy )
+#> [1] TRUE
+
+
+levels ( cfactor_copy )
+#> [1] "LEVEL_A" "LEVEL_B" "LEVEL_C"
+
+codes ( cfactor_copy )
+#> LEVEL_A LEVEL_B LEVEL_C
+#> 10 20 30
+
+
+# modify the levels of the cfactor as if it was a regular factor
+levels ( cfactor_copy ) <- c ( "C" , "B" , "A" )
+
+# now cfactor_copy is just a regular factor
+is.cfactor ( cfactor_copy )
+#> [1] FALSE
+
+
+levels ( cfactor_copy )
+#> [1] "C" "B" "A"
+
+codes ( cfactor_copy )
+#> NULL
+
Finally, if you have a base R factor
or character vector
+of labels, you can add codes to them via as.cfactor()
:
+
+as.cfactor (
+ c ( "LEVEL_A" , "LEVEL_B" , "LEVEL_C" , "LEVEL_A" , "LEVEL_B" , "LEVEL_C" ) ,
+ codes = c ( LEVEL_A = 10 , LEVEL_B = 20 , LEVEL_C = 30 )
+)
+#> <cfactor<int+bd96a>[6]>
+#> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C
+#>
+#> Categorical levels:
+#> label code
+#> LEVEL_A 10
+#> LEVEL_B 20
+#> LEVEL_C 30
+
+
+
Re-coding and writing an interlaced data frame.
+
+
Re-coding and writing an interlaced data frame is as simple as
+calling as.codes()
on all cfactor
type value
+and missing reason channels, and then calling one of the
+write_interlaced_*()
family of functions:
+
+df_decoded |>
mutate (
- across (
- everything ( ) ,
- \( x ) map_na_channel (
- x ,
- \( v ) fct_recode ( v ,
- `-99` = "N/A" ,
- `-98` = "REFUSED" ,
- `-97` = "OMITTED"
- )
- )
- ) ,
- favorite_color = map_value_channel (
- favorite_color ,
- \( v ) fct_recode (
- v ,
- `1` = "BLUE" ,
- `2` = "RED" ,
- `3` = "YELLOW"
- )
- )
+ across_value_channels ( where_value_channel ( is.cfactor ) , as.codes ) ,
+ across_na_channels ( where_na_channel ( is.cfactor ) , as.codes ) ,
) |>
write_interlaced_csv ( "output.csv" )
@@ -502,7 +496,7 @@ haven
The haven package has
functions for loading native SPSS, SAS, and Stata native file formats
into special data frames that use column attributes and special values
-to keep track of interlaced values and missing reasons. For a complete
+to keep track of value labels and missing reasons. For a complete
discussion of how this compares to interlacer’s approach, see
vignette("other-approaches")
.
diff --git a/articles/extended-column-types.html b/articles/extended-column-types.html
new file mode 100644
index 0000000..56c2f4c
--- /dev/null
+++ b/articles/extended-column-types.html
@@ -0,0 +1,395 @@
+
+
+
+
+
+
+
+
+Extended Column Types • interlacer
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Skip to contents
+
+
+
+
+
+
+
+
+
+
+
+
+
+Like the readr::read_*()
family of functions,
+read_interlaced_*()
will automatically guess column types
+by default:
+
+library ( interlacer , warn.conflicts = FALSE )
+
+( read_interlaced_csv (
+ interlacer_example ( "colors.csv" ) ,
+ na = c ( "REFUSED" , "OMITTED" , "N/A" )
+) )
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <dbl,fct> <dbl,fct> <chr,fct>
+#> 1 1 20 BLUE
+#> 2 2 <REFUSED> BLUE
+#> 3 3 21 <REFUSED>
+#> 4 4 30 <OMITTED>
+#> 5 5 1 <N/A>
+#> 6 6 41 RED
+#> 7 7 50 <OMITTED>
+#> 8 8 30 YELLOW
+#> 9 9 <REFUSED> <REFUSED>
+#> 10 10 <OMITTED> RED
+#> 11 11 10 <REFUSED>
+As with readr, these column type guess can be overridden using the
+col_types
parameter with a readr::cols()
+column specification:
+
+library ( readr )
+
+( read_interlaced_csv (
+ interlacer_example ( "colors.csv" ) ,
+ col_types = cols (
+ person_id = col_integer ( ) ,
+ age = col_number ( ) ,
+ favorite_color = col_factor ( levels = c ( "BLUE" , "RED" , "YELLOW" , "GREEN" ) )
+ ) ,
+ na = c ( "REFUSED" , "OMITTED" , "N/A" )
+) )
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <int,fct> <dbl,fct> <fct,fct>
+#> 1 1 20 BLUE
+#> 2 2 <REFUSED> BLUE
+#> 3 3 21 <REFUSED>
+#> 4 4 30 <OMITTED>
+#> 5 5 1 <N/A>
+#> 6 6 41 RED
+#> 7 7 50 <OMITTED>
+#> 8 8 30 YELLOW
+#> 9 9 <REFUSED> <REFUSED>
+#> 10 10 <OMITTED> RED
+#> 11 11 10 <REFUSED>
+
+
+x_cols
: extended cols
specifications
+
+
When you need more fine-grained control over value and missing reason
+channel types, you can use an x_cols()
specification, an
+extension of readr’s readr::cols()
system. With
+x_cols()
, you can control both the value channel and na
+channel types in the columns of the resulting data frame.
+
This is useful when you have missing reasons that only apply to
+particular items as opposed to the file as a whole. For example, say we
+had a measure with the following two items:
+
+What is your current stress level?
+
+
+
+Low
+Moderate
+High
+I don’t know
+I don’t understand the question
+
+
+
+How well do you feel you manage your time and responsibilities
+today?
+
+
+
+Poorly
+Fairly well
+Well
+Very well
+Does not apply (Today was a vacation day)
+Does not apply (Other reason)
+
+
+
As you can see, both items have two selection choices that should be
+mapped to missing reasons. These can be specified with the
+x_cols()
as follows:
+
+( df_stress <- read_interlaced_csv (
+ interlacer_example ( "stress.csv" ) ,
+ col_types = x_cols (
+ person_id = x_col (
+ v_col_integer ( ) ,
+ na_col_none ( )
+ ) ,
+ current_stress = x_col (
+ v_col_factor ( levels = c ( "LOW" , "MODERATE" , "HIGH" ) ) ,
+ na_col_factor ( "DONT_KNOW" , "DONT_UNDERSTAND" )
+ ) ,
+ time_management = x_col (
+ v_col_factor ( levels = c ( "POORLY" , "FAIRLY_WELL" , "WELL" , "VERY_WELL" ) ) ,
+ na_col_factor ( "NA_VACATION" , "NA_OTHER" )
+ )
+ )
+) )
+#> # A tibble: 8 × 3
+#> person_id current_stress time_management
+#> <int> <fct,fct> <fct,fct>
+#> 1 1 LOW VERY_WELL
+#> 2 2 MODERATE POORLY
+#> 3 3 <DONT_KNOW> <NA_OTHER>
+#> 4 4 HIGH POORLY
+#> 5 5 <DONT_UNDERSTAND> <NA_OTHER>
+#> 6 6 LOW <NA_VACATION>
+#> 7 7 MODERATE WELL
+#> 8 8 <DONT_KNOW> FAIRLY_WELL
+
Like readr’s readr::cols()
function, each named
+x_cols()
describes a column in the resulting data frame.
+Value and missing reason channel types are declared via calls to
+v_col_*()
and na_col_*()
respectively, which
+are assembled by x_col()
.
+
v_col_*()
types mirror readr’s readr::col_*
+column “collectors”. So v_col_double()
is equivalent to
+readr::col_double()
, v_col_character()
is
+equivalent to readr::col_character()
, etc. See vroom’s
+documentation for a list of
+available column types .
+
na_col_*()
collectors allow you to declare missing
+reason channel type of the loaded column, and the values that should be
+interpreted as missing reasons. Currently, there are five options:
+
+na_col_default()
: Use the collector defined by the
+na =
argument in the read_interlaced_*()
+function
+na_col_none()
: Load the column without a missing
+reason channel.
+na_col_factor()
: Use a factor missing reason
+channel. Character arguments passed form the levels of the factor.
+(e.g. na_col_factor("REFUSED", "OMITTED", "N/A")
)
+na_col_integer()
: Use an integer na channel. Numeric
+arguments passed are the values to be interpreted as missing values.
+(e.g. na_col_integer(-99, -98, -97))
)
+na_col_cfactor()
: Use a cfactor
na
+channel. (cfactor
types will be covered in the next
+vignette,vignette("coded-data")
)
+
+
The following example shows some of these collectors in action. In
+this example we use a coded version of the colors.csv
+example data, to demonstrate integer missing reason types:
+
+read_interlaced_csv (
+ interlacer_example ( "colors_coded.csv" ) ,
+ col_types = x_cols (
+ person_id = x_col ( v_col_integer ( ) , na_col_none ( ) ) ,
+ age = x_col ( v_col_double ( ) , na_col_integer ( - 99 , - 98 , - 97 ) ) ,
+ favorite_color = x_col ( v_col_integer ( ) , na_col_integer ( - 99 , - 98 , - 97 ) )
+ )
+)
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <int> <dbl,int> <int,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <-98>
+#> 4 4 30 <-97>
+#> 5 5 1 <-99>
+#> 6 6 41 2
+#> 7 7 50 <-97>
+#> 8 8 30 3
+#> 9 9 <-98> <-98>
+#> 10 10 <-97> 2
+#> 11 11 10 <-98>
+
+
+
Shortcuts
+
+
+
Default collector types
+
+
Like readr’s cols()
function, the x_cols()
+function accepts a .default
argument that specifies a
+default value collector. The na =
argument is similarly
+used to specify a default missing reason collector to be used when no
+na_col_*()
is specified, or when it is set to
+na_col_default()
.
+
By taking advantage of these defaults, the specification in the last
+example could have been equivalently written as:
+
+read_interlaced_csv (
+ interlacer_example ( "colors_coded.csv" ) ,
+ col_types = x_cols (
+ .default = v_col_integer ( ) ,
+ person_id = x_col ( v_col_integer ( ) , na_col_none ( ) ) ,
+ age = v_col_double ( ) ,
+ ) ,
+ na = na_col_integer ( - 99 , - 98 , - 97 )
+)
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <int> <dbl,int> <int,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <-98>
+#> 4 4 30 <-97>
+#> 5 5 1 <-99>
+#> 6 6 41 2
+#> 7 7 50 <-97>
+#> 8 8 30 3
+#> 9 9 <-98> <-98>
+#> 10 10 <-97> 2
+#> 11 11 10 <-98>
+
+
+
Concise value and missing reason specifications
+
+
Like readr, value collectors can be specified using characters. For
+example, instead of v_col_integer()
, you can use
+"i"
. See vroom’s documentation for a complete
+list of these shortcuts .
+
Similarly, missing reason collectors can be specified by providing a
+vector of the missing values; the collector type is inferred via the
+type of the vector. The conversions are as follows:
+
+
Using these shortcuts, the previous example could have equivalently
+been written in more compact form as follows:
+
+read_interlaced_csv (
+ interlacer_example ( "colors_coded.csv" ) ,
+ col_types = x_cols (
+ .default = "i" ,
+ person_id = x_col ( "i" , NULL ) ,
+ age = "d" ,
+ ) ,
+ na = c ( - 99 , - 98 , - 97 )
+)
+#> # A tibble: 11 × 3
+#> person_id age favorite_color
+#> <int> <dbl,int> <int,int>
+#> 1 1 20 1
+#> 2 2 <-98> 1
+#> 3 3 21 <-98>
+#> 4 4 30 <-97>
+#> 5 5 1 <-99>
+#> 6 6 41 2
+#> 7 7 50 <-97>
+#> 8 8 30 3
+#> 9 9 <-98> <-98>
+#> 10 10 <-97> 2
+#> 11 11 10 <-98>
+
+
+
+
Next steps
+
+
In this vignette we covered how the column types for values and
+missing reasons can be explicitly specified using collectors. We also
+illustrated how column-level missing values can be specified by creating
+extended column type specifications using x_cols()
.
+
In the final examples, we used an example data set with coded values
+and missing reasons. Coded values are especially common in data sets
+produced by SPSS, SAS, and Stata. interlacer provides a special column
+type to make working with this sort of data easier: the
+cfactor
type. This will be covered in next vignette,
+vignette("coded-data")
.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/articles/index.html b/articles/index.html
index 8d4f2e7..7402f52 100644
--- a/articles/index.html
+++ b/articles/index.html
@@ -10,7 +10,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -27,7 +27,7 @@
Articles
@@ -35,7 +35,13 @@
-
+
@@ -49,7 +55,7 @@
Articles
- NA Column Types
+ Extended Column Types
Coded Data
diff --git a/articles/interlacer.html b/articles/interlacer.html
index 647f2ba..cb58443 100644
--- a/articles/interlacer.html
+++ b/articles/interlacer.html
@@ -33,7 +33,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -51,7 +51,7 @@
Articles
@@ -61,7 +61,14 @@
-
+
@@ -76,7 +83,7 @@
Introduction to interlacer
-
+ Source: vignettes/interlacer.Rmd
interlacer.Rmd
@@ -86,7 +93,7 @@
as special values or codes. For example, consider the following CSV:
library ( readr )
-library ( interlacer , warn.conflicts = FALSE )
+library ( interlacer , warn.conflicts = FALSE )
read_file ( interlacer_example ( "colors.csv" ) ) |>
cat ( )
@@ -112,8 +119,7 @@
diff --git a/articles/other-approaches.html b/articles/other-approaches.html
index a405f01..aa60f85 100644
--- a/articles/other-approaches.html
+++ b/articles/other-approaches.html
@@ -33,7 +33,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -51,7 +51,7 @@
Articles
@@ -61,7 +61,14 @@
-
+
@@ -76,7 +83,7 @@
Other Approaches
-
+ Source: vignettes/other-approaches.Rmd
other-approaches.Rmd
@@ -107,7 +114,7 @@ “Labelled” missing value
haven::read_spss()
, values and missing reasons are loaded
into a single interlaced numeric vector:
-
-
-
Questions for the future
-
-
-
1. More flexible missing reason channel types?
-
-
Earlier versions allowed arbitrary types to occupy the missing reason
-channel (i.e. it was a fully generic Result<Value, Missing> type).
-I ended up constricting the missing reason channel to only allow
-integer
or factor
types to help simplify the
-na_cols()
specifications. When arbitrary types are allowed,
-the na_cols()
specs become quite long (e.g.
-column_name = factor(levels=c("REASON_1", "REASON_2")))
).
-As far as I can tell, in 99.9% of the time, it is preferable to use
-integer
and factor
missing reason channels
-over double
and character
ones, so for now
-I’ve made the executive decision to only allow integer
and
-factor
types.
-
-
-
2. A better na_cols()
specification?
-
-
Right now, missing values are supplied in a separate argument from
-col_types
. This means custom missing values get pretty far
-separated from their col_type
definitions:
-
-read_interlaced_csv (
- interlacer_example ( "stress.csv" ) ,
- col_types = cols (
- person_id = col_integer ( ) ,
- current_stress = col_factor (
- levels = c ( "LOW" , "MODERATE" , "HIGH" )
- ) ,
- time_management = col_factor (
- levels = c ( "POORLY" , "FAIRLY_WELL" , "WELL" , "VERY_WELL" )
- )
- ) ,
- na = na_cols (
- .default = c ( "REFUSED" , "OMITTED" , "N/A" ) ,
- current_stress = c ( .default , "DONT_KNOW" , "DONT_UNDERSTAND" ) ,
- time_management = c ( .default , "NA_VACATION" , "NA_OTHER" )
- )
-)
-#> # A tibble: 8 × 3
-#> person_id current_stress time_management
-#> <int,fct> <fct,fct> <fct,fct>
-#> 1 1 LOW VERY_WELL
-#> 2 2 MODERATE POORLY
-#> 3 3 <DONT_KNOW> <NA_OTHER>
-#> 4 4 HIGH POORLY
-#> 5 5 <DONT_UNDERSTAND> <NA_OTHER>
-#> 6 6 LOW <NA_VACATION>
-#> 7 7 MODERATE WELL
-#> 8 8 <OMITTED> FAIRLY_WELL
-
In an earlier version I created an extension of readr collectors, a
-family of icol_*
types, that allowed you to do something
-like this:
-
-read_interlaced_csv (
- interlacer_example ( "stress.csv" ) ,
- col_types = cols (
- person_id = col_integer ( ) ,
- current_stress = icol_factor (
- levels = c ( "LOW" , "MODERATE" , "HIGH" ) ,
- na = c ( "DONT_KNOW" , "DONT_UNDERSTAND" )
- ) ,
- time_management = col_factor (
- levels = c ( "POORLY" , "FAIRLY_WELL" , "WELL" , "VERY_WELL" ) ,
- na = c ( "NA_VACATION" , "NA_OTHER" )
- )
- ) ,
- na = c ( "REFUSED" , "OMITTED" , "N/A" )
-)
-
…I can’t decide which interface I like better. Although the latter
-approach feels cleaner because it folds custom missing reasons into the
-cols
definition, one disadvantage is that it cannot
-overwrite missing values (e.g. I cannot set the missing reason on
-person_id
to NULL
as long as there’s a default
-missing reason specified). It also feels a little “hackish” to extend
-readr’s types in this way; I think making use of the na
-parameter in my own na_cols()
function provides me with a
-little bit more insulation to changes from readr.
-
Anyway, if you have thoughts or opinions on any of these things, I’d
-really appreciate your feedback !
-
diff --git a/authors.html b/authors.html
index b3041f2..aca55f5 100644
--- a/authors.html
+++ b/authors.html
@@ -10,7 +10,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -27,7 +27,7 @@
Articles
@@ -35,7 +35,13 @@
-
+
@@ -56,18 +62,18 @@ Authors
Citation
-
+
Source: DESCRIPTION
Husmann K (2024).
interlacer: Read Tabular Data With Interlaced Values And Missing Reasons .
-R package version 0.2.2, http://kylehusmann.com/interlacer/ .
+R package version 0.3.0, https://kylehusmann.com/interlacer, https://github.com/khusmann/interlacer .
@Manual{,
title = {interlacer: Read Tabular Data With Interlaced Values And Missing Reasons},
author = {Kyle Husmann},
year = {2024},
- note = {R package version 0.2.2},
- url = {http://kylehusmann.com/interlacer/},
+ note = {R package version 0.3.0, https://kylehusmann.com/interlacer},
+ url = {https://github.com/khusmann/interlacer},
}
On this page
diff --git a/index.html b/index.html
index 118472e..1924b73 100644
--- a/index.html
+++ b/index.html
@@ -33,7 +33,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -51,7 +51,7 @@
Articles
@@ -61,7 +61,14 @@
-
+
@@ -79,7 +86,7 @@
Unlike a regular column, however, the missing reasons are still available. This means you can still filter data frames on variables by specific missing reasons, or generate summary statistics with breakdowns by missing reason. In other words, you no longer have to constantly manually include / exclude missing reasons in computations by filtering them with awkward string comparisons or type conversions… everything just works!
In addition to the introduction in vignette("interlacer")
be sure to also check out:
@@ -102,7 +109,7 @@ Usage
To use interlacer, load it into your current R session:
+library ( interlacer , warn.conflicts = FALSE )
interlacer supports the following file formats with these read_interlaced_*()
functions, which extend the readr::read_*()
family of functions:
read_interlaced_csv()
@@ -134,8 +141,7 @@ Usage
read_csv (
interlacer_example ( "colors.csv" ) ,
- na = c ( "REFUSED" , "OMITTED" , "N/A" ) ,
- show_col_types = FALSE ,
+ na = c ( "REFUSED" , "OMITTED" , "N/A" )
)
#> # A tibble: 11 × 3
#> person_id age favorite_color
@@ -155,8 +161,7 @@ Usage
( ex <- read_interlaced_csv (
interlacer_example ( "colors.csv" ) ,
- na = c ( "REFUSED" , "OMITTED" , "N/A" ) ,
- show_col_types = FALSE ,
+ na = c ( "REFUSED" , "OMITTED" , "N/A" )
) )
#> # A tibble: 11 × 3
#> person_id age favorite_color
@@ -200,7 +205,7 @@ Usage
mean_age = mean ( age , na.rm = TRUE ) ,
n = n ( ) ,
.by = favorite_color
- ) %>%
+ ) |>
arrange ( favorite_color )
#> # A tibble: 6 × 3
#> favorite_color mean_age n
@@ -287,7 +292,15 @@ AcknowledgementsThe development of this software was supported, in whole or in part, by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A170047 to The Pennsylvania State University. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.
-
+
+
+
License
Full license
diff --git a/pkgdown.yml b/pkgdown.yml
index b6d698e..746bddc 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -3,10 +3,10 @@ pkgdown: 2.0.9
pkgdown_sha: ~
articles:
coded-data: coded-data.html
+ extended-column-types: extended-column-types.html
interlacer: interlacer.html
- na-column-types: na-column-types.html
other-approaches: other-approaches.html
-last_built: 2024-06-07T23:46Z
+last_built: 2024-06-17T22:30Z
urls:
reference: http://kylehusmann.com/interlacer/reference
article: http://kylehusmann.com/interlacer/articles
diff --git a/reference/across_value_channels.html b/reference/across_value_channels.html
new file mode 100644
index 0000000..4343d5d
--- /dev/null
+++ b/reference/across_value_channels.html
@@ -0,0 +1,150 @@
+
+Apply a function across the value or missing reason channels of multiple columns — across_value_channels • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
across_value_channels()
and across_na_channels()
are simple wrappers
+dplyr::across()
that applies transformations to value or missing reason
+channels, respectively.
+
+
+
+
Usage
+
across_value_channels ( .cols , .fns , .names = NULL , .unpack = FALSE )
+
+across_na_channels ( .cols , .fns , .names = NULL , .unpack = FALSE )
+
+
+
+
Arguments
+
.cols
+<tidy-select
> Columns to transform.
+You can't select grouping columns because they are already automatically
+handled by the verb (i.e. summarise()
or mutate()
).
+
+
+.fns
+Functions to apply to each of the selected columns.
+Possible values are:
A function, e.g. mean
.
+A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)
+A named list of functions or lambdas, e.g.
+list(mean = mean, n_miss = ~ sum(is.na(.x))
. Each function is applied
+to each column, and the output is named by combining the function name
+and the column name using the glue specification in .names
.
+Within these functions you can use cur_column()
and cur_group()
+to access the current column and grouping keys respectively.
+
+
+.names
+A glue specification that describes how to name the output
+columns. This can use {.col}
to stand for the selected column name, and
+{.fn}
to stand for the name of the function being applied. The default
+(NULL
) is equivalent to "{.col}"
for the single function case and
+"{.col}_{.fn}"
for the case where a list is used for .fns
.
+
+
+.unpack
+
+Optionally unpack data frames returned by functions in
+.fns
, which expands the df-columns out into individual columns, retaining
+the number of rows in the data frame.
If FALSE
, the default, no unpacking is done.
+If TRUE
, unpacking is done with a default glue specification of
+"{outer}_{inner}"
.
+Otherwise, a single glue specification can be supplied to describe how to
+name the unpacked columns. This can use {outer}
to refer to the name
+originally generated by .names
, and {inner}
to refer to the names of
+the data frame you are unpacking.
+
+
+
+
+
Value
+
+
+
like dplyr::across()
, across_value_channels()
and
+across_na_channels()
return a tibble with one column for each column in
+.cols
and each function in .fns
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/as.cfactor.html b/reference/as.cfactor.html
new file mode 100644
index 0000000..d13bbba
--- /dev/null
+++ b/reference/as.cfactor.html
@@ -0,0 +1,114 @@
+
+cfactor coercion — as.cfactor • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Add codes to a vector of labels
+
+
+
+
Usage
+
as.cfactor ( x , codes = NULL , ordered = is.ordered ( x ) )
+
+# S3 method for factor
+as.cfactor ( x , codes = NULL , ordered = is.ordered ( x ) )
+
+as.cordered ( x , codes = NULL )
+
+
+
+
Arguments
+
x
+a vector of values representing labels for factor levels
+
+
+codes
+named vector of unique codes that declares the mapping of labels
+to codes
+
+
+ordered
+logical flag to determine if the codes should be regarded as
+ordered (in the order given).
+
+
+
+
Value
+
+
+
a new cfactor
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/as.codes.html b/reference/as.codes.html
new file mode 100644
index 0000000..7792188
--- /dev/null
+++ b/reference/as.codes.html
@@ -0,0 +1,107 @@
+
+Convert a cfactor vector into a vector of its codes — as.codes • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
+
+
Usage
+
as.codes ( x , ... )
+
+# S3 method for interlacer_interlaced
+as.codes ( x , ... )
+
+# S3 method for interlacer_cfactor
+as.codes ( x , ... )
+
+
+
+
Arguments
+
x
+a cfactor()
+
+
+...
+additional arguments (not used)
+
+
+
+
Value
+
+
+
a vector of coded values
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/as.x_col_spec.html b/reference/as.x_col_spec.html
new file mode 100644
index 0000000..a73d587
--- /dev/null
+++ b/reference/as.x_col_spec.html
@@ -0,0 +1,108 @@
+
+Extended column specification coercions — as.x_col_spec • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Coerce an object into a column specification. This is used internally to
+parse the col_types
argument in the read_interlaced_*()
family of
+functions, so that it can accept a readr::cols()
specification or list()
.
+
+
+
+
+
+
Arguments
+
x
+a value to coerce into an extended column specification
+
+
+
+
Value
+
+
+
an extended column specification
+
+
+
Details
+
It is an S3 function so that other packages may use the col_types
argument
+with their own custom objects.
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/as.x_collector.html b/reference/as.x_collector.html
new file mode 100644
index 0000000..bc4411c
--- /dev/null
+++ b/reference/as.x_collector.html
@@ -0,0 +1,102 @@
+
+Collector shortcuts — as.na_collector • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
The as.*_collector
functions are used internally to enable shortcuts and
+defaults when specifying extended collectors. See
+vignette("extended-column-types")
for a full discussion.
+
+
+
+
Usage
+
as.na_collector ( x )
+
+as.value_collector ( x )
+
+as.x_collector ( x )
+
+
+
+
Arguments
+
x
+a value to convert into an extended collector, value collector,
+or missing reason collector.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/cfactor.html b/reference/cfactor.html
new file mode 100644
index 0000000..8ab524b
--- /dev/null
+++ b/reference/cfactor.html
@@ -0,0 +1,115 @@
+
+Coded factors — cfactor • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
+
+
Usage
+
cfactor ( x = unspecified ( ) , codes , ordered = FALSE )
+
+cordered ( x , codes )
+
+is.cfactor ( x )
+
+is.cordered ( x )
+
+
+
+
Arguments
+
x
+a vector of character or numeric codes
+
+
+codes
+named vector of unique codes that declares the mapping of labels
+to codes
+
+
+ordered
+logical flag to determine if the codes should be regarded as
+ordered (in the order given).
+
+
+
+
Value
+
+
+
a new cfactor
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/codes-set.html b/reference/codes-set.html
new file mode 100644
index 0000000..01ac725
--- /dev/null
+++ b/reference/codes-set.html
@@ -0,0 +1,91 @@
+
+Set the codes for a `cfactor`` — codes<- • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Set the codes for a cfactor
, similar to levels<-()
+
+
+
+
+
+
Arguments
+
value
+a named vector of codes for the cfactor
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/codes.html b/reference/codes.html
new file mode 100644
index 0000000..3d3b557
--- /dev/null
+++ b/reference/codes.html
@@ -0,0 +1,108 @@
+
+cfactor attributes — codes • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Return the levels or codes of a cfactor
+
+
+
+
Usage
+
codes ( x , ... )
+
+# S3 method for interlacer_cfactor
+levels ( x )
+
+
+
+
Arguments
+
x
+a cfactor
+
+
+...
+additional arguments (not used)
+
+
+
+
Value
+
+
+
levels()
returns the levels of the cfactor
(as a vector of
+character labels); codes()
returns a named vector representing the codes
+for the cfactor
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/flatten_channels.html b/reference/flatten_channels.html
index c7962dc..0d551a2 100644
--- a/reference/flatten_channels.html
+++ b/reference/flatten_channels.html
@@ -14,7 +14,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -31,7 +31,7 @@
Articles
@@ -39,7 +39,13 @@
-
+
@@ -47,7 +53,7 @@
diff --git a/reference/index.html b/reference/index.html
index bea6a81..875c705 100644
--- a/reference/index.html
+++ b/reference/index.html
@@ -10,7 +10,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -27,7 +27,7 @@
Articles
@@ -35,7 +35,13 @@
-
+
@@ -68,24 +74,50 @@ Reading and writing interlaced data
Interlace a deinterlaced data frame and write it to a file
- na_cols()
as.na_col_spec()
is.na_col_spec()
+ parse_interlaced()
- Create an NA column specification
+ Parse a character
vector into an interlaced
vector type
- na_spec()
+ interlacer_example()
+
+ Get a path to one of interlacer's example data sets
+
+
Extended column type specification
+
+
+
+
+
+
Tidy helpers for interlaced
types
+
+
Wrappers for tidyselect selectors and dplyr verbs that target value or missing reason channels
+
+
+
+
The cfactor
type
+
+
Functions for working with the cfactor
type
+
+
+
diff --git a/reference/interlaced.html b/reference/interlaced.html
index 6a2f485..95e8ec6 100644
--- a/reference/interlaced.html
+++ b/reference/interlaced.html
@@ -16,7 +16,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -33,7 +33,7 @@
Articles
@@ -41,7 +41,13 @@
-
+
@@ -49,7 +55,7 @@
diff --git a/reference/interlacer_example.html b/reference/interlacer_example.html
index b18e811..63d506d 100644
--- a/reference/interlacer_example.html
+++ b/reference/interlacer_example.html
@@ -12,7 +12,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -29,7 +29,7 @@
Articles
@@ -37,7 +37,13 @@
-
+
@@ -45,7 +51,7 @@
diff --git a/reference/is.empty.html b/reference/is.empty.html
index 6ff5ab5..0b1319c 100644
--- a/reference/is.empty.html
+++ b/reference/is.empty.html
@@ -1,7 +1,7 @@
NA missing reasons — is.empty • interlacer Test if a value is missing and lacks a missing reason — is.empty • interlacer Set the factor level attributes of interlaced vectors — levels<-.interlacer_interlaced • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Set the factor level attributes of interlaced
vectors
+
+
+
+
Usage
+
# S3 method for interlacer_interlaced
+levels ( x ) <- value
+
+na_levels ( x ) <- value
+
+
+
+
Arguments
+
value
+A vector of new levels to set
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/levels.interlacer_interlaced.html b/reference/levels.interlacer_interlaced.html
new file mode 100644
index 0000000..10241a4
--- /dev/null
+++ b/reference/levels.interlacer_interlaced.html
@@ -0,0 +1,109 @@
+
+Factor level attributes of interlaced vectors — levels.interlacer_interlaced • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
The base S3 levels()
function is overloaded for interlaced
vectors, so
+when the value channel is a factor type, levels()
will return its levels.
+Similarly na_levels()
will return the levels for the missing reason
+channel.
+
+
+
+
Usage
+
# S3 method for interlacer_interlaced
+levels ( x )
+
+na_levels ( x )
+
+
+
+
Arguments
+
x
+an interlaced
vector
+
+
+
+
Value
+
+
+
The levels of the values or missing reason channel
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/map_value_channel.html b/reference/map_value_channel.html
index f5ac7e1..ad55a79 100644
--- a/reference/map_value_channel.html
+++ b/reference/map_value_channel.html
@@ -1,7 +1,7 @@
interlaced functional utilities — map_value_channel • interlacer Apply a function to one of the channels of an interlaced vector — map_value_channel • interlacer Lift values to missing reasons — na • interlacer Interpret a value as a missing reason — na • interlacer
@@ -10,7 +10,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -27,7 +27,7 @@
Articles
@@ -35,15 +35,21 @@
-
+
diff --git a/reference/na_collectors.html b/reference/na_collectors.html
new file mode 100644
index 0000000..25a2f3c
--- /dev/null
+++ b/reference/na_collectors.html
@@ -0,0 +1,121 @@
+
+Missing reason collectors — na_collectors • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Missing reason collectors are used in extended column specifications to
+specify the type of a column's missing reason channel.
+
+
+
+
Usage
+
na_col_default ( )
+
+na_col_none ( )
+
+na_col_integer ( ... )
+
+na_col_factor ( ... )
+
+na_col_cfactor ( ... )
+
+
+
+
Arguments
+
...
+values to interpret as missing values. In the case of
+na_col_cfactor()
, arguments must be named.
+
+
+
+
Value
+
+
+
a new missing reason collector object
+
+
+
Details
+
na_col_default()
is used to signal that the missing reason type should
+inherit the specification provided in the na =
argument of the
+calling read_interlaced_*()
function
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/parse_interlaced.html b/reference/parse_interlaced.html
index e896eda..b7563fa 100644
--- a/reference/parse_interlaced.html
+++ b/reference/parse_interlaced.html
@@ -12,7 +12,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -29,7 +29,7 @@
Articles
@@ -37,7 +37,13 @@
-
+
@@ -45,7 +51,7 @@
@@ -56,7 +62,7 @@
Usage
-
parse_interlaced ( x , na , .value_col = col_guess ( ) )
+
diff --git a/reference/read_interlaced_delim.html b/reference/read_interlaced_delim.html
index f8244e7..a5d5d51 100644
--- a/reference/read_interlaced_delim.html
+++ b/reference/read_interlaced_delim.html
@@ -16,7 +16,7 @@
interlacer
-
0.2.2
+
0.3.0
@@ -33,7 +33,7 @@
Articles
@@ -41,7 +41,13 @@
-
+
@@ -49,7 +55,7 @@
@@ -73,7 +79,7 @@ Usage
col_select = NULL ,
id = NULL ,
locale = readr :: default_locale ( ) ,
- na = c ( "" , "NA" ) ,
+ na = na_col_none ( ) ,
comment = "" ,
trim_ws = FALSE ,
skip = 0 ,
@@ -92,7 +98,7 @@ Usage
col_select = NULL ,
id = NULL ,
locale = readr :: default_locale ( ) ,
- na = c ( "" , "NA" ) ,
+ na = na_col_none ( ) ,
quote = "\"" ,
comment = "" ,
trim_ws = TRUE ,
@@ -112,7 +118,7 @@ Usage
col_select = NULL ,
id = NULL ,
locale = readr :: default_locale ( ) ,
- na = c ( "" , "NA" ) ,
+ na = na_col_none ( ) ,
quote = "\"" ,
comment = "" ,
trim_ws = TRUE ,
@@ -132,7 +138,7 @@ Usage
col_select = NULL ,
id = NULL ,
locale = readr :: default_locale ( ) ,
- na = c ( "" , "NA" ) ,
+ na = na_col_none ( ) ,
quote = "\"" ,
comment = "" ,
trim_ws = TRUE ,
@@ -154,7 +160,7 @@ Usage
id = NULL ,
skip = 0 ,
n_max = Inf ,
- na = c ( "" , "NA" ) ,
+ na = na_col_none ( ) ,
quote = "\"" ,
comment = "" ,
skip_empty_rows = TRUE ,
@@ -274,7 +280,7 @@ Argumentsna_cols() or a character or numeric
+A NA col spec defined by na_cols()
or a character or numeric
vector of values to interpret as missing values.
diff --git a/reference/reexports.html b/reference/reexports.html
index 8cf04e1..1b93a28 100644
--- a/reference/reexports.html
+++ b/reference/reexports.html
@@ -6,10 +6,6 @@
as.factor, as.ordered
- readr
-as.col_spec, col_character, col_date, col_datetime, col_double, col_factor, col_guess, col_integer, col_logical, col_number, col_skip, col_time, cols, cols_condense, cols_only, spec
-
-
vctrs
vec_c
@@ -21,10 +17,6 @@
as.factor, as.ordered
- readr
-as.col_spec, col_character, col_date, col_datetime, col_double, col_factor, col_guess, col_integer, col_logical, col_number, col_skip, col_time, cols, cols_condense, cols_only, spec
-
-
vctrs
vec_c
@@ -40,7 +32,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -57,7 +49,7 @@
Articles
@@ -65,7 +57,13 @@
-
+
@@ -73,7 +71,7 @@
@@ -84,10 +82,6 @@
as.factor
, as.ordered
- readr
-as.col_spec
, col_character
, col_date
, col_datetime
, col_double
, col_factor
, col_guess
, col_integer
, col_logical
, col_number
, col_skip
, col_time
, cols
, cols_condense
, cols_only
, spec
-
-
vctrs
vec_c
diff --git a/reference/value_channel.html b/reference/value_channel.html
index 0cb3ea9..3023a7a 100644
--- a/reference/value_channel.html
+++ b/reference/value_channel.html
@@ -20,7 +20,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -37,7 +37,7 @@
Articles
@@ -45,7 +45,13 @@
-
+
@@ -53,7 +59,7 @@
diff --git a/reference/value_collectors.html b/reference/value_collectors.html
new file mode 100644
index 0000000..2e32a5b
--- /dev/null
+++ b/reference/value_collectors.html
@@ -0,0 +1,153 @@
+
+Value collectors — value_collectors • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Value collectors are used in extended column specifications to specify the
+value type of a column. They are think wrappers around readr's col_*()
+collector types.
+
+
+
+
Usage
+
v_col_guess ( )
+
+v_col_cfactor ( codes , ordered = FALSE )
+
+v_col_character ( )
+
+v_col_date ( format = "" )
+
+v_col_datetime ( format = "" )
+
+v_col_double ( )
+
+v_col_factor ( levels = NULL , ordered = FALSE )
+
+v_col_integer ( )
+
+v_col_big_integer ( )
+
+v_col_logical ( )
+
+v_col_number ( )
+
+v_col_skip ( )
+
+v_col_time ( format = "" )
+
+
+
+
Arguments
+
codes
+A named vector of unique codes that declares the mapping of labels
+to codes.
+
+
+ordered
+Is it an ordered factor?
+
+
+format
+A format specification, as described in readr::col_datetime()
+
+
+levels
+Character vector of the allowed levels. When levels = NULL
+(the default), levels are discovered from the unique values of x, in the
+order in which they appear in x.
+
+
+
+
Value
+
+
+
a new value collector object
+
+
+
Details
+
In addition to all of the column types supported by readr, interlacer
+additionally can load cfactor()
types via v_col_cfactor()
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/where_value_channel.html b/reference/where_value_channel.html
new file mode 100644
index 0000000..56c8b78
--- /dev/null
+++ b/reference/where_value_channel.html
@@ -0,0 +1,105 @@
+
+Select variables with a function applied on value or missing reason channels — where_value_channel • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
where_value_channel()
and where_na_channel()
are simple wrappers for
+tidyselect::where()
that apply the selection function to the value or
+missing reason channel of columns, respectively.
+
+
+
+
Usage
+
where_value_channel ( fn )
+
+where_na_channel ( fn )
+
+
+
+
Arguments
+
fn
+A function that returns TRUE
or FALSE
(technically, a
+predicate function). Can also be a purrr-like formula.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/write_interlaced_delim.html b/reference/write_interlaced_delim.html
index ff1362d..8407811 100644
--- a/reference/write_interlaced_delim.html
+++ b/reference/write_interlaced_delim.html
@@ -16,7 +16,7 @@
interlacer
- 0.2.2
+ 0.3.0
@@ -33,7 +33,7 @@
Articles
@@ -41,7 +41,13 @@
-
+
@@ -49,7 +55,7 @@
diff --git a/reference/x_col.html b/reference/x_col.html
new file mode 100644
index 0000000..2ccab8a
--- /dev/null
+++ b/reference/x_col.html
@@ -0,0 +1,113 @@
+
+Construct an extended collector for an extended column specification — x_col • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Extended collectors are used in x_cols()
column specifications to indicate
+which value and missing reason channel types should be used when loading
+data with read_interlaced_*()
.
+
+
+
+
+
+
Arguments
+
value_collector
+a value collector
+
+
+na_collector
+a missing reason collector
+
+
+
+
Value
+
+
+
a new extended collector object
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/x_cols.html b/reference/x_cols.html
new file mode 100644
index 0000000..5f693f5
--- /dev/null
+++ b/reference/x_cols.html
@@ -0,0 +1,122 @@
+
+Construct an extended column specification — x_cols • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
Extended column specifications are used in the read_interlaced_*()
family
+of functions in the col_types
argument to specify the value and missing
+reason channel types.
+
+
+
+
Usage
+
x_cols ( ... , .default = v_col_guess ( ) )
+
+x_cols_only ( ... )
+
+
+
+
Arguments
+
...
+a named argument list of extended collectors or value collectors.
+
+
+.default
+a default value collector
+
+
+
+
Value
+
+
+
a new extended column specification
+
+
+
Details
+
Like readr::cols()
, x_cols()
includes all the columns in the input data,
+guessing the column types as the default, and creating missing reason
+channels according to the na =
argument in the read function.
+x_cols_only()
only includes the columns you specify, like
+readr::cols_only()
. In general, you can substitute list()
for x_cols()
+without changing the behavior.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/x_spec.html b/reference/x_spec.html
new file mode 100644
index 0000000..5d65a64
--- /dev/null
+++ b/reference/x_spec.html
@@ -0,0 +1,104 @@
+
+Examine the extended column specification for a data frame — x_spec • interlacer
+ Skip to contents
+
+
+
+
+
+
+
+
+
x_spec()
extracts the full extended column specification from a tibble
+created with the read_interlaced_*()
family of functions.
+
+
+
+
+
+
Arguments
+
x
+a data frame loaded by read_interlaced_*()
+
+
+
+
Value
+
+
+
An extended column specification object
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/search.json b/search.json
index 90bcf38..3960250 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-negative-missing-reasons-spss","dir":"Articles","previous_headings":"","what":"Numeric codes with negative missing reasons (SPSS)","title":"Coded Data","text":"’s extremely common find data sources encode categorical responses numeric values, negative values representing missing reason codes. SPSS one example. ’s SPSS-formatted version colors.csv example: missing reasons : -99: N/-98: REFUSED -97: OMITTED colors coded: 1: BLUE 2: RED 3: YELLOW format gives ability load everything numeric type: test value missing code, can check ’s less 0: downsides approach twofold: 1) values missing reasons become codes remember 2) ’s really easy make mistakes. sort mistakes? Well, everything numeric, ’s nothing stopping us treating missing reason codes regular values… forget remove missing reason codes, R still happily compute aggregations using negative numbers! fact, math without filtering missing codes potentially ruins integrity data: ever thought significant result, find ’s stray missing reason codes still interlaced values? ’s bad time. ’re much better loading formats interlacer factors, converting codes labels: Now aggregations won’t mix values missing codes, won’t keep cross-referencing codebook know values mean: operations work similar ease:","code":"library(readr) library(interlacer, warn.conflicts = FALSE) read_file( interlacer_example(\"colors_coded.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,-98,1 #> 3,21,-98 #> 4,30,-97 #> 5,1,-99 #> 6,41,2 #> 7,50,-97 #> 8,30,3 #> 9,-98,-98 #> 10,-97,2 #> 11,10,-98 (df_coded <- read_csv( interlacer_example(\"colors_coded.csv\"), col_types = \"n\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 -98 1 #> 3 3 21 -98 #> 4 4 30 -97 #> 5 5 1 -99 #> 6 6 41 2 #> 7 7 50 -97 #> 8 8 30 3 #> 9 9 -98 -98 #> 10 10 -97 2 #> 11 11 10 -98 library(dplyr, warn.conflicts = FALSE) df_coded |> mutate( age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 -99 1 1 #> 2 -98 15.5 3 #> 3 -97 40 2 #> 4 1 20 2 #> 5 2 41 2 #> 6 3 30 1 df_coded |> mutate( # age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 -99 1 1 #> 2 -98 -22.3 3 #> 3 -97 40 2 #> 4 1 -39 2 #> 5 2 -28 2 #> 6 3 30 1 # This will add 1 to the age values, but ALSO add one to all of the missing # reason codes, resulting in corrupted data! df_coded |> mutate( age_next_year = age + 1, ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 1 21 #> 2 2 -98 1 -97 #> 3 3 21 -98 22 #> 4 4 30 -97 31 #> 5 5 1 -99 2 #> 6 6 41 2 42 #> 7 7 50 -97 51 #> 8 8 30 3 31 #> 9 9 -98 -98 -97 #> 10 10 -97 2 -96 #> 11 11 10 -98 11 # This will give you your intended result, but it's easy to forget df_coded |> mutate( age_next_year = if_else(age < 0, age, age + 1), ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 1 21 #> 2 2 -98 1 -98 #> 3 3 21 -98 22 #> 4 4 30 -97 31 #> 5 5 1 -99 2 #> 6 6 41 2 42 #> 7 7 50 -97 51 #> 8 8 30 3 31 #> 9 9 -98 -98 -98 #> 10 10 -97 2 -97 #> 11 11 10 -98 11 (df_decoded <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = c(-99, -98, -97), show_col_types = FALSE, ) |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) factor( v, levels = c(-99, -98, -97), labels = c(\"N/A\", \"REFUSED\", \"OMITTED\"), ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) factor( v, levels = c(1, 2, 3), labels = c(\"BLUE\", \"RED\", \"YELLOW\") ) ), )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df_decoded |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 1 1 #> 5 15.5 3 #> 6 40 2 df_decoded |> mutate( age_next_year = age + 1, ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 BLUE 21 #> 2 2 BLUE NA #> 3 3 21 22 #> 4 4 30 31 #> 5 5 1 2 #> 6 6 41 RED 42 #> 7 7 50 51 #> 8 8 30 YELLOW 31 #> 9 9 NA #> 10 10 RED NA #> 11 11 10 11"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-character-missing-reasons-sas-stata","dir":"Articles","previous_headings":"","what":"Numeric codes with character missing reasons (SAS, Stata)","title":"Coded Data","text":"Like SPSS, SAS Stata encode factor levels numeric values, instead representing missing reasons negative codes, given character codes: , value codes used previous example, except missing reasons coded follows: \".\": N/\".\": REFUSED \".b\": OMITTED handle missing reasons without interlacer, columns must loaded character vectors: test value missing, can cast numeric types. cast fails, know ’s missing code. successful, know ’s coded value. Although character missing codes help prevent us mistakenly including missing codes value aggregations, cast columns numeric time check missingness hardly ergonomic, generates annoying warnings. Like , ’s easier import interlacer decode values missing reasons:","code":"read_file( interlacer_example(\"colors_coded_char.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,.a,1 #> 3,21,.a #> 4,30,.b #> 5,1,. #> 6,41,2 #> 7,50,.b #> 8,30,3 #> 9,.a,.a #> 10,.b,2 #> 11,10,.a (df_coded_char <- read_csv( interlacer_example(\"colors_coded_char.csv\"), col_types = \"c\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 .a 1 #> 3 3 21 .a #> 4 4 30 .b #> 5 5 1 . #> 6 6 41 2 #> 7 7 50 .b #> 8 8 30 3 #> 9 9 .a .a #> 10 10 .b 2 #> 11 11 10 .a df_coded_char |> mutate( age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> Warning: There were 2 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA)`. #> Caused by warning in `is_logical()`: #> ! NAs introduced by coercion #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning. #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 . 1 1 #> 2 .a 15.5 3 #> 3 .b 40 2 #> 4 1 20 2 #> 5 2 41 2 #> 6 3 30 1 read_interlaced_csv( interlacer_example(\"colors_coded_char.csv\"), na = c(\".\", \".a\", \".b\"), show_col_types = FALSE, ) |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) factor( v, levels = c(\".\", \".a\", \".b\"), labels = c(\"N/A\", \"REFUSED\", \"OMITTED\") ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) factor( v, levels = c(1, 2, 3), labels = c(\"BLUE\", \"RED\", \"YELLOW\") ) ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"encoding-a-decoded-deinterlaced-data-frame-","dir":"Articles","previous_headings":"","what":"Encoding a decoded & deinterlaced data frame.","title":"Coded Data","text":"Re-coding re-interlacing data frame can done follows:","code":"library(forcats) df_decoded |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) fct_recode(v, `-99` = \"N/A\", `-98` = \"REFUSED\", `-97` = \"OMITTED\" ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) fct_recode( v, `1` = \"BLUE\", `2` = \"RED\", `3` = \"YELLOW\" ) ) ) |> write_interlaced_csv(\"output.csv\")"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"haven","dir":"Articles","previous_headings":"","what":"haven","title":"Coded Data","text":"haven package functions loading native SPSS, SAS, Stata native file formats special data frames use column attributes special values keep track interlaced values missing reasons. complete discussion compares interlacer’s approach, see vignette(\"-approaches\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"aggregations-with-missing-reasons","dir":"Articles","previous_headings":"","what":"Aggregations with missing reasons","title":"Introduction to interlacer","text":"Now, interested values source data, functionality need. wanted know values NA? Although information encoded source data, lost missing reasons converted NA values. example, consider favorite_color column. many respondents REFUSED give favorite color? many people just OMITTED answer? question N/respondents (e.g. wasn’t survey form)? mean respondent age groups? current dataframe gets us part way: can see, converted missing reasons single NA, can answer questions missingness general, rather work specific reasons stored source data. Unfortunately, try load data missing reasons intact, lose something else: type information values. Now access missing reasons, columns character vectors. means order anything values, always filter missing reasons, cast remaining values desired type: gives us information want, cumbersome. Notice ’s distinction favorite color values missing reasons! Things start get really complex different columns different sets possible missing reasons. means lot type conversion gymnastics switch value types missing types.","code":"library(dplyr, warn.conflicts = FALSE) df_simple |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 4 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 NA 22.4 6 (df_with_missing <- read_csv( interlacer_example(\"colors.csv\"), col_types = cols(.default = \"c\"), show_col_types = FALSE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED reasons <- c(\"REFUSED\", \"OMITTED\", \"N/A\") df_with_missing |> mutate( age_values = as.numeric(if_else(age %in% reasons, NA, age)), ) |> summarize( mean_age = mean(age_values, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 N/A 1 1 #> 3 OMITTED 40 2 #> 4 RED 41 2 #> 5 REFUSED 15.5 3 #> 6 YELLOW 30 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"the-interlacer-approach","dir":"Articles","previous_headings":"Aggregations with missing reasons","what":"The interlacer approach","title":"Introduction to interlacer","text":"interlacer built based insight everything becomes much tidy, simple, expressive explicitly work values missing reasons separate channels variable. interlacer introduces new interlaced column type facilitates . read_interlaced_* functions interlacer import data new column type. can see column headers, column loaded composed two channels: value channel, missing reason channel. channel can type. age column, example, double values factor missing reasons: channels can explicitly accessed using value_channel() na_channel() helper functions: helpers rarely needed, however, computations automatically operate interlaced column’s value channel, ignore missing reasons channel. following compute mean age, without missing reasons interfering: (equivalently used value_channel() helper achieve result, albeit verbosity): Although missing reasons excluded computations, still treated unique values. means group age get breakdown unique missing reasons, rather lumped single NA: can see, can generate report , without needing type gymnastics! Also, values neatly distinguished missing reasons.","code":"(df <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), show_col_types = FALSE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df$age #> [11]> #> [1] 20 21 30 1 41 50 #> [8] 30 10 #> NA levels: REFUSED OMITTED N/A value_channel(df$age) #> [1] 20 NA 21 30 1 41 50 30 NA NA 10 na_channel(df$age) #> [1] REFUSED REFUSED #> [10] OMITTED #> Levels: REFUSED OMITTED N/A mean(df$age, na.rm = TRUE) #> [1] 25.375 mean(value_channel(df$age), na.rm = TRUE) #> [1] 25.375 df |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 15.5 3 #> 5 40 2 #> 6 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"filtering-based-on-missing-reasons","dir":"Articles","previous_headings":"","what":"Filtering based on missing reasons","title":"Introduction to interlacer","text":"interlaced columns also helpful creating samples inclusion / exclusion criteria based missing reasons. example, using example data, say wanted create sample respondents REFUSED give age. indicate value interpreted missing reason, can use na() function value: people REFUSED report age favorite color? ’s also possible combine value conditions missing reason conditions. example, select everyone REFUSED give favorite color, 20 years old:","code":"df |> filter(age == na(\"REFUSED\")) #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 # na_channel() can also be used to get an equivalent result: df |> filter(na_channel(age) == \"REFUSED\") #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 df |> filter(age == na(\"REFUSED\") & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 9 df |> filter(age > 20 & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 3 21 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"mutations","dir":"Articles","previous_headings":"","what":"Mutations","title":"Introduction to interlacer","text":"might expect, na() function can used values mutations. following pipeline replace favorite color respondents missing value \"REDACTED\" Conditionals also work exactly expect mutations. following replace favorite color respondents age < 18 missing reason \"REDACTED_UNDERAGE\". Respondents missing age replaced \"REDACTED_MISSING_AGE\" following mutation create new column called person_type \"CHILD\" age < 18, \"ADULT\" age >= 18, missing reason \"AGE_UNAVAILABLE\" age missing: Important note: must use dplyr::if_else() interlaced vectors instead R’s base::ifelse() function, base function strips missing reason channel due fundamental limitation base R.","code":"df |> mutate( favorite_color = na(\"REDACTED\") ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> ??,fct> #> 1 1 20 #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 #> 7 7 50 #> 8 8 30 #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, missing = na(\"REDACTED_MISSING_AGE\") ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( person_type = if_else( age < 18, \"CHILD\", \"ADULT\", missing = na(\"AGE_UNAVAILABLE\") ), ) #> # A tibble: 11 × 4 #> person_id age favorite_color person_type #> #> 1 1 20 BLUE ADULT #> 2 2 BLUE #> 3 3 21 ADULT #> 4 4 30 ADULT #> 5 5 1 CHILD #> 6 6 41 RED ADULT #> 7 7 50 ADULT #> 8 8 30 YELLOW ADULT #> 9 9 #> 10 10 RED #> 11 11 10 CHILD"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"empty-cells-na-missing-reasons","dir":"Articles","previous_headings":"","what":"Empty cells (NA missing reasons)","title":"Introduction to interlacer","text":"cell column missing value missing reason, cell considered “empty”. values can occur missing reasons specified. example, include missing = argument second example previous section, get following result: Empty values can detected using .empty() function: Raw NA values also considered “empty”: Empty values often occur result joins, dplyr::*_join() family functions missing = parameter, like dplyr::if_else() . example, say following data frame wanted join sample: ’re missing condition information respondents, show empty values join data frame sample: can remedy replacing empty values join:","code":"df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 <> #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 <> #> 10 10 <> #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) |> filter(is.empty(favorite_color)) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 2 <> #> 2 9 <> #> 3 10 <> # regular values are neither missing nor empty is.na(42) #> [1] FALSE is.empty(42) #> [1] FALSE # na(\"REASON\") is a missing value, but is not an empty value is.na(na(\"REASON\")) #> [1] TRUE is.empty(na(\"REASON\")) #> [1] FALSE # na(NA) values are considered missing and empty is.na(na(NA)) #> [1] TRUE is.empty(na(NA)) #> [1] TRUE # regular NA values are also missing and empty is.na(NA) #> [1] TRUE is.empty(NA) #> [1] TRUE conditions <- tribble( ~person_id, ~condition, 1, \"TREATMENT\", 2, \"CONTROL\", 3, na(\"TECHNICAL_ERROR\"), 6, \"CONTROL\", 8, \"TREATMENT\", ) df |> left_join(conditions, by = join_by(person_id)) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 <> #> 5 5 1 <> #> 6 6 41 RED CONTROL #> 7 7 50 <> #> 8 8 30 YELLOW TREATMENT #> 9 9 <> #> 10 10 RED <> #> 11 11 10 <> df |> left_join(conditions, by = join_by(person_id)) |> mutate( condition = if_else(is.empty(condition), na(\"LEFT_STUDY\"), condition), ) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED CONTROL #> 7 7 50 #> 8 8 30 YELLOW TREATMENT #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"writing-interlaced-files","dir":"Articles","previous_headings":"","what":"Writing interlaced files","title":"Introduction to interlacer","text":"’ve made made changes data, probably want save . interlacer provides write_interlaced_* family functions : combine value missing reasons interlaced character columns, write result csv. Alternatively, want re-interlace columns without writing file control writing process, can use flatten_channels(): value missing reason channels data frames interlaced vectors can similarly accessed using value_channel() na_channel() helper functions:","code":"write_interlaced_csv(df, \"interlaced_output.csv\") flatten_channels(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED # (it works on single vectors as well) flatten_channels(df$age) #> [1] \"20\" \"REFUSED\" \"21\" \"30\" \"1\" \"41\" \"50\" #> [8] \"30\" \"REFUSED\" \"OMITTED\" \"10\" value_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 NA #> 4 4 30 NA #> 5 5 1 NA #> 6 6 41 RED #> 7 7 50 NA #> 8 8 30 YELLOW #> 9 9 NA NA #> 10 10 NA RED #> 11 11 10 NA na_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 NA NA NA #> 2 NA REFUSED NA #> 3 NA NA REFUSED #> 4 NA NA OMITTED #> 5 NA NA N/A #> 6 NA NA NA #> 7 NA NA OMITTED #> 8 NA NA NA #> 9 NA REFUSED REFUSED #> 10 NA OMITTED NA #> 11 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Introduction to interlacer","text":"far, ’ve covered interlacer’s read_interlaced_* family functions enabled us load interlaced columns contain separate challens value missing reasons. interlaced type enables us create tidy type-aware pipelines can flexibly consider variable’s value missing reasons. examples vignette, column types automatically detected. explicitly specify value missing column types, (specify individual missing reasons specific columns), interlacer extends readr’s collector() system. covered next vignette, vignette(\"na-column-types\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/na-column-types.html","id":"na-collector-types","dir":"Articles","previous_headings":"","what":"NA collector types","title":"NA Column Types","text":"addition standard readr::col_* column specification types, interlacer provides ability specify missing reasons column level, using na parameter. useful missing reasons apply particular items opposed file whole. example, say measure following two items: current stress level? Low Moderate High don’t know don’t understand question well feel manage time responsibilities today? Poorly Fairly well Well well apply (Today vacation day) apply (reason) can see, items two selection choices mapped missing reasons. can specified na_cols() function, works similarly readr’s cols() function: Setting na type NULL indicates column loaded regular type instead interlaced one. following load person_id regular, non-interlaced type:","code":"(df_stress <- read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = col_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\") ), time_management = col_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\") ) ), na = na_cols( .default = c(\"REFUSED\", \"OMITTED\", \"N/A\"), current_stress = c(.default, \"DONT_KNOW\", \"DONT_UNDERSTAND\"), time_management = c(.default, \"NA_VACATION\", \"NA_OTHER\") ) )) #> # A tibble: 8 × 3 #> person_id current_stress time_management #>