diff --git a/404.html b/404.html index 18fc697..7e95c80 100644 --- a/404.html +++ b/404.html @@ -31,7 +31,7 @@ interlacer - 0.2.2 + 0.3.0 + + + + + +
+ + + + +
+
+ + + +

Like the readr::read_*() family of functions, +read_interlaced_*() will automatically guess column types +by default:

+
+library(interlacer, warn.conflicts = FALSE)
+
+(read_interlaced_csv(
+  interlacer_example("colors.csv"),
+  na = c("REFUSED", "OMITTED", "N/A")
+))
+#> # A tibble: 11 × 3
+#>    person_id       age favorite_color
+#>    <dbl,fct> <dbl,fct> <chr,fct>     
+#>  1         1        20 BLUE          
+#>  2         2 <REFUSED> BLUE          
+#>  3         3        21 <REFUSED>     
+#>  4         4        30 <OMITTED>     
+#>  5         5         1 <N/A>         
+#>  6         6        41 RED           
+#>  7         7        50 <OMITTED>     
+#>  8         8        30 YELLOW        
+#>  9         9 <REFUSED> <REFUSED>     
+#> 10        10 <OMITTED> RED           
+#> 11        11        10 <REFUSED>
+

As with readr, these column type guess can be overridden using the +col_types parameter with a readr::cols() +column specification:

+
+library(readr)
+
+(read_interlaced_csv(
+  interlacer_example("colors.csv"),
+  col_types = cols(
+    person_id = col_integer(),
+    age = col_number(),
+    favorite_color = col_factor(levels = c("BLUE", "RED", "YELLOW", "GREEN"))
+  ),
+  na = c("REFUSED", "OMITTED", "N/A")
+))
+#> # A tibble: 11 × 3
+#>    person_id       age favorite_color
+#>    <int,fct> <dbl,fct> <fct,fct>     
+#>  1         1        20 BLUE          
+#>  2         2 <REFUSED> BLUE          
+#>  3         3        21 <REFUSED>     
+#>  4         4        30 <OMITTED>     
+#>  5         5         1 <N/A>         
+#>  6         6        41 RED           
+#>  7         7        50 <OMITTED>     
+#>  8         8        30 YELLOW        
+#>  9         9 <REFUSED> <REFUSED>     
+#> 10        10 <OMITTED> RED           
+#> 11        11        10 <REFUSED>
+
+

+x_cols: extended cols specifications +

+

When you need more fine-grained control over value and missing reason +channel types, you can use an x_cols() specification, an +extension of readr’s readr::cols() system. With +x_cols(), you can control both the value channel and na +channel types in the columns of the resulting data frame.

+

This is useful when you have missing reasons that only apply to +particular items as opposed to the file as a whole. For example, say we +had a measure with the following two items:

+
    +
  1. What is your current stress level?
  2. +
+
+
    +
  1. Low
  2. +
  3. Moderate
  4. +
  5. High
  6. +
  7. I don’t know
  8. +
  9. I don’t understand the question
  10. +
+
+
    +
  1. How well do you feel you manage your time and responsibilities +today?
  2. +
+
+
    +
  1. Poorly
  2. +
  3. Fairly well
  4. +
  5. Well
  6. +
  7. Very well
  8. +
  9. Does not apply (Today was a vacation day)
  10. +
  11. Does not apply (Other reason)
  12. +
+
+

As you can see, both items have two selection choices that should be +mapped to missing reasons. These can be specified with the +x_cols() as follows:

+
+(df_stress <- read_interlaced_csv(
+  interlacer_example("stress.csv"),
+  col_types = x_cols(
+    person_id = x_col(
+      v_col_integer(),
+      na_col_none()
+    ),
+    current_stress = x_col(
+      v_col_factor(levels = c("LOW", "MODERATE", "HIGH")),
+      na_col_factor("DONT_KNOW", "DONT_UNDERSTAND")
+    ),
+    time_management = x_col(
+      v_col_factor(levels = c("POORLY", "FAIRLY_WELL", "WELL", "VERY_WELL")),
+      na_col_factor("NA_VACATION", "NA_OTHER")
+    )
+  )
+))
+#> # A tibble: 8 × 3
+#>   person_id current_stress    time_management
+#>       <int> <fct,fct>         <fct,fct>      
+#> 1         1 LOW               VERY_WELL      
+#> 2         2 MODERATE          POORLY         
+#> 3         3 <DONT_KNOW>       <NA_OTHER>     
+#> 4         4 HIGH              POORLY         
+#> 5         5 <DONT_UNDERSTAND> <NA_OTHER>     
+#> 6         6 LOW               <NA_VACATION>  
+#> 7         7 MODERATE          WELL           
+#> 8         8 <DONT_KNOW>       FAIRLY_WELL
+

Like readr’s readr::cols() function, each named +x_cols() describes a column in the resulting data frame. +Value and missing reason channel types are declared via calls to +v_col_*() and na_col_*() respectively, which +are assembled by x_col().

+

v_col_*() types mirror readr’s readr::col_* +column “collectors”. So v_col_double() is equivalent to +readr::col_double(), v_col_character() is +equivalent to readr::col_character(), etc. See vroom’s +documentation for a list of +available column types.

+

na_col_*() collectors allow you to declare missing +reason channel type of the loaded column, and the values that should be +interpreted as missing reasons. Currently, there are five options:

+
    +
  1. na_col_default(): Use the collector defined by the +na = argument in the read_interlaced_*() +function

  2. +
  3. na_col_none(): Load the column without a missing +reason channel.

  4. +
  5. na_col_factor(): Use a factor missing reason +channel. Character arguments passed form the levels of the factor. +(e.g. na_col_factor("REFUSED", "OMITTED", "N/A"))

  6. +
  7. na_col_integer(): Use an integer na channel. Numeric +arguments passed are the values to be interpreted as missing values. +(e.g. na_col_integer(-99, -98, -97)))

  8. +
  9. na_col_cfactor(): Use a cfactor na +channel. (cfactor types will be covered in the next +vignette,vignette("coded-data"))

  10. +
+

The following example shows some of these collectors in action. In +this example we use a coded version of the colors.csv +example data, to demonstrate integer missing reason types:

+
+read_interlaced_csv(
+  interlacer_example("colors_coded.csv"),
+  col_types = x_cols(
+    person_id = x_col(v_col_integer(), na_col_none()),
+    age = x_col(v_col_double(), na_col_integer(-99, -98, -97)),
+    favorite_color = x_col(v_col_integer(), na_col_integer(-99, -98, -97))
+  )
+)
+#> # A tibble: 11 × 3
+#>    person_id       age favorite_color
+#>        <int> <dbl,int>      <int,int>
+#>  1         1        20              1
+#>  2         2     <-98>              1
+#>  3         3        21          <-98>
+#>  4         4        30          <-97>
+#>  5         5         1          <-99>
+#>  6         6        41              2
+#>  7         7        50          <-97>
+#>  8         8        30              3
+#>  9         9     <-98>          <-98>
+#> 10        10     <-97>              2
+#> 11        11        10          <-98>
+
+
+

Shortcuts +

+
+

Default collector types +

+

Like readr’s cols() function, the x_cols() +function accepts a .default argument that specifies a +default value collector. The na = argument is similarly +used to specify a default missing reason collector to be used when no +na_col_*() is specified, or when it is set to +na_col_default().

+

By taking advantage of these defaults, the specification in the last +example could have been equivalently written as:

+
+read_interlaced_csv(
+  interlacer_example("colors_coded.csv"),
+  col_types = x_cols(
+    .default = v_col_integer(),
+    person_id = x_col(v_col_integer(), na_col_none()),
+    age = v_col_double(),
+  ),
+  na = na_col_integer(-99, -98, -97)
+)
+#> # A tibble: 11 × 3
+#>    person_id       age favorite_color
+#>        <int> <dbl,int>      <int,int>
+#>  1         1        20              1
+#>  2         2     <-98>              1
+#>  3         3        21          <-98>
+#>  4         4        30          <-97>
+#>  5         5         1          <-99>
+#>  6         6        41              2
+#>  7         7        50          <-97>
+#>  8         8        30              3
+#>  9         9     <-98>          <-98>
+#> 10        10     <-97>              2
+#> 11        11        10          <-98>
+
+
+

Concise value and missing reason specifications +

+

Like readr, value collectors can be specified using characters. For +example, instead of v_col_integer(), you can use +"i". See vroom’s documentation for a complete +list of these shortcuts.

+

Similarly, missing reason collectors can be specified by providing a +vector of the missing values; the collector type is inferred via the +type of the vector. The conversions are as follows:

+ +

Using these shortcuts, the previous example could have equivalently +been written in more compact form as follows:

+
+read_interlaced_csv(
+  interlacer_example("colors_coded.csv"),
+  col_types = x_cols(
+    .default = "i",
+    person_id = x_col("i", NULL),
+    age = "d",
+  ),
+  na = c(-99, -98, -97)
+)
+#> # A tibble: 11 × 3
+#>    person_id       age favorite_color
+#>        <int> <dbl,int>      <int,int>
+#>  1         1        20              1
+#>  2         2     <-98>              1
+#>  3         3        21          <-98>
+#>  4         4        30          <-97>
+#>  5         5         1          <-99>
+#>  6         6        41              2
+#>  7         7        50          <-97>
+#>  8         8        30              3
+#>  9         9     <-98>          <-98>
+#> 10        10     <-97>              2
+#> 11        11        10          <-98>
+
+
+
+

Next steps +

+

In this vignette we covered how the column types for values and +missing reasons can be explicitly specified using collectors. We also +illustrated how column-level missing values can be specified by creating +extended column type specifications using x_cols().

+

In the final examples, we used an example data set with coded values +and missing reasons. Coded values are especially common in data sets +produced by SPSS, SAS, and Stata. interlacer provides a special column +type to make working with this sort of data easier: the +cfactor type. This will be covered in next vignette, +vignette("coded-data").

+
+
+
+ + + +
+ + + +
+
+ + + + + + + diff --git a/articles/index.html b/articles/index.html index 8d4f2e7..7402f52 100644 --- a/articles/index.html +++ b/articles/index.html @@ -10,7 +10,7 @@ interlacer - 0.2.2 + 0.3.0 + + + + + +
+
+
+ +
+

across_value_channels() and across_na_channels() are simple wrappers +dplyr::across() that applies transformations to value or missing reason +channels, respectively.

+
+ +
+

Usage

+
across_value_channels(.cols, .fns, .names = NULL, .unpack = FALSE)
+
+across_na_channels(.cols, .fns, .names = NULL, .unpack = FALSE)
+
+ +
+

Arguments

+
.cols
+

<tidy-select> Columns to transform. +You can't select grouping columns because they are already automatically +handled by the verb (i.e. summarise() or mutate()).

+ + +
.fns
+

Functions to apply to each of the selected columns. +Possible values are:

  • A function, e.g. mean.

  • +
  • A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)

  • +
  • A named list of functions or lambdas, e.g. +list(mean = mean, n_miss = ~ sum(is.na(.x)). Each function is applied +to each column, and the output is named by combining the function name +and the column name using the glue specification in .names.

  • +

Within these functions you can use cur_column() and cur_group() +to access the current column and grouping keys respectively.

+ + +
.names
+

A glue specification that describes how to name the output +columns. This can use {.col} to stand for the selected column name, and +{.fn} to stand for the name of the function being applied. The default +(NULL) is equivalent to "{.col}" for the single function case and +"{.col}_{.fn}" for the case where a list is used for .fns.

+ + +
.unpack
+

[Experimental]

+

Optionally unpack data frames returned by functions in +.fns, which expands the df-columns out into individual columns, retaining +the number of rows in the data frame.

  • If FALSE, the default, no unpacking is done.

  • +
  • If TRUE, unpacking is done with a default glue specification of +"{outer}_{inner}".

  • +
  • Otherwise, a single glue specification can be supplied to describe how to +name the unpacked columns. This can use {outer} to refer to the name +originally generated by .names, and {inner} to refer to the names of +the data frame you are unpacking.

  • +
+ +
+
+

Value

+ + +

like dplyr::across(), across_value_channels() and +across_na_channels() return a tibble with one column for each column in +.cols and each function in .fns

+ + +
+
+

See also

+

Other interlaced tidy helpers: +where_value_channel()

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/as.cfactor.html b/reference/as.cfactor.html new file mode 100644 index 0000000..d13bbba --- /dev/null +++ b/reference/as.cfactor.html @@ -0,0 +1,114 @@ + +cfactor coercion — as.cfactor • interlacer + Skip to contents + + +
+
+
+ +
+

Add codes to a vector of labels

+
+ +
+

Usage

+
as.cfactor(x, codes = NULL, ordered = is.ordered(x))
+
+# S3 method for factor
+as.cfactor(x, codes = NULL, ordered = is.ordered(x))
+
+as.cordered(x, codes = NULL)
+
+ +
+

Arguments

+
x
+

a vector of values representing labels for factor levels

+ + +
codes
+

named vector of unique codes that declares the mapping of labels +to codes

+ + +
ordered
+

logical flag to determine if the codes should be regarded as +ordered (in the order given).

+ +
+
+

Value

+ + +

a new cfactor

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/as.codes.html b/reference/as.codes.html new file mode 100644 index 0000000..7792188 --- /dev/null +++ b/reference/as.codes.html @@ -0,0 +1,107 @@ + +Convert a cfactor vector into a vector of its codes — as.codes • interlacer + Skip to contents + + +
+
+
+ +
+

TODO: Write this

+
+ +
+

Usage

+
as.codes(x, ...)
+
+# S3 method for interlacer_interlaced
+as.codes(x, ...)
+
+# S3 method for interlacer_cfactor
+as.codes(x, ...)
+
+ +
+

Arguments

+
x
+

a cfactor()

+ + +
...
+

additional arguments (not used)

+ +
+
+

Value

+ + +

a vector of coded values

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/as.x_col_spec.html b/reference/as.x_col_spec.html new file mode 100644 index 0000000..a73d587 --- /dev/null +++ b/reference/as.x_col_spec.html @@ -0,0 +1,108 @@ + +Extended column specification coercions — as.x_col_spec • interlacer + Skip to contents + + +
+
+
+ +
+

Coerce an object into a column specification. This is used internally to +parse the col_types argument in the read_interlaced_*() family of +functions, so that it can accept a readr::cols() specification or list().

+
+ +
+

Usage

+
as.x_col_spec(x)
+
+ +
+

Arguments

+
x
+

a value to coerce into an extended column specification

+ +
+
+

Value

+ + +

an extended column specification

+
+
+

Details

+

It is an S3 function so that other packages may use the col_types argument +with their own custom objects.

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/as.x_collector.html b/reference/as.x_collector.html new file mode 100644 index 0000000..bc4411c --- /dev/null +++ b/reference/as.x_collector.html @@ -0,0 +1,102 @@ + +Collector shortcuts — as.na_collector • interlacer + Skip to contents + + +
+
+
+ +
+

The as.*_collector functions are used internally to enable shortcuts and +defaults when specifying extended collectors. See +vignette("extended-column-types") for a full discussion.

+
+ +
+

Usage

+
as.na_collector(x)
+
+as.value_collector(x)
+
+as.x_collector(x)
+
+ +
+

Arguments

+
x
+

a value to convert into an extended collector, value collector, +or missing reason collector.

+ +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/cfactor.html b/reference/cfactor.html new file mode 100644 index 0000000..8ab524b --- /dev/null +++ b/reference/cfactor.html @@ -0,0 +1,115 @@ + +Coded factors — cfactor • interlacer + Skip to contents + + +
+
+
+ +
+

TODO: Write this

+
+ +
+

Usage

+
cfactor(x = unspecified(), codes, ordered = FALSE)
+
+cordered(x, codes)
+
+is.cfactor(x)
+
+is.cordered(x)
+
+ +
+

Arguments

+
x
+

a vector of character or numeric codes

+ + +
codes
+

named vector of unique codes that declares the mapping of labels +to codes

+ + +
ordered
+

logical flag to determine if the codes should be regarded as +ordered (in the order given).

+ +
+
+

Value

+ + +

a new cfactor

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/codes-set.html b/reference/codes-set.html new file mode 100644 index 0000000..01ac725 --- /dev/null +++ b/reference/codes-set.html @@ -0,0 +1,91 @@ + +Set the codes for a `cfactor`` — codes<- • interlacer + Skip to contents + + +
+
+
+ +
+

Set the codes for a cfactor, similar to levels<-()

+
+ +
+

Usage

+
codes(x) <- value
+
+ +
+

Arguments

+
value
+

a named vector of codes for the cfactor

+ +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/codes.html b/reference/codes.html new file mode 100644 index 0000000..3d3b557 --- /dev/null +++ b/reference/codes.html @@ -0,0 +1,108 @@ + +cfactor attributes — codes • interlacer + Skip to contents + + +
+
+
+ +
+

Return the levels or codes of a cfactor

+
+ +
+

Usage

+
codes(x, ...)
+
+# S3 method for interlacer_cfactor
+levels(x)
+
+ +
+

Arguments

+
x
+

a cfactor

+ + +
...
+

additional arguments (not used)

+ +
+
+

Value

+ + +

levels() returns the levels of the cfactor (as a vector of +character labels); codes() returns a named vector representing the codes +for the cfactor

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/flatten_channels.html b/reference/flatten_channels.html index c7962dc..0d551a2 100644 --- a/reference/flatten_channels.html +++ b/reference/flatten_channels.html @@ -14,7 +14,7 @@ interlacer - 0.2.2 + 0.3.0
+ + + + + +
+
+
+ +
+

Set the factor level attributes of interlaced vectors

+
+ +
+

Usage

+
# S3 method for interlacer_interlaced
+levels(x) <- value
+
+na_levels(x) <- value
+
+ +
+

Arguments

+
value
+

A vector of new levels to set

+ +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/levels.interlacer_interlaced.html b/reference/levels.interlacer_interlaced.html new file mode 100644 index 0000000..10241a4 --- /dev/null +++ b/reference/levels.interlacer_interlaced.html @@ -0,0 +1,109 @@ + +Factor level attributes of interlaced vectors — levels.interlacer_interlaced • interlacer + Skip to contents + + +
+
+
+ +
+

The base S3 levels() function is overloaded for interlaced vectors, so +when the value channel is a factor type, levels() will return its levels. +Similarly na_levels() will return the levels for the missing reason +channel.

+
+ +
+

Usage

+
# S3 method for interlacer_interlaced
+levels(x)
+
+na_levels(x)
+
+ +
+

Arguments

+
x
+

an interlaced vector

+ +
+
+

Value

+ + +

The levels of the values or missing reason channel

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/map_value_channel.html b/reference/map_value_channel.html index f5ac7e1..ad55a79 100644 --- a/reference/map_value_channel.html +++ b/reference/map_value_channel.html @@ -1,7 +1,7 @@ interlaced functional utilities — map_value_channel • interlacerApply a function to one of the channels of an interlaced vector — map_value_channel • interlacerLift values to missing reasons — na • interlacerInterpret a value as a missing reason — na • interlacer @@ -10,7 +10,7 @@ interlacer - 0.2.2 + 0.3.0 + + + + + +
+
+
+ +
+

Missing reason collectors are used in extended column specifications to +specify the type of a column's missing reason channel.

+
+ +
+

Usage

+
na_col_default()
+
+na_col_none()
+
+na_col_integer(...)
+
+na_col_factor(...)
+
+na_col_cfactor(...)
+
+ +
+

Arguments

+
...
+

values to interpret as missing values. In the case of +na_col_cfactor(), arguments must be named.

+ +
+
+

Value

+ + +

a new missing reason collector object

+
+
+

Details

+

na_col_default() is used to signal that the missing reason type should +inherit the specification provided in the na = argument of the +calling read_interlaced_*() function

+
+
+

See also

+

Other collectors: +value_collectors, +x_col()

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/parse_interlaced.html b/reference/parse_interlaced.html index e896eda..b7563fa 100644 --- a/reference/parse_interlaced.html +++ b/reference/parse_interlaced.html @@ -12,7 +12,7 @@ interlacer - 0.2.2 + 0.3.0
+ + + + + +
+
+
+ +
+

Value collectors are used in extended column specifications to specify the +value type of a column. They are think wrappers around readr's col_*() +collector types.

+
+ +
+

Usage

+
v_col_guess()
+
+v_col_cfactor(codes, ordered = FALSE)
+
+v_col_character()
+
+v_col_date(format = "")
+
+v_col_datetime(format = "")
+
+v_col_double()
+
+v_col_factor(levels = NULL, ordered = FALSE)
+
+v_col_integer()
+
+v_col_big_integer()
+
+v_col_logical()
+
+v_col_number()
+
+v_col_skip()
+
+v_col_time(format = "")
+
+ +
+

Arguments

+
codes
+

A named vector of unique codes that declares the mapping of labels +to codes.

+ + +
ordered
+

Is it an ordered factor?

+ + +
format
+

A format specification, as described in readr::col_datetime()

+ + +
levels
+

Character vector of the allowed levels. When levels = NULL +(the default), levels are discovered from the unique values of x, in the +order in which they appear in x.

+ +
+
+

Value

+ + +

a new value collector object

+
+
+

Details

+

In addition to all of the column types supported by readr, interlacer +additionally can load cfactor() types via v_col_cfactor()

+
+
+

See also

+

Other collectors: +na_collectors, +x_col()

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/where_value_channel.html b/reference/where_value_channel.html new file mode 100644 index 0000000..56c8b78 --- /dev/null +++ b/reference/where_value_channel.html @@ -0,0 +1,105 @@ + +Select variables with a function applied on value or missing reason channels — where_value_channel • interlacer + Skip to contents + + +
+
+
+ +
+

where_value_channel() and where_na_channel() are simple wrappers for +tidyselect::where() that apply the selection function to the value or +missing reason channel of columns, respectively.

+
+ +
+

Usage

+
where_value_channel(fn)
+
+where_na_channel(fn)
+
+ +
+

Arguments

+
fn
+

A function that returns TRUE or FALSE (technically, a +predicate function). Can also be a purrr-like formula.

+ +
+
+

See also

+

Other interlaced tidy helpers: +across_value_channels()

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/write_interlaced_delim.html b/reference/write_interlaced_delim.html index ff1362d..8407811 100644 --- a/reference/write_interlaced_delim.html +++ b/reference/write_interlaced_delim.html @@ -16,7 +16,7 @@ interlacer - 0.2.2 + 0.3.0 + + + + + +
+
+
+ +
+

Extended collectors are used in x_cols() column specifications to indicate +which value and missing reason channel types should be used when loading +data with read_interlaced_*().

+
+ +
+

Usage

+
x_col(value_collector, na_collector = na_col_default())
+
+ +
+

Arguments

+
value_collector
+

a value collector

+ + +
na_collector
+

a missing reason collector

+ +
+
+

Value

+ + +

a new extended collector object

+
+
+

See also

+

Other collectors: +na_collectors, +value_collectors

+
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/x_cols.html b/reference/x_cols.html new file mode 100644 index 0000000..5f693f5 --- /dev/null +++ b/reference/x_cols.html @@ -0,0 +1,122 @@ + +Construct an extended column specification — x_cols • interlacer + Skip to contents + + +
+
+
+ +
+

Extended column specifications are used in the read_interlaced_*() family +of functions in the col_types argument to specify the value and missing +reason channel types.

+
+ +
+

Usage

+
x_cols(..., .default = v_col_guess())
+
+x_cols_only(...)
+
+ +
+

Arguments

+
...
+

a named argument list of extended collectors or value collectors.

+ + +
.default
+

a default value collector

+ +
+
+

Value

+ + +

a new extended column specification

+
+
+

Details

+

Like readr::cols(), x_cols() includes all the columns in the input data, +guessing the column types as the default, and creating missing reason +channels according to the na = argument in the read function. +x_cols_only() only includes the columns you specify, like +readr::cols_only(). In general, you can substitute list() for x_cols() +without changing the behavior.

+
+ + +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/x_spec.html b/reference/x_spec.html new file mode 100644 index 0000000..5d65a64 --- /dev/null +++ b/reference/x_spec.html @@ -0,0 +1,104 @@ + +Examine the extended column specification for a data frame — x_spec • interlacer + Skip to contents + + +
+
+
+ +
+

x_spec() extracts the full extended column specification from a tibble +created with the read_interlaced_*() family of functions.

+
+ +
+

Usage

+
x_spec(x)
+
+ +
+

Arguments

+
x
+

a data frame loaded by read_interlaced_*()

+ +
+
+

Value

+ + +

An extended column specification object

+
+
+

See also

+ +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/search.json b/search.json index 90bcf38..3960250 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-negative-missing-reasons-spss","dir":"Articles","previous_headings":"","what":"Numeric codes with negative missing reasons (SPSS)","title":"Coded Data","text":"’s extremely common find data sources encode categorical responses numeric values, negative values representing missing reason codes. SPSS one example. ’s SPSS-formatted version colors.csv example: missing reasons : -99: N/-98: REFUSED -97: OMITTED colors coded: 1: BLUE 2: RED 3: YELLOW format gives ability load everything numeric type: test value missing code, can check ’s less 0: downsides approach twofold: 1) values missing reasons become codes remember 2) ’s really easy make mistakes. sort mistakes? Well, everything numeric, ’s nothing stopping us treating missing reason codes regular values… forget remove missing reason codes, R still happily compute aggregations using negative numbers! fact, math without filtering missing codes potentially ruins integrity data: ever thought significant result, find ’s stray missing reason codes still interlaced values? ’s bad time. ’re much better loading formats interlacer factors, converting codes labels: Now aggregations won’t mix values missing codes, won’t keep cross-referencing codebook know values mean: operations work similar ease:","code":"library(readr) library(interlacer, warn.conflicts = FALSE) read_file( interlacer_example(\"colors_coded.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,-98,1 #> 3,21,-98 #> 4,30,-97 #> 5,1,-99 #> 6,41,2 #> 7,50,-97 #> 8,30,3 #> 9,-98,-98 #> 10,-97,2 #> 11,10,-98 (df_coded <- read_csv( interlacer_example(\"colors_coded.csv\"), col_types = \"n\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 -98 1 #> 3 3 21 -98 #> 4 4 30 -97 #> 5 5 1 -99 #> 6 6 41 2 #> 7 7 50 -97 #> 8 8 30 3 #> 9 9 -98 -98 #> 10 10 -97 2 #> 11 11 10 -98 library(dplyr, warn.conflicts = FALSE) df_coded |> mutate( age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 -99 1 1 #> 2 -98 15.5 3 #> 3 -97 40 2 #> 4 1 20 2 #> 5 2 41 2 #> 6 3 30 1 df_coded |> mutate( # age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 -99 1 1 #> 2 -98 -22.3 3 #> 3 -97 40 2 #> 4 1 -39 2 #> 5 2 -28 2 #> 6 3 30 1 # This will add 1 to the age values, but ALSO add one to all of the missing # reason codes, resulting in corrupted data! df_coded |> mutate( age_next_year = age + 1, ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 1 21 #> 2 2 -98 1 -97 #> 3 3 21 -98 22 #> 4 4 30 -97 31 #> 5 5 1 -99 2 #> 6 6 41 2 42 #> 7 7 50 -97 51 #> 8 8 30 3 31 #> 9 9 -98 -98 -97 #> 10 10 -97 2 -96 #> 11 11 10 -98 11 # This will give you your intended result, but it's easy to forget df_coded |> mutate( age_next_year = if_else(age < 0, age, age + 1), ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 1 21 #> 2 2 -98 1 -98 #> 3 3 21 -98 22 #> 4 4 30 -97 31 #> 5 5 1 -99 2 #> 6 6 41 2 42 #> 7 7 50 -97 51 #> 8 8 30 3 31 #> 9 9 -98 -98 -98 #> 10 10 -97 2 -97 #> 11 11 10 -98 11 (df_decoded <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = c(-99, -98, -97), show_col_types = FALSE, ) |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) factor( v, levels = c(-99, -98, -97), labels = c(\"N/A\", \"REFUSED\", \"OMITTED\"), ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) factor( v, levels = c(1, 2, 3), labels = c(\"BLUE\", \"RED\", \"YELLOW\") ) ), )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df_decoded |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 1 1 #> 5 15.5 3 #> 6 40 2 df_decoded |> mutate( age_next_year = age + 1, ) #> # A tibble: 11 × 4 #> person_id age favorite_color age_next_year #> #> 1 1 20 BLUE 21 #> 2 2 BLUE NA #> 3 3 21 22 #> 4 4 30 31 #> 5 5 1 2 #> 6 6 41 RED 42 #> 7 7 50 51 #> 8 8 30 YELLOW 31 #> 9 9 NA #> 10 10 RED NA #> 11 11 10 11"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-character-missing-reasons-sas-stata","dir":"Articles","previous_headings":"","what":"Numeric codes with character missing reasons (SAS, Stata)","title":"Coded Data","text":"Like SPSS, SAS Stata encode factor levels numeric values, instead representing missing reasons negative codes, given character codes: , value codes used previous example, except missing reasons coded follows: \".\": N/\".\": REFUSED \".b\": OMITTED handle missing reasons without interlacer, columns must loaded character vectors: test value missing, can cast numeric types. cast fails, know ’s missing code. successful, know ’s coded value. Although character missing codes help prevent us mistakenly including missing codes value aggregations, cast columns numeric time check missingness hardly ergonomic, generates annoying warnings. Like , ’s easier import interlacer decode values missing reasons:","code":"read_file( interlacer_example(\"colors_coded_char.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,.a,1 #> 3,21,.a #> 4,30,.b #> 5,1,. #> 6,41,2 #> 7,50,.b #> 8,30,3 #> 9,.a,.a #> 10,.b,2 #> 11,10,.a (df_coded_char <- read_csv( interlacer_example(\"colors_coded_char.csv\"), col_types = \"c\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 .a 1 #> 3 3 21 .a #> 4 4 30 .b #> 5 5 1 . #> 6 6 41 2 #> 7 7 50 .b #> 8 8 30 3 #> 9 9 .a .a #> 10 10 .b 2 #> 11 11 10 .a df_coded_char |> mutate( age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> Warning: There were 2 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA)`. #> Caused by warning in `is_logical()`: #> ! NAs introduced by coercion #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning. #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 . 1 1 #> 2 .a 15.5 3 #> 3 .b 40 2 #> 4 1 20 2 #> 5 2 41 2 #> 6 3 30 1 read_interlaced_csv( interlacer_example(\"colors_coded_char.csv\"), na = c(\".\", \".a\", \".b\"), show_col_types = FALSE, ) |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) factor( v, levels = c(\".\", \".a\", \".b\"), labels = c(\"N/A\", \"REFUSED\", \"OMITTED\") ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) factor( v, levels = c(1, 2, 3), labels = c(\"BLUE\", \"RED\", \"YELLOW\") ) ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"encoding-a-decoded-deinterlaced-data-frame-","dir":"Articles","previous_headings":"","what":"Encoding a decoded & deinterlaced data frame.","title":"Coded Data","text":"Re-coding re-interlacing data frame can done follows:","code":"library(forcats) df_decoded |> mutate( across( everything(), \\(x) map_na_channel( x, \\(v) fct_recode(v, `-99` = \"N/A\", `-98` = \"REFUSED\", `-97` = \"OMITTED\" ) ) ), favorite_color = map_value_channel( favorite_color, \\(v) fct_recode( v, `1` = \"BLUE\", `2` = \"RED\", `3` = \"YELLOW\" ) ) ) |> write_interlaced_csv(\"output.csv\")"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"haven","dir":"Articles","previous_headings":"","what":"haven","title":"Coded Data","text":"haven package functions loading native SPSS, SAS, Stata native file formats special data frames use column attributes special values keep track interlaced values missing reasons. complete discussion compares interlacer’s approach, see vignette(\"-approaches\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"aggregations-with-missing-reasons","dir":"Articles","previous_headings":"","what":"Aggregations with missing reasons","title":"Introduction to interlacer","text":"Now, interested values source data, functionality need. wanted know values NA? Although information encoded source data, lost missing reasons converted NA values. example, consider favorite_color column. many respondents REFUSED give favorite color? many people just OMITTED answer? question N/respondents (e.g. wasn’t survey form)? mean respondent age groups? current dataframe gets us part way: can see, converted missing reasons single NA, can answer questions missingness general, rather work specific reasons stored source data. Unfortunately, try load data missing reasons intact, lose something else: type information values. Now access missing reasons, columns character vectors. means order anything values, always filter missing reasons, cast remaining values desired type: gives us information want, cumbersome. Notice ’s distinction favorite color values missing reasons! Things start get really complex different columns different sets possible missing reasons. means lot type conversion gymnastics switch value types missing types.","code":"library(dplyr, warn.conflicts = FALSE) df_simple |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 4 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 NA 22.4 6 (df_with_missing <- read_csv( interlacer_example(\"colors.csv\"), col_types = cols(.default = \"c\"), show_col_types = FALSE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED reasons <- c(\"REFUSED\", \"OMITTED\", \"N/A\") df_with_missing |> mutate( age_values = as.numeric(if_else(age %in% reasons, NA, age)), ) |> summarize( mean_age = mean(age_values, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 N/A 1 1 #> 3 OMITTED 40 2 #> 4 RED 41 2 #> 5 REFUSED 15.5 3 #> 6 YELLOW 30 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"the-interlacer-approach","dir":"Articles","previous_headings":"Aggregations with missing reasons","what":"The interlacer approach","title":"Introduction to interlacer","text":"interlacer built based insight everything becomes much tidy, simple, expressive explicitly work values missing reasons separate channels variable. interlacer introduces new interlaced column type facilitates . read_interlaced_* functions interlacer import data new column type. can see column headers, column loaded composed two channels: value channel, missing reason channel. channel can type. age column, example, double values factor missing reasons: channels can explicitly accessed using value_channel() na_channel() helper functions: helpers rarely needed, however, computations automatically operate interlaced column’s value channel, ignore missing reasons channel. following compute mean age, without missing reasons interfering: (equivalently used value_channel() helper achieve result, albeit verbosity): Although missing reasons excluded computations, still treated unique values. means group age get breakdown unique missing reasons, rather lumped single NA: can see, can generate report , without needing type gymnastics! Also, values neatly distinguished missing reasons.","code":"(df <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), show_col_types = FALSE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df$age #> [11]> #> [1] 20 21 30 1 41 50 #> [8] 30 10 #> NA levels: REFUSED OMITTED N/A value_channel(df$age) #> [1] 20 NA 21 30 1 41 50 30 NA NA 10 na_channel(df$age) #> [1] REFUSED REFUSED #> [10] OMITTED #> Levels: REFUSED OMITTED N/A mean(df$age, na.rm = TRUE) #> [1] 25.375 mean(value_channel(df$age), na.rm = TRUE) #> [1] 25.375 df |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 15.5 3 #> 5 40 2 #> 6 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"filtering-based-on-missing-reasons","dir":"Articles","previous_headings":"","what":"Filtering based on missing reasons","title":"Introduction to interlacer","text":"interlaced columns also helpful creating samples inclusion / exclusion criteria based missing reasons. example, using example data, say wanted create sample respondents REFUSED give age. indicate value interpreted missing reason, can use na() function value: people REFUSED report age favorite color? ’s also possible combine value conditions missing reason conditions. example, select everyone REFUSED give favorite color, 20 years old:","code":"df |> filter(age == na(\"REFUSED\")) #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 # na_channel() can also be used to get an equivalent result: df |> filter(na_channel(age) == \"REFUSED\") #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 df |> filter(age == na(\"REFUSED\") & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 9 df |> filter(age > 20 & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 3 21 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"mutations","dir":"Articles","previous_headings":"","what":"Mutations","title":"Introduction to interlacer","text":"might expect, na() function can used values mutations. following pipeline replace favorite color respondents missing value \"REDACTED\" Conditionals also work exactly expect mutations. following replace favorite color respondents age < 18 missing reason \"REDACTED_UNDERAGE\". Respondents missing age replaced \"REDACTED_MISSING_AGE\" following mutation create new column called person_type \"CHILD\" age < 18, \"ADULT\" age >= 18, missing reason \"AGE_UNAVAILABLE\" age missing: Important note: must use dplyr::if_else() interlaced vectors instead R’s base::ifelse() function, base function strips missing reason channel due fundamental limitation base R.","code":"df |> mutate( favorite_color = na(\"REDACTED\") ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 #> 7 7 50 #> 8 8 30 #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, missing = na(\"REDACTED_MISSING_AGE\") ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( person_type = if_else( age < 18, \"CHILD\", \"ADULT\", missing = na(\"AGE_UNAVAILABLE\") ), ) #> # A tibble: 11 × 4 #> person_id age favorite_color person_type #> #> 1 1 20 BLUE ADULT #> 2 2 BLUE #> 3 3 21 ADULT #> 4 4 30 ADULT #> 5 5 1 CHILD #> 6 6 41 RED ADULT #> 7 7 50 ADULT #> 8 8 30 YELLOW ADULT #> 9 9 #> 10 10 RED #> 11 11 10 CHILD"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"empty-cells-na-missing-reasons","dir":"Articles","previous_headings":"","what":"Empty cells (NA missing reasons)","title":"Introduction to interlacer","text":"cell column missing value missing reason, cell considered “empty”. values can occur missing reasons specified. example, include missing = argument second example previous section, get following result: Empty values can detected using .empty() function: Raw NA values also considered “empty”: Empty values often occur result joins, dplyr::*_join() family functions missing = parameter, like dplyr::if_else() . example, say following data frame wanted join sample: ’re missing condition information respondents, show empty values join data frame sample: can remedy replacing empty values join:","code":"df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 <> #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 <> #> 10 10 <> #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) |> filter(is.empty(favorite_color)) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 2 <> #> 2 9 <> #> 3 10 <> # regular values are neither missing nor empty is.na(42) #> [1] FALSE is.empty(42) #> [1] FALSE # na(\"REASON\") is a missing value, but is not an empty value is.na(na(\"REASON\")) #> [1] TRUE is.empty(na(\"REASON\")) #> [1] FALSE # na(NA) values are considered missing and empty is.na(na(NA)) #> [1] TRUE is.empty(na(NA)) #> [1] TRUE # regular NA values are also missing and empty is.na(NA) #> [1] TRUE is.empty(NA) #> [1] TRUE conditions <- tribble( ~person_id, ~condition, 1, \"TREATMENT\", 2, \"CONTROL\", 3, na(\"TECHNICAL_ERROR\"), 6, \"CONTROL\", 8, \"TREATMENT\", ) df |> left_join(conditions, by = join_by(person_id)) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 <> #> 5 5 1 <> #> 6 6 41 RED CONTROL #> 7 7 50 <> #> 8 8 30 YELLOW TREATMENT #> 9 9 <> #> 10 10 RED <> #> 11 11 10 <> df |> left_join(conditions, by = join_by(person_id)) |> mutate( condition = if_else(is.empty(condition), na(\"LEFT_STUDY\"), condition), ) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED CONTROL #> 7 7 50 #> 8 8 30 YELLOW TREATMENT #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"writing-interlaced-files","dir":"Articles","previous_headings":"","what":"Writing interlaced files","title":"Introduction to interlacer","text":"’ve made made changes data, probably want save . interlacer provides write_interlaced_* family functions : combine value missing reasons interlaced character columns, write result csv. Alternatively, want re-interlace columns without writing file control writing process, can use flatten_channels(): value missing reason channels data frames interlaced vectors can similarly accessed using value_channel() na_channel() helper functions:","code":"write_interlaced_csv(df, \"interlaced_output.csv\") flatten_channels(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED # (it works on single vectors as well) flatten_channels(df$age) #> [1] \"20\" \"REFUSED\" \"21\" \"30\" \"1\" \"41\" \"50\" #> [8] \"30\" \"REFUSED\" \"OMITTED\" \"10\" value_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 NA #> 4 4 30 NA #> 5 5 1 NA #> 6 6 41 RED #> 7 7 50 NA #> 8 8 30 YELLOW #> 9 9 NA NA #> 10 10 NA RED #> 11 11 10 NA na_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 NA NA NA #> 2 NA REFUSED NA #> 3 NA NA REFUSED #> 4 NA NA OMITTED #> 5 NA NA N/A #> 6 NA NA NA #> 7 NA NA OMITTED #> 8 NA NA NA #> 9 NA REFUSED REFUSED #> 10 NA OMITTED NA #> 11 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Introduction to interlacer","text":"far, ’ve covered interlacer’s read_interlaced_* family functions enabled us load interlaced columns contain separate challens value missing reasons. interlaced type enables us create tidy type-aware pipelines can flexibly consider variable’s value missing reasons. examples vignette, column types automatically detected. explicitly specify value missing column types, (specify individual missing reasons specific columns), interlacer extends readr’s collector() system. covered next vignette, vignette(\"na-column-types\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/na-column-types.html","id":"na-collector-types","dir":"Articles","previous_headings":"","what":"NA collector types","title":"NA Column Types","text":"addition standard readr::col_* column specification types, interlacer provides ability specify missing reasons column level, using na parameter. useful missing reasons apply particular items opposed file whole. example, say measure following two items: current stress level? Low Moderate High don’t know don’t understand question well feel manage time responsibilities today? Poorly Fairly well Well well apply (Today vacation day) apply (reason) can see, items two selection choices mapped missing reasons. can specified na_cols() function, works similarly readr’s cols() function: Setting na type NULL indicates column loaded regular type instead interlaced one. following load person_id regular, non-interlaced type:","code":"(df_stress <- read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = col_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\") ), time_management = col_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\") ) ), na = na_cols( .default = c(\"REFUSED\", \"OMITTED\", \"N/A\"), current_stress = c(.default, \"DONT_KNOW\", \"DONT_UNDERSTAND\"), time_management = c(.default, \"NA_VACATION\", \"NA_OTHER\") ) )) #> # A tibble: 8 × 3 #> person_id current_stress time_management #> #> 1 1 LOW VERY_WELL #> 2 2 MODERATE POORLY #> 3 3 #> 4 4 HIGH POORLY #> 5 5 #> 6 6 LOW #> 7 7 MODERATE WELL #> 8 8 FAIRLY_WELL read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = na_cols( .default = c(-99, -98, -97), person_id = NULL, ), show_col_types = FALSE ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98>"},{"path":"http://kylehusmann.com/interlacer/articles/na-column-types.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"NA Column Types","text":"vignette covered column types values missing reasons can explicitly specified using collectors. also illustrated column-level missing values can specified creating missing channel specification using na_cols(). final example, used example data set coded values missing reasons. Coded values especially common data sets produced SPSS, SAS, Stata. recipes working coded data like , check next vignette, vignette(\"coded-data\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"haven-and-labelled","dir":"Articles","previous_headings":"","what":"haven and labelled","title":"Other Approaches","text":"haven labelled packages rely two functions creating vectors interlace values missing reasons: haven::labelled_spss() haven::tagged_na(). Although create haven_labelled vectors, use different methods representing missing values.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"labelled-missing-values-havenlabelled_spss","dir":"Articles","previous_headings":"haven and labelled","what":"“Labelled” missing values (haven::labelled_spss())","title":"Other Approaches","text":"SPSS files loaded haven via haven::read_spss(), values missing reasons loaded single interlaced numeric vector: just numeric vector though, haven::labelled_spss() numeric vector, attributes describing value missing value codes: attributes adjust behavior functions like .na(): makes easy check value missing reason, still filter missing reasons aggregations: ’s little bit improvement working raw coded values, can use .na(), codes get labels, don’t constantly looking codes codebook. still falls short interlacer’s functionality two key reasons: Reason 1: interlacer, value column can whatever type want: numeric, character, factor, etc. labelled missing reasons, values missing reasons need type, usually numeric codes. creates lot type gymnastics potential errors ’re manipulating . Reason 2: Even missing values labelled labelled_spss type, aggregations math operations protected. forget take missing values, get incorrect results / corrupted data:","code":"library(interlacer, warn.conflicts = FALSE) library(haven) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union (df_spss <- read_spss( interlacer_example(\"colors.sav\"), user_na = TRUE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 21 -98 (NA) [REFUSED] #> 4 4 30 -97 (NA) [OMITTED] #> 5 5 1 -99 (NA) [N/A] #> 6 6 41 2 [RED] #> 7 7 50 -97 (NA) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 -97 (NA) [OMITTED] 2 [RED] #> 11 11 10 -98 (NA) [REFUSED] attributes(df_spss$favorite_color) #> $label #> [1] \"Favorite color\" #> #> $na_range #> [1] -Inf 0 #> #> $class #> [1] \"haven_labelled_spss\" \"haven_labelled\" \"vctrs_vctr\" #> [4] \"double\" #> #> $format.spss #> [1] \"F8.2\" #> #> $labels #> BLUE RED YELLOW N/A REFUSED OMITTED #> 1 2 3 -99 -98 -97 is.na(df_spss$favorite_color) #> [1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE df_spss |> mutate( age_values = if_else(is.na(age), NA, age), favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age_values, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 30.3 5 #> 2 -98 (NA) [REFUSED] 15.5 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1 df_spss |> mutate( age_next_year = if_else(is.na(age), NA, age + 1), .after = person_id ) #> # A tibble: 11 × 4 #> person_id age_next_year age favorite_color #> #> 1 1 21 20 1 [BLUE] #> 2 2 NA -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 22 21 -98 (NA) [REFUSED] #> 4 4 31 30 -97 (NA) [OMITTED] #> 5 5 2 1 -99 (NA) [N/A] #> 6 6 42 41 2 [RED] #> 7 7 51 50 -97 (NA) [OMITTED] #> 8 8 31 30 3 [YELLOW] #> 9 9 NA -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 NA -97 (NA) [OMITTED] 2 [RED] #> 11 11 11 10 -98 (NA) [REFUSED] df_spss |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA -20.8 5 #> 2 -98 (NA) [REFUSED] -22.3 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1 df_spss |> mutate( age_next_year = age + 1, .after = person_id ) #> # A tibble: 11 × 4 #> person_id age_next_year age favorite_color #> #> 1 1 21 20 1 [BLUE] #> 2 2 -97 -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 22 21 -98 (NA) [REFUSED] #> 4 4 31 30 -97 (NA) [OMITTED] #> 5 5 2 1 -99 (NA) [N/A] #> 6 6 42 41 2 [RED] #> 7 7 51 50 -97 (NA) [OMITTED] #> 8 8 31 30 3 [YELLOW] #> 9 9 -97 -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 -96 -97 (NA) [OMITTED] 2 [RED] #> 11 11 11 10 -98 (NA) [REFUSED]"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"tagged-missing-values-haventagged_na","dir":"Articles","previous_headings":"haven and labelled","what":"“Tagged” missing values (haven::tagged_na())","title":"Other Approaches","text":"loading Stata SAS files, haven uses “tagged missingness” approach mirror values handled Stata SAS: approach deviously clever. takes advantage way NaN floating point values stored memory, make possible different “flavors” NA values. (info done, check tagged_na.c source code haven) still act like regular NA values… now can include single character “tag” (usually letter -z). means work .na() include missing reason codes aggregations! Unfortunately, can’t group , dplyr::group_by() tag-aware. :( Another limitation approach requires values types numeric, trick “tagging” NA values depends peculiarities floating point values stored memory.","code":"(df_stata <- read_stata( interlacer_example(\"colors.dta\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 NA(a) [REFUSED] 1 [BLUE] #> 3 3 21 NA(a) [REFUSED] #> 4 4 30 NA(b) [OMITTED] #> 5 5 1 NA #> 6 6 41 2 [RED] #> 7 7 50 NA(b) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 NA(a) [REFUSED] NA(a) [REFUSED] #> 10 10 NA(b) [OMITTED] 2 [RED] #> 11 11 10 NA(a) [REFUSED] is.na(df_stata$age) #> [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE mean(df_stata$age, na.rm = TRUE) #> [1] 25.375 df_stata |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 1 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 25.4 11"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"declared","dir":"Articles","previous_headings":"","what":"declared","title":"Other Approaches","text":"declared package uses functiondeclared::declared() constructing interlaced vectors: declared vectors similar haven_labelled_spss vectors, except critical innovation: store actual NA values missing values, keep track missing reasons entirely attributes object: means aggregations work exactly expect!","code":"library(declared) #> #> Attaching package: 'declared' #> The following object is masked from 'package:interlacer': #> #> is.empty (dcl <- declared(c(1, 2, 3, -99, -98), na_values = c(-99, -98))) #> [5]> #> [1] 1 2 3 NA(-99) NA(-98) #> Missing values: -99, -98 # All the missing reason info is tracked in the attributes attributes(dcl) #> $na_index #> -99 -98 #> 4 5 #> #> $na_values #> [1] -99 -98 #> #> $date #> [1] FALSE #> #> $class #> [1] \"declared\" \"numeric\" # The data stored has actual NA values, so it works as you would expect # with summary stats like `mean()`, etc. attributes(dcl) <- NULL dcl #> [1] 1 2 3 NA NA dcl <- declared(c(1, 2, 3, -99, -98), na_values = c(-99, -98)) sum(dcl, na.rm = TRUE) #> [1] 6"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"interlacer","dir":"Articles","previous_headings":"","what":"interlacer","title":"Other Approaches","text":"interlacer builds ideas haven, labelled, declared following goals:","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"be-fully-generic-add-a-missing-value-channel-to-any-vector-type","dir":"Articles","previous_headings":"interlacer","what":"1. Be fully generic: Add a missing value channel to any vector type","title":"Other Approaches","text":"mentioned , haven::labelled_spss() works numeric character types, haven::tagged_na() works numeric types. declared::declared() supports numeric, character date types. interlaced types, contrast, can imbue vector type missing value channel: Like declared vectors, missing reasons tracked attributes. unlike declared, missing reasons stored entirely separate channel rather tracking indices: data structure drives functional API, described (3) .","code":"interlaced(list(TRUE, FALSE, \"reason\"), na = \"reason\") #> [3]> #> [1] TRUE FALSE #> NA levels: reason interlaced(c(\"2020-01-01\", \"2020-01-02\", \"reason\"), na = \"reason\") |> map_value_channel(as.Date) #> [3]> #> [1] 2020-01-01 2020-01-02 #> NA levels: reason interlaced(c(\"red\", \"green\", \"reason\"), na = \"reason\") |> map_value_channel(factor) #> [3]> #> [1] red green #> Levels: green red #> NA levels: reason (int <- interlaced(c(1,2,3, -99, -98), na = c(-99, -98))) #> [5]> #> [1] 1 2 3 <-99> <-98> attributes(int) #> $na_channel_values #> [1] NA NA NA -99 -98 #> #> $class #> [1] \"interlacer_interlaced\" \"vctrs_vctr\" \"numeric\" attributes(int) <- NULL int #> [1] 1 2 3 NA NA"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"provide-functions-for-reading-writing-interlaced-csv-files-not-just-spss-sas-stata-files","dir":"Articles","previous_headings":"interlacer","what":"2. Provide functions for reading / writing interlaced CSV files (not just SPSS / SAS / Stata files)","title":"Other Approaches","text":"See interlacer::read_interlaced_csv(), etc.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"provide-a-functional-api-that-integrates-well-into-tidy-pipelines","dir":"Articles","previous_headings":"interlacer","what":"3. Provide a functional API that integrates well into tidy pipelines","title":"Other Approaches","text":"interlacer provides functions facilitate working interlaced type Result type, well-understood abstraction functional programming. functions na() map_value_channel() map_na_channel() come influence. na() function creates interlaced type “lifting” value missing reason channel. approach helps create safer separation value missing reason channels, ’s always clear channel ’re making comparisons . example: Similarly, map_value_channel() map_na_channel() allow safely mutate particular channel, without touching values channel. interface especially useful tidy pipelines. Finally, interlaced type based vctrs type system, plays nicely packages tidyverse.","code":"# haven labelled_spss(c(-99, 1, 2), na_values = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE labelled_spss(c(-99, 1, 2), na_values = -99) == -99 # na channel comparison #> [1] TRUE FALSE FALSE # declared declared(c(-99, 1, 2), na_values = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE declared(c(-99, 1, 2), na_values = -99) == -99 # na channel comparison #> [1] TRUE FALSE FALSE # interlacer interlaced(c(-99, 1, 2), na = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE interlaced(c(-99, 1, 2), na = -99) == na(-99) # na channel comparison #> [1] TRUE FALSE FALSE"},{"path":[]},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"more-flexible-missing-reason-channel-types","dir":"Articles","previous_headings":"Questions for the future","what":"1. More flexible missing reason channel types?","title":"Other Approaches","text":"Earlier versions allowed arbitrary types occupy missing reason channel (.e. fully generic Result type). ended constricting missing reason channel allow integer factor types help simplify na_cols() specifications. arbitrary types allowed, na_cols() specs become quite long (e.g. column_name = factor(levels=c(\"REASON_1\", \"REASON_2\")))). far can tell, 99.9% time, preferable use integer factor missing reason channels double character ones, now ’ve made executive decision allow integer factor types.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"a-better-na_cols-specification","dir":"Articles","previous_headings":"Questions for the future","what":"2. A better na_cols() specification?","title":"Other Approaches","text":"Right now, missing values supplied separate argument col_types. means custom missing values get pretty far separated col_type definitions: earlier version created extension readr collectors, family icol_* types, allowed something like : …can’t decide interface like better. Although latter approach feels cleaner folds custom missing reasons cols definition, one disadvantage overwrite missing values (e.g. set missing reason person_id NULL long ’s default missing reason specified). also feels little “hackish” extend readr’s types way; think making use na parameter na_cols() function provides little bit insulation changes readr. Anyway, thoughts opinions things, ’d really appreciate feedback!","code":"read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = col_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\") ), time_management = col_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\") ) ), na = na_cols( .default = c(\"REFUSED\", \"OMITTED\", \"N/A\"), current_stress = c(.default, \"DONT_KNOW\", \"DONT_UNDERSTAND\"), time_management = c(.default, \"NA_VACATION\", \"NA_OTHER\") ) ) #> # A tibble: 8 × 3 #> person_id current_stress time_management #> #> 1 1 LOW VERY_WELL #> 2 2 MODERATE POORLY #> 3 3 #> 4 4 HIGH POORLY #> 5 5 #> 6 6 LOW #> 7 7 MODERATE WELL #> 8 8 FAIRLY_WELL read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = icol_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\"), na = c(\"DONT_KNOW\", \"DONT_UNDERSTAND\") ), time_management = col_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\"), na = c(\"NA_VACATION\", \"NA_OTHER\") ) ), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") )"},{"path":"http://kylehusmann.com/interlacer/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Kyle Husmann. Author, maintainer.","code":""},{"path":"http://kylehusmann.com/interlacer/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Husmann K (2024). interlacer: Read Tabular Data Interlaced Values Missing Reasons. R package version 0.2.2, http://kylehusmann.com/interlacer/.","code":"@Manual{, title = {interlacer: Read Tabular Data With Interlaced Values And Missing Reasons}, author = {Kyle Husmann}, year = {2024}, note = {R package version 0.2.2}, url = {http://kylehusmann.com/interlacer/}, }"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"interlacer-","dir":"","previous_headings":"","what":"Read Tabular Data With Interlaced Values And Missing Reasons","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"value missing data, sometimes want know missing. Many textual tabular data sources encode missing reasons special values interlaced regular values column (e.g. N/, REFUSED, -99, etc.). Unfortunately, missing reasons lost values converted single NA type. Working missing reasons R traditionally requires loading variables character vectors bunch string comparisons type conversions make sense . interlacer provides functions load variables interlaced data sources special interlaced column type holds values NA reasons separate channels variable. contexts, can treat interlaced columns regular values: take mean interlaced column, example, get mean values, without missing reasons interfering computation. Unlike regular column, however, missing reasons still available. means can still filter data frames variables specific missing reasons, generate summary statistics breakdowns missing reason. words, longer constantly manually include / exclude missing reasons computations filtering awkward string comparisons type conversions… everything just works! addition introduction vignette(\"interlacer\") sure also check : vignette(\"na-column-types\") see handle variable-level missing reasons vignette(\"coded-data\") recipies working coded data (e.g. data produced SPSS, SAS Stata) vignette(\"-approaches\") deep dive interlacer’s approach compares approaches representing manipulating missing reasons alongside data values","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"id_️-️-️-warning-️-️-️","dir":"","previous_headings":"","what":"⚠️ ⚠️ ⚠️ WARNING ⚠️ ⚠️ ⚠️","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"library currently experimental stages, aware interface quite likely change future. meantime, please try let know think!","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"easiest way get interlacer install via devtools:","code":"install.packages(\"devtools\") # If devtools is not already installed devtools::install_github(\"khusmann/interlacer\")"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"use interlacer, load current R session: interlacer supports following file formats read_interlaced_*() functions, extend readr::read_*() family functions: read_interlaced_csv() read_interlaced_tsv() read_interlaced_csv2() read_interlaced_delim() quick demo, consider following example file bundled interlacer: csv file, values interlaced three possible missing reasons: REFUSED, OMITTED, N/. readr, loading data result data frame missing reasons replaced NA: interlacer, missing reasons preserved: can see, printout column defined two types: type values, type missing reasons. age column, example, type double values, type factor missing reasons: Computations automatically operate values: missing reasons still ! indicate value treated missing reason instead regular value, can use na() function. following, example, filter data set individuals REFUSED give favorite color: ’s pipeline compute breakdown mean age respondents favorite color, separate categories missing reason: just scratches surface can done interlacer… check vignette(\"interlacer\") complete overview!","code":"library(interlacer, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) library(readr) read_file(interlacer_example(\"colors.csv\")) |> cat() #> person_id,age,favorite_color #> 1,20,BLUE #> 2,REFUSED,BLUE #> 3,21,REFUSED #> 4,30,OMITTED #> 5,1,N/A #> 6,41,RED #> 7,50,OMITTED #> 8,30,YELLOW #> 9,REFUSED,REFUSED #> 10,OMITTED,RED #> 11,10,REFUSED read_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), show_col_types = FALSE, ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 NA #> 10 10 NA RED #> 11 11 10 (ex <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), show_col_types = FALSE, )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 ex$age #> [11]> #> [1] 20 21 30 1 41 50 #> [8] 30 10 #> NA levels: REFUSED OMITTED N/A mean(ex$age, na.rm = TRUE) #> [1] 25.375 ex |> filter(favorite_color == na(\"REFUSED\")) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 3 21 #> 2 9 #> 3 11 10 ex |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) %>% arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 15.5 3 #> 5 40 2 #> 6 1 1"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"known-issues","dir":"","previous_headings":"","what":"Known Issues","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"base functions, like base::ifelse(), drop missing reason channel interlaced types, converting regular vectors example: due limitation R. run , use tidyverse equivalent function. Tidyverse functions designed correctly handle type conversions. example, can use dplyr::if_else(): Performance large data sets may notice large datasets interlacer runs significantly slower readr / vroom. Although interlacer uses vroom hood load delimited data, able take advantage many optimizations vroom currently support column-level missing values. soon vroom supports column-level missing values, able remedy !","code":"ex |> mutate( favorite_color = ifelse(age < 18, na(\"REDACTED\"), favorite_color) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 ex |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, missing = na(\"REDACTED_MISSING_AGE\") ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"interlacer inspired haven, labelled, declared packages. packages provide similar functionality interlacer, focused providing compatibility missing reason data imported SPSS, SAS, Stata. interlacer slightly different aims: fully generic: Add missing value channel vector type. Provide functions reading / writing interlaced CSV files (just SPSS / SAS / Stata files) Provide functional API integrates well tidy pipelines Future versions interlacer provide functions convert packages’ types. detailed discussion, see vignette(\"-approaches\").","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"acknowledgements","dir":"","previous_headings":"","what":"Acknowledgements","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"development software supported, whole part, Institute Education Sciences, U.S. Department Education, Grant R305A170047 Pennsylvania State University. opinions expressed authors represent views Institute U.S. Department Education.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":null,"dir":"Reference","previous_headings":"","what":"Flatten a interlaced vector — flatten_channels","title":"Flatten a interlaced vector — flatten_channels","text":"flatten_channels() flattens interlaced vector single channel. useful step right writing interlaced vector file, example.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flatten a interlaced vector — flatten_channels","text":"","code":"flatten_channels(x, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flatten a interlaced vector — flatten_channels","text":"x interlaced vector ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flatten a interlaced vector — flatten_channels","text":"vector, flattened","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an interlaced vector — interlaced","title":"Construct an interlaced vector — interlaced","text":"interlaced type extends vectors adding \"missing reason\" channel can used distinguish different types missingness. interlaced() function constructs new interlaced vector vector list values.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an interlaced vector — interlaced","text":"","code":"interlaced(x, na = NULL) as.interlaced(x, na = NULL, ...) # S3 method for default as.interlaced(x, na = NULL, ...) # S3 method for interlacer_interlaced as.interlaced(x, ...) # S3 method for data.frame as.interlaced(x, ...) is.interlaced(x)"},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an interlaced vector — interlaced","text":"x vector list values na vector values interpret missing values ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an interlaced vector — interlaced","text":"interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a path to one of interlacer's example data sets — interlacer_example","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"interlacer comes bundled number sample files inst/extdata directory. function make easy access","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"","code":"interlacer_example(file = NULL)"},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"file Name file. NULL, example files listed.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"","code":"interlacer_example() #> [1] \"colors.csv\" \"colors.dta\" \"colors.sav\" #> [4] \"colors_coded.csv\" \"colors_coded_char.csv\" \"stress.csv\" interlacer_example(\"colors.csv\") #> [1] \"/home/runner/work/_temp/Library/interlacer/extdata/colors.csv\""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":null,"dir":"Reference","previous_headings":"","what":"NA missing reasons — is.empty","title":"NA missing reasons — is.empty","text":"value missing value missing reason, considered \"empty\". .empty() checks type values. Regular NA values (missing reasons) also considered \"empty\".","code":""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"NA missing reasons — is.empty","text":"","code":"is.empty(x)"},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"NA missing reasons — is.empty","text":"x vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"NA missing reasons — is.empty","text":"logical vector length x, containing TRUE empty elements, FALSE otherwise.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":null,"dir":"Reference","previous_headings":"","what":"interlaced functional utilities — map_value_channel","title":"interlaced functional utilities — map_value_channel","text":"map_value_channel() modifies values interlaced vector. map_na_channel() modifies missing reason channel interlaced vector.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"interlaced functional utilities — map_value_channel","text":"","code":"map_value_channel(x, fn) map_na_channel(x, fn)"},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"interlaced functional utilities — map_value_channel","text":"x interlaced vector fn function maps values missing reasons new values","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"interlaced functional utilities — map_value_channel","text":"new interlaced vector, modified according supplied function","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":null,"dir":"Reference","previous_headings":"","what":"Lift values to missing reasons — na","title":"Lift values to missing reasons — na","text":"na() lifts value interlaced missing reason channel.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Lift values to missing reasons — na","text":"","code":"na(x = unspecified())"},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Lift values to missing reasons — na","text":"x character numeric value","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Lift values to missing reasons — na","text":"interlaced value","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_cols.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an NA column specification — na_cols","title":"Create an NA column specification — na_cols","text":"na_cols() creates specification NA channel missing reason loading data read_interlaced_*() family functions.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_cols.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an NA column specification — na_cols","text":"","code":"na_cols(...) as.na_col_spec(x) is.na_col_spec(x)"},{"path":"http://kylehusmann.com/interlacer/reference/na_cols.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an NA column specification — na_cols","text":"... Named vectors use missing reasons loading interlaced columns. Use name .default set default NA values columns. x Named list construct NA spec , vector values used spec .default equal values.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_levels.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the factor levels of the value or missing reason channel — na_levels","title":"Get the factor levels of the value or missing reason channel — na_levels","text":"base S3 levels() function overloaded interlaced vectors, value channel factor type, levels() return levels. Similarly na_levels() return levels missing reason channel, factor type.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_levels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the factor levels of the value or missing reason channel — na_levels","text":"","code":"na_levels(x) na_levels(x) <- value # S3 method for interlacer_interlaced levels(x) # S3 method for interlacer_interlaced levels(x) <- value"},{"path":"http://kylehusmann.com/interlacer/reference/na_levels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the factor levels of the value or missing reason channel — na_levels","text":"x interlaced vector value new levels set","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_levels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the factor levels of the value or missing reason channel — na_levels","text":"levels values missing reason channel","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Examine the NA spec of a data frame — na_spec","title":"Examine the NA spec of a data frame — na_spec","text":"Like readr::spec(), na_spec() extracts NA column specification tibble created read_interlaced_*","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Examine the NA spec of a data frame — na_spec","text":"","code":"na_spec(x)"},{"path":"http://kylehusmann.com/interlacer/reference/na_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Examine the NA spec of a data frame — na_spec","text":"x data frame object extract ","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_spec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Examine the NA spec of a data frame — na_spec","text":"na_col_spec object","code":""},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse a character vector into an interlaced vector type — parse_interlaced","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"parse_interlaced converts character vector interlaced vector parsing readr collector type.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"","code":"parse_interlaced(x, na, .value_col = col_guess())"},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"x character vector na vector values interpret missing values .value_col collector parse character values (e.g. readr::col_double(), readr::col_integer(), etc.)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"read_interlaced_*(), family functions extend readr's read_delim(), read_csv, etc. functions use data sources values interlaced missing reasons. functions return tibble interlaced columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"read_interlaced_delim( file, delim = NULL, quote = \"\\\"\", escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = c(\"\", \"NA\"), comment = \"\", trim_ws = FALSE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_csv( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = c(\"\", \"NA\"), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_csv2( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = c(\"\", \"NA\"), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_tsv( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = c(\"\", \"NA\"), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) interlaced_vroom( file, delim = NULL, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, skip = 0, n_max = Inf, na = c(\"\", \"NA\"), quote = \"\\\"\", comment = \"\", skip_empty_rows = TRUE, trim_ws = TRUE, escape_double = TRUE, escape_backslash = FALSE, locale = vroom::default_locale(), guess_max = 100, progress = vroom::vroom_progress(), show_col_types = NULL, .name_repair = \"unique\" )"},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"file Either path file, connection, literal data (either single string raw vector). Files ending .gz, .bz2, .xz, .zip automatically uncompressed. Files starting http://, https://, ftp://, ftps:// automatically downloaded. Remote gz files can also automatically downloaded decompressed. Literal data useful examples tests. recognised literal data, input must either wrapped (), string containing least one new line, vector containing least one string new line. Using value clipboard() read system clipboard. delim Single character used separate fields within record. quote Single character used quote strings. escape_backslash file use backslashes escape special characters? general escape_double backslashes can used escape delimiter character, quote character, add special characters like \\\\n. escape_double file escape quotes doubling ? .e. option TRUE, value \"\"\"\" represents single quote, \\\". col_names Either TRUE, FALSE character vector column names. TRUE, first row input used column names, included data frame. FALSE, column names generated automatically: X1, X2, X3 etc. col_names character vector, values used names columns, first row input read first row output data frame. Missing (NA) column names generate warning, filled dummy names ...1, ...2 etc. Duplicate column names generate warning made unique, see name_repair control done. col_types One NULL, cols() specification, string. See vignette(\"readr\") details. NULL, column types inferred guess_max rows input, interspersed throughout file. convenient (fast), robust. guessed types wrong, need increase guess_max supply correct types . Column specifications created list() cols() must contain one column specification column. want read subset columns, use cols_only(). Alternatively, can use compact string representation character represents one column: c = character = integer n = number d = double l = logical f = factor D = date T = date time t = time ? = guess _ - = skip default, reading file without column specification print message showing readr guessed . remove message, set show_col_types = FALSE set options(readr.show_col_types = FALSE). col_select Columns include results. can use mini-language dplyr::select() refer columns name. Use c() use one selection expression. Although usage less common, col_select also accepts numeric column index. See ?tidyselect::language full details selection language. id name column store file path. useful reading multiple input files data file paths, data collection date. NULL (default) extra column created. locale locale controls defaults vary place place. default locale US-centric (like R), can use locale() create locale controls things like default time zone, encoding, decimal mark, big mark, day/month names. na NA col spec defined na_cols() character numeric vector values interpret missing values. comment string used identify comments. text comment characters silently ignored. trim_ws leading trailing whitespace (ASCII spaces tabs) trimmed field parsing ? skip Number lines skip reading data. comment supplied commented lines ignored skipping. n_max Maximum number lines read. guess_max Maximum number lines use guessing column types. never use number lines read. See vignette(\"column-types\", package = \"readr\") details. name_repair, .name_repair Handling column names. default behaviour ensure column names \"unique\". Various repair strategies supported: \"minimal\": name repair checks, beyond basic existence names. \"unique\" (default value): Make sure names unique empty. \"check_unique\": name repair, check unique. \"unique_quiet\": Repair unique strategy, quietly. \"universal\": Make names unique syntactic. \"universal_quiet\": Repair universal strategy, quietly. function: Apply custom name repair (e.g., name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function(). argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce . progress Display progress bar? default display interactive session knitting document. automatic progress bar can disabled setting option readr.show_progress FALSE. show_col_types FALSE, show guessed column types. TRUE always show column types, even supplied. NULL (default) show column types explicitly supplied col_types argument. skip_empty_rows blank rows ignored altogether? .e. option TRUE blank rows represented . FALSE represented NA values columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"tibble(), interlaced columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"# Beep boop"},{"path":"http://kylehusmann.com/interlacer/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics .factor, .ordered readr .col_spec, col_character, col_date, col_datetime, col_double, col_factor, col_guess, col_integer, col_logical, col_number, col_skip, col_time, cols, cols_condense, cols_only, spec vctrs vec_c","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":null,"dir":"Reference","previous_headings":"","what":"Access the channels of an interlaced vector — value_channel","title":"Access the channels of an interlaced vector — value_channel","text":"value_channel() returns value channel interlaced vector na_channel() returns missing reason channel interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Access the channels of an interlaced vector — value_channel","text":"","code":"value_channel(x, ...) na_channel(x, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Access the channels of an interlaced vector — value_channel","text":"x interlaced vector ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Access the channels of an interlaced vector — value_channel","text":"value missing reasons channel","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"write_interlaced_*() family functions take data frame interlaced columns, flatten interlaced columns, write file. Non-interlaced columns just pass . behavior functions match similarly named counterparts readr.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"","code":"write_interlaced_delim( x, file, delim = \" \", empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_csv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_csv2( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_excel_csv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_excel_csv2( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_tsv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() )"},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"x data frame tibble write disk. file File connection write . delim Delimiter used separate values. Defaults \" \" write_delim(), \",\" write_excel_csv() \";\" write_excel_csv2(). Must single character. empty String used empty values (NA values non-interlaced columns). Defaults NA. append FALSE, overwrite existing file. TRUE, append existing file. cases, file exist new file created. col_names FALSE, column names included top file. TRUE, column names included. specified, col_names take opposite value given append. quote handle fields contain characters need quoted. needed - Values quoted needed: contain delimiter, quote, newline. - Quote fields. none - Never quote fields. escape type escape use quotes data. double - quotes escaped doubling . backslash - quotes escaped preceding backslash. none - quotes escaped. eol end line character use. commonly either \"\\n\" Unix style newlines, \"\\r\\n\" Windows style newlines. num_threads Number threads use reading materializing vectors. data contains newlines within fields parser automatically forced use single thread . progress Display progress bar? default display interactive session knitting document. display updated every 50,000 values display estimated reading time 5 seconds . automatic progress bar can disabled setting option readr.show_progress FALSE.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"write_interlaced_* returns input x invisibly","code":""}] +[{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"warning","dir":"Articles","previous_headings":"","what":"⚠️ ⚠️ ⚠️ WARNING ⚠️ ⚠️ ⚠️","title":"Coded Data","text":"cfactor type highly experimental feature (even compared rest interlacer) thoroughly tested! ’m sharing super pre-alpha, unstable state get feedback invest time polishing implementation.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"spss-style-codes","dir":"Articles","previous_headings":"","what":"SPSS-style codes","title":"Coded Data","text":"motivating example, consider coded version colors.csv example: missing reasons : -99: N/-98: REFUSED -97: OMITTED colors coded: 1: BLUE 2: RED 3: YELLOW style coding, positive values representing categorical levels negative values representing missing values, common format used SPSS. data can loaded interlaced numeric values follows: representation awkward work codes meaningless obfuscate significance code write results output. wanted select everyone BLUE favorite color, example, write: Similarly, wanted filter OMITTED favorite colors, write: make data ergnomic work , can use interlacer’s v_col_cfactor() na_col_cfactor() collector types load values cfactor instead, allows associate codes human-readable labels: Now human-readable labels, instead magic codes, can used working data: can still convert labels values missing reasons back codes wish, using .codes(). following convert missing reason channel age value channel favorite_color coded representation: recode cfactor channels data frame coded representation can following:","code":"library(readr) library(dplyr, warn.conflicts = FALSE) library(interlacer, warn.conflicts = FALSE) read_file( interlacer_example(\"colors_coded.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,-98,1 #> 3,21,-98 #> 4,30,-97 #> 5,1,-99 #> 6,41,2 #> 7,50,-97 #> 8,30,3 #> 9,-98,-98 #> 10,-97,2 #> 11,10,-98 (df_coded <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = c(-99, -98, -97) )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98> df_coded |> filter(favorite_color == 1) #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 df_coded |> filter(favorite_color == na(-97)) #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 4 30 <-97> #> 2 7 50 <-97> (df_decoded <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), col_types = x_cols( favorite_color = v_col_cfactor(codes = c(BLUE = 1, RED = 2, YELLOW = 3)), ), na = na_col_cfactor(REFUSED = -99, OMITTED = -98, `N/A` = -97) )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df_decoded |> filter(favorite_color == \"BLUE\") #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE df_decoded |> filter(favorite_color == na(\"OMITTED\")) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 3 21 #> 2 9 #> 3 11 10 df_decoded |> mutate( age = map_na_channel(age, as.codes), favorite_color = map_value_channel(favorite_color, as.codes) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 2 #> 7 7 50 #> 8 8 30 3 #> 9 9 <-98> #> 10 10 <-97> 2 #> 11 11 10 df_decoded |> mutate( across_value_channels(where_value_channel(is.cfactor), as.codes), across_na_channels(where_na_channel(is.cfactor), as.codes), ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98>"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"sas--and-stata-style-codes","dir":"Articles","previous_headings":"","what":"SAS- and Stata-style codes","title":"Coded Data","text":"Like SPSS, SAS Stata encode factor levels numeric values, instead representing missing reasons negative codes, given character codes: example, value coding scheme used favorite_color previous example, except missing reason channels coded follows: “.”: N/“.”: REFUSED “.b”: OMITTED data can easily loaded interlacer cfactor missing reason channel follows:","code":"read_file( interlacer_example(\"colors_coded_char.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,.a,1 #> 3,21,.a #> 4,30,.b #> 5,1,. #> 6,41,2 #> 7,50,.b #> 8,30,3 #> 9,.a,.a #> 10,.b,2 #> 11,10,.a read_interlaced_csv( interlacer_example(\"colors_coded_char.csv\"), col_types = x_cols( favorite_color = v_col_cfactor(codes = c(BLUE = 1, RED = 2, YELLOW = 3)), ), na = c(`N/A` = \".\", REFUSED = \".a\", OMITTED = \".b\"), ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"the-cfactor-type","dir":"Articles","previous_headings":"","what":"The cfactor type","title":"Coded Data","text":"cfactor extension base R’s factor type. created numeric character codes using cfactor() function: cfactor vectors can used wherever regular base R factor types used, fully-compatible factor types: unlike regular factor, cfactor additionally stores codes factor levels. means can convert back coded representation time, desired: IMPORTANT: .numeric() .integer() functions convert cfactor numeric codes coded representation. Instead, order retain full compatibility base R factor type, always returns result coded index level factor: levels changed, cfactor drop codes degrade regular R factor: Finally, base R factor character vector labels, can add codes via .cfactor():","code":"(example_cfactor <- cfactor( c(10, 20, 30, 10, 20, 30), codes = c(LEVEL_A = 10, LEVEL_B = 20, LEVEL_C = 30) )) #> [6]> #> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C #> #> Categorical levels: #> label code #> LEVEL_A 10 #> LEVEL_B 20 #> LEVEL_C 30 (example_cfactor2 <- cfactor( c(\"a\", \"b\", \"c\", \"a\", \"b\", \"c\"), codes = c(LEVEL_A = \"a\", LEVEL_B = \"b\", LEVEL_C = \"c\") )) #> [6]> #> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C #> #> Categorical levels: #> label code #> LEVEL_A a #> LEVEL_B b #> LEVEL_C c is.factor(example_cfactor) #> [1] TRUE levels(example_cfactor) #> [1] \"LEVEL_A\" \"LEVEL_B\" \"LEVEL_C\" is.factor(example_cfactor2) #> [1] TRUE levels(example_cfactor2) #> [1] \"LEVEL_A\" \"LEVEL_B\" \"LEVEL_C\" codes(example_cfactor) #> LEVEL_A LEVEL_B LEVEL_C #> 10 20 30 as.codes(example_cfactor) #> [1] 10 20 30 10 20 30 codes(example_cfactor2) #> LEVEL_A LEVEL_B LEVEL_C #> \"a\" \"b\" \"c\" as.codes(example_cfactor2) #> [1] \"a\" \"b\" \"c\" \"a\" \"b\" \"c\" as.numeric(example_cfactor) #> [1] 1 2 3 1 2 3 as.numeric(example_cfactor2) #> [1] 1 2 3 1 2 3 cfactor_copy <- example_cfactor # cfactory_copy is a cfactor and a factor is.cfactor(cfactor_copy) #> [1] TRUE is.factor(cfactor_copy) #> [1] TRUE levels(cfactor_copy) #> [1] \"LEVEL_A\" \"LEVEL_B\" \"LEVEL_C\" codes(cfactor_copy) #> LEVEL_A LEVEL_B LEVEL_C #> 10 20 30 # modify the levels of the cfactor as if it was a regular factor levels(cfactor_copy) <- c(\"C\", \"B\", \"A\") # now cfactor_copy is just a regular factor is.cfactor(cfactor_copy) #> [1] FALSE is.factor(cfactor_copy) #> [1] TRUE levels(cfactor_copy) #> [1] \"C\" \"B\" \"A\" codes(cfactor_copy) #> NULL as.cfactor( c(\"LEVEL_A\", \"LEVEL_B\", \"LEVEL_C\", \"LEVEL_A\", \"LEVEL_B\", \"LEVEL_C\"), codes = c(LEVEL_A = 10, LEVEL_B = 20, LEVEL_C = 30) ) #> [6]> #> [1] LEVEL_A LEVEL_B LEVEL_C LEVEL_A LEVEL_B LEVEL_C #> #> Categorical levels: #> label code #> LEVEL_A 10 #> LEVEL_B 20 #> LEVEL_C 30"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"re-coding-and-writing-an-interlaced-data-frame-","dir":"Articles","previous_headings":"","what":"Re-coding and writing an interlaced data frame.","title":"Coded Data","text":"Re-coding writing interlaced data frame simple calling .codes() cfactor type value missing reason channels, calling one write_interlaced_*() family functions:","code":"df_decoded |> mutate( across_value_channels(where_value_channel(is.cfactor), as.codes), across_na_channels(where_na_channel(is.cfactor), as.codes), ) |> write_interlaced_csv(\"output.csv\")"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"haven","dir":"Articles","previous_headings":"","what":"haven","title":"Coded Data","text":"haven package functions loading native SPSS, SAS, Stata native file formats special data frames use column attributes special values keep track value labels missing reasons. complete discussion compares interlacer’s approach, see vignette(\"-approaches\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/extended-column-types.html","id":"x_cols-extended-cols-specifications","dir":"Articles","previous_headings":"","what":"x_cols: extended cols specifications","title":"Extended Column Types","text":"need fine-grained control value missing reason channel types, can use x_cols() specification, extension readr’s readr::cols() system. x_cols(), can control value channel na channel types columns resulting data frame. useful missing reasons apply particular items opposed file whole. example, say measure following two items: current stress level? Low Moderate High don’t know don’t understand question well feel manage time responsibilities today? Poorly Fairly well Well well apply (Today vacation day) apply (reason) can see, items two selection choices mapped missing reasons. can specified x_cols() follows: Like readr’s readr::cols() function, named x_cols() describes column resulting data frame. Value missing reason channel types declared via calls v_col_*() na_col_*() respectively, assembled x_col(). v_col_*() types mirror readr’s readr::col_* column “collectors”. v_col_double() equivalent readr::col_double(), v_col_character() equivalent readr::col_character(), etc. See vroom’s documentation list available column types. na_col_*() collectors allow declare missing reason channel type loaded column, values interpreted missing reasons. Currently, five options: na_col_default(): Use collector defined na = argument read_interlaced_*() function na_col_none(): Load column without missing reason channel. na_col_factor(): Use factor missing reason channel. Character arguments passed form levels factor. (e.g. na_col_factor(\"REFUSED\", \"OMITTED\", \"N/\")) na_col_integer(): Use integer na channel. Numeric arguments passed values interpreted missing values. (e.g. na_col_integer(-99, -98, -97))) na_col_cfactor(): Use cfactor na channel. (cfactor types covered next vignette,vignette(\"coded-data\")) following example shows collectors action. example use coded version colors.csv example data, demonstrate integer missing reason types:","code":"(df_stress <- read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = x_cols( person_id = x_col( v_col_integer(), na_col_none() ), current_stress = x_col( v_col_factor(levels = c(\"LOW\", \"MODERATE\", \"HIGH\")), na_col_factor(\"DONT_KNOW\", \"DONT_UNDERSTAND\") ), time_management = x_col( v_col_factor(levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\")), na_col_factor(\"NA_VACATION\", \"NA_OTHER\") ) ) )) #> # A tibble: 8 × 3 #> person_id current_stress time_management #> #> 1 1 LOW VERY_WELL #> 2 2 MODERATE POORLY #> 3 3 #> 4 4 HIGH POORLY #> 5 5 #> 6 6 LOW #> 7 7 MODERATE WELL #> 8 8 FAIRLY_WELL read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), col_types = x_cols( person_id = x_col(v_col_integer(), na_col_none()), age = x_col(v_col_double(), na_col_integer(-99, -98, -97)), favorite_color = x_col(v_col_integer(), na_col_integer(-99, -98, -97)) ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98>"},{"path":[]},{"path":"http://kylehusmann.com/interlacer/articles/extended-column-types.html","id":"default-collector-types","dir":"Articles","previous_headings":"Shortcuts","what":"Default collector types","title":"Extended Column Types","text":"Like readr’s cols() function, x_cols() function accepts .default argument specifies default value collector. na = argument similarly used specify default missing reason collector used na_col_*() specified, set na_col_default(). taking advantage defaults, specification last example equivalently written :","code":"read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), col_types = x_cols( .default = v_col_integer(), person_id = x_col(v_col_integer(), na_col_none()), age = v_col_double(), ), na = na_col_integer(-99, -98, -97) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98>"},{"path":"http://kylehusmann.com/interlacer/articles/extended-column-types.html","id":"concise-value-and-missing-reason-specifications","dir":"Articles","previous_headings":"Shortcuts","what":"Concise value and missing reason specifications","title":"Extended Column Types","text":"Like readr, value collectors can specified using characters. example, instead v_col_integer(), can use \"\". See vroom’s documentation complete list shortcuts. Similarly, missing reason collectors can specified providing vector missing values; collector type inferred via type vector. conversions follows: na_col_none(): NULL na_col_factor(): character vector, e.g. c(\"REFUSED\", \"OMITTED\", \"N/\") na_col_integer(): numeric vector, e.g. c(-99, -98, -97)) na_col_cfactor(): named character numeric vector, e.g. c(REFUSED = -99, OMITTED = -98, `N/` = -97) Using shortcuts, previous example equivalently written compact form follows:","code":"read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), col_types = x_cols( .default = \"i\", person_id = x_col(\"i\", NULL), age = \"d\", ), na = c(-99, -98, -97) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 <-98> 1 #> 3 3 21 <-98> #> 4 4 30 <-97> #> 5 5 1 <-99> #> 6 6 41 2 #> 7 7 50 <-97> #> 8 8 30 3 #> 9 9 <-98> <-98> #> 10 10 <-97> 2 #> 11 11 10 <-98>"},{"path":"http://kylehusmann.com/interlacer/articles/extended-column-types.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Extended Column Types","text":"vignette covered column types values missing reasons can explicitly specified using collectors. also illustrated column-level missing values can specified creating extended column type specifications using x_cols(). final examples, used example data set coded values missing reasons. Coded values especially common data sets produced SPSS, SAS, Stata. interlacer provides special column type make working sort data easier: cfactor type. covered next vignette, vignette(\"coded-data\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"aggregations-with-missing-reasons","dir":"Articles","previous_headings":"","what":"Aggregations with missing reasons","title":"Introduction to interlacer","text":"Now, interested values source data, functionality need. wanted know values NA? Although information encoded source data, lost missing reasons converted NA values. example, consider favorite_color column. many respondents REFUSED give favorite color? many people just OMITTED answer? question N/respondents (e.g. wasn’t survey form)? mean respondent age groups? current dataframe gets us part way: can see, converted missing reasons single NA, can answer questions missingness general, rather work specific reasons stored source data. Unfortunately, try load data missing reasons intact, lose something else: type information values. Now access missing reasons, columns character vectors. means order anything values, always filter missing reasons, cast remaining values desired type: gives us information want, cumbersome. Notice ’s distinction favorite color values missing reasons! Things start get really complex different columns different sets possible missing reasons. means lot type conversion gymnastics switch value types missing types.","code":"library(dplyr, warn.conflicts = FALSE) df_simple |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 4 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 NA 22.4 6 (df_with_missing <- read_csv( interlacer_example(\"colors.csv\"), col_types = cols(.default = \"c\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED reasons <- c(\"REFUSED\", \"OMITTED\", \"N/A\") df_with_missing |> mutate( age_values = as.numeric(if_else(age %in% reasons, NA, age)), ) |> summarize( mean_age = mean(age_values, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 N/A 1 1 #> 3 OMITTED 40 2 #> 4 RED 41 2 #> 5 REFUSED 15.5 3 #> 6 YELLOW 30 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"the-interlacer-approach","dir":"Articles","previous_headings":"Aggregations with missing reasons","what":"The interlacer approach","title":"Introduction to interlacer","text":"interlacer built based insight everything becomes much tidy, simple, expressive explicitly work values missing reasons separate channels variable. interlacer introduces new interlaced column type facilitates . read_interlaced_* functions interlacer import data new column type. can see column headers, column loaded composed two channels: value channel, missing reason channel. channel can type. age column, example, double values factor missing reasons: channels can explicitly accessed using value_channel() na_channel() helper functions: helpers rarely needed, however, computations automatically operate interlaced column’s value channel, ignore missing reasons channel. following compute mean age, without missing reasons interfering: (equivalently used value_channel() helper achieve result, albeit verbosity): Although missing reasons excluded computations, still treated unique values. means group age get breakdown unique missing reasons, rather lumped single NA: can see, can generate report , without needing type gymnastics! Also, values neatly distinguished missing reasons.","code":"(df <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 df$age #> [11]> #> [1] 20 21 30 1 41 50 #> [8] 30 10 #> NA levels: REFUSED OMITTED N/A value_channel(df$age) #> [1] 20 NA 21 30 1 41 50 30 NA NA 10 na_channel(df$age) #> [1] REFUSED REFUSED #> [10] OMITTED #> Levels: REFUSED OMITTED N/A mean(df$age, na.rm = TRUE) #> [1] 25.375 mean(value_channel(df$age), na.rm = TRUE) #> [1] 25.375 df |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 15.5 3 #> 5 40 2 #> 6 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"filtering-based-on-missing-reasons","dir":"Articles","previous_headings":"","what":"Filtering based on missing reasons","title":"Introduction to interlacer","text":"interlaced columns also helpful creating samples inclusion / exclusion criteria based missing reasons. example, using example data, say wanted create sample respondents REFUSED give age. indicate value interpreted missing reason, can use na() function value: people REFUSED report age favorite color? ’s also possible combine value conditions missing reason conditions. example, select everyone REFUSED give favorite color, 20 years old:","code":"df |> filter(age == na(\"REFUSED\")) #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 # na_channel() can also be used to get an equivalent result: df |> filter(na_channel(age) == \"REFUSED\") #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 BLUE #> 2 9 df |> filter(age == na(\"REFUSED\") & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 9 df |> filter(age > 20 & favorite_color == na(\"REFUSED\")) #> # A tibble: 1 × 3 #> person_id age favorite_color #> #> 1 3 21 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"mutations","dir":"Articles","previous_headings":"","what":"Mutations","title":"Introduction to interlacer","text":"might expect, na() function can used values mutations. following pipeline replace favorite color respondents missing value \"REDACTED\" Conditionals also work exactly expect mutations. following replace favorite color respondents age < 18 missing reason \"REDACTED_UNDERAGE\". Respondents missing age replaced \"REDACTED_MISSING_AGE\" following mutation create new column called person_type \"CHILD\" age < 18, \"ADULT\" age >= 18, missing reason \"AGE_UNAVAILABLE\" age missing: Important note: must use dplyr::if_else() interlaced vectors instead R’s base::ifelse() function, base function strips missing reason channel due fundamental limitation base R.","code":"df |> mutate( favorite_color = na(\"REDACTED\") ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 #> 7 7 50 #> 8 8 30 #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, missing = na(\"REDACTED_MISSING_AGE\") ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 df |> mutate( person_type = if_else( age < 18, \"CHILD\", \"ADULT\", missing = na(\"AGE_UNAVAILABLE\") ), ) #> # A tibble: 11 × 4 #> person_id age favorite_color person_type #> #> 1 1 20 BLUE ADULT #> 2 2 BLUE #> 3 3 21 ADULT #> 4 4 30 ADULT #> 5 5 1 CHILD #> 6 6 41 RED ADULT #> 7 7 50 ADULT #> 8 8 30 YELLOW ADULT #> 9 9 #> 10 10 RED #> 11 11 10 CHILD"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"empty-cells-na-missing-reasons","dir":"Articles","previous_headings":"","what":"Empty cells (NA missing reasons)","title":"Introduction to interlacer","text":"cell column missing value missing reason, cell considered “empty”. values can occur missing reasons specified. example, include missing = argument second example previous section, get following result: Empty values can detected using .empty() function: Raw NA values also considered “empty”: Empty values often occur result joins, dplyr::*_join() family functions missing = parameter, like dplyr::if_else() . example, say following data frame wanted join sample: ’re missing condition information respondents, show empty values join data frame sample: can remedy replacing empty values join:","code":"df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 <> #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 <> #> 10 10 <> #> 11 11 10 df |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, ) ) |> filter(is.empty(favorite_color)) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 2 <> #> 2 9 <> #> 3 10 <> # regular values are neither missing nor empty is.na(42) #> [1] FALSE is.empty(42) #> [1] FALSE # na(\"REASON\") is a missing value, but is not an empty value is.na(na(\"REASON\")) #> [1] TRUE is.empty(na(\"REASON\")) #> [1] FALSE # na(NA) values are considered missing and empty is.na(na(NA)) #> [1] TRUE is.empty(na(NA)) #> [1] TRUE # regular NA values are also missing and empty is.na(NA) #> [1] TRUE is.empty(NA) #> [1] TRUE conditions <- tribble( ~person_id, ~condition, 1, \"TREATMENT\", 2, \"CONTROL\", 3, na(\"TECHNICAL_ERROR\"), 6, \"CONTROL\", 8, \"TREATMENT\", ) df |> left_join(conditions, by = join_by(person_id)) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 <> #> 5 5 1 <> #> 6 6 41 RED CONTROL #> 7 7 50 <> #> 8 8 30 YELLOW TREATMENT #> 9 9 <> #> 10 10 RED <> #> 11 11 10 <> df |> left_join(conditions, by = join_by(person_id)) |> mutate( condition = if_else(is.empty(condition), na(\"LEFT_STUDY\"), condition), ) #> # A tibble: 11 × 4 #> person_id age favorite_color condition #> #> 1 1 20 BLUE TREATMENT #> 2 2 BLUE CONTROL #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED CONTROL #> 7 7 50 #> 8 8 30 YELLOW TREATMENT #> 9 9 #> 10 10 RED #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"writing-interlaced-files","dir":"Articles","previous_headings":"","what":"Writing interlaced files","title":"Introduction to interlacer","text":"’ve made made changes data, probably want save . interlacer provides write_interlaced_* family functions : combine value missing reasons interlaced character columns, write result csv. Alternatively, want re-interlace columns without writing file control writing process, can use flatten_channels(): value missing reason channels data frames interlaced vectors can similarly accessed using value_channel() na_channel() helper functions:","code":"write_interlaced_csv(df, \"interlaced_output.csv\") flatten_channels(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED # (it works on single vectors as well) flatten_channels(df$age) #> [1] \"20\" \"REFUSED\" \"21\" \"30\" \"1\" \"41\" \"50\" #> [8] \"30\" \"REFUSED\" \"OMITTED\" \"10\" value_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 NA #> 4 4 30 NA #> 5 5 1 NA #> 6 6 41 RED #> 7 7 50 NA #> 8 8 30 YELLOW #> 9 9 NA NA #> 10 10 NA RED #> 11 11 10 NA na_channel(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 NA NA NA #> 2 NA REFUSED NA #> 3 NA NA REFUSED #> 4 NA NA OMITTED #> 5 NA NA N/A #> 6 NA NA NA #> 7 NA NA OMITTED #> 8 NA NA NA #> 9 NA REFUSED REFUSED #> 10 NA OMITTED NA #> 11 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Introduction to interlacer","text":"far, ’ve covered interlacer’s read_interlaced_* family functions enabled us load interlaced columns contain separate challens value missing reasons. interlaced type enables us create tidy type-aware pipelines can flexibly consider variable’s value missing reasons. examples vignette, column types automatically detected. explicitly specify value missing column types, (specify individual missing reasons specific columns), interlacer extends readr’s collector() system. covered next vignette, vignette(\"na-column-types\").","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"haven-and-labelled","dir":"Articles","previous_headings":"","what":"haven and labelled","title":"Other Approaches","text":"haven labelled packages rely two functions creating vectors interlace values missing reasons: haven::labelled_spss() haven::tagged_na(). Although create haven_labelled vectors, use different methods representing missing values.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"labelled-missing-values-havenlabelled_spss","dir":"Articles","previous_headings":"haven and labelled","what":"“Labelled” missing values (haven::labelled_spss())","title":"Other Approaches","text":"SPSS files loaded haven via haven::read_spss(), values missing reasons loaded single interlaced numeric vector: just numeric vector though, haven::labelled_spss() numeric vector, attributes describing value missing value codes: attributes adjust behavior functions like .na(): makes easy check value missing reason, still filter missing reasons aggregations: ’s little bit improvement working raw coded values, can use .na(), codes get labels, don’t constantly looking codes codebook. still falls short interlacer’s functionality two key reasons: Reason 1: interlacer, value column can whatever type want: numeric, character, factor, etc. labelled missing reasons, values missing reasons need type, usually numeric codes. creates lot type gymnastics potential errors ’re manipulating . Reason 2: Even missing values labelled labelled_spss type, aggregations math operations protected. forget take missing values, get incorrect results / corrupted data:","code":"library(interlacer, warn.conflicts = FALSE) library(haven) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union (df_spss <- read_spss( interlacer_example(\"colors.sav\"), user_na = TRUE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 21 -98 (NA) [REFUSED] #> 4 4 30 -97 (NA) [OMITTED] #> 5 5 1 -99 (NA) [N/A] #> 6 6 41 2 [RED] #> 7 7 50 -97 (NA) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 -97 (NA) [OMITTED] 2 [RED] #> 11 11 10 -98 (NA) [REFUSED] attributes(df_spss$favorite_color) #> $label #> [1] \"Favorite color\" #> #> $na_range #> [1] -Inf 0 #> #> $class #> [1] \"haven_labelled_spss\" \"haven_labelled\" \"vctrs_vctr\" #> [4] \"double\" #> #> $format.spss #> [1] \"F8.2\" #> #> $labels #> BLUE RED YELLOW N/A REFUSED OMITTED #> 1 2 3 -99 -98 -97 is.na(df_spss$favorite_color) #> [1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE df_spss |> mutate( age_values = if_else(is.na(age), NA, age), favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age_values, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 30.3 5 #> 2 -98 (NA) [REFUSED] 15.5 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1 df_spss |> mutate( age_next_year = if_else(is.na(age), NA, age + 1), .after = person_id ) #> # A tibble: 11 × 4 #> person_id age_next_year age favorite_color #> #> 1 1 21 20 1 [BLUE] #> 2 2 NA -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 22 21 -98 (NA) [REFUSED] #> 4 4 31 30 -97 (NA) [OMITTED] #> 5 5 2 1 -99 (NA) [N/A] #> 6 6 42 41 2 [RED] #> 7 7 51 50 -97 (NA) [OMITTED] #> 8 8 31 30 3 [YELLOW] #> 9 9 NA -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 NA -97 (NA) [OMITTED] 2 [RED] #> 11 11 11 10 -98 (NA) [REFUSED] df_spss |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA -20.8 5 #> 2 -98 (NA) [REFUSED] -22.3 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1 df_spss |> mutate( age_next_year = age + 1, .after = person_id ) #> # A tibble: 11 × 4 #> person_id age_next_year age favorite_color #> #> 1 1 21 20 1 [BLUE] #> 2 2 -97 -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 22 21 -98 (NA) [REFUSED] #> 4 4 31 30 -97 (NA) [OMITTED] #> 5 5 2 1 -99 (NA) [N/A] #> 6 6 42 41 2 [RED] #> 7 7 51 50 -97 (NA) [OMITTED] #> 8 8 31 30 3 [YELLOW] #> 9 9 -97 -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 -96 -97 (NA) [OMITTED] 2 [RED] #> 11 11 11 10 -98 (NA) [REFUSED]"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"tagged-missing-values-haventagged_na","dir":"Articles","previous_headings":"haven and labelled","what":"“Tagged” missing values (haven::tagged_na())","title":"Other Approaches","text":"loading Stata SAS files, haven uses “tagged missingness” approach mirror values handled Stata SAS: approach deviously clever. takes advantage way NaN floating point values stored memory, make possible different “flavors” NA values. (info done, check tagged_na.c source code haven) still act like regular NA values… now can include single character “tag” (usually letter -z). means work .na() include missing reason codes aggregations! Unfortunately, can’t group , dplyr::group_by() tag-aware. :( Another limitation approach requires values types numeric, trick “tagging” NA values depends peculiarities floating point values stored memory.","code":"(df_stata <- read_stata( interlacer_example(\"colors.dta\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 NA(a) [REFUSED] 1 [BLUE] #> 3 3 21 NA(a) [REFUSED] #> 4 4 30 NA(b) [OMITTED] #> 5 5 1 NA #> 6 6 41 2 [RED] #> 7 7 50 NA(b) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 NA(a) [REFUSED] NA(a) [REFUSED] #> 10 10 NA(b) [OMITTED] 2 [RED] #> 11 11 10 NA(a) [REFUSED] is.na(df_stata$age) #> [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE mean(df_stata$age, na.rm = TRUE) #> [1] 25.375 df_stata |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 1 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 25.4 11"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"declared","dir":"Articles","previous_headings":"","what":"declared","title":"Other Approaches","text":"declared package uses functiondeclared::declared() constructing interlaced vectors: declared vectors similar haven_labelled_spss vectors, except critical innovation: store actual NA values missing values, keep track missing reasons entirely attributes object: means aggregations work exactly expect!","code":"library(declared) #> #> Attaching package: 'declared' #> The following object is masked from 'package:interlacer': #> #> is.empty (dcl <- declared(c(1, 2, 3, -99, -98), na_values = c(-99, -98))) #> [5]> #> [1] 1 2 3 NA(-99) NA(-98) #> Missing values: -99, -98 # All the missing reason info is tracked in the attributes attributes(dcl) #> $na_index #> -99 -98 #> 4 5 #> #> $na_values #> [1] -99 -98 #> #> $date #> [1] FALSE #> #> $class #> [1] \"declared\" \"numeric\" # The data stored has actual NA values, so it works as you would expect # with summary stats like `mean()`, etc. attributes(dcl) <- NULL dcl #> [1] 1 2 3 NA NA dcl <- declared(c(1, 2, 3, -99, -98), na_values = c(-99, -98)) sum(dcl, na.rm = TRUE) #> [1] 6"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"interlacer","dir":"Articles","previous_headings":"","what":"interlacer","title":"Other Approaches","text":"interlacer builds ideas haven, labelled, declared following goals:","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"be-fully-generic-add-a-missing-value-channel-to-any-vector-type","dir":"Articles","previous_headings":"interlacer","what":"1. Be fully generic: Add a missing value channel to any vector type","title":"Other Approaches","text":"mentioned , haven::labelled_spss() works numeric character types, haven::tagged_na() works numeric types. declared::declared() supports numeric, character date types. interlaced types, contrast, can imbue vector type missing value channel: Like declared vectors, missing reasons tracked attributes. unlike declared, missing reasons stored entirely separate channel rather tracking indices: data structure drives functional API, described (3) .","code":"interlaced(list(TRUE, FALSE, \"reason\"), na = \"reason\") #> [3]> #> [1] TRUE FALSE #> NA levels: reason interlaced(c(\"2020-01-01\", \"2020-01-02\", \"reason\"), na = \"reason\") |> map_value_channel(as.Date) #> [3]> #> [1] 2020-01-01 2020-01-02 #> NA levels: reason interlaced(c(\"red\", \"green\", \"reason\"), na = \"reason\") |> map_value_channel(factor) #> [3]> #> [1] red green #> Levels: green red #> NA levels: reason (int <- interlaced(c(1,2,3, -99, -98), na = c(-99, -98))) #> [5]> #> [1] 1 2 3 <-99> <-98> attributes(int) #> $na_channel_values #> [1] NA NA NA -99 -98 #> #> $class #> [1] \"interlacer_interlaced\" \"vctrs_vctr\" \"numeric\" attributes(int) <- NULL int #> [1] 1 2 3 NA NA"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"provide-functions-for-reading-writing-interlaced-csv-files-not-just-spss-sas-stata-files","dir":"Articles","previous_headings":"interlacer","what":"2. Provide functions for reading / writing interlaced CSV files (not just SPSS / SAS / Stata files)","title":"Other Approaches","text":"See interlacer::read_interlaced_csv(), etc.","code":""},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"provide-a-functional-api-that-integrates-well-into-tidy-pipelines","dir":"Articles","previous_headings":"interlacer","what":"3. Provide a functional API that integrates well into tidy pipelines","title":"Other Approaches","text":"interlacer provides functions facilitate working interlaced type Result type, well-understood abstraction functional programming. functions na() map_value_channel() map_na_channel() come influence. na() function creates interlaced type “lifting” value missing reason channel. approach helps create safer separation value missing reason channels, ’s always clear channel ’re making comparisons . example: Similarly, map_value_channel() map_na_channel() allow safely mutate particular channel, without touching values channel. interface especially useful tidy pipelines. Finally, interlaced type based vctrs type system, plays nicely packages tidyverse.","code":"# haven labelled_spss(c(-99, 1, 2), na_values = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE labelled_spss(c(-99, 1, 2), na_values = -99) == -99 # na channel comparison #> [1] TRUE FALSE FALSE # declared declared(c(-99, 1, 2), na_values = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE declared(c(-99, 1, 2), na_values = -99) == -99 # na channel comparison #> [1] TRUE FALSE FALSE # interlacer interlaced(c(-99, 1, 2), na = -99) == 1 # value channel comparison #> [1] FALSE TRUE FALSE interlaced(c(-99, 1, 2), na = -99) == na(-99) # na channel comparison #> [1] TRUE FALSE FALSE"},{"path":"http://kylehusmann.com/interlacer/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Kyle Husmann. Author, maintainer.","code":""},{"path":"http://kylehusmann.com/interlacer/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Husmann K (2024). interlacer: Read Tabular Data Interlaced Values Missing Reasons. R package version 0.3.0, https://kylehusmann.com/interlacer, https://github.com/khusmann/interlacer.","code":"@Manual{, title = {interlacer: Read Tabular Data With Interlaced Values And Missing Reasons}, author = {Kyle Husmann}, year = {2024}, note = {R package version 0.3.0, https://kylehusmann.com/interlacer}, url = {https://github.com/khusmann/interlacer}, }"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"interlacer-","dir":"","previous_headings":"","what":"Read Tabular Data With Interlaced Values And Missing Reasons","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"value missing data, sometimes want know missing. Many textual tabular data sources encode missing reasons special values interlaced regular values column (e.g. N/, REFUSED, -99, etc.). Unfortunately, missing reasons lost values converted single NA type. Working missing reasons R traditionally requires loading variables character vectors bunch string comparisons type conversions make sense . interlacer provides functions load variables interlaced data sources special interlaced column type holds values NA reasons separate channels variable. contexts, can treat interlaced columns regular values: take mean interlaced column, example, get mean values, without missing reasons interfering computation. Unlike regular column, however, missing reasons still available. means can still filter data frames variables specific missing reasons, generate summary statistics breakdowns missing reason. words, longer constantly manually include / exclude missing reasons computations filtering awkward string comparisons type conversions… everything just works! addition introduction vignette(\"interlacer\") sure also check : vignette(\"extended-column-types\") see handle variable-level missing reasons vignette(\"coded-data\") recipies working coded data (e.g. data produced SPSS, SAS Stata) vignette(\"-approaches\") deep dive interlacer’s approach compares approaches representing manipulating missing reasons alongside data values","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"id_️-️-️-warning-️-️-️","dir":"","previous_headings":"","what":"⚠️ ⚠️ ⚠️ WARNING ⚠️ ⚠️ ⚠️","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"library currently experimental stages, aware interface quite likely change future. meantime, please try let know think!","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"easiest way get interlacer install via devtools:","code":"install.packages(\"devtools\") # If devtools is not already installed devtools::install_github(\"khusmann/interlacer\")"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"use interlacer, load current R session: interlacer supports following file formats read_interlaced_*() functions, extend readr::read_*() family functions: read_interlaced_csv() read_interlaced_tsv() read_interlaced_csv2() read_interlaced_delim() quick demo, consider following example file bundled interlacer: csv file, values interlaced three possible missing reasons: REFUSED, OMITTED, N/. readr, loading data result data frame missing reasons replaced NA: interlacer, missing reasons preserved: can see, printout column defined two types: type values, type missing reasons. age column, example, type double values, type factor missing reasons: Computations automatically operate values: missing reasons still ! indicate value treated missing reason instead regular value, can use na() function. following, example, filter data set individuals REFUSED give favorite color: ’s pipeline compute breakdown mean age respondents favorite color, separate categories missing reason: just scratches surface can done interlacer… check vignette(\"interlacer\") complete overview!","code":"library(interlacer, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) library(readr) read_file(interlacer_example(\"colors.csv\")) |> cat() #> person_id,age,favorite_color #> 1,20,BLUE #> 2,REFUSED,BLUE #> 3,21,REFUSED #> 4,30,OMITTED #> 5,1,N/A #> 6,41,RED #> 7,50,OMITTED #> 8,30,YELLOW #> 9,REFUSED,REFUSED #> 10,OMITTED,RED #> 11,10,REFUSED read_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 NA #> 10 10 NA RED #> 11 11 10 (ex <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 RED #> 11 11 10 ex$age #> [11]> #> [1] 20 21 30 1 41 50 #> [8] 30 10 #> NA levels: REFUSED OMITTED N/A mean(ex$age, na.rm = TRUE) #> [1] 25.375 ex |> filter(favorite_color == na(\"REFUSED\")) #> # A tibble: 3 × 3 #> person_id age favorite_color #> #> 1 3 21 #> 2 9 #> 3 11 10 ex |> summarize( mean_age = mean(age, na.rm = TRUE), n = n(), .by = favorite_color ) |> arrange(favorite_color) #> # A tibble: 6 × 3 #> favorite_color mean_age n #> #> 1 BLUE 20 2 #> 2 RED 41 2 #> 3 YELLOW 30 1 #> 4 15.5 3 #> 5 40 2 #> 6 1 1"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"known-issues","dir":"","previous_headings":"","what":"Known Issues","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"base functions, like base::ifelse(), drop missing reason channel interlaced types, converting regular vectors example: due limitation R. run , use tidyverse equivalent function. Tidyverse functions designed correctly handle type conversions. example, can use dplyr::if_else(): Performance large data sets may notice large datasets interlacer runs significantly slower readr / vroom. Although interlacer uses vroom hood load delimited data, able take advantage many optimizations vroom currently support column-level missing values. soon vroom supports column-level missing values, able remedy !","code":"ex |> mutate( favorite_color = ifelse(age < 18, na(\"REDACTED\"), favorite_color) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 ex |> mutate( favorite_color = if_else( age < 18, na(\"REDACTED_UNDERAGE\"), favorite_color, missing = na(\"REDACTED_MISSING_AGE\") ) ) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 #> 10 10 #> 11 11 10 "},{"path":"http://kylehusmann.com/interlacer/index.html","id":"related-work","dir":"","previous_headings":"","what":"Related work","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"interlacer inspired haven, labelled, declared packages. packages provide similar functionality interlacer, focused providing compatibility missing reason data imported SPSS, SAS, Stata. interlacer slightly different aims: fully generic: Add missing value channel vector type. Provide functions reading / writing interlaced CSV files (just SPSS / SAS / Stata files) Provide functional API integrates well tidy pipelines Future versions interlacer provide functions convert packages’ types. detailed discussion, see vignette(\"-approaches\").","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"acknowledgements","dir":"","previous_headings":"","what":"Acknowledgements","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"development software supported, whole part, Institute Education Sciences, U.S. Department Education, Grant R305A170047 Pennsylvania State University. opinions expressed authors represent views Institute U.S. Department Education.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/across_value_channels.html","id":null,"dir":"Reference","previous_headings":"","what":"Apply a function across the value or missing reason channels of multiple columns — across_value_channels","title":"Apply a function across the value or missing reason channels of multiple columns — across_value_channels","text":"across_value_channels() across_na_channels() simple wrappers dplyr::across() applies transformations value missing reason channels, respectively.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/across_value_channels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Apply a function across the value or missing reason channels of multiple columns — across_value_channels","text":"","code":"across_value_channels(.cols, .fns, .names = NULL, .unpack = FALSE) across_na_channels(.cols, .fns, .names = NULL, .unpack = FALSE)"},{"path":"http://kylehusmann.com/interlacer/reference/across_value_channels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Apply a function across the value or missing reason channels of multiple columns — across_value_channels","text":".cols Columns transform. select grouping columns already automatically handled verb (.e. summarise() mutate()). .fns Functions apply selected columns. Possible values : function, e.g. mean. purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE) named list functions lambdas, e.g. list(mean = mean, n_miss = ~ sum(.na(.x)). function applied column, output named combining function name column name using glue specification .names. Within functions can use cur_column() cur_group() access current column grouping keys respectively. .names glue specification describes name output columns. can use {.col} stand selected column name, {.fn} stand name function applied. default (NULL) equivalent \"{.col}\" single function case \"{.col}_{.fn}\" case list used .fns. .unpack Optionally unpack data frames returned functions .fns, expands df-columns individual columns, retaining number rows data frame. FALSE, default, unpacking done. TRUE, unpacking done default glue specification \"{outer}_{inner}\". Otherwise, single glue specification can supplied describe name unpacked columns. can use {outer} refer name originally generated .names, {inner} refer names data frame unpacking.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/across_value_channels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Apply a function across the value or missing reason channels of multiple columns — across_value_channels","text":"like dplyr::across(), across_value_channels() across_na_channels() return tibble one column column .cols function .fns","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/as.cfactor.html","id":null,"dir":"Reference","previous_headings":"","what":"cfactor coercion — as.cfactor","title":"cfactor coercion — as.cfactor","text":"Add codes vector labels","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.cfactor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"cfactor coercion — as.cfactor","text":"","code":"as.cfactor(x, codes = NULL, ordered = is.ordered(x)) # S3 method for factor as.cfactor(x, codes = NULL, ordered = is.ordered(x)) as.cordered(x, codes = NULL)"},{"path":"http://kylehusmann.com/interlacer/reference/as.cfactor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"cfactor coercion — as.cfactor","text":"x vector values representing labels factor levels codes named vector unique codes declares mapping labels codes ordered logical flag determine codes regarded ordered (order given).","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.cfactor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"cfactor coercion — as.cfactor","text":"new cfactor","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.codes.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert a cfactor vector into a vector of its codes — as.codes","title":"Convert a cfactor vector into a vector of its codes — as.codes","text":"TODO: Write ","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.codes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert a cfactor vector into a vector of its codes — as.codes","text":"","code":"as.codes(x, ...) # S3 method for interlacer_interlaced as.codes(x, ...) # S3 method for interlacer_cfactor as.codes(x, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/as.codes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert a cfactor vector into a vector of its codes — as.codes","text":"x cfactor() ... additional arguments (used)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.codes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert a cfactor vector into a vector of its codes — as.codes","text":"vector coded values","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_col_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Extended column specification coercions — as.x_col_spec","title":"Extended column specification coercions — as.x_col_spec","text":"Coerce object column specification. used internally parse col_types argument read_interlaced_*() family functions, can accept readr::cols() specification list().","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_col_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extended column specification coercions — as.x_col_spec","text":"","code":"as.x_col_spec(x)"},{"path":"http://kylehusmann.com/interlacer/reference/as.x_col_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extended column specification coercions — as.x_col_spec","text":"x value coerce extended column specification","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_col_spec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extended column specification coercions — as.x_col_spec","text":"extended column specification","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_col_spec.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extended column specification coercions — as.x_col_spec","text":"S3 function packages may use col_types argument custom objects.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_collector.html","id":null,"dir":"Reference","previous_headings":"","what":"Collector shortcuts — as.na_collector","title":"Collector shortcuts — as.na_collector","text":".*_collector functions used internally enable shortcuts defaults specifying extended collectors. See vignette(\"extended-column-types\") full discussion.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/as.x_collector.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Collector shortcuts — as.na_collector","text":"","code":"as.na_collector(x) as.value_collector(x) as.x_collector(x)"},{"path":"http://kylehusmann.com/interlacer/reference/as.x_collector.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Collector shortcuts — as.na_collector","text":"x value convert extended collector, value collector, missing reason collector.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/cfactor.html","id":null,"dir":"Reference","previous_headings":"","what":"Coded factors — cfactor","title":"Coded factors — cfactor","text":"TODO: Write ","code":""},{"path":"http://kylehusmann.com/interlacer/reference/cfactor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coded factors — cfactor","text":"","code":"cfactor(x = unspecified(), codes, ordered = FALSE) cordered(x, codes) is.cfactor(x) is.cordered(x)"},{"path":"http://kylehusmann.com/interlacer/reference/cfactor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coded factors — cfactor","text":"x vector character numeric codes codes named vector unique codes declares mapping labels codes ordered logical flag determine codes regarded ordered (order given).","code":""},{"path":"http://kylehusmann.com/interlacer/reference/cfactor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Coded factors — cfactor","text":"new cfactor","code":""},{"path":"http://kylehusmann.com/interlacer/reference/codes-set.html","id":null,"dir":"Reference","previous_headings":"","what":"Set the codes for a `cfactor`` — codes<-","title":"Set the codes for a `cfactor`` — codes<-","text":"Set codes cfactor, similar levels<-()","code":""},{"path":"http://kylehusmann.com/interlacer/reference/codes-set.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set the codes for a `cfactor`` — codes<-","text":"","code":"codes(x) <- value"},{"path":"http://kylehusmann.com/interlacer/reference/codes-set.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set the codes for a `cfactor`` — codes<-","text":"value named vector codes cfactor","code":""},{"path":"http://kylehusmann.com/interlacer/reference/codes.html","id":null,"dir":"Reference","previous_headings":"","what":"cfactor attributes — codes","title":"cfactor attributes — codes","text":"Return levels codes cfactor","code":""},{"path":"http://kylehusmann.com/interlacer/reference/codes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"cfactor attributes — codes","text":"","code":"codes(x, ...) # S3 method for interlacer_cfactor levels(x)"},{"path":"http://kylehusmann.com/interlacer/reference/codes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"cfactor attributes — codes","text":"x cfactor ... additional arguments (used)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/codes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"cfactor attributes — codes","text":"levels() returns levels cfactor (vector character labels); codes() returns named vector representing codes cfactor","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":null,"dir":"Reference","previous_headings":"","what":"Flatten a interlaced vector — flatten_channels","title":"Flatten a interlaced vector — flatten_channels","text":"flatten_channels() flattens interlaced vector single channel. useful step right writing interlaced vector file, example.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Flatten a interlaced vector — flatten_channels","text":"","code":"flatten_channels(x, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Flatten a interlaced vector — flatten_channels","text":"x interlaced vector ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/flatten_channels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Flatten a interlaced vector — flatten_channels","text":"vector, flattened","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an interlaced vector — interlaced","title":"Construct an interlaced vector — interlaced","text":"interlaced type extends vectors adding \"missing reason\" channel can used distinguish different types missingness. interlaced() function constructs new interlaced vector vector list values.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an interlaced vector — interlaced","text":"","code":"interlaced(x, na = NULL) as.interlaced(x, na = NULL, ...) # S3 method for default as.interlaced(x, na = NULL, ...) # S3 method for interlacer_interlaced as.interlaced(x, ...) # S3 method for data.frame as.interlaced(x, ...) is.interlaced(x)"},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an interlaced vector — interlaced","text":"x vector list values na vector values interpret missing values ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlaced.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an interlaced vector — interlaced","text":"interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a path to one of interlacer's example data sets — interlacer_example","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"interlacer comes bundled number sample files inst/extdata directory. function make easy access","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"","code":"interlacer_example(file = NULL)"},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"file Name file. NULL, example files listed.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a path to one of interlacer's example data sets — interlacer_example","text":"","code":"interlacer_example() #> [1] \"colors.csv\" \"colors.dta\" \"colors.sav\" #> [4] \"colors_coded.csv\" \"colors_coded_char.csv\" \"stress.csv\" interlacer_example(\"colors.csv\") #> [1] \"/home/runner/work/_temp/Library/interlacer/extdata/colors.csv\""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":null,"dir":"Reference","previous_headings":"","what":"Test if a value is missing and lacks a missing reason — is.empty","title":"Test if a value is missing and lacks a missing reason — is.empty","text":"value missing value missing reason, considered \"empty\". .empty() checks type values. Regular NA values (missing reasons) also considered \"empty\".","code":""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test if a value is missing and lacks a missing reason — is.empty","text":"","code":"is.empty(x)"},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test if a value is missing and lacks a missing reason — is.empty","text":"x vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/is.empty.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test if a value is missing and lacks a missing reason — is.empty","text":"logical vector length x, containing TRUE empty elements, FALSE otherwise.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/levels-set-.interlacer_interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Set the factor level attributes of interlaced vectors — levels<-.interlacer_interlaced","title":"Set the factor level attributes of interlaced vectors — levels<-.interlacer_interlaced","text":"Set factor level attributes interlaced vectors","code":""},{"path":"http://kylehusmann.com/interlacer/reference/levels-set-.interlacer_interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set the factor level attributes of interlaced vectors — levels<-.interlacer_interlaced","text":"","code":"# S3 method for interlacer_interlaced levels(x) <- value na_levels(x) <- value"},{"path":"http://kylehusmann.com/interlacer/reference/levels-set-.interlacer_interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set the factor level attributes of interlaced vectors — levels<-.interlacer_interlaced","text":"value vector new levels set","code":""},{"path":"http://kylehusmann.com/interlacer/reference/levels.interlacer_interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Factor level attributes of interlaced vectors — levels.interlacer_interlaced","title":"Factor level attributes of interlaced vectors — levels.interlacer_interlaced","text":"base S3 levels() function overloaded interlaced vectors, value channel factor type, levels() return levels. Similarly na_levels() return levels missing reason channel.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/levels.interlacer_interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Factor level attributes of interlaced vectors — levels.interlacer_interlaced","text":"","code":"# S3 method for interlacer_interlaced levels(x) na_levels(x)"},{"path":"http://kylehusmann.com/interlacer/reference/levels.interlacer_interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Factor level attributes of interlaced vectors — levels.interlacer_interlaced","text":"x interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/levels.interlacer_interlaced.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Factor level attributes of interlaced vectors — levels.interlacer_interlaced","text":"levels values missing reason channel","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":null,"dir":"Reference","previous_headings":"","what":"Apply a function to one of the channels of an interlaced vector — map_value_channel","title":"Apply a function to one of the channels of an interlaced vector — map_value_channel","text":"map_value_channel() modifies values interlaced vector. map_na_channel() modifies missing reason channel interlaced vector.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Apply a function to one of the channels of an interlaced vector — map_value_channel","text":"","code":"map_value_channel(x, fn) map_na_channel(x, fn)"},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Apply a function to one of the channels of an interlaced vector — map_value_channel","text":"x interlaced vector fn function maps values missing reasons new values","code":""},{"path":"http://kylehusmann.com/interlacer/reference/map_value_channel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Apply a function to one of the channels of an interlaced vector — map_value_channel","text":"new interlaced vector, modified according supplied function","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":null,"dir":"Reference","previous_headings":"","what":"Interpret a value as a missing reason — na","title":"Interpret a value as a missing reason — na","text":"na() lifts value interlaced missing reason channel.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interpret a value as a missing reason — na","text":"","code":"na(x = unspecified())"},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interpret a value as a missing reason — na","text":"x character numeric value","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Interpret a value as a missing reason — na","text":"interlaced value","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_collectors.html","id":null,"dir":"Reference","previous_headings":"","what":"Missing reason collectors — na_collectors","title":"Missing reason collectors — na_collectors","text":"Missing reason collectors used extended column specifications specify type column's missing reason channel.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_collectors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Missing reason collectors — na_collectors","text":"","code":"na_col_default() na_col_none() na_col_integer(...) na_col_factor(...) na_col_cfactor(...)"},{"path":"http://kylehusmann.com/interlacer/reference/na_collectors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Missing reason collectors — na_collectors","text":"... values interpret missing values. case na_col_cfactor(), arguments must named.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_collectors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Missing reason collectors — na_collectors","text":"new missing reason collector object","code":""},{"path":"http://kylehusmann.com/interlacer/reference/na_collectors.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Missing reason collectors — na_collectors","text":"na_col_default() used signal missing reason type inherit specification provided na = argument calling read_interlaced_*() function","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":null,"dir":"Reference","previous_headings":"","what":"Parse a character vector into an interlaced vector type — parse_interlaced","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"parse_interlaced converts character vector interlaced vector parsing readr collector type.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"","code":"parse_interlaced(x, na, .default = v_col_guess())"},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"x character vector na missing reason collector (e.g. na_col_integer()), one shortcuts (e.g. list missing values) .default value collector parse character values (e.g. v_col_double(), v_col_integer(), etc.)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/parse_interlaced.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Parse a character vector into an interlaced vector type — parse_interlaced","text":"interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"read_interlaced_*(), family functions extend readr's read_delim(), read_csv, etc. functions use data sources values interlaced missing reasons. functions return tibble interlaced columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"read_interlaced_delim( file, delim = NULL, quote = \"\\\"\", escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = na_col_none(), comment = \"\", trim_ws = FALSE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_csv( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = na_col_none(), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_csv2( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = na_col_none(), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) read_interlaced_tsv( file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale = readr::default_locale(), na = na_col_none(), quote = \"\\\"\", comment = \"\", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), name_repair = \"unique\", progress = readr::show_progress(), show_col_types = readr::should_show_types(), skip_empty_rows = TRUE ) interlaced_vroom( file, delim = NULL, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, skip = 0, n_max = Inf, na = na_col_none(), quote = \"\\\"\", comment = \"\", skip_empty_rows = TRUE, trim_ws = TRUE, escape_double = TRUE, escape_backslash = FALSE, locale = vroom::default_locale(), guess_max = 100, progress = vroom::vroom_progress(), show_col_types = NULL, .name_repair = \"unique\" )"},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"file Either path file, connection, literal data (either single string raw vector). Files ending .gz, .bz2, .xz, .zip automatically uncompressed. Files starting http://, https://, ftp://, ftps:// automatically downloaded. Remote gz files can also automatically downloaded decompressed. Literal data useful examples tests. recognised literal data, input must either wrapped (), string containing least one new line, vector containing least one string new line. Using value clipboard() read system clipboard. delim Single character used separate fields within record. quote Single character used quote strings. escape_backslash file use backslashes escape special characters? general escape_double backslashes can used escape delimiter character, quote character, add special characters like \\\\n. escape_double file escape quotes doubling ? .e. option TRUE, value \"\"\"\" represents single quote, \\\". col_names Either TRUE, FALSE character vector column names. TRUE, first row input used column names, included data frame. FALSE, column names generated automatically: X1, X2, X3 etc. col_names character vector, values used names columns, first row input read first row output data frame. Missing (NA) column names generate warning, filled dummy names ...1, ...2 etc. Duplicate column names generate warning made unique, see name_repair control done. col_types One NULL, cols() specification, string. See vignette(\"readr\") details. NULL, column types inferred guess_max rows input, interspersed throughout file. convenient (fast), robust. guessed types wrong, need increase guess_max supply correct types . Column specifications created list() cols() must contain one column specification column. want read subset columns, use cols_only(). Alternatively, can use compact string representation character represents one column: c = character = integer n = number d = double l = logical f = factor D = date T = date time t = time ? = guess _ - = skip default, reading file without column specification print message showing readr guessed . remove message, set show_col_types = FALSE set options(readr.show_col_types = FALSE). col_select Columns include results. can use mini-language dplyr::select() refer columns name. Use c() use one selection expression. Although usage less common, col_select also accepts numeric column index. See ?tidyselect::language full details selection language. id name column store file path. useful reading multiple input files data file paths, data collection date. NULL (default) extra column created. locale locale controls defaults vary place place. default locale US-centric (like R), can use locale() create locale controls things like default time zone, encoding, decimal mark, big mark, day/month names. na NA col spec defined na_cols() character numeric vector values interpret missing values. comment string used identify comments. text comment characters silently ignored. trim_ws leading trailing whitespace (ASCII spaces tabs) trimmed field parsing ? skip Number lines skip reading data. comment supplied commented lines ignored skipping. n_max Maximum number lines read. guess_max Maximum number lines use guessing column types. never use number lines read. See vignette(\"column-types\", package = \"readr\") details. name_repair, .name_repair Handling column names. default behaviour ensure column names \"unique\". Various repair strategies supported: \"minimal\": name repair checks, beyond basic existence names. \"unique\" (default value): Make sure names unique empty. \"check_unique\": name repair, check unique. \"unique_quiet\": Repair unique strategy, quietly. \"universal\": Make names unique syntactic. \"universal_quiet\": Repair universal strategy, quietly. function: Apply custom name repair (e.g., name_repair = make.names names style base R). purrr-style anonymous function, see rlang::as_function(). argument passed repair vctrs::vec_as_names(). See details terms strategies used enforce . progress Display progress bar? default display interactive session knitting document. automatic progress bar can disabled setting option readr.show_progress FALSE. show_col_types FALSE, show guessed column types. TRUE always show column types, even supplied. NULL (default) show column types explicitly supplied col_types argument. skip_empty_rows blank rows ignored altogether? .e. option TRUE blank rows represented . FALSE represented NA values columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"tibble(), interlaced columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"# Beep boop"},{"path":"http://kylehusmann.com/interlacer/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics .factor, .ordered vctrs vec_c","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":null,"dir":"Reference","previous_headings":"","what":"Access the channels of an interlaced vector — value_channel","title":"Access the channels of an interlaced vector — value_channel","text":"value_channel() returns value channel interlaced vector na_channel() returns missing reason channel interlaced vector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Access the channels of an interlaced vector — value_channel","text":"","code":"value_channel(x, ...) na_channel(x, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Access the channels of an interlaced vector — value_channel","text":"x interlaced vector ... Additional arguments, used","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_channel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Access the channels of an interlaced vector — value_channel","text":"value missing reasons channel","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_collectors.html","id":null,"dir":"Reference","previous_headings":"","what":"Value collectors — value_collectors","title":"Value collectors — value_collectors","text":"Value collectors used extended column specifications specify value type column. think wrappers around readr's col_*() collector types.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_collectors.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Value collectors — value_collectors","text":"","code":"v_col_guess() v_col_cfactor(codes, ordered = FALSE) v_col_character() v_col_date(format = \"\") v_col_datetime(format = \"\") v_col_double() v_col_factor(levels = NULL, ordered = FALSE) v_col_integer() v_col_big_integer() v_col_logical() v_col_number() v_col_skip() v_col_time(format = \"\")"},{"path":"http://kylehusmann.com/interlacer/reference/value_collectors.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Value collectors — value_collectors","text":"codes named vector unique codes declares mapping labels codes. ordered ordered factor? format format specification, described readr::col_datetime() levels Character vector allowed levels. levels = NULL (default), levels discovered unique values x, order appear x.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_collectors.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Value collectors — value_collectors","text":"new value collector object","code":""},{"path":"http://kylehusmann.com/interlacer/reference/value_collectors.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Value collectors — value_collectors","text":"addition column types supported readr, interlacer additionally can load cfactor() types via v_col_cfactor()","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/where_value_channel.html","id":null,"dir":"Reference","previous_headings":"","what":"Select variables with a function applied on value or missing reason channels — where_value_channel","title":"Select variables with a function applied on value or missing reason channels — where_value_channel","text":"where_value_channel() where_na_channel() simple wrappers tidyselect::() apply selection function value missing reason channel columns, respectively.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/where_value_channel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Select variables with a function applied on value or missing reason channels — where_value_channel","text":"","code":"where_value_channel(fn) where_na_channel(fn)"},{"path":"http://kylehusmann.com/interlacer/reference/where_value_channel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Select variables with a function applied on value or missing reason channels — where_value_channel","text":"fn function returns TRUE FALSE (technically, predicate function). Can also purrr-like formula.","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"write_interlaced_*() family functions take data frame interlaced columns, flatten interlaced columns, write file. Non-interlaced columns just pass . behavior functions match similarly named counterparts readr.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"","code":"write_interlaced_delim( x, file, delim = \" \", empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_csv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_csv2( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_excel_csv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_excel_csv2( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() ) write_interlaced_tsv( x, file, empty = \"NA\", append = FALSE, col_names = !append, quote = c(\"needed\", \"all\", \"none\"), escape = c(\"double\", \"backslash\", \"none\"), eol = \"\\n\", num_threads = readr::readr_threads(), progress = readr::show_progress() )"},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"x data frame tibble write disk. file File connection write . delim Delimiter used separate values. Defaults \" \" write_delim(), \",\" write_excel_csv() \";\" write_excel_csv2(). Must single character. empty String used empty values (NA values non-interlaced columns). Defaults NA. append FALSE, overwrite existing file. TRUE, append existing file. cases, file exist new file created. col_names FALSE, column names included top file. TRUE, column names included. specified, col_names take opposite value given append. quote handle fields contain characters need quoted. needed - Values quoted needed: contain delimiter, quote, newline. - Quote fields. none - Never quote fields. escape type escape use quotes data. double - quotes escaped doubling . backslash - quotes escaped preceding backslash. none - quotes escaped. eol end line character use. commonly either \"\\n\" Unix style newlines, \"\\r\\n\" Windows style newlines. num_threads Number threads use reading materializing vectors. data contains newlines within fields parser automatically forced use single thread . progress Display progress bar? default display interactive session knitting document. display updated every 50,000 values display estimated reading time 5 seconds . automatic progress bar can disabled setting option readr.show_progress FALSE.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"write_interlaced_* returns input x invisibly","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_col.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an extended collector for an extended column specification — x_col","title":"Construct an extended collector for an extended column specification — x_col","text":"Extended collectors used x_cols() column specifications indicate value missing reason channel types used loading data read_interlaced_*().","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_col.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an extended collector for an extended column specification — x_col","text":"","code":"x_col(value_collector, na_collector = na_col_default())"},{"path":"http://kylehusmann.com/interlacer/reference/x_col.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an extended collector for an extended column specification — x_col","text":"value_collector value collector na_collector missing reason collector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_col.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an extended collector for an extended column specification — x_col","text":"new extended collector object","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/x_cols.html","id":null,"dir":"Reference","previous_headings":"","what":"Construct an extended column specification — x_cols","title":"Construct an extended column specification — x_cols","text":"Extended column specifications used read_interlaced_*() family functions col_types argument specify value missing reason channel types.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_cols.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Construct an extended column specification — x_cols","text":"","code":"x_cols(..., .default = v_col_guess()) x_cols_only(...)"},{"path":"http://kylehusmann.com/interlacer/reference/x_cols.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Construct an extended column specification — x_cols","text":"... named argument list extended collectors value collectors. .default default value collector","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_cols.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Construct an extended column specification — x_cols","text":"new extended column specification","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_cols.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Construct an extended column specification — x_cols","text":"Like readr::cols(), x_cols() includes columns input data, guessing column types default, creating missing reason channels according na = argument read function. x_cols_only() includes columns specify, like readr::cols_only(). general, can substitute list() x_cols() without changing behavior.","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/reference/x_spec.html","id":null,"dir":"Reference","previous_headings":"","what":"Examine the extended column specification for a data frame — x_spec","title":"Examine the extended column specification for a data frame — x_spec","text":"x_spec() extracts full extended column specification tibble created read_interlaced_*() family functions.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_spec.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Examine the extended column specification for a data frame — x_spec","text":"","code":"x_spec(x)"},{"path":"http://kylehusmann.com/interlacer/reference/x_spec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Examine the extended column specification for a data frame — x_spec","text":"x data frame loaded read_interlaced_*()","code":""},{"path":"http://kylehusmann.com/interlacer/reference/x_spec.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Examine the extended column specification for a data frame — x_spec","text":"extended column specification object","code":""},{"path":[]}] diff --git a/sitemap.xml b/sitemap.xml index 9f23ab6..c50ebbc 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -10,13 +10,13 @@ http://kylehusmann.com/interlacer/articles/coded-data.html - http://kylehusmann.com/interlacer/articles/index.html + http://kylehusmann.com/interlacer/articles/extended-column-types.html - http://kylehusmann.com/interlacer/articles/interlacer.html + http://kylehusmann.com/interlacer/articles/index.html - http://kylehusmann.com/interlacer/articles/na-column-types.html + http://kylehusmann.com/interlacer/articles/interlacer.html http://kylehusmann.com/interlacer/articles/other-approaches.html @@ -27,6 +27,30 @@ http://kylehusmann.com/interlacer/index.html + + http://kylehusmann.com/interlacer/reference/across_value_channels.html + + + http://kylehusmann.com/interlacer/reference/as.cfactor.html + + + http://kylehusmann.com/interlacer/reference/as.codes.html + + + http://kylehusmann.com/interlacer/reference/as.x_col_spec.html + + + http://kylehusmann.com/interlacer/reference/as.x_collector.html + + + http://kylehusmann.com/interlacer/reference/cfactor.html + + + http://kylehusmann.com/interlacer/reference/codes-set.html + + + http://kylehusmann.com/interlacer/reference/codes.html + http://kylehusmann.com/interlacer/reference/flatten_channels.html @@ -43,19 +67,19 @@ http://kylehusmann.com/interlacer/reference/is.empty.html - http://kylehusmann.com/interlacer/reference/map_value_channel.html + http://kylehusmann.com/interlacer/reference/levels-set-.interlacer_interlaced.html - http://kylehusmann.com/interlacer/reference/na.html + http://kylehusmann.com/interlacer/reference/levels.interlacer_interlaced.html - http://kylehusmann.com/interlacer/reference/na_cols.html + http://kylehusmann.com/interlacer/reference/map_value_channel.html - http://kylehusmann.com/interlacer/reference/na_levels.html + http://kylehusmann.com/interlacer/reference/na.html - http://kylehusmann.com/interlacer/reference/na_spec.html + http://kylehusmann.com/interlacer/reference/na_collectors.html http://kylehusmann.com/interlacer/reference/parse_interlaced.html @@ -69,7 +93,22 @@ http://kylehusmann.com/interlacer/reference/value_channel.html + + http://kylehusmann.com/interlacer/reference/value_collectors.html + + + http://kylehusmann.com/interlacer/reference/where_value_channel.html + http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html + + http://kylehusmann.com/interlacer/reference/x_col.html + + + http://kylehusmann.com/interlacer/reference/x_cols.html + + + http://kylehusmann.com/interlacer/reference/x_spec.html +