diff --git a/index.html b/index.html index 08bfee7..a81dedc 100644 --- a/index.html +++ b/index.html @@ -162,7 +162,7 @@
With interlacer, we get this instead:
+With interlacer, we get a “deinterlaced data frame” instead:
(ex <- read_interlaced_csv(
interlacer_example("colors.csv"),
@@ -182,7 +182,7 @@ Usage
#> 9 9 <NA> NA REFUSED <NA> REFUSED
#> 10 10 <NA> NA OMITTED RED <NA>
#> 11 11 <NA> 10 <NA> <NA> REFUSED
As you can see, each source variable is loaded into a “deinterlaced data frame”. Deinterlaced data frames have two columns for each variable: one for values, and another for missing reasons. Missing reason columns are denoted by column names surrounded by dots (e.g. .age.
is the missing reason for the age
column). When a value is NA
, it always has a reason in the missing reason column. Similarly, when a missing reason is NA
, it always has a value in the value column.
Deinterlaced data frames have two columns for each variable: one for values, and another for missing reasons. Missing reason columns are denoted by column names surrounded by dots (e.g. .age.
is the missing reason for the age
column). When a value is NA
, it always has a reason in the missing reason column. Similarly, when a missing reason is NA
, it always has a value in the value column.
This allows us to separately reference values and missing reasons in a tidy and type-aware manner. For example, if I wanted to get a breakdown of the mean age of respondents missing a report of their favorite color, grouped by the missing reason, it would simply be:
ex |>
diff --git a/pkgdown.yml b/pkgdown.yml
index df4ed9f..b36031f 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -7,7 +7,7 @@ articles:
interlacer: interlacer.html
mutations: mutations.html
other-approaches: other-approaches.html
-last_built: 2024-03-05T22:26Z
+last_built: 2024-03-05T22:28Z
urls:
reference: http://kylehusmann.com/interlacer/reference
article: http://kylehusmann.com/interlacer/articles
diff --git a/search.json b/search.json
index 762b866..f71e1be 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-negative-missing-reasons-spss","dir":"Articles","previous_headings":"","what":"Numeric codes with negative missing reasons (SPSS)","title":"Coded Data","text":"’s extremely common find data sources encode categorical responses numeric values, negative values representing missing reason codes. SPSS one example. ’s SPSS-formatted version colors.csv example: missing reasons : -99: N/-98: REFUSED -97: OMITTED colors coded: 1: BLUE 2: RED 3: YELLOW format gives ability load everything numeric type: test value missing code, can check ’s less 0: downsides approach twofold: 1) values missing reasons become codes remember 2) ’s really easy make mistakes. sort mistakes? Well, everything numeric, ’s nothing stopping us treating missing reason codes regular values… forget remove missing reason codes, R still happily compute aggregations using negative numbers! ever thought significant result, find ’s stray missing reason codes still interlaced values? ’s bad time. ’re much better loading formats interlacer, converting codes labelled factor levels: Now aggregations won’t mix values missing codes, won’t keep cross-referencing codebook know values mean:","code":"library(readr) library(interlacer) read_file( interlacer_example(\"colors_coded.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,-98,1 #> 3,21,-98 #> 4,30,-97 #> 5,1,-99 #> 6,41,2 #> 7,50,-97 #> 8,30,3 #> 9,-98,-98 #> 10,-97,2 #> 11,10,-98 (df_coded <- read_csv( interlacer_example(\"colors_coded.csv\"), col_types = \"n\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 -98 1 #> 3 3 21 -98 #> 4 4 30 -97 #> 5 5 1 -99 #> 6 6 41 2 #> 7 7 50 -97 #> 8 8 30 3 #> 9 9 -98 -98 #> 10 10 -97 2 #> 11 11 10 -98 library(dplyr, warn.conflicts = FALSE) df_coded |> mutate( favorite_color_missing = if_else(favorite_color < 0, favorite_color, NA), age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA 30.3 5 #> 2 -98 15.5 3 #> 3 -97 40 2 #> 4 -99 1 1 df_coded |> mutate( favorite_color_missing = if_else(favorite_color < 0, favorite_color, NA), # age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA -20.8 5 #> 2 -98 -22.3 3 #> 3 -97 40 2 #> 4 -99 1 1 library(forcats) (df_decoded_deinterlaced <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = c(\"-99\", \"-98\", \"-97\") ) |> mutate( across( missing_cols(), \\(x) fct_recode(x, `N/A` = \"-99\", REFUSED = \"-98\", OMITTED = \"-97\", ) ), favorite_color = fct_recode( as.character(favorite_color), BLUE = \"1\", RED = \"2\", YELLOW = \"3\", ) )) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED df_decoded_deinterlaced |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = .favorite_color. ) #> # A tibble: 4 × 3 #> .favorite_color. mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-character-missing-reasons-sas-stata","dir":"Articles","previous_headings":"","what":"Numeric codes with character missing reasons (SAS, Stata)","title":"Coded Data","text":"Like SPSS, SAS Stata encode factor levels numeric values, instead representing missing reasons negative codes, given character codes: , value codes used previous example, except missing reasons coded follows: “.”: N/“.”: REFUSED “.b”: OMITTED handle missing reasons without interlacer, columns must loaded character vectors: test value missing, can cast numeric types. cast fails, know ’s missing code. successful, know ’s coded value. Although character missing codes help prevent us mistakenly including missing codes value aggregations, cast columns numeric time check missingness hardly ergonomic, generates annoying warnings. Like , ’s easier import interlacer decode values missing reasons:","code":"read_file( interlacer_example(\"colors_coded_char.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,.a,1 #> 3,21,.a #> 4,30,.b #> 5,1,. #> 6,41,2 #> 7,50,.b #> 8,30,3 #> 9,.a,.a #> 10,.b,2 #> 11,10,.a (df_coded_char <- read_csv( interlacer_example(\"colors_coded_char.csv\"), col_types = \"c\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 .a 1 #> 3 3 21 .a #> 4 4 30 .b #> 5 5 1 . #> 6 6 41 2 #> 7 7 50 .b #> 8 8 30 3 #> 9 9 .a .a #> 10 10 .b 2 #> 11 11 10 .a df_coded_char |> mutate( favorite_color_missing = if_else( is.na(as.numeric(favorite_color)), favorite_color, NA ), age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> Warning: There were 3 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `favorite_color_missing = #> if_else(is.na(as.numeric(favorite_color)), favorite_color, NA)`. #> Caused by warning in `is_logical()`: #> ! NAs introduced by coercion #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA 30.3 5 #> 2 .a 15.5 3 #> 3 .b 40 2 #> 4 . 1 1 read_interlaced_csv( interlacer_example(\"colors_coded_char.csv\"), na = c(\".\", \".a\", \".b\") ) |> mutate( across( missing_cols(), \\(x) fct_recode(x, `N/A` = \".\", REFUSED = \".a\", OMITTED = \".b\", ) ), favorite_color = fct_recode( as.character(favorite_color), BLUE = \"1\", RED = \"2\", YELLOW = \"3\", ) ) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"encoding-a-decoded-deinterlaced-data-frame-","dir":"Articles","previous_headings":"","what":"Encoding a decoded & deinterlaced data frame.","title":"Coded Data","text":"Re-coding re-interlacing data frame easily done follows:","code":"df_decoded_deinterlaced |> mutate( across( missing_cols(), \\(x) fct_recode(x, `-99` = \"N/A\", `-98` = \"REFUSED\", `-97` = \"OMITTED\" ) ), favorite_color = fct_recode( favorite_color, `1` = \"BLUE\", `2` = \"RED\", `3` = \"YELLOW\" ) ) |> write_interlaced_csv(\"output.csv\")"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"haven","dir":"Articles","previous_headings":"","what":"haven","title":"Coded Data","text":"haven package functions loading native SPSS, SAS, Stata native file formats special data frames use column attributes special values keep track interlaces values missing reasons. complete discussion compares interlacer’s approach, see vignette(\"-approaches\"). Future versions interlacer ability convert haven data frames deinterlaced data frames, want gauge interest feature invest time implement . feature ’d use, please let know!","code":""},{"path":"http://kylehusmann.com/interlacer/articles/column-types.html","id":"interlaced-column-types","dir":"Articles","previous_headings":"","what":"Interlaced Column Types","title":"Interlaced Column Types","text":"addition standard readr::col_* column specification types, interlacer provides interlaced column types enable specify missing reasons column level. useful missing reasons apply particular items opposed file whole. example, say measure following two items: current stress level? Low Moderate High don’t know don’t understand question well feel manage time responsibilities today? Poorly Fairly well Well well apply (Today vacation day) apply (reason) can see, items two selection choices mapped missing reasons. specify missing reasons variable level, icol_*() family column specification types can used. extend readr’s col_*() column types adding parameter specifying missing values unique particular variable: icol_factor() column spec works just like readr::col_factor(), additionally accepts na argument specifying missing values variable level. specify missing reasons variable-level like , available levels resulting missing reason column correctly show possible missing reasons variable: comparison, loaded variable-level missing reasons file-level level missing reasons, variable missing reasons possible levels, even didn’t apply particular variable:","code":"(df_stress <- read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = icol_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\"), na = c(\"DONT_KNOW\", \"DONT_UNDERSTAND\") ), time_management = icol_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\"), na = c(\"NA_VACATION\", \"NA_OTHER\") ) ), na = c( \"REFUSED\", \"OMITTED\", \"N/A\" ) )) #> # An deinterlaced tibble: 8 × 6 #> person_id .person_id. current_stress .current_stress. time_management #> #> 1 1 NA LOW NA VERY_WELL #> 2 2 NA MODERATE NA POORLY #> 3 3 NA NA DONT_KNOW NA #> 4 4 NA HIGH NA POORLY #> 5 5 NA NA DONT_UNDERSTAND NA #> 6 6 NA LOW NA NA #> 7 7 NA MODERATE NA WELL #> 8 8 NA NA OMITTED FAIRLY_WELL #> # ℹ 1 more variable: .time_management. levels(df_stress$.person_id.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" levels(df_stress$.current_stress.) #> [1] \"DONT_KNOW\" \"DONT_UNDERSTAND\" \"REFUSED\" \"OMITTED\" #> [5] \"N/A\" levels(df_stress$.time_management.) #> [1] \"NA_VACATION\" \"NA_OTHER\" \"REFUSED\" \"OMITTED\" \"N/A\" df_stress_file <- read_interlaced_csv( interlacer_example(\"stress.csv\"), na = c( \"REFUSED\", \"OMITTED\", \"N/A\", \"DONT_KNOW\", \"DONT_UNDERSTAND\", \"NA_VACATION\", \"NA_OTHER\" ) ) levels(df_stress_file$.person_id.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\" levels(df_stress_file$.current_stress.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\" levels(df_stress_file$.time_management.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\""},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"aggregations-with-missing-reasons","dir":"Articles","previous_headings":"","what":"Aggregations with missing reasons","title":"Introduction to interlacer","text":"Now, interested values source data, functionality need. wanted know values NA? Although information encoded source data, lost missing reasons converted NA values. example, consider favorite_color column. many respondents REFUSED give favorite color? many people just OMITTED answer? question N/respondents (e.g. wasn’t survey form)? mean respondent age groups? current dataframe gets us part way: can see, converted missing reasons single NA, can answer questions missingness general, rather work specific reasons stored source data. Unfortunately, try load data missing reasons intact, lose something else: type information values. Now access missing reasons, columns character vectors. means order anything values, always filter missing reasons, cast remaining values desired type: gives us information want, cumbersome starts get really complex different columns different sets possible missing reasons. means lot type conversion gymnastics switch value types missing types.","code":"library(dplyr, warn.conflicts = FALSE) df |> mutate( favorite_color_missing = is.na(favorite_color) ) |> summarize( mean_age = mean(age, na.rm = T), n = n(), .by = favorite_color_missing ) #> # A tibble: 2 × 3 #> favorite_color_missing mean_age n #> #> 1 FALSE 30.3 5 #> 2 TRUE 22.4 6 (df_with_missing <- read_csv( interlacer_example(\"colors.csv\"), col_types = cols(.default = \"c\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED reasons <- c(\"REFUSED\", \"OMITTED\", \"N/A\") df_with_missing |> mutate( age_values = as.numeric(if_else(age %in% reasons, NA, age)), favorite_color_missing_reasons = if_else( favorite_color %in% reasons, favorite_color, NA ) ) |> summarize( mean_age = mean(age_values, na.rm=T), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"the-interlacer-approach","dir":"Articles","previous_headings":"Aggregations with missing reasons","what":"The interlacer approach","title":"Introduction to interlacer","text":"Interlacer built based insight everything becomes much tidy, simple, expressive explicitly work values missing reasons separate channels variable. functions read_interlaced_* functions interlacer : deinterlace variables interlaced data sources two columns per variable: one holding values, one holding missing reasons. can see, missing reasons columns denoted names surrounded dots: .age. column holds missing reasons age variable, . Now, missing reason information need right fingertips, value types preserved. make report , run: get results without needing type gymnastics!","code":"(df_deinterlaced <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), )) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED df_deinterlaced |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = .favorite_color. ) #> # A tibble: 4 × 3 #> .favorite_color. mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"filtering-based-on-missing-reasons","dir":"Articles","previous_headings":"","what":"Filtering based on missing reasons","title":"Introduction to interlacer","text":"separate columns values missing reasons also helpful creating samples inclusion / exclusion criteria based missing reasons. example, using example data, say wanted create sample respondents REFUSED give age? people REFUSED report age favorite color? separate columns, can combine value conditions missing reason conditions. example, select everyone REFUSED give favorite color, 20 years old: ’ve created sample, ready start analyzing data, typically don’t need keep missing reasons around anymore. Interlacer provides convenient drop_missing_cols() function take care :","code":"df_deinterlaced |> filter(.age. == \"REFUSED\") #> # An deinterlaced tibble: 2 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 2 NA NA REFUSED BLUE NA #> 2 9 NA NA REFUSED NA REFUSED df_deinterlaced |> filter(.age. == \"REFUSED\" & .favorite_color. == \"REFUSED\") #> # An deinterlaced tibble: 1 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 9 NA NA REFUSED NA REFUSED df_deinterlaced |> filter(age > 20 & .favorite_color. == \"REFUSED\") #> # An deinterlaced tibble: 1 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 3 NA 21 NA NA REFUSED df_deinterlaced |> filter(.age. == \"REFUSED\") |> drop_missing_cols() #> # A tibble: 2 × 3 #> person_id age favorite_color #> #> 1 2 NA BLUE #> 2 9 NA NA"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Introduction to interlacer","text":"far, ’ve covered interlacer’s read_interlaced_* family functions enabled us deinterlace value missing reason channels interlaced data sources separate dataframe columns. Separate value missing reason columns enable us create tidy type-aware aggregation filtering pipelines can simultaneously consider variable’s value missing reasons. ’s well good, happens want make modifications data? want add variables dataframe, replace values missing reasons, missing reasons values? Inevitably, ’ll create situations simultaneously value missing reason, neither value missing reason: Notice warnings! operations produce dataframes don’t conform rule “one value missing reason per variable row”. manually solve manually fixing corresponding column, output hints, interlacer provides easier way way function coalesce_channels(). next vignette, vignette(\"mutations\"), show works!","code":"# Value and missing reason: df_deinterlaced |> mutate( .age. = \"REDACTED\" ) #> Warning: Column `age` has rows with both values and missing reasons #> ℹ Run `coalesce_channels()` to fix. #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 REDACTED BLUE NA #> 2 2 NA NA REDACTED BLUE NA #> 3 3 NA 21 REDACTED NA REFUSED #> 4 4 NA 30 REDACTED NA OMITTED #> 5 5 NA 1 REDACTED NA N/A #> 6 6 NA 41 REDACTED RED NA #> 7 7 NA 50 REDACTED NA OMITTED #> 8 8 NA 30 REDACTED YELLOW NA #> 9 9 NA NA REDACTED NA REFUSED #> 10 10 NA NA REDACTED RED NA #> 11 11 NA 10 REDACTED NA REFUSED # No value, no missing reason: df_deinterlaced |> mutate( favorite_color = na_if(favorite_color, \"BLUE\") ) #> Warning: Column `favorite_color` has rows without values or missing reasons #> ℹ Run `coalesce_channels()` to fix. #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA NA NA #> 2 2 NA NA REFUSED NA NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/mutations.html","id":"an-easier-way-with-coalesce_channels","dir":"Articles","previous_headings":"","what":"An easier way with coalesce_channels()","title":"Mutating Values and Missing Reasons","text":"can imagine, manually fixing value & missing reason structure data frame every mutation can get cumbersome! Luckily, interlacer provides easier way via coalesce_channels(): coalesce_channels() run every time mutate something deinterlaced data frame. accepts two arguments keep, default_reason. fixes possible problem cases follows: Case 1: value missing reason exists Keep value keep = 'value' Keep missing reason keep = 'missing' Case 2: NEITHER value missing reason exists Fill missing reason default_reason rules allow us mutate deinterlaced variables without needing specify values missing reason actions – need think intended operation context one channel, call coalesce_channels() can take care us. ’s ’d use coalesce_channels() two examples previous section:","code":"df |> mutate( .age. = \"REDACTED\", ) |> coalesce_channels(keep = \"missing\") #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA NA REDACTED BLUE NA #> 2 2 NA NA REDACTED BLUE NA #> 3 3 NA NA REDACTED NA REFUSED #> 4 4 NA NA REDACTED NA OMITTED #> 5 5 NA NA REDACTED NA N/A #> 6 6 NA NA REDACTED RED NA #> 7 7 NA NA REDACTED NA OMITTED #> 8 8 NA NA REDACTED YELLOW NA #> 9 9 NA NA REDACTED NA REFUSED #> 10 10 NA NA REDACTED RED NA #> 11 11 NA NA REDACTED NA REFUSED df |> mutate( favorite_color = if_else( favorite_color %in% c(\"RED\", \"YELLOW\"), favorite_color, NA ) ) |> coalesce_channels(default_reason = \"TECHNICAL_ERROR\") #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA NA TECHNICAL_ERROR #> 2 2 NA NA REFUSED NA TECHNICAL_ERROR #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/mutations.html","id":"creating-new-columns","dir":"Articles","previous_headings":"","what":"Creating New Columns","title":"Mutating Values and Missing Reasons","text":"coalesce_channels() also automatically create missing reason columns don’t automatically exist. useful adding new variables data frame:","code":"df |> mutate( person_type = if_else(age < 18, \"CHILD\", \"ADULT\"), .after = person_id ) |> coalesce_channels(default_reason = \"AGE_UNAVAILABLE\") #> # An deinterlaced tibble: 11 × 8 #> person_id .person_id. person_type .person_type. age .age. favorite_color #> #> 1 1 NA ADULT NA 20 NA BLUE #> 2 2 NA NA AGE_UNAVAILABLE NA REFUS… BLUE #> 3 3 NA ADULT NA 21 NA NA #> 4 4 NA ADULT NA 30 NA NA #> 5 5 NA CHILD NA 1 NA NA #> 6 6 NA ADULT NA 41 NA RED #> 7 7 NA ADULT NA 50 NA NA #> 8 8 NA ADULT NA 30 NA YELLOW #> 9 9 NA NA AGE_UNAVAILABLE NA REFUS… NA #> 10 10 NA NA AGE_UNAVAILABLE NA OMITT… RED #> 11 11 NA CHILD NA 10 NA NA #> # ℹ 1 more variable: .favorite_color. "},{"path":"http://kylehusmann.com/interlacer/articles/mutations.html","id":"writing-interlaced-files","dir":"Articles","previous_headings":"","what":"Writing interlaced files","title":"Mutating Values and Missing Reasons","text":"’ve made made changes data, probably want save . Interlacer provides write_interlaced_* family functions : combine value missing reasons interlaced character columns, write result csv. Alternatively, want re-interlace columns without writing file control writing process, can use interlace_channels():","code":"write_interlaced_csv(df, \"interlaced_output.csv\") interlace_channels(df) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/mutations.html","id":"final-note-setting-the-global-default-reason","dir":"Articles","previous_headings":"","what":"Final note: Setting the global default reason","title":"Mutating Values and Missing Reasons","text":"default, coalesce_channels() use UNKNOWN_REASON default missing reason. Sometimes want use different default value, act “catch-” missing reason, don’t constantly specify . , set global default_missing_reason option:","code":"options(default_missing_reason = -99) tibble( a = c(1,2,3, NA, 5) ) |> coalesce_channels() #> # An deinterlaced tibble: 5 × 2 #> a .a. #> #> 1 1 NA #> 2 2 NA #> 3 3 NA #> 4 NA -99 #> 5 5 NA"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"labelled-missing-values","dir":"Articles","previous_headings":"","what":"“Labelled” missing values","title":"Other Approaches","text":"SPSS *.sav files loaded haven via haven::read_spss(), values missing reasons loaded single interlaced numeric vector: just numeric vector though, haven::labelled_spss() numeric vector, attributes describing value missing value codes: attributes adjust behavior functions like .na(): still usual gymnastics pipelines: ’s little bit improvement working raw coded values, can use .na(), codes get labels, don’t constantly looking codes codebook. still falls short interlacer approach two key reasons. Reason 1: interlacer approach separate columns values missing reasons, value column can whatever type want: numeric, character, factor, etc. labelled, values missing reasons need type, usually numeric codes. creates lot type gymnastics potential errors ’re manipulating . Reason 2: Keeping interlaced columns, even missing values labelled, means aggregations protected. forget take missing values, get incorrect results:","code":"library(interlacer) library(haven) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union (df_spss <- read_spss( interlacer_example(\"colors.sav\"), user_na = TRUE )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 -98 (NA) [REFUSED] 1 [BLUE] #> 3 3 21 -98 (NA) [REFUSED] #> 4 4 30 -97 (NA) [OMITTED] #> 5 5 1 -99 (NA) [N/A] #> 6 6 41 2 [RED] #> 7 7 50 -97 (NA) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 -98 (NA) [REFUSED] -98 (NA) [REFUSED] #> 10 10 -97 (NA) [OMITTED] 2 [RED] #> 11 11 10 -98 (NA) [REFUSED] attributes(df_spss$favorite_color) #> $label #> [1] \"Favorite color\" #> #> $na_range #> [1] -Inf 0 #> #> $class #> [1] \"haven_labelled_spss\" \"haven_labelled\" \"vctrs_vctr\" #> [4] \"double\" #> #> $format.spss #> [1] \"F8.2\" #> #> $labels #> BLUE RED YELLOW N/A REFUSED OMITTED #> 1 2 3 -99 -98 -97 is.na(df_spss$favorite_color) #> [1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE df_spss |> mutate( age_values = if_else(is.na(age), NA, age), favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age_values, na.rm=T), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 30.3 5 #> 2 -98 (NA) [REFUSED] 15.5 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1 df_spss |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA -20.8 5 #> 2 -98 (NA) [REFUSED] -22.3 3 #> 3 -97 (NA) [OMITTED] 40 2 #> 4 -99 (NA) [N/A] 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"tagged-missing-values","dir":"Articles","previous_headings":"","what":"“Tagged” missing values","title":"Other Approaches","text":"loading Stata SAS files, haven uses “tagged missingness” approach mirror values handled Stata SAS: approach deviously clever. takes advantage way NaN floating point values stored memory, make possible different “flavors” NA values. (info done, check tagged_na.c source code haven) still act like regular NA values… now can include single character “tag” (usually letter -z). means work .na() include missing reason codes aggregations! Unfortunately, can’t group , dplyr::group_by() missing tag-aware :( Another limitation approach requires values types numeric, trick “tagging” NA values depends peculiarities floating point values stored memory. , keeping separate columns values missing reasons solves issues.","code":"(df_stata <- read_stata( interlacer_example(\"colors.dta\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 [BLUE] #> 2 2 NA(a) [REFUSED] 1 [BLUE] #> 3 3 21 NA(a) [REFUSED] #> 4 4 30 NA(b) [OMITTED] #> 5 5 1 NA #> 6 6 41 2 [RED] #> 7 7 50 NA(b) [OMITTED] #> 8 8 30 3 [YELLOW] #> 9 9 NA(a) [REFUSED] NA(a) [REFUSED] #> 10 10 NA(b) [OMITTED] 2 [RED] #> 11 11 10 NA(a) [REFUSED] is.na(df_stata$age) #> [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE mean(df_stata$age, na.rm=TRUE) #> [1] 25.375 df_stata |> mutate( favorite_color_missing_reasons = if_else( is.na(favorite_color), favorite_color, NA ) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 1 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 25.4 11"},{"path":"http://kylehusmann.com/interlacer/articles/other-approaches.html","id":"the-ideal-approach","dir":"Articles","previous_headings":"","what":"The “ideal” approach","title":"Other Approaches","text":"biggest downside keeping separate columns values missing reasons invalid states come start trying mutate data frames. coalesce_channels() helps lot, ’s pragmatic solution, ideal one. think ideal way handle missing reasons implement proper generic Result type natively R’s type system. real Result type act similar haven’s haven::tagged_na(), container type value, missing values. early attempt library, tried using nested data frames effect: sort works, can use $v $m reference separate channels data frame. Unfortunately requires creating separate columns grouping: mutations get ugly, even though ’re “correct” strongly-typed functional programming perspective… implement somehow custom native type R, ’d want syntax something like instead: “ideal” book: can use values usual, anytime want access “missing reason” channel, can wrap missing_reason() (similar haven::tagged_na() works). ’s type safe super ergonomic. implementing major headache involve intimate knowledge R internals… (@Hadley Wickham miracle ’re reading , talk sometime??) ’m using present current “deinterlaced data frame” approach. easy understand use, even though ’s “perfect” strongly typed functional programming perspective. ’s enough demand missing-reason-aware tooling R though, might convince go “generic tagged type” rabbit hole… Please drop line let know think!","code":"df_interlaced <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") ) (df_nested <- tibble( person_id = tibble( v = df_interlaced$person_id, m = df_interlaced$.person_id., ), age = tibble( v = df_interlaced$age, m = df_interlaced$.age., ), favorite_color = tibble( v = df_interlaced$favorite_color, m = df_interlaced$.favorite_color., ) )) #> # A tibble: 11 × 3 #> person_id$v $m age$v $m favorite_color$v $m #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED df_nested |> mutate( favorite_color_missing = favorite_color$m ) |> summarize( mean_age = mean(age$v, na.rm=T), n = n(), .by = favorite_color_missing ) #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1 df_nested |> mutate( favorite_color = if_else( favorite_color$v %in% c(\"RED\", \"YELLOW\"), tibble(v = favorite_color$v, m = NA), tibble(v = NA, m = \"TECHNICAL_ERROR\") ) ) #> # A tibble: 11 × 3 #> person_id$v $m age$v $m favorite_color$v $m #> #> 1 1 NA 20 NA NA TECHNICAL_ERROR #> 2 2 NA NA REFUSED NA TECHNICAL_ERROR #> 3 3 NA 21 NA NA TECHNICAL_ERROR #> 4 4 NA 30 NA NA TECHNICAL_ERROR #> 5 5 NA 1 NA NA TECHNICAL_ERROR #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA TECHNICAL_ERROR #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA TECHNICAL_ERROR #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA TECHNICAL_ERROR df_mutated <- df |> mutate( favorite_color = if_else( favorite_color %in% c(\"RED\", \"YELLOW\"), favorite_color, missing_reason(\"TECHNICAL_ERROR\") ) df_mutated |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = missing_reason(favorite_color) )"},{"path":"http://kylehusmann.com/interlacer/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Kyle Husmann. Author, maintainer.","code":""},{"path":"http://kylehusmann.com/interlacer/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Husmann K (2024). interlacer: Read Tabular Data Interlaced Values Missing Reasons. R package version 0.1.0, http://kylehusmann.com/interlacer/.","code":"@Manual{, title = {interlacer: Read Tabular Data With Interlaced Values And Missing Reasons}, author = {Kyle Husmann}, year = {2024}, note = {R package version 0.1.0}, url = {http://kylehusmann.com/interlacer/}, }"},{"path":[]},{"path":"http://kylehusmann.com/interlacer/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"value missing data, sometimes want know missing. Many textual tabular data sources encode missing reasons special values interlaced regular values column (e.g. N/, REFUSED, -99, etc.). Unfortunately, missing reasons lost values converted single NA type. Working missing reasons R traditionally requires loading variables character vectors bunch string comparisons type conversions make sense . Interlacer created based insight values missing reasons can handled separate channels variable. Interlacer provides functions load variables interlaced data sources two separate columns: One containing variable’s values, containing missing reasons. turns , structure gives us extremely powerful expressive way simultaneously work values missing reasons tidy pipelines, described vignette(\"interlacer\"). (tldr: allows us interact variable Result type, abstraction often found functional programming) library currently experimental stages, aware interface likely change future. meantime, please try let know think!","code":""},{"path":"http://kylehusmann.com/interlacer/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"","code":"# The easiest way to get interlacer is to install via devtools: install.packages(\"devtools\") # If devtools is not already installed devtools::install_github(\"khusmann/interlacer\")"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"use interlacer, load current R session: interlacer supports following file formats read_interlaced_*() functions, extend readr::read_*() family functions: read_interlaced_csv() read_interlaced_tsv() read_interlaced_csv2() read_interlaced_delim() quick demo, consider following example file bundled interlacer: csv file, values interlaced three possible missing reasons: REFUSED, OMITTED, N/. readr, loading data result data frame like : interlacer, get instead: can see, source variable loaded “deinterlaced data frame”. Deinterlaced data frames two columns variable: one values, another missing reasons. Missing reason columns denoted column names surrounded dots (e.g. .age. missing reason age column). value NA, always reason missing reason column. Similarly, missing reason NA, always value value column. allows us separately reference values missing reasons tidy type-aware manner. example, wanted get breakdown mean age respondents missing report favorite color, grouped missing reason, simply : (Note category result refers mean age responses without missing color values, .e. available favorite color responses). just scratches surface can done interlacer… check vignette(\"interlacer\") complete overview!","code":"library(interlacer) library(readr) #> Warning: package 'readr' was built under R version 4.2.3 read_file(interlacer_example(\"colors.csv\")) |> cat() #> person_id,age,favorite_color #> 1,20,BLUE #> 2,REFUSED,BLUE #> 3,21,REFUSED #> 4,30,OMITTED #> 5,1,N/A #> 6,41,RED #> 7,50,OMITTED #> 8,30,YELLOW #> 9,REFUSED,REFUSED #> 10,OMITTED,RED #> 11,10,REFUSED read_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") ) #> Rows: 11 Columns: 3 #> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: \",\" #> chr (1): favorite_color #> dbl (2): person_id, age #> #> ℹ Use `spec()` to retrieve the full column specification for this data. #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 NA BLUE #> 3 3 21 #> 4 4 30 #> 5 5 1 #> 6 6 41 RED #> 7 7 50 #> 8 8 30 YELLOW #> 9 9 NA #> 10 10 NA RED #> 11 11 10 (ex <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\") )) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 20 BLUE #> 2 2 NA REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 NA REFUSED REFUSED #> 10 10 NA OMITTED RED #> 11 11 10 REFUSED ex |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = .favorite_color. ) #> # A tibble: 4 × 3 #> .favorite_color. mean_age n #> #> 1 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/index.html","id":"known-issues","dir":"","previous_headings":"","what":"Known Issues","title":"Read Tabular Data With Interlaced Values And Missing Reasons","text":"Large data frames (many columns & rows) slow run print interlacer. Deinterlaced data frames validated check conform rule “one value missing reason per row”, check done completely R. key places (noted source) extremely benefit native implementation, make library much snappy. invest time though, want get enough feedback users package stabilize current approach / API. (find package useful, please let know!)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/coalesce_channels.html","id":null,"dir":"Reference","previous_headings":"","what":"Coalesce missing reasons in a data frame — coalesce_channels","title":"Coalesce missing reasons in a data frame — coalesce_channels","text":"Mutations deinterlaced data frames can result variables either values missing reasons, values missing reasons. `coalesce_channels()` takes care situations. case value missing reason, choose keep based `keep` paramter. case value missing reason exists, fill missing reason `default_reason` parameter. Mutations can also create new value columns without companion missing reason columns. case, new missing reason created filled `default_reason` wherever missing values value column. ( behavior can also used stub missing reason columns value-data frames)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/coalesce_channels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coalesce missing reasons in a data frame — coalesce_channels","text":"","code":"coalesce_channels( x, default_reason = getOption(\"default_missing_reason\"), keep = c(\"values\", \"missing\") )"},{"path":"http://kylehusmann.com/interlacer/reference/coalesce_channels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coalesce missing reasons in a data frame — coalesce_channels","text":"x data frame default_reason variable missing value missing reason, default missing reason fill . keep variable value missing reason, choose keep. (properly formed deinterlaced data frame values missing reasons)","code":""},{"path":"http://kylehusmann.com/interlacer/reference/coalesce_channels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Coalesce missing reasons in a data frame — coalesce_channels","text":"deinterlaced tibble.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/deinterlace_type_convert.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert character columns and deinterlace missing reasons in existing data\nframe — deinterlace_type_convert","title":"Convert character columns and deinterlace missing reasons in existing data\nframe — deinterlace_type_convert","text":"simple wrapper `readr::type_convert()` deinterlaces missing reasons addition parsing values.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/deinterlace_type_convert.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert character columns and deinterlace missing reasons in existing data\nframe — deinterlace_type_convert","text":"","code":"deinterlace_type_convert(x, col_types = NULL, na = c(\"\", \"NA\"), ...)"},{"path":"http://kylehusmann.com/interlacer/reference/deinterlace_type_convert.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert character columns and deinterlace missing reasons in existing data\nframe — deinterlace_type_convert","text":"x data frame col_types One `NULL`, [readr::cols()] specification, string. na Character vector strings interpret missing values. ... additional parameters pass `readr::type_convert()`","code":""},{"path":"http://kylehusmann.com/interlacer/reference/deinterlace_type_convert.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert character columns and deinterlace missing reasons in existing data\nframe — deinterlace_type_convert","text":"[tibble()].separate columns values missing reasons variable.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/drop_missing_cols.html","id":null,"dir":"Reference","previous_headings":"","what":"Drop missing reasons from a deinterlaced data frame — drop_missing_cols","title":"Drop missing reasons from a deinterlaced data frame — drop_missing_cols","text":"Drop missing reason value columns deinterlaced data frame, turning regular data frame unlabelled `NA` values.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/drop_missing_cols.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Drop missing reasons from a deinterlaced data frame — drop_missing_cols","text":"","code":"drop_missing_cols(x) drop_value_cols(x)"},{"path":"http://kylehusmann.com/interlacer/reference/drop_missing_cols.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Drop missing reasons from a deinterlaced data frame — drop_missing_cols","text":"x data frame","code":""},{"path":"http://kylehusmann.com/interlacer/reference/drop_missing_cols.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Drop missing reasons from a deinterlaced data frame — drop_missing_cols","text":"tibble without missing reason columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/icol_logical.html","id":null,"dir":"Reference","previous_headings":"","what":"Interlaced collectors for read_interlaced_* — icol_logical","title":"Interlaced collectors for read_interlaced_* — icol_logical","text":"Interlaced collector extend `readr` collector types (e.g. `col_double()`) allow column-level missing value specifications.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/icol_logical.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interlaced collectors for read_interlaced_* — icol_logical","text":"","code":"icol_logical(na) icol_integer(na) icol_double(na) icol_character(na) icol_factor(na, levels = NULL, ordered = FALSE) icol_date(na, format = \"\") icol_time(na, format = \"\") icol_datetime(na, format = \"\") icol_number(na)"},{"path":"http://kylehusmann.com/interlacer/reference/icol_logical.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interlaced collectors for read_interlaced_* — icol_logical","text":"na Character vector strings interpret column-level missing values levels Character vector allowed levels. levels = NULL (default), levels discovered unique values x, order appear x. ordered ordered factor? format format specification, described . set \"\", date times parsed ISO8601, dates times used date time formats specified locale(). Unlike strptime(), format specification must match complete string.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlace_channels.html","id":null,"dir":"Reference","previous_headings":"","what":"Re-interlacce a deinterlaced data frame — interlace_channels","title":"Re-interlacce a deinterlaced data frame — interlace_channels","text":"function take deinterlaced data frame re-interlace combining value misisng reason column pairs single character columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlace_channels.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Re-interlacce a deinterlaced data frame — interlace_channels","text":"","code":"interlace_channels(x)"},{"path":"http://kylehusmann.com/interlacer/reference/interlace_channels.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Re-interlacce a deinterlaced data frame — interlace_channels","text":"x deinterlaced data frame","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlace_channels.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Re-interlacce a deinterlaced data frame — interlace_channels","text":"interlaced data frame, , data frame character columns contain values missing reasons.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":null,"dir":"Reference","previous_headings":"","what":"Get path to interlacer example — interlacer_example","title":"Get path to interlacer example — interlacer_example","text":"interlacer comes bundled number sample files `inst/extdata` directory. function make easy access","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get path to interlacer example — interlacer_example","text":"","code":"interlacer_example(file = NULL)"},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get path to interlacer example — interlacer_example","text":"file Name file. NULL, example files listed.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/interlacer_example.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get path to interlacer example — interlacer_example","text":"","code":"interlacer_example() #> [1] \"colors.csv\" \"colors.dta\" \"colors.sav\" #> [4] \"colors_coded.csv\" \"colors_coded_char.csv\" \"stress.csv\" interlacer_example(\"colors.csv\") #> [1] \"/home/runner/work/_temp/Library/interlacer/extdata/colors.csv\""},{"path":"http://kylehusmann.com/interlacer/reference/missing_cols.html","id":null,"dir":"Reference","previous_headings":"","what":"Selection helpers for deinterlaced data frames — missing_cols","title":"Selection helpers for deinterlaced data frames — missing_cols","text":"tidy selection helpers match missing reason value columns deinterlaced data frame * `missing_cols()` selects missing reason columns. * `value_cols()` selects value columns.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/missing_cols.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Selection helpers for deinterlaced data frames — missing_cols","text":"","code":"missing_cols(vars = NULL) value_cols(vars = NULL)"},{"path":"http://kylehusmann.com/interlacer/reference/missing_cols.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Selection helpers for deinterlaced data frames — missing_cols","text":"vars character vector variable names. supplied, variables taken current selection context (established functions like select() pivot_longer()).","code":""},{"path":"http://kylehusmann.com/interlacer/reference/missing_names.html","id":null,"dir":"Reference","previous_headings":"","what":"The names of an deinterlaced data frame — missing_names","title":"The names of an deinterlaced data frame — missing_names","text":"Functions get names missing reason columns value columns deinterlaced data frame","code":""},{"path":"http://kylehusmann.com/interlacer/reference/missing_names.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"The names of an deinterlaced data frame — missing_names","text":"","code":"missing_names(x) value_names(x)"},{"path":"http://kylehusmann.com/interlacer/reference/missing_names.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"The names of an deinterlaced data frame — missing_names","text":"x deinterlaced data frame","code":""},{"path":"http://kylehusmann.com/interlacer/reference/missing_names.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"The names of an deinterlaced data frame — missing_names","text":"vector missing reason value column names.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"`read_interlaced_*()`, family functions extend `readr`'s `read_delim()`, `read_csv`, etc. functions use data sources values interlaced missing reasons. functions return tibble two columns interlaced source column: column values, column missing reasons. Missing reason columns named taking value column name surrounding dots (e.g. missing reasons \"col_name\" read column named \".col_name.\")","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"read_interlaced_delim( file, delim = NULL, col_types = NULL, col_select = NULL, na = c(\"\", \"NA\"), ... ) read_interlaced_csv( file, col_types = NULL, col_select = NULL, na = c(\"\", \"NA\"), ... ) read_interlaced_csv2( file, col_types = NULL, col_select = NULL, na = c(\"\", \"NA\"), ... ) read_interlaced_tsv( file, col_types = NULL, col_select = NULL, na = c(\"\", \"NA\"), ... )"},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"file Either path file, connection, literal data (either single string raw vector). delim Single character used separate fields within record. col_types One `NULL`, [readr::cols()] specification, string. addition `col_*` specifiers provided `readr`, `icol_*()` specifiers may used. See `vignette(\"interlacer\")` details. col_select Columns include results. [reader::read_delim], can use mini-language [dplyr::select()] refer columns name. na Character vector strings interpret missing values. values become factor levels missing reason column. ... Additional parameters pass `read_delim`","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"deinterlaced [tibble()], , tibble separate columns values missing reasonskfor variable.","code":""},{"path":"http://kylehusmann.com/interlacer/reference/read_interlaced_delim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read an delimited file with interlaced missing reasons into a tibble — read_interlaced_delim","text":"","code":"# Beep boop"},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":null,"dir":"Reference","previous_headings":"","what":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"`write_interlaced_*()` family functions take deinterlaced data frame, re-interlace , write flie. behavior functions match similarly named counterparts [readr].","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"","code":"write_interlaced_delim(x, file, delim = \" \", ...) write_interlaced_csv(x, file, ...) write_interlaced_csv2(x, file, ...) write_interlaced_excel_csv(x, file, ...) write_interlaced_excel_csv2(x, file, ...) write_interlaced_tsv(x, file, ...)"},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"x data frame tibble write disk file File connection write delim Delimiter used separate values. Defaults \" \" `write_interlaced_delim()`, \",\" `write_interlaced_excel_csv()` \";\" `write_interlaced_excel_csv2()`. Must single character. ... Additional parameters pass [readr]","code":""},{"path":"http://kylehusmann.com/interlacer/reference/write_interlaced_delim.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Interlace a deinterlaced data frame and write it to a file — write_interlaced_delim","text":"`write_interlaced_*` returns input x invisibly","code":""}]
+[{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"Apache License","title":"Apache License","text":"Version 2.0, January 2004 ","code":""},{"path":[]},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_1-definitions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"1. Definitions","title":"Apache License","text":"“License” shall mean terms conditions use, reproduction, distribution defined Sections 1 9 document. “Licensor” shall mean copyright owner entity authorized copyright owner granting License. “Legal Entity” shall mean union acting entity entities control, controlled , common control entity. purposes definition, “control” means () power, direct indirect, cause direction management entity, whether contract otherwise, (ii) ownership fifty percent (50%) outstanding shares, (iii) beneficial ownership entity. “” (“”) shall mean individual Legal Entity exercising permissions granted License. “Source” form shall mean preferred form making modifications, including limited software source code, documentation source, configuration files. “Object” form shall mean form resulting mechanical transformation translation Source form, including limited compiled object code, generated documentation, conversions media types. “Work” shall mean work authorship, whether Source Object form, made available License, indicated copyright notice included attached work (example provided Appendix ). “Derivative Works” shall mean work, whether Source Object form, based (derived ) Work editorial revisions, annotations, elaborations, modifications represent, whole, original work authorship. purposes License, Derivative Works shall include works remain separable , merely link (bind name) interfaces , Work Derivative Works thereof. “Contribution” shall mean work authorship, including original version Work modifications additions Work Derivative Works thereof, intentionally submitted Licensor inclusion Work copyright owner individual Legal Entity authorized submit behalf copyright owner. purposes definition, “submitted” means form electronic, verbal, written communication sent Licensor representatives, including limited communication electronic mailing lists, source code control systems, issue tracking systems managed , behalf , Licensor purpose discussing improving Work, excluding communication conspicuously marked otherwise designated writing copyright owner “Contribution.” “Contributor” shall mean Licensor individual Legal Entity behalf Contribution received Licensor subsequently incorporated within Work.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_2-grant-of-copyright-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"2. Grant of Copyright License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable copyright license reproduce, prepare Derivative Works , publicly display, publicly perform, sublicense, distribute Work Derivative Works Source Object form.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_3-grant-of-patent-license","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"3. Grant of Patent License","title":"Apache License","text":"Subject terms conditions License, Contributor hereby grants perpetual, worldwide, non-exclusive, -charge, royalty-free, irrevocable (except stated section) patent license make, made, use, offer sell, sell, import, otherwise transfer Work, license applies patent claims licensable Contributor necessarily infringed Contribution(s) alone combination Contribution(s) Work Contribution(s) submitted. institute patent litigation entity (including cross-claim counterclaim lawsuit) alleging Work Contribution incorporated within Work constitutes direct contributory patent infringement, patent licenses granted License Work shall terminate date litigation filed.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_4-redistribution","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"4. Redistribution","title":"Apache License","text":"may reproduce distribute copies Work Derivative Works thereof medium, without modifications, Source Object form, provided meet following conditions: () must give recipients Work Derivative Works copy License; (b) must cause modified files carry prominent notices stating changed files; (c) must retain, Source form Derivative Works distribute, copyright, patent, trademark, attribution notices Source form Work, excluding notices pertain part Derivative Works; (d) Work includes “NOTICE” text file part distribution, Derivative Works distribute must include readable copy attribution notices contained within NOTICE file, excluding notices pertain part Derivative Works, least one following places: within NOTICE text file distributed part Derivative Works; within Source form documentation, provided along Derivative Works; , within display generated Derivative Works, wherever third-party notices normally appear. contents NOTICE file informational purposes modify License. may add attribution notices within Derivative Works distribute, alongside addendum NOTICE text Work, provided additional attribution notices construed modifying License. may add copyright statement modifications may provide additional different license terms conditions use, reproduction, distribution modifications, Derivative Works whole, provided use, reproduction, distribution Work otherwise complies conditions stated License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_5-submission-of-contributions","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"5. Submission of Contributions","title":"Apache License","text":"Unless explicitly state otherwise, Contribution intentionally submitted inclusion Work Licensor shall terms conditions License, without additional terms conditions. Notwithstanding , nothing herein shall supersede modify terms separate license agreement may executed Licensor regarding Contributions.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_6-trademarks","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"6. Trademarks","title":"Apache License","text":"License grant permission use trade names, trademarks, service marks, product names Licensor, except required reasonable customary use describing origin Work reproducing content NOTICE file.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_7-disclaimer-of-warranty","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"7. Disclaimer of Warranty","title":"Apache License","text":"Unless required applicable law agreed writing, Licensor provides Work (Contributor provides Contributions) “” BASIS, WITHOUT WARRANTIES CONDITIONS KIND, either express implied, including, without limitation, warranties conditions TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS PARTICULAR PURPOSE. solely responsible determining appropriateness using redistributing Work assume risks associated exercise permissions License.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_8-limitation-of-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"8. Limitation of Liability","title":"Apache License","text":"event legal theory, whether tort (including negligence), contract, otherwise, unless required applicable law (deliberate grossly negligent acts) agreed writing, shall Contributor liable damages, including direct, indirect, special, incidental, consequential damages character arising result License use inability use Work (including limited damages loss goodwill, work stoppage, computer failure malfunction, commercial damages losses), even Contributor advised possibility damages.","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"id_9-accepting-warranty-or-additional-liability","dir":"","previous_headings":"Terms and Conditions for use, reproduction, and distribution","what":"9. Accepting Warranty or Additional Liability","title":"Apache License","text":"redistributing Work Derivative Works thereof, may choose offer, charge fee , acceptance support, warranty, indemnity, liability obligations /rights consistent License. However, accepting obligations, may act behalf sole responsibility, behalf Contributor, agree indemnify, defend, hold Contributor harmless liability incurred , claims asserted , Contributor reason accepting warranty additional liability. END TERMS CONDITIONS","code":""},{"path":"http://kylehusmann.com/interlacer/LICENSE.html","id":"appendix-how-to-apply-the-apache-license-to-your-work","dir":"","previous_headings":"","what":"APPENDIX: How to apply the Apache License to your work","title":"Apache License","text":"apply Apache License work, attach following boilerplate notice, fields enclosed brackets [] replaced identifying information. (Don’t include brackets!) text enclosed appropriate comment syntax file format. also recommend file class name description purpose included “printed page” copyright notice easier identification within third-party archives.","code":"Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-negative-missing-reasons-spss","dir":"Articles","previous_headings":"","what":"Numeric codes with negative missing reasons (SPSS)","title":"Coded Data","text":"’s extremely common find data sources encode categorical responses numeric values, negative values representing missing reason codes. SPSS one example. ’s SPSS-formatted version colors.csv example: missing reasons : -99: N/-98: REFUSED -97: OMITTED colors coded: 1: BLUE 2: RED 3: YELLOW format gives ability load everything numeric type: test value missing code, can check ’s less 0: downsides approach twofold: 1) values missing reasons become codes remember 2) ’s really easy make mistakes. sort mistakes? Well, everything numeric, ’s nothing stopping us treating missing reason codes regular values… forget remove missing reason codes, R still happily compute aggregations using negative numbers! ever thought significant result, find ’s stray missing reason codes still interlaced values? ’s bad time. ’re much better loading formats interlacer, converting codes labelled factor levels: Now aggregations won’t mix values missing codes, won’t keep cross-referencing codebook know values mean:","code":"library(readr) library(interlacer) read_file( interlacer_example(\"colors_coded.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,-98,1 #> 3,21,-98 #> 4,30,-97 #> 5,1,-99 #> 6,41,2 #> 7,50,-97 #> 8,30,3 #> 9,-98,-98 #> 10,-97,2 #> 11,10,-98 (df_coded <- read_csv( interlacer_example(\"colors_coded.csv\"), col_types = \"n\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 -98 1 #> 3 3 21 -98 #> 4 4 30 -97 #> 5 5 1 -99 #> 6 6 41 2 #> 7 7 50 -97 #> 8 8 30 3 #> 9 9 -98 -98 #> 10 10 -97 2 #> 11 11 10 -98 library(dplyr, warn.conflicts = FALSE) df_coded |> mutate( favorite_color_missing = if_else(favorite_color < 0, favorite_color, NA), age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA 30.3 5 #> 2 -98 15.5 3 #> 3 -97 40 2 #> 4 -99 1 1 df_coded |> mutate( favorite_color_missing = if_else(favorite_color < 0, favorite_color, NA), # age = if_else(age > 0, age, NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA -20.8 5 #> 2 -98 -22.3 3 #> 3 -97 40 2 #> 4 -99 1 1 library(forcats) (df_decoded_deinterlaced <- read_interlaced_csv( interlacer_example(\"colors_coded.csv\"), na = c(\"-99\", \"-98\", \"-97\") ) |> mutate( across( missing_cols(), \\(x) fct_recode(x, `N/A` = \"-99\", REFUSED = \"-98\", OMITTED = \"-97\", ) ), favorite_color = fct_recode( as.character(favorite_color), BLUE = \"1\", RED = \"2\", YELLOW = \"3\", ) )) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED df_decoded_deinterlaced |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = .favorite_color. ) #> # A tibble: 4 × 3 #> .favorite_color. mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"numeric-codes-with-character-missing-reasons-sas-stata","dir":"Articles","previous_headings":"","what":"Numeric codes with character missing reasons (SAS, Stata)","title":"Coded Data","text":"Like SPSS, SAS Stata encode factor levels numeric values, instead representing missing reasons negative codes, given character codes: , value codes used previous example, except missing reasons coded follows: “.”: N/“.”: REFUSED “.b”: OMITTED handle missing reasons without interlacer, columns must loaded character vectors: test value missing, can cast numeric types. cast fails, know ’s missing code. successful, know ’s coded value. Although character missing codes help prevent us mistakenly including missing codes value aggregations, cast columns numeric time check missingness hardly ergonomic, generates annoying warnings. Like , ’s easier import interlacer decode values missing reasons:","code":"read_file( interlacer_example(\"colors_coded_char.csv\") ) |> cat() #> person_id,age,favorite_color #> 1,20,1 #> 2,.a,1 #> 3,21,.a #> 4,30,.b #> 5,1,. #> 6,41,2 #> 7,50,.b #> 8,30,3 #> 9,.a,.a #> 10,.b,2 #> 11,10,.a (df_coded_char <- read_csv( interlacer_example(\"colors_coded_char.csv\"), col_types = \"c\" )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 1 #> 2 2 .a 1 #> 3 3 21 .a #> 4 4 30 .b #> 5 5 1 . #> 6 6 41 2 #> 7 7 50 .b #> 8 8 30 3 #> 9 9 .a .a #> 10 10 .b 2 #> 11 11 10 .a df_coded_char |> mutate( favorite_color_missing = if_else( is.na(as.numeric(favorite_color)), favorite_color, NA ), age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA) ) |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = favorite_color_missing ) #> Warning: There were 3 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `favorite_color_missing = #> if_else(is.na(as.numeric(favorite_color)), favorite_color, NA)`. #> Caused by warning in `is_logical()`: #> ! NAs introduced by coercion #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> # A tibble: 4 × 3 #> favorite_color_missing mean_age n #> #> 1 NA 30.3 5 #> 2 .a 15.5 3 #> 3 .b 40 2 #> 4 . 1 1 read_interlaced_csv( interlacer_example(\"colors_coded_char.csv\"), na = c(\".\", \".a\", \".b\") ) |> mutate( across( missing_cols(), \\(x) fct_recode(x, `N/A` = \".\", REFUSED = \".a\", OMITTED = \".b\", ) ), favorite_color = fct_recode( as.character(favorite_color), BLUE = \"1\", RED = \"2\", YELLOW = \"3\", ) ) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"encoding-a-decoded-deinterlaced-data-frame-","dir":"Articles","previous_headings":"","what":"Encoding a decoded & deinterlaced data frame.","title":"Coded Data","text":"Re-coding re-interlacing data frame easily done follows:","code":"df_decoded_deinterlaced |> mutate( across( missing_cols(), \\(x) fct_recode(x, `-99` = \"N/A\", `-98` = \"REFUSED\", `-97` = \"OMITTED\" ) ), favorite_color = fct_recode( favorite_color, `1` = \"BLUE\", `2` = \"RED\", `3` = \"YELLOW\" ) ) |> write_interlaced_csv(\"output.csv\")"},{"path":"http://kylehusmann.com/interlacer/articles/coded-data.html","id":"haven","dir":"Articles","previous_headings":"","what":"haven","title":"Coded Data","text":"haven package functions loading native SPSS, SAS, Stata native file formats special data frames use column attributes special values keep track interlaces values missing reasons. complete discussion compares interlacer’s approach, see vignette(\"-approaches\"). Future versions interlacer ability convert haven data frames deinterlaced data frames, want gauge interest feature invest time implement . feature ’d use, please let know!","code":""},{"path":"http://kylehusmann.com/interlacer/articles/column-types.html","id":"interlaced-column-types","dir":"Articles","previous_headings":"","what":"Interlaced Column Types","title":"Interlaced Column Types","text":"addition standard readr::col_* column specification types, interlacer provides interlaced column types enable specify missing reasons column level. useful missing reasons apply particular items opposed file whole. example, say measure following two items: current stress level? Low Moderate High don’t know don’t understand question well feel manage time responsibilities today? Poorly Fairly well Well well apply (Today vacation day) apply (reason) can see, items two selection choices mapped missing reasons. specify missing reasons variable level, icol_*() family column specification types can used. extend readr’s col_*() column types adding parameter specifying missing values unique particular variable: icol_factor() column spec works just like readr::col_factor(), additionally accepts na argument specifying missing values variable level. specify missing reasons variable-level like , available levels resulting missing reason column correctly show possible missing reasons variable: comparison, loaded variable-level missing reasons file-level level missing reasons, variable missing reasons possible levels, even didn’t apply particular variable:","code":"(df_stress <- read_interlaced_csv( interlacer_example(\"stress.csv\"), col_types = cols( person_id = col_integer(), current_stress = icol_factor( levels = c(\"LOW\", \"MODERATE\", \"HIGH\"), na = c(\"DONT_KNOW\", \"DONT_UNDERSTAND\") ), time_management = icol_factor( levels = c(\"POORLY\", \"FAIRLY_WELL\", \"WELL\", \"VERY_WELL\"), na = c(\"NA_VACATION\", \"NA_OTHER\") ) ), na = c( \"REFUSED\", \"OMITTED\", \"N/A\" ) )) #> # An deinterlaced tibble: 8 × 6 #> person_id .person_id. current_stress .current_stress. time_management #> #> 1 1 NA LOW NA VERY_WELL #> 2 2 NA MODERATE NA POORLY #> 3 3 NA NA DONT_KNOW NA #> 4 4 NA HIGH NA POORLY #> 5 5 NA NA DONT_UNDERSTAND NA #> 6 6 NA LOW NA NA #> 7 7 NA MODERATE NA WELL #> 8 8 NA NA OMITTED FAIRLY_WELL #> # ℹ 1 more variable: .time_management. levels(df_stress$.person_id.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" levels(df_stress$.current_stress.) #> [1] \"DONT_KNOW\" \"DONT_UNDERSTAND\" \"REFUSED\" \"OMITTED\" #> [5] \"N/A\" levels(df_stress$.time_management.) #> [1] \"NA_VACATION\" \"NA_OTHER\" \"REFUSED\" \"OMITTED\" \"N/A\" df_stress_file <- read_interlaced_csv( interlacer_example(\"stress.csv\"), na = c( \"REFUSED\", \"OMITTED\", \"N/A\", \"DONT_KNOW\", \"DONT_UNDERSTAND\", \"NA_VACATION\", \"NA_OTHER\" ) ) levels(df_stress_file$.person_id.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\" levels(df_stress_file$.current_stress.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\" levels(df_stress_file$.time_management.) #> [1] \"REFUSED\" \"OMITTED\" \"N/A\" \"DONT_KNOW\" #> [5] \"DONT_UNDERSTAND\" \"NA_VACATION\" \"NA_OTHER\""},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"aggregations-with-missing-reasons","dir":"Articles","previous_headings":"","what":"Aggregations with missing reasons","title":"Introduction to interlacer","text":"Now, interested values source data, functionality need. wanted know values NA? Although information encoded source data, lost missing reasons converted NA values. example, consider favorite_color column. many respondents REFUSED give favorite color? many people just OMITTED answer? question N/respondents (e.g. wasn’t survey form)? mean respondent age groups? current dataframe gets us part way: can see, converted missing reasons single NA, can answer questions missingness general, rather work specific reasons stored source data. Unfortunately, try load data missing reasons intact, lose something else: type information values. Now access missing reasons, columns character vectors. means order anything values, always filter missing reasons, cast remaining values desired type: gives us information want, cumbersome starts get really complex different columns different sets possible missing reasons. means lot type conversion gymnastics switch value types missing types.","code":"library(dplyr, warn.conflicts = FALSE) df |> mutate( favorite_color_missing = is.na(favorite_color) ) |> summarize( mean_age = mean(age, na.rm = T), n = n(), .by = favorite_color_missing ) #> # A tibble: 2 × 3 #> favorite_color_missing mean_age n #> #> 1 FALSE 30.3 5 #> 2 TRUE 22.4 6 (df_with_missing <- read_csv( interlacer_example(\"colors.csv\"), col_types = cols(.default = \"c\") )) #> # A tibble: 11 × 3 #> person_id age favorite_color #> #> 1 1 20 BLUE #> 2 2 REFUSED BLUE #> 3 3 21 REFUSED #> 4 4 30 OMITTED #> 5 5 1 N/A #> 6 6 41 RED #> 7 7 50 OMITTED #> 8 8 30 YELLOW #> 9 9 REFUSED REFUSED #> 10 10 OMITTED RED #> 11 11 10 REFUSED reasons <- c(\"REFUSED\", \"OMITTED\", \"N/A\") df_with_missing |> mutate( age_values = as.numeric(if_else(age %in% reasons, NA, age)), favorite_color_missing_reasons = if_else( favorite_color %in% reasons, favorite_color, NA ) ) |> summarize( mean_age = mean(age_values, na.rm=T), n = n(), .by = favorite_color_missing_reasons ) #> # A tibble: 4 × 3 #> favorite_color_missing_reasons mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"the-interlacer-approach","dir":"Articles","previous_headings":"Aggregations with missing reasons","what":"The interlacer approach","title":"Introduction to interlacer","text":"Interlacer built based insight everything becomes much tidy, simple, expressive explicitly work values missing reasons separate channels variable. functions read_interlaced_* functions interlacer : deinterlace variables interlaced data sources two columns per variable: one holding values, one holding missing reasons. can see, missing reasons columns denoted names surrounded dots: .age. column holds missing reasons age variable, . Now, missing reason information need right fingertips, value types preserved. make report , run: get results without needing type gymnastics!","code":"(df_deinterlaced <- read_interlaced_csv( interlacer_example(\"colors.csv\"), na = c(\"REFUSED\", \"OMITTED\", \"N/A\"), )) #> # An deinterlaced tibble: 11 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 1 NA 20 NA BLUE NA #> 2 2 NA NA REFUSED BLUE NA #> 3 3 NA 21 NA NA REFUSED #> 4 4 NA 30 NA NA OMITTED #> 5 5 NA 1 NA NA N/A #> 6 6 NA 41 NA RED NA #> 7 7 NA 50 NA NA OMITTED #> 8 8 NA 30 NA YELLOW NA #> 9 9 NA NA REFUSED NA REFUSED #> 10 10 NA NA OMITTED RED NA #> 11 11 NA 10 NA NA REFUSED df_deinterlaced |> summarize( mean_age = mean(age, na.rm=T), n = n(), .by = .favorite_color. ) #> # A tibble: 4 × 3 #> .favorite_color. mean_age n #> #> 1 NA 30.3 5 #> 2 REFUSED 15.5 3 #> 3 OMITTED 40 2 #> 4 N/A 1 1"},{"path":"http://kylehusmann.com/interlacer/articles/interlacer.html","id":"filtering-based-on-missing-reasons","dir":"Articles","previous_headings":"","what":"Filtering based on missing reasons","title":"Introduction to interlacer","text":"separate columns values missing reasons also helpful creating samples inclusion / exclusion criteria based missing reasons. example, using example data, say wanted create sample respondents REFUSED give age? people REFUSED report age favorite color? separate columns, can combine value conditions missing reason conditions. example, select everyone REFUSED give favorite color, 20 years old: ’ve created sample, ready start analyzing data, typically don’t need keep missing reasons around anymore. Interlacer provides convenient drop_missing_cols() function take care :","code":"df_deinterlaced |> filter(.age. == \"REFUSED\") #> # An deinterlaced tibble: 2 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #> #> 1 2 NA NA REFUSED BLUE NA #> 2 9 NA NA REFUSED NA REFUSED df_deinterlaced |> filter(.age. == \"REFUSED\" & .favorite_color. == \"REFUSED\") #> # An deinterlaced tibble: 1 × 6 #> person_id .person_id. age .age. favorite_color .favorite_color. #>