Merge pull request #4 from peterdesmet/main

Typo fixes + opinionated styling choices
khusmann · Jun 7, 2024 · 6cd6863 · 6cd6863
2 parents c07ebde + de00986
commit 6cd6863
Show file tree

Hide file tree

Showing 6 changed files with 42 additions and 44 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -135,7 +135,7 @@ ex$age
 Computations automatically operate on values:
 
 ```{r}
-mean(ex$age, na.rm=TRUE)
+mean(ex$age, na.rm = TRUE)
 ```
 
 But the missing reasons are still there! To indicate a value should be treated
@@ -156,7 +156,7 @@ reason:
 ```{r}
 ex |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) %>%
@@ -202,7 +202,7 @@ ex |>
 You may notice that on large datasets `interlacer` runs significantly slower
 than `readr` / `vroom`. Although `interlacer` uses `vroom` under the hood to load 
 delimited data, it is not able to take advantage of many of its optimizations
-because `vroom` does not
+because `vroom` 
 [does not currently support](https://github.com/tidyverse/vroom/issues/532)
 column-level missing values.  As soon as `vroom` supports column-level
 missing values, I will be able to remedy this!

diff --git a/README.md b/README.md
@@ -172,7 +172,7 @@ ex$age
 Computations automatically operate on values:
 
 ``` r
-mean(ex$age, na.rm=TRUE)
+mean(ex$age, na.rm = TRUE)
 #> [1] 25.375
 ```
 
@@ -199,7 +199,7 @@ missing reason:
 ``` r
 ex |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) %>%
@@ -282,7 +282,7 @@ ex |>
 You may notice that on large datasets `interlacer` runs significantly
 slower than `readr` / `vroom`. Although `interlacer` uses `vroom` under
 the hood to load delimited data, it is not able to take advantage of
-many of its optimizations because `vroom` does not [does not currently
+many of its optimizations because `vroom` [does not currently
 support](https://github.com/tidyverse/vroom/issues/532) column-level
 missing values. As soon as `vroom` supports column-level missing values,
 I will be able to remedy this!

diff --git a/vignettes/coded-data.Rmd b/vignettes/coded-data.Rmd
@@ -47,19 +47,19 @@ read_file(
 
 Where missing reasons are:
 
-> -99: N/A
+> `-99`: N/A
 >
-> -98: REFUSED
+> `-98`: REFUSED
 >
-> -97: OMITTED
+> `-97`: OMITTED
 
 And colors are coded:
 
-> 1: BLUE
+> `1`: BLUE
 >
-> 2: RED
+> `2`: RED
 >
-> 3: YELLOW
+> `3`: YELLOW
 
 This format gives you the ability to load everything as a numeric type:
 
@@ -80,7 +80,7 @@ df_coded |>
     age = if_else(age > 0, age, NA)
   ) |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -102,7 +102,7 @@ df_coded |>
 #    age = if_else(age > 0, age, NA)
   ) |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -169,7 +169,7 @@ keep cross-referencing your codebook to know what values mean:
 ```{r}
 df_decoded |>
   summarize(
-    mean_age = mean(age, na.rm=TRUE),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -185,8 +185,6 @@ df_decoded |>
   )
 ```
 
-
-
 ## Numeric codes with character missing reasons (SAS, Stata)
 
 Like SPSS, SAS and Stata will encode factor levels as numeric values, but
@@ -203,12 +201,11 @@ read_file(
 Here, the same value codes are used as the previous example, except the missing
 reasons are coded as follows:
 
-> ".": N/A
+> `"."`: N/A
 >
-> ".a": REFUSED
+> `".a"`: REFUSED
 >
-> ".b": OMITTED
-
+> `".b"`: OMITTED
 
 To handle these missing reasons without interlacer, columns must be loaded as
 character vectors:
@@ -229,7 +226,7 @@ df_coded_char |>
     age = if_else(!is.na(as.numeric(age)), as.numeric(age), NA)
   ) |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>

diff --git a/vignettes/interlacer.Rmd b/vignettes/interlacer.Rmd
@@ -63,7 +63,7 @@ library(dplyr, warn.conflicts = FALSE)
 
 df_simple |>
   summarize(
-    mean_age = mean(age, na.rm = T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -98,7 +98,7 @@ df_with_missing |>
     age_values = as.numeric(if_else(age %in% reasons, NA, age)),
   ) |>
   summarize(
-    mean_age = mean(age_values, na.rm=T),
+    mean_age = mean(age_values, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -169,7 +169,7 @@ the unique missing reasons, rather than being lumped into a single `NA`:
 ```{r}
 df |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color
   ) |>
@@ -392,4 +392,4 @@ In all the examples in this vignette, column types were automatically detected.
 To explicitly specify value and missing column types, (and specify individual
 missing reasons for specific columns), interlacer extends
 `readr`'s `collector()` system. This will be covered in the next vignette,
-`vignette("na-column-types")`
+`vignette("na-column-types")`.
diff --git a/vignettes/na-column-types.Rmd b/vignettes/na-column-types.Rmd
@@ -54,14 +54,16 @@ This is useful when you have missing reasons that only apply to particular items
 as opposed to the file as a whole. For example, say we had a measure with the
 following two items:
 
-> 1. What is your current stress level?
+1. What is your current stress level?
+
 > a. Low
 > b. Moderate
 > c. High
 > d. I don't know
 > e. I don't understand the question
-> 
-> 2. How well do you feel you manage your time and responsibilities today?
+
+2. How well do you feel you manage your time and responsibilities today?
+
 > a. Poorly
 > b. Fairly well
 > c. Well

diff --git a/vignettes/other-approaches.Rmd b/vignettes/other-approaches.Rmd
@@ -78,7 +78,7 @@ df_spss |>
     )
   ) |>
   summarize(
-    mean_age = mean(age_values, na.rm=T),
+    mean_age = mean(age_values, na.rm = TRUE),
     n = n(),
     .by = favorite_color_missing_reasons
   )
@@ -103,7 +103,7 @@ This creates a lot more type gymnastics and potential errors when you're
 manipulating them.
 
 Reason 2: Even when the missing values are labelled in the `labelled_spss` type,
-aggregations and other math operatiosn are not protected. If you forget
+aggregations and other math operations are not protected. If you forget
 to take out your missing values, you get incorrect results / corrupted data:
 
 ```{r}
@@ -114,7 +114,7 @@ df_spss |>
     )
   ) |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color_missing_reasons
   )
@@ -151,11 +151,11 @@ character "tag" (usually a letter from a-z). This means that they work with
 ```{r}
 is.na(df_stata$age)
 
-mean(df_stata$age, na.rm=TRUE)
+mean(df_stata$age, na.rm = TRUE)
 ```
 
 Unfortunately, you can't group by them, because `dplyr::group_by()` is not
-missing tag-aware :(
+tag-aware. :(
 
 ```{r}
 df_stata |>
@@ -165,7 +165,7 @@ df_stata |>
     )
   ) |>
   summarize(
-    mean_age = mean(age, na.rm=T),
+    mean_age = mean(age, na.rm = TRUE),
     n = n(),
     .by = favorite_color_missing_reasons
   )
@@ -195,7 +195,6 @@ of the object:
 # All the missing reason info is tracked in the attributes
 attributes(dcl)
 
-
 # The data stored has actual NA values, so it works as you would expect
 # with summary stats like `mean()`, etc.
 attributes(dcl) <- NULL
@@ -207,15 +206,15 @@ This means aggregations work exactly as you would expect!
 ```{r}
 dcl <- declared(c(1, 2, 3, -99, -98), na_values = c(-99, -98))
 
-sum(dcl, na.rm=TRUE)
+sum(dcl, na.rm = TRUE)
 ```
 
 ## interlacer
 
 interlacer builds on the ideas of haven, labelled, and declared with following
 goals:
 
-1. Be fully generic: Add a missing value channel to *any* vector type.
+### 1. Be fully generic: Add a missing value channel to *any* vector type
 
 As mentioned above, `haven::labelled_spss()` only works with `numeric`
 and `character` types, and `haven::tagged_na()` only works with `numeric` types. 
@@ -250,12 +249,12 @@ int
 
 This data structure drives their functional API, described in (3) below.
 
-2. Provide functions for reading / writing interlaced CSV files (not just SPSS
+### 2. Provide functions for reading / writing interlaced CSV files (not just SPSS
 / SAS / Stata files)
 
-(See `interlacer::read_interlaced_csv()`, etc.)
+See `interlacer::read_interlaced_csv()`, etc.
 
-3. Provide a functional API that integrates well into tidy pipelines
+### 3. Provide a functional API that integrates well into tidy pipelines
 
 interlacer provides functions to facilitate working with the `interlaced` type
 as a [Result type](https://en.wikipedia.org/wiki/Result_type),
@@ -292,7 +291,7 @@ plays nicely with all the packages in the tidyverse.
 
 ## Questions for the future
 
-1. More flexible missing reason channel types?
+### 1. More flexible missing reason channel types?
 
 Earlier versions allowed arbitrary types to occupy
 the missing reason channel (i.e. it was a fully generic Result<Value, Missing>
@@ -305,9 +304,9 @@ tell, in 99.9% of the time, it is preferable to use `integer` and `factor`
 missing reason channels over `double` and `character` ones, so for now I've
 made the executive decision to only allow `integer` and `factor` types.
 
-2. A better `na_cols()` specification?
+### 2. A better `na_cols()` specification?
 
-Right now, missing values are supplied in `na` a separate argument from
+Right now, missing values are supplied in a separate argument from
 `col_types`. This means custom missing values get pretty far separated from
 their `col_type` definitions: