Merge branch 'main' of https://github.com/rostools/r-cubed-intermediate

rostools · May 7, 2024 · f015763 · f015763
2 parents e472f89 + 0096073
commit f015763
Showing 1 changed file with 65 additions and 69 deletions.
diff --git a/sessions/functionals.qmd b/sessions/functionals.qmd
@@ -34,11 +34,11 @@ Specific objectives are to:
 
 1.  Explain what functional programming, vectorization, and functionals
     are within R and identify when code is a functional or uses
-    functional programming. Then to apply this knowledge by using the
+    functional programming. Then apply this knowledge using the
     `{purrr}` package.
 2.  Review the split-apply-combine technique and identify how these
     concepts connect to functional programming.
-3.  Apply functional programming to summarizing data and for using the
+3.  Apply functional programming to summarize data using the
     split-apply-combine technique.
 
 ## Functional programming
@@ -236,21 +236,13 @@ for you and for us as instructors). And because we use Git, nothing is
 truly gone so you can always go back to the text later. Next, we restart
 the R session with {{< var keybind.restart-r >}}.
 
-Next, we'll need to add `{purrr}` as a package dependency by going to
-the **Console** and running:
-
-``` {.r filename="Console"}
-usethis::use_package("purrr")
-```
-
-Since `{purrr}` is part of the `{tidyverse}`, we don't need to load it
-with `library()`. Before we'll use the `map()` functional, we need to
-get a vector or list of all the dataset files available to us. We will
-return to using the `{fs}` package, which has a function called
-`dir_ls()` that finds files of a certain pattern. So, let's add
-`library(fs)` to the `setup` code chunk. Then, go to the bottom of the
-`doc/learning.qmd` document, create a new header called `## Using map`,
-and create a code chunk below that with {{< var keybind.chunk >}}
+Before we'll use the `map()` functional, we need to get a vector or list
+of all the dataset files available to us. We will return to using the
+`{fs}` package, which has a function called `dir_ls()` that finds files
+of a certain pattern. So, let's add `library(fs)` to the `setup` code
+chunk. Then, go to the bottom of the `doc/learning.qmd` document, create
+a new header called `## Using map`, and create a code chunk below that
+with {{< var keybind.chunk >}}
 
 The `dir_ls()` function takes the path that we want to search
 (`data-raw/mmash/`), uses the argument `regexp` (short for [regular
@@ -281,8 +273,16 @@ user_info_files
 head(gsub(".*\\/data-raw", "data-raw", user_info_files), 3)
 ```
 
-Alright, we now have all the files ready to give to `map()`. So let's
-try it!
+Alright, we now have all the files ready to give to `map()`. But before
+using it, we'll need to add `{purrr}`, where `map()` comes from as a
+package dependency by going to the **Console** and running:
+
+``` {.r filename="Console"}
+usethis::use_package("purrr")
+```
+
+Since `{purrr}` is part of the `{tidyverse}`, we don't need to load it
+with `library()`. So let's try it!
 
 ```{r}
 #| filename: "doc/learning.qmd"
@@ -303,7 +303,7 @@ datasets! But we're missing an important bit of information: The user
 ID. A powerful feature of the `{purrr}` package is that it has other
 functions to make it easier to work with functionals. We know `map()`
 always outputs a list. But what we want is a single data frame at the
-end that also contains the user ID information.
+end that also contains the user ID.
 
 The function that will take a list and convert it into a data frame is
 called `list_rbind()` to bind ("stack") by rows or `list_cbind()` to
@@ -482,11 +482,13 @@ we can move on and open up the `data-raw/mmash.R` script. If not, it
 means that there is an issue in your code and that it won't be
 reproducible.
 
-Before continuing, we'll move the `library(fs)` line to right below the
-`library(here)`. Then, inside `data-raw/mmash.R`, copy and paste the two
-lines of code in the code chunk above to the bottom of the script.
-Afterwards, go the top of the script and right below the `library(fs)`
-code, add these two lines of code, so it looks like this:
+Before continuing, we'll collect our imported packages in the top of the
+script by adding the `library(fs)` line to right below `library(here)`.
+Then, inside `data-raw/mmash.R`, copy and paste the two lines of code
+that creates the `user_info_df` and `saliva_df` to the bottom of the
+script (i.e., the two lines in the code chunk above). Afterwards, go the
+top of the script and right below the `library(fs)` code, add these two
+lines of code, so it looks like this:
 
 ``` {.r filename="data-raw/mmash.R"}
 library(here)
@@ -519,30 +521,23 @@ technique, which we covered in the beginner R course. The method is:
 3.  Combine the results to present them together (e.g. into a data frame
     that you can use to make a plot or table).
 
-So when you split data into multiple groups, you make a *vector* that
-you can then apply (i.e. using the *map* functional) some statistical
-technique to each group through *vectorization*. This technique works
-really well for a range of tasks, including for our task of summarizing
-some of the MMASH data so we can merge it all into one dataset.
+So when you split data into multiple groups, you create a list (or a
+*vector*) that you can then use (with the *map* functional) to apply a
+statistical technique to each group through *vectorization*. This
+technique works really well for a range of tasks, including for our task
+of summarizing some of the MMASH data so we can merge it all into one
+dataset.
 
 ## Summarising data through functionals {#sec-summarise-with-functionals}
 
-::: {.callout-note appearance="minimal" collapse="true"}
-## Instructor note
-
-Before starting this section, ask how many have used the pipe before. If
-everyone has, then move on. If some haven't, very briefly explain it,
-but **do not** use much time on it since we will be using it shortly and
-they will see how it works then. We covered this in the introduction
-course, so we should not cover it again here.
-:::
-
 Functionals and vectorization are integral components of how R works and
 they appear throughout many of R's functions and packages. They are
 particularly used throughout the `{tidyverse}` packages like `{dplyr}`.
 Let's get into some more advanced features of `{dplyr}` functions that
-work as functionals. Before we continue, re-run the code for getting
-`user_info_df` since you had restarted the R session previously.
+work as functionals.
+
+Before we continue, re-run the code for getting `user_info_df` since you
+had restarted the R session previously.
 
 Since we're going to use `{dplyr}`, we need to add it as a dependency by
 typing this in the **Console**:
@@ -557,9 +552,10 @@ the [Data Management and
 Wrangling](https://r-cubed-intro.rostools.org/sessions/data-management.html#managing-and-working-with-data-in-r)
 session of the beginner course). The common usage of these verbs is
 through acting on and directly using the column names (e.g. without `"`
-quotes around the column name). But many `{dplyr}` verbs can also take
-functions as input, especially when using the column selection helpers
-from the `{tidyselect}` package.
+quotes around the column name like with
+`saliva_df |> select(cortisol_norm)`). But many `{dplyr}` verbs can also
+take functions as input, especially when using the column selection
+helpers from the `{tidyselect}` package.
 
 Likewise, with functions like `summarise()`, if you want to for example
 calculate the mean of cortisol in the saliva dataset, you would usually
@@ -591,8 +587,11 @@ saliva_df |>
 
 But instead, there is the `across()` function that works like `map()`
 and allows you to calculate the mean across which ever columns you want.
-In many ways, `across()` is a duplicate of `map()`, particularly in the
-arguments you give it.
+In many ways, `across()` is similar to `map()`, particularly in the
+arguments you give it and in the sense that it is a functional. But they
+are used in different settings: `across()` works well with columns
+within a dataframe and within a `mutate()` or `summarise()`, while
+`map()` is more generic.
 
 ::: callout-note
 ## Reading task: \~2 minutes
@@ -727,24 +726,13 @@ way, we use the split-apply-combine technique. Let's first summarise by
 taking the mean of `ibi_s` (which is the inter-beat interval in
 seconds).
 
-::: {.callout-note appearance="default"}
-By default, using `group_by()` continues the grouping effect of later
-code, like `mutate()` and `summarise()`. Normally we would end a
-`group_by()` by using `ungroup()`, especially if we want to do multiple
-wrangling functions on the same grouping. Because sometimes, especially
-after using `summarise()`, we don't need to keep the grouping. So we can
-use the `.groups = "drop"` argument in `summarise()` to end the
-grouping.
-:::
-
 ```{r}
 #| filename: "doc/learning.qmd"
 #| eval: false
 rr_df <- import_multiple_files("RR.csv", import_rr)
 rr_df |>
   group_by(file_path_id, day) |>
-  summarise(across(ibi_s, list(mean = mean)),
-    .groups = "drop"
+  summarise(across(ibi_s, list(mean = mean))
   )
 ```
 
@@ -753,8 +741,7 @@ rr_df |>
 rr_df <- import_multiple_files("RR.csv", import_rr)
 rr_df |>
   group_by(file_path_id, day) |>
-  summarise(across(ibi_s, list(mean = mean)),
-    .groups = "drop"
+  summarise(across(ibi_s, list(mean = mean))
   ) |>
   trim_filepath_for_book()
 ```
@@ -767,17 +754,15 @@ While there are no missing values here, let's add the argument
 #| eval: false
 rr_df |>
   group_by(file_path_id, day) |>
-  summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE))),
-    .groups = "drop"
+  summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE)))
   )
 ```
 
 ```{r admin-rr-summarise-na-rm-for-book}
 #| echo: false
 rr_df |>
   group_by(file_path_id, day) |>
-  summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE))),
-    .groups = "drop"
+  summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE)))
   ) |>
   trim_filepath_for_book()
 ```
@@ -794,9 +779,9 @@ summarised_rr_df <- rr_df |>
     across(ibi_s, list(
       mean = \(x) mean(x, na.rm = TRUE),
       sd = \(x) sd(x, na.rm = TRUE)
-    )),
-    .groups = "drop"
+    ))
   )
+  
 summarised_rr_df
 ```
 
@@ -808,9 +793,9 @@ summarised_rr_df <- rr_df |>
     across(ibi_s, list(
       mean = \(x) mean(x, na.rm = TRUE),
       sd = \(x) sd(x, na.rm = TRUE)
-    )),
-    .groups = "drop"
+    ))
   )
+
 summarised_rr_df |>
   trim_filepath_for_book()
 ```
@@ -853,6 +838,16 @@ function does not provide any visual indication of what is happening.
 However, in the background, it removes certain metadata that the
 `group_by()` function added.
 
+::: {.callout-note appearance="default"}
+By default, using `group_by()` continues the grouping effect of later
+code, like `mutate()` and `summarise()`. Normally we would end a
+`group_by()` by using `ungroup()`, especially if we want to do multiple
+wrangling functions on the same grouping. Because sometimes, especially
+after using `summarise()`, we don't need to keep the grouping. So we can
+use the `.groups = "drop"` argument in `summarise()` to end the
+grouping.
+:::
+
 Before continuing, let's run `{styler}` with {{< var keybind.styler >}}
 and knit the Quarto document with {{< var keybind.render >}} to confirm
 that everything runs as it should. If the knitting works, then switch to
@@ -976,3 +971,4 @@ changes to the Git history with {{< var keybind.git >}}.
 rm(actigraph_df, rr_df)
 save.image(here::here("_temp/functionals.RData"))
 ```
+