Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
lwjohnst86 committed May 7, 2024
2 parents e472f89 + 0096073 commit f015763
Showing 1 changed file with 65 additions and 69 deletions.
134 changes: 65 additions & 69 deletions sessions/functionals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ Specific objectives are to:

1. Explain what functional programming, vectorization, and functionals
are within R and identify when code is a functional or uses
functional programming. Then to apply this knowledge by using the
functional programming. Then apply this knowledge using the
`{purrr}` package.
2. Review the split-apply-combine technique and identify how these
concepts connect to functional programming.
3. Apply functional programming to summarizing data and for using the
3. Apply functional programming to summarize data using the
split-apply-combine technique.

## Functional programming
Expand Down Expand Up @@ -236,21 +236,13 @@ for you and for us as instructors). And because we use Git, nothing is
truly gone so you can always go back to the text later. Next, we restart
the R session with {{< var keybind.restart-r >}}.

Next, we'll need to add `{purrr}` as a package dependency by going to
the **Console** and running:

``` {.r filename="Console"}
usethis::use_package("purrr")
```

Since `{purrr}` is part of the `{tidyverse}`, we don't need to load it
with `library()`. Before we'll use the `map()` functional, we need to
get a vector or list of all the dataset files available to us. We will
return to using the `{fs}` package, which has a function called
`dir_ls()` that finds files of a certain pattern. So, let's add
`library(fs)` to the `setup` code chunk. Then, go to the bottom of the
`doc/learning.qmd` document, create a new header called `## Using map`,
and create a code chunk below that with {{< var keybind.chunk >}}
Before we'll use the `map()` functional, we need to get a vector or list
of all the dataset files available to us. We will return to using the
`{fs}` package, which has a function called `dir_ls()` that finds files
of a certain pattern. So, let's add `library(fs)` to the `setup` code
chunk. Then, go to the bottom of the `doc/learning.qmd` document, create
a new header called `## Using map`, and create a code chunk below that
with {{< var keybind.chunk >}}

The `dir_ls()` function takes the path that we want to search
(`data-raw/mmash/`), uses the argument `regexp` (short for [regular
Expand Down Expand Up @@ -281,8 +273,16 @@ user_info_files
head(gsub(".*\\/data-raw", "data-raw", user_info_files), 3)
```

Alright, we now have all the files ready to give to `map()`. So let's
try it!
Alright, we now have all the files ready to give to `map()`. But before
using it, we'll need to add `{purrr}`, where `map()` comes from as a
package dependency by going to the **Console** and running:

``` {.r filename="Console"}
usethis::use_package("purrr")
```

Since `{purrr}` is part of the `{tidyverse}`, we don't need to load it
with `library()`. So let's try it!

```{r}
#| filename: "doc/learning.qmd"
Expand All @@ -303,7 +303,7 @@ datasets! But we're missing an important bit of information: The user
ID. A powerful feature of the `{purrr}` package is that it has other
functions to make it easier to work with functionals. We know `map()`
always outputs a list. But what we want is a single data frame at the
end that also contains the user ID information.
end that also contains the user ID.

The function that will take a list and convert it into a data frame is
called `list_rbind()` to bind ("stack") by rows or `list_cbind()` to
Expand Down Expand Up @@ -482,11 +482,13 @@ we can move on and open up the `data-raw/mmash.R` script. If not, it
means that there is an issue in your code and that it won't be
reproducible.

Before continuing, we'll move the `library(fs)` line to right below the
`library(here)`. Then, inside `data-raw/mmash.R`, copy and paste the two
lines of code in the code chunk above to the bottom of the script.
Afterwards, go the top of the script and right below the `library(fs)`
code, add these two lines of code, so it looks like this:
Before continuing, we'll collect our imported packages in the top of the
script by adding the `library(fs)` line to right below `library(here)`.
Then, inside `data-raw/mmash.R`, copy and paste the two lines of code
that creates the `user_info_df` and `saliva_df` to the bottom of the
script (i.e., the two lines in the code chunk above). Afterwards, go the
top of the script and right below the `library(fs)` code, add these two
lines of code, so it looks like this:

``` {.r filename="data-raw/mmash.R"}
library(here)
Expand Down Expand Up @@ -519,30 +521,23 @@ technique, which we covered in the beginner R course. The method is:
3. Combine the results to present them together (e.g. into a data frame
that you can use to make a plot or table).

So when you split data into multiple groups, you make a *vector* that
you can then apply (i.e. using the *map* functional) some statistical
technique to each group through *vectorization*. This technique works
really well for a range of tasks, including for our task of summarizing
some of the MMASH data so we can merge it all into one dataset.
So when you split data into multiple groups, you create a list (or a
*vector*) that you can then use (with the *map* functional) to apply a
statistical technique to each group through *vectorization*. This
technique works really well for a range of tasks, including for our task
of summarizing some of the MMASH data so we can merge it all into one
dataset.

## Summarising data through functionals {#sec-summarise-with-functionals}

::: {.callout-note appearance="minimal" collapse="true"}
## Instructor note

Before starting this section, ask how many have used the pipe before. If
everyone has, then move on. If some haven't, very briefly explain it,
but **do not** use much time on it since we will be using it shortly and
they will see how it works then. We covered this in the introduction
course, so we should not cover it again here.
:::

Functionals and vectorization are integral components of how R works and
they appear throughout many of R's functions and packages. They are
particularly used throughout the `{tidyverse}` packages like `{dplyr}`.
Let's get into some more advanced features of `{dplyr}` functions that
work as functionals. Before we continue, re-run the code for getting
`user_info_df` since you had restarted the R session previously.
work as functionals.

Before we continue, re-run the code for getting `user_info_df` since you
had restarted the R session previously.

Since we're going to use `{dplyr}`, we need to add it as a dependency by
typing this in the **Console**:
Expand All @@ -557,9 +552,10 @@ the [Data Management and
Wrangling](https://r-cubed-intro.rostools.org/sessions/data-management.html#managing-and-working-with-data-in-r)
session of the beginner course). The common usage of these verbs is
through acting on and directly using the column names (e.g. without `"`
quotes around the column name). But many `{dplyr}` verbs can also take
functions as input, especially when using the column selection helpers
from the `{tidyselect}` package.
quotes around the column name like with
`saliva_df |> select(cortisol_norm)`). But many `{dplyr}` verbs can also
take functions as input, especially when using the column selection
helpers from the `{tidyselect}` package.

Likewise, with functions like `summarise()`, if you want to for example
calculate the mean of cortisol in the saliva dataset, you would usually
Expand Down Expand Up @@ -591,8 +587,11 @@ saliva_df |>

But instead, there is the `across()` function that works like `map()`
and allows you to calculate the mean across which ever columns you want.
In many ways, `across()` is a duplicate of `map()`, particularly in the
arguments you give it.
In many ways, `across()` is similar to `map()`, particularly in the
arguments you give it and in the sense that it is a functional. But they
are used in different settings: `across()` works well with columns
within a dataframe and within a `mutate()` or `summarise()`, while
`map()` is more generic.

::: callout-note
## Reading task: \~2 minutes
Expand Down Expand Up @@ -727,24 +726,13 @@ way, we use the split-apply-combine technique. Let's first summarise by
taking the mean of `ibi_s` (which is the inter-beat interval in
seconds).

::: {.callout-note appearance="default"}
By default, using `group_by()` continues the grouping effect of later
code, like `mutate()` and `summarise()`. Normally we would end a
`group_by()` by using `ungroup()`, especially if we want to do multiple
wrangling functions on the same grouping. Because sometimes, especially
after using `summarise()`, we don't need to keep the grouping. So we can
use the `.groups = "drop"` argument in `summarise()` to end the
grouping.
:::

```{r}
#| filename: "doc/learning.qmd"
#| eval: false
rr_df <- import_multiple_files("RR.csv", import_rr)
rr_df |>
group_by(file_path_id, day) |>
summarise(across(ibi_s, list(mean = mean)),
.groups = "drop"
summarise(across(ibi_s, list(mean = mean))
)
```

Expand All @@ -753,8 +741,7 @@ rr_df |>
rr_df <- import_multiple_files("RR.csv", import_rr)
rr_df |>
group_by(file_path_id, day) |>
summarise(across(ibi_s, list(mean = mean)),
.groups = "drop"
summarise(across(ibi_s, list(mean = mean))
) |>
trim_filepath_for_book()
```
Expand All @@ -767,17 +754,15 @@ While there are no missing values here, let's add the argument
#| eval: false
rr_df |>
group_by(file_path_id, day) |>
summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE))),
.groups = "drop"
summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE)))
)
```

```{r admin-rr-summarise-na-rm-for-book}
#| echo: false
rr_df |>
group_by(file_path_id, day) |>
summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE))),
.groups = "drop"
summarise(across(ibi_s, list(mean = \(x) mean(x, na.rm = TRUE)))
) |>
trim_filepath_for_book()
```
Expand All @@ -794,9 +779,9 @@ summarised_rr_df <- rr_df |>
across(ibi_s, list(
mean = \(x) mean(x, na.rm = TRUE),
sd = \(x) sd(x, na.rm = TRUE)
)),
.groups = "drop"
))
)
summarised_rr_df
```

Expand All @@ -808,9 +793,9 @@ summarised_rr_df <- rr_df |>
across(ibi_s, list(
mean = \(x) mean(x, na.rm = TRUE),
sd = \(x) sd(x, na.rm = TRUE)
)),
.groups = "drop"
))
)
summarised_rr_df |>
trim_filepath_for_book()
```
Expand Down Expand Up @@ -853,6 +838,16 @@ function does not provide any visual indication of what is happening.
However, in the background, it removes certain metadata that the
`group_by()` function added.

::: {.callout-note appearance="default"}
By default, using `group_by()` continues the grouping effect of later
code, like `mutate()` and `summarise()`. Normally we would end a
`group_by()` by using `ungroup()`, especially if we want to do multiple
wrangling functions on the same grouping. Because sometimes, especially
after using `summarise()`, we don't need to keep the grouping. So we can
use the `.groups = "drop"` argument in `summarise()` to end the
grouping.
:::

Before continuing, let's run `{styler}` with {{< var keybind.styler >}}
and knit the Quarto document with {{< var keybind.render >}} to confirm
that everything runs as it should. If the knitting works, then switch to
Expand Down Expand Up @@ -976,3 +971,4 @@ changes to the Git history with {{< var keybind.git >}}.
rm(actigraph_df, rr_df)
save.image(here::here("_temp/functionals.RData"))
```

0 comments on commit f015763

Please sign in to comment.