From 86ed6697a166e09f696cb2d431b98369e5c7e02e Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Mon, 28 Oct 2024 09:54:08 -0400 Subject: [PATCH] style: :lipstick: show where code should be written in Closes #149 --- preamble/pre-course.qmd | 12 +++-- sessions/pipelines.qmd | 70 +++++++++++++++++----------- sessions/smoother-collaboration.qmd | 5 ++ sessions/stats-analyses-basic.qmd | 29 ++++++++++-- sessions/stats-analyses-multiple.qmd | 20 +++++++- 5 files changed, 99 insertions(+), 37 deletions(-) diff --git a/preamble/pre-course.qmd b/preamble/pre-course.qmd index ec06516..5388191 100644 --- a/preamble/pre-course.qmd +++ b/preamble/pre-course.qmd @@ -113,7 +113,7 @@ If you *have not* used it, please do these tasks: from the introduction course. Specifically, type in the RStudio Console: - ``` r + ```{.r filename="Console"} # There will be a pop-up to type in your name (first and # last), as well as your email r3::setup_git_config() @@ -134,7 +134,7 @@ If you *have not* used it, please do these tasks: Regardless of whether you've done the steps above or not, *everyone* needs to run: -``` r +```{.r filename="Console"} r3::check_setup() ``` @@ -207,6 +207,7 @@ When the RStudio Project opens up again, run these commands in the R Console to finish the setup: ```{r more-project-setup} +#| filename: Console #| purl: true # Add Git to the project prodigenr::setup_with_git() @@ -258,6 +259,7 @@ Please do these two tasks: `doc/` folder. ```{r add-rmarkdown-doc} +#| filename: Console #| purl: true r3::create_qmd_doc() ``` @@ -295,6 +297,7 @@ your `AdvancedR3` R Project, go to the Console pane in RStudio and type out: ```{r create-data-raw} +#| filename: Console #| purl: true usethis::use_data_raw("nmr-omics") ``` @@ -313,7 +316,7 @@ The R script should have opened up for you, otherwise, go into the thing to do is delete all the code in the script. Than, copy and paste the code below into the script. -```{r insert-data-raw-script, file=here::here('data-raw/nmr-omics.R')} +```{r insert-data-raw-script, file=here::here('data-raw/nmr-omics.R'), filename="data-raw/nmr-omics.R"} ``` ```{r purl-only-paste-processing-code} @@ -367,6 +370,7 @@ ignore the files created in the `data-raw/nmr-omics/` folder. In the Console, type out the code below. You only need to do this once. ```{r ignore-xlsx-files} +#| filename: Console #| purl: true usethis::use_git_ignore("data-raw/nmr-omics/") ``` @@ -380,7 +384,7 @@ git_ci( ) ``` -``` r +```{.r filename="Console"} r3::check_project_setup_advanced() ``` diff --git a/sessions/pipelines.qmd b/sessions/pipelines.qmd index 649b2b8..7bf8610 100644 --- a/sessions/pipelines.qmd +++ b/sessions/pipelines.qmd @@ -202,6 +202,7 @@ file with: ```{r add-targets-as-dep} #| purl: true +#| filename: Console use_package("targets") ``` @@ -210,6 +211,7 @@ to start using it! ```{r use-targets} #| purl: true +#| filename: Console targets::use_targets() ``` @@ -354,6 +356,7 @@ project library. `{tidyverse}` is a special "meta"-package so we need to add it to the `"depends"` section of the `DESCRIPTION` file. ```{r add-tidyverse-deps} +#| filename: Console #| purl: true use_package("tidyverse", "depends") ``` @@ -365,7 +368,7 @@ Now, let's start doing some data analysis so that we can add to our pipeline later on. First, open up the `doc/learning.qmd` file and create a new header and code chunk at the bottom of the file. -```` +````{.markdown filename="doc/learning.qmd"} ```{{r setup}} library(tidyverse) @@ -407,6 +410,7 @@ column (`across(where(is.numeric))`) to `round()` to 1 digit. Let's write out the code! ```{r mean-sd-by-each-metabolite} +#| filename: "doc/learning.qmd" #| eval: true lipidomics %>% group_by(metabolite) %>% @@ -429,7 +433,14 @@ to take the code we wrote above and convert it into a function. Complete these tasks: 1. Wrap the code with `function() {...}` and name the new function - `descriptive_stats`. + `descriptive_stats`. Here is some scaffolding to help you get started: + + ```{.r filename="doc/learning.qmd"} + descriptive_stats <- function(___) { + ___ + } + ``` + 2. Replace `lipidomics` with `data` and put `data` as an argument inside the brackets of `function()`. 3. Add `dplyr::` to the start of each `{dplyr}` function used inside @@ -458,14 +469,6 @@ be the output of the function. This is called an "implicit return" and we will be using this feature throughout the rest of this course. ::: -Here is some scaffolding to help you get started: - -``` r -descriptive_stats <- function(___) { - ___ -} -``` - ```{r solution-descriptive-stats} #| eval: true #| code-fold: true @@ -495,7 +498,9 @@ target to load the lipidomic data. In the second, replace it with the what the target output is, we can add `df_` to remind us that it is a data frame. It should look like: -``` r +```{r} +#| eval: false +#| filename: "targets.R" list( tar_target( name = lipidomics, @@ -511,7 +516,9 @@ list( Let's run `{targets}` to see what happens! You can either use {{< var keybind.targets-make >}} or run this code in the Console: -``` r +```{r} +#| eval: false +#| filename: "Console" targets::tar_make() ``` @@ -524,7 +531,9 @@ data file. We can accomplish this by using the argument `format = "file"` inside the `tar_target()` before loading the data using `{readr}`. -``` r +```{r} +#| eval: false +#| filename: "targets.R" list( tar_target( name = file, @@ -586,7 +595,7 @@ explicitly called via `::` (e.g. `%>%`). For now, we only need to add We can now put this code in the `packages` argument of `tar_option_set()` in the `_targets.R` file: -``` r +```{.r filename="targets.R"} packages = c("tibble", "dplyr") ``` @@ -616,6 +625,7 @@ automatically): ```{r visualize-targets} #| purl: true +#| filename: Console targets::tar_visnetwork() ``` @@ -623,6 +633,7 @@ Or to see what pipeline targets are outdated: ```{r outdated-targets} #| purl: true +#| filename: Console targets::tar_outdated() ``` @@ -646,10 +657,11 @@ write this code, let's add it to our `DESCRIPTION` file. ```{r add-ggplot2-deps} #| purl: true +#| filename: Console use_package("ggplot2") ``` -Next, we'll switch back to `doc/lesson.qmd` and write the code to this +Next, we'll switch back to `doc/learning.qmd` and write the code to this plot of the distribution of each metabolite. We'll use `geom_histogram()`, nothing too fancy. And since the data is already in long format, we can easily use `facet_wrap()` to create a plot for each @@ -659,6 +671,7 @@ have the same range of values (some are small, others are quite large). ```{r histogram-metabolites} #| fig-cap: "Histograms showing the distribution of all metabolites in the lipidomics dataset." #| eval: true +#| filename: "doc/learning.qmd" metabolite_distribution_plot <- ggplot(lipidomics, aes(x = value)) + geom_histogram() + facet_wrap(vars(metabolite), scales = "free") @@ -676,8 +689,16 @@ convert it into a function. Just like you did with the `descriptive_stats()` function in the exercise above, complete these tasks: -1. Wrap the plot code inside `doc/lesson.qmd` with `function() {...}` +1. Wrap the plot code inside `doc/learning.qmd` with `function() {...}` and name the new function `plot_distributions`. + Use this scaffolding code to help guide you to write the code into a + function. + + ```{.r filename="doc/learning.qmd"} + plot_distributions <- function(___) { + ___ + } + ``` 2. Replace `lipidomics` with `data` and put `data` as an argument inside the brackets of `function()`. 3. Add `ggplot2::` to the start of each `{ggplot2}` function used @@ -697,15 +718,6 @@ tasks: 8. Save both files and then open the Git interface and commit the changes you made to them with {{< var keybind.git >}}. -Use this scaffolding code to help guide you to write the code into a -function. - -``` r -plot_distributions <- function(___) { - ___ -} -``` - ```{r solution-new-function-descriptive-plots} #| eval: false #| code-fold: true @@ -732,7 +744,7 @@ this `tar_target()` item within the `list()` inside `_targets.R`. To make it easier to track things, add `fig_` to the start of the `name` given. -``` r +```{.r filename="targets.R"} list( ..., tar_target( @@ -788,6 +800,7 @@ install the helper package `{tarchetypes}` first, as well as the ```{r tarchetypes-deps} #| purl: true +#| filename: Console use_package("tarchetypes") use_package("quarto") ``` @@ -801,7 +814,7 @@ needs, and the file path to the Quarto file. Again, like the other using the `doc/learning.qmd` as a sandbox, we won't include it as a pipeline target. Instead we will use the `doc/learning.qmd` file: -``` r +```{.r filename="targets.R"} list( ..., tar_quarto( @@ -828,7 +841,7 @@ use of the `targets::tar_read()`. -```` +````{.markdown filename="doc/learning.qmd"} --- # YAML header --- @@ -892,6 +905,7 @@ string, you can use columns from a data frame, like `value_mean`. So we can use it to format the final table text to be `mean value (SD value)`: ```{r stats-to-table} +#| filename: "doc/learning.qmd" targets::tar_read(df_stats_by_metabolite) %>% mutate(MeanSD = glue::glue("{value_mean} ({value_sd})")) %>% select(Metabolite = metabolite, `Mean SD` = MeanSD) %>% diff --git a/sessions/smoother-collaboration.qmd b/sessions/smoother-collaboration.qmd index af08b95..9592f67 100644 --- a/sessions/smoother-collaboration.qmd +++ b/sessions/smoother-collaboration.qmd @@ -249,6 +249,7 @@ these tasks: #| code-fold: true #| code-summary: "**Click for the solution**. Only click if you are really struggling or are out of time for the exercise." #| purl: true +#| filename: Console usethis::use_package("stringr") usethis::use_package("readxl") usethis::use_package("dplyr") @@ -319,6 +320,7 @@ and add some code into it. ```{r add-usethis-rprofile} #| eval: false +#| filename: Console usethis::use_usethis() ``` @@ -335,6 +337,7 @@ Let's restart R with {{< var keybind.restart-r >}} before using `use_package()` to add `{usethis}` as a workflow dependency. ```{r suggests-dep} +#| filename: Console #| eval: false #| purl: true use_package("usethis", "suggests") @@ -429,6 +432,7 @@ it to the `DESCRIPTION` file. ```{r add-styler-dep} #| eval: false #| purl: true +#| filename: Console use_package("styler", "suggests") ``` @@ -462,6 +466,7 @@ If you wanted to run `{styler}` on all the files, we can use: ```{r style-dir} #| eval: false +#| filename: Console styler::style_dir() ``` diff --git a/sessions/stats-analyses-basic.qmd b/sessions/stats-analyses-basic.qmd index 813808c..b7eba4f 100644 --- a/sessions/stats-analyses-basic.qmd +++ b/sessions/stats-analyses-basic.qmd @@ -366,6 +366,7 @@ find this "store" of outputs, and import the data using `tar_read()`. We also need to add `library(tidymodels)` to the `setup` code chunk. Copy and paste this code chunk below into the Quarto file. +```` {.markdown filename="doc/learning.qmd"} ```{{r setup}} targets::tar_config_set(store = here::here("_targets")) library(tidyverse) @@ -374,6 +375,7 @@ library(tidymodels) source(here::here("R/functions.R")) lipidomics <- tar_read(lipidomics) ``` +```` Since we will be using `{tidymodels}`, we need to install it, as well as explicitly add the `{parsnip}`, `{recipes}`, and `{workflows}` packages. @@ -382,6 +384,7 @@ is a "meta-package". We might need to force installing it with `pak::pak("tidymodels")`. ```{r tidymodels-to-deps} +#| filename: Console #| purl: true #| eval: false use_package("tidymodels", "depends") @@ -395,7 +398,7 @@ Before continuing, let's **commit** the changes to the Git history with {{< var keybind.git >}}. Next, in the `doc/learning.qmd` file, on the bottom of the document create a new header and code chunk: -```` +````{.markdown filename="doc/learning.qmd"} ## Building the model ```{{r}} @@ -406,6 +409,7 @@ bottom of the document create a new header and code chunk: In the new code chunk, we will set up the model specs: ```{r logistic-reg-specs} +#| filename: "doc/learning.qmd" log_reg_specs <- logistic_reg() %>% set_engine("glm") log_reg_specs @@ -431,6 +435,7 @@ can be fixed with `{recipes}`. Can you spot them? ```{r print-lipidomics-data} #| column: page-inset-right +#| filename: Console lipidomics ``` @@ -449,6 +454,7 @@ that there seems to be a data input error, since there are three `Cholesterol` values, while all other metabolites only have one: ```{r too-many-cholesterols} +#| filename: "doc/learning.qmd" lipidomics %>% count(code, metabolite) %>% filter(n > 1) @@ -477,6 +483,7 @@ by setting the `values_fn` with `mean`. ```{r lipidomic-to-wider} #| column: page-inset-right +#| filename: "doc/learning.qmd" lipidomics_wide <- lipidomics %>% mutate(metabolite = snakecase::to_snake_case(metabolite)) %>% pivot_wider( @@ -495,6 +502,7 @@ moving them over into the `R/functions.R` file. ```{r first-snakecase-fn} #| column: page-inset-right +#| filename: "doc/learning.qmd" column_values_to_snake_case <- function(data) { data %>% dplyr::mutate(metabolite = snakecase::to_snake_case(metabolite)) @@ -571,6 +579,7 @@ We can use curly-curly (combined with `across()`) to apply ```{r second-snakecase-fn} #| column: page-inset-right +#| filename: "doc/learning.qmd" column_values_to_snake_case <- function(data, columns) { data %>% dplyr::mutate(dplyr::across({{ columns }}, snakecase::to_snake_case)) @@ -583,10 +592,10 @@ lipidomics %>% Move this new function over into the `R/functions.R` file, add Roxygen documentation with {{< var keybind.roxygen >}}, style using {{< var keybind.styler >}}, `source()` the modified `R/functions.R` file -with {{< var keybind.source >}}, and add the new function above the -`pivot_wider()` code in the `doc/learning.qmd` file. +with {{< var keybind.source >}} ```{r new-function-column-values-to-snakecase} +#| filename: "R/functions.R" #' Convert a column's character values to snakecase format. #' #' @param data The lipidomics dataset. @@ -663,6 +672,7 @@ predictors, we can explicitly select which variables are which. This has some nice features that we will use later on. ```{r recipes-without-formula} +#| filename: "doc/learning.qmd" recipe(lipidomics_wide) %>% update_role(metabolite_cholesterol, age, gender, new_role = "predictor") %>% update_role(class, new_role = "outcome") @@ -708,6 +718,7 @@ distribution. This means we can more easily compare values between variables. We can add this to the end of the recipe: ```{r recipes-with-step-normalize} +#| filename: "doc/learning.qmd" recipe(lipidomics_wide) %>% update_role(metabolite_cholesterol, age, gender, new_role = "predictor") %>% update_role(class, new_role = "outcome") %>% @@ -722,6 +733,7 @@ might use a different metabolite later. Note, when adding all the from the `{tidyselect}` package. ```{r new-function-create-recipe-spec} +#| filename: "R/functions.R" #' A transformation recipe to pre-process the data. #' #' @param data The lipidomics dataset. @@ -740,6 +752,7 @@ create_recipe_spec <- function(data, metabolite_variable) { And test it out: ```{r use-create-recipe-specs-fn} +#| filename: "doc/learning.qmd" #| column: page-inset-right recipe_specs <- lipidomics_wide %>% create_recipe_spec(metabolite_cholesterol) @@ -777,6 +790,7 @@ slightly different types). All model workflows need to start with ```{r use-workflow-for-model} #| column: page-inset-right +#| filename: "doc/learning.qmd" workflow() %>% add_model(log_reg_specs) %>% add_recipe(recipe_specs) @@ -788,6 +802,7 @@ workflow that we've used before, where the function should ultimately be inside the `R/functions.R` file. ```{r new-function-create-model-workflow} +#| filename: "R/functions.R" #' Create a workflow object of the model and transformations. #' #' @param model_specs The model specs @@ -809,6 +824,7 @@ creation from scratch: ```{r full-model-workflow-from-almost-scratch} #| column: page-inset-right +#| filename: "doc/learning.qmd" model_workflow <- create_model_workflow( logistic_reg() %>% set_engine("glm"), @@ -823,6 +839,7 @@ Now, we can do the final thing: Fitting the data to the model with ```{r fit-model-workflow-to-data} #| column: page-inset-right +#| filename: "doc/learning.qmd" fitted_model <- model_workflow %>% fit(lipidomics_wide) fitted_model @@ -836,6 +853,7 @@ the `extract_fit_parsnip()` function. ```{r extract-model-fit} #| column: page-inset-right +#| filename: "doc/learning.qmd" fitted_model %>% extract_fit_parsnip() ``` @@ -848,6 +866,7 @@ dependencies: ```{r broom-to-deps} #| purl: true #| eval: false +#| filename: Console use_package("broom") ``` @@ -861,6 +880,7 @@ coefficient. Here we choose `exponentiate = TRUE`: ```{r tidy-up-model-results} #| column: page-inset-right +#| filename: "doc/learning.qmd" fitted_model %>% extract_fit_parsnip() %>% tidy(exponentiate = TRUE) @@ -872,6 +892,7 @@ thing here: Make another function (and move it to `R/functions.R`)! :stuck_out_tongue: ```{r new-function-tidy-model-output} +#| filename: "R/functions.R" #' Create a tidy output of the model results. #' #' @param workflow_fitted_model The model workflow object that has been fitted. @@ -889,6 +910,7 @@ Replacing the code in the `doc/learning.qmd` file to use the function. ```{r use-tidy-model-output-fn} #| column: page-inset-right +#| filename: "doc/learning.qmd" fitted_model %>% tidy_model_output() ``` @@ -897,6 +919,7 @@ If we revise the code so it is one pipe, it would look like: ```{r single-pipe-model-results} #| column: page-inset-right +#| filename: "doc/learning.qmd" create_model_workflow( logistic_reg() %>% set_engine("glm"), diff --git a/sessions/stats-analyses-multiple.qmd b/sessions/stats-analyses-multiple.qmd index 349481e..aeaaa6c 100644 --- a/sessions/stats-analyses-multiple.qmd +++ b/sessions/stats-analyses-multiple.qmd @@ -89,7 +89,7 @@ the `lipidomics_wide` dataset. However, these types of long form. So we'll start with the original `lipidomics` dataset. Create a header and code chunk at the end of the `doc/learning.qmd` file: -```` +````{.markdown filename="doc/learning.qmd"} ## Running multiple models ```{{r}} @@ -101,6 +101,7 @@ The first thing we want to do is convert the metabolite names into snake case: ```{r chain-col-to-snakecase} +#| filename: "doc/learning.qmd" lipidomics %>% column_values_to_snake_case(metabolite) ``` @@ -122,6 +123,7 @@ dependency: ```{r purrr-to-deps} #| purl: true #| eval: false +#| filename: Console use_package("purrr") ``` @@ -131,6 +133,7 @@ three. ```{r chain-split-by-metabolite} #| eval: false +#| filename: "doc/learning.qmd" lipidomics %>% column_values_to_snake_case(metabolite) %>% group_split(metabolite) @@ -151,6 +154,7 @@ the first three): ```{r chain-map-to-wider} #| eval: false +#| filename: "doc/learning.qmd" lipidomics %>% column_values_to_snake_case(metabolite) %>% group_split(metabolite) %>% @@ -176,6 +180,7 @@ into the `R/functions.R` file, and then `source()` the file with {{< var keybind.source >}}. ```{r new-function-split-by-metabolite} +#| filename: "R/functions.R" #' Convert the long form dataset into a list of wide form data frames. #' #' @param data The lipidomics dataset. @@ -193,6 +198,7 @@ split_by_metabolite <- function(data) { In the `doc/learning.qmd`, use the new function in the code: ```{r split-by-metabolite} +#| filename: "doc/learning.qmd" #| eval: false lipidomics %>% split_by_metabolite() @@ -217,6 +223,7 @@ move into the `R/functions.R` file, and then `source()` the file with {{< var keybind.source >}}. ```{r new-function-generate-model-results} +#| filename: "R/functions.R" #' Generate the results of a model #' #' @param data The lipidomics dataset. @@ -239,6 +246,7 @@ Then we add it to the end of the pipe, but using `map()` and `list_rbind()` to convert to a data frame: ```{r chain-generate-model-results} +#| filename: "doc/learning.qmd" lipidomics %>% split_by_metabolite() %>% map(generate_model_results) %>% @@ -250,6 +258,7 @@ let's keep only the `term` rows that are metabolites using `filter()` and `str_detect()`. ```{r chain-filter-terms} +#| filename: "doc/learning.qmd" model_estimates <- lipidomics %>% split_by_metabolite() %>% map(generate_model_results) %>% @@ -268,6 +277,7 @@ create a duplicate column of `metabolite` called `term` (to match the `model_estimates`) using `mutate()`. ```{r duplicate-original-vars} +#| filename: "doc/learning.qmd" lipidomics %>% select(metabolite) %>% mutate(term = metabolite) @@ -277,6 +287,7 @@ Right after that we will use our custom `column_values_to_snake_case()` function on the `term` column. ```{r dup-column-to-snakecase} +#| filename: "doc/learning.qmd" lipidomics %>% select(metabolite) %>% mutate(term = metabolite) %>% @@ -287,6 +298,7 @@ We can see that we are missing the `metabolite_` text before each snake case'd name, so we can add that with `mutate()` and `str_c()`: ```{r dup-column-append-metabolite} +#| filename: "doc/learning.qmd" lipidomics %>% select(metabolite) %>% mutate(term = metabolite) %>% @@ -300,6 +312,7 @@ There are 504 rows, but we only need the unique values of `term` and only the `metabolite` and `term` variables. ```{r dup-column-distinct} +#| filename: "doc/learning.qmd" lipidomics %>% mutate(term = metabolite) %>% column_values_to_snake_case(term) %>% @@ -310,6 +323,7 @@ lipidomics %>% The last step is to `right_join()` with the `model_estimates`: ```{r dup-column-full-join} +#| filename: "doc/learning.qmd" lipidomics %>% mutate(term = metabolite) %>% column_values_to_snake_case(term) %>% @@ -470,7 +484,7 @@ inside the `## Results` section. We'll want to use `tar_read(df_model_estimates)` so that `{targets}` is aware that the R Markdown file is dependent on this target. -```` +````{.markdown filename="doc/learning.qmd"} ### Figure of model estimates ```{{r}} @@ -522,6 +536,7 @@ dot-whisker plots, the "geom" we would use is called but adding `std.error` instead. ```{r plot-estimates-pointrange-only} +#| filename: "doc/learning.qmd" plot_estimates <- model_estimates %>% ggplot(aes( x = estimate, @@ -541,6 +556,7 @@ eventually need to troubleshoot this issue, but for now, let's restrict the x axis to be between 0 and 5. ```{r plot-estimates-coord-fixed} +#| filename: "doc/learning.qmd" plot_estimates + coord_fixed(xlim = c(0, 5)) ```