Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 03-counterfactuals.qmd #245

Merged
merged 3 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/quarto.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@ jobs:

- name: Install Google Fonts
run: |
brew tap homebrew/cask
brew tap homebrew/cask-fonts
brew install font-open-sans

- name: Query dependencies
Expand Down
4 changes: 2 additions & 2 deletions _freeze/chapters/03-counterfactuals/execute-results/html.json

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions _freeze/index/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
{
"hash": "ef3157b3e2d1d97cd016993a6b9fcb07",
"hash": "b13605468b718b90b4a5eec92fa59ff7",
"result": {
"markdown": "# Preface {.unnumbered}\n\nWelcome to *Causal Inference in R*.\nAnswering causal questions is critical for scientific and business purposes, but techniques like randomized clinical trials and A/B testing are not always practical or successful.\nThe tools in this book will allow readers better make causal inferences with observational data with the R programming language.\nBy its end, we hope to help you:\n\n1. Ask better causal questions.\n2. Understand the assumptions needed for causal inference\n3. Identify the target population for which you want to make inferences\n4. Fit causal models and check their problems\n5. Conduct sensitivity analyses where the techniques we use might be imperfect\n\nThis book is for both academic researchers and data scientists.\nAlthough the questions may differ between these settings, many techniques are the same: causal inference is as helpful for asking questions about cancer as it is about clicks.\nWe use a mix of examples from medicine, economics, and tech to demonstrate that you need a clear causal question and a willingness to be transparent about your assumptions.\n\nYou'll learn a lot in this book, but ironically, you won't learn much about conducting randomized trials, one of the best tools for causal inferences.\nRandomized trials, and their cousins, A/B tests (standard in the tech world), are compelling because they alleviate many of the assumptions we need to make for valid inferences.\nThey are also sufficiently complex in design to merit their own learning resources.\nInstead, we'll focus on observational data where we don't usually benefit from randomization.\nIf you're interested in randomization techniques, don't put away this resource just yet: many causal inference techniques designed for observational data improve randomized analyses, too.\n\nWe're making a few assumptions about you as a reader:\n\n1. You're familiar with the [tidyverse](https://www.tidyverse.org/) ecosystem of R packages and their general philosophy. For instance, we use a lot of dplyr and ggplot2 in this book, but we won't explain their basic grammar. To learn more about starting with the tidyverse, we recommend [R for Data Science](https://r4ds.hadley.nz/).\n2. You're familiar with basic statistical modeling in R. For instance, we'll fit many models with `lm()` and `glm()`, but we won't discuss how they work. If you want to learn more about R's powerful modeling functions, we recommend reading [\"A Review of R Modeling Fundamentals\"](https://www.tmwr.org/base-r.html) in [Tidy Modeling with R](https://www.tmwr.org).\n3. We also assume you have familiarity with other R basics, such as [writing functions](https://r4ds.hadley.nz/functions.html). [R for Data Science](https://r4ds.hadley.nz/) is also a good resource for these topics. (For a deeper dive into the R programming language, we recommend [Advanced R](https://adv-r.hadley.nz/index.html), although we don't assume you have mastered its material for this book).\n\nWe'll also use tools from the tidymodels ecosystem, a set of R packages for modeling related to the tidyverse.\nWe don't assume you have used them before.\ntidymodels also focuses on predictive modeling, so many of its tools aren't appropriate for this book.\nNevertheless, if you are interested in this topic, we recommend [Tidy Modeling with R](https://www.tmwr.org).\n\nThere are also several other excellent books on causal inference.\nThis book is different in its focus on R, but it's still helpful to see this area from other perspectives.\nA few books you might like:\n\n- [*Causal Inference: What If?*](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)\n- [*Causal Inference: The Mixtape*](https://mixtape.scunning.com/)\n- [*The Effect*](https://theeffectbook.net/)\n\nThe first book is focused on epidemiology.\nThe latter two are focused on econometrics.\nWe also recommend *The Book of Why* @pearl2018why for more on causal diagrams.\n\n## Conventions\n\n### Modern R Features\n\nWe use two modern R features in R 4.1.0 and above in this book.\nThe first is the native pipe, `|>`.\nThis R feature is similar to the tidyverse's `%>%`, with which you may be more familiar.\nIn typical cases, the two work interchangeably.\nOne notable difference is that `|>` uses the `_` symbol to direct the pipe's results, e.g., `.df |> lm(y ~ x, data = _)`.\nSee [this Tidyverse Blog post](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) for more on this topic.\n\nAnother modern R feature we use is the native lambda, a way of writing short functions that looks like `\\(.x) do_something(.x)`.\nIt is similar to purrr's `~` lambda notation.\nIt's also helpful to realize the native lambda is identical to `function(.x) do_something(.x)`, where `\\` is shorthand for `function`.\nSee [R for Data Science's chapter on iteration](https://r4ds.hadley.nz/iteration.html) for more on this topic.\n\n## Theming\n\nThe plots in this book use a consistent theme that we don't include in every code chunk, meaning if you run the code for a visualization, you might get a slightly different-looking result.\nWe set the following defaults related to ggplot2:\n\n<!-- TODO: make sure these are up to date -->\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(\n # set default colors in ggplot2 to colorblind-friendly\n # Okabe-Ito and Viridis palettes\n ggplot2.discrete.colour = ggokabeito::palette_okabe_ito(),\n ggplot2.discrete.fill = ggokabeito::palette_okabe_ito(),\n ggplot2.continuous.colour = \"viridis\",\n ggplot2.continuous.fill = \"viridis\",\n # set theme font and size\n book.base_family = \"sans\",\n book.base_size = 14\n)\n\nlibrary(ggplot2)\n\n# set default theme\ntheme_set(\n theme_minimal(\n base_size = getOption(\"book.base_size\"),\n base_family = getOption(\"book.base_family\")\n ) %+replace%\n theme(\n panel.grid.minor = element_blank(),\n legend.position = \"bottom\"\n )\n)\n```\n:::\n\n\nWe also mask a few functions from ggdag that we like to customize:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntheme_dag <- function() {\n ggdag::theme_dag(base_family = getOption(\"book.base_family\"))\n}\n\ngeom_dag_label_repel <- function(..., seed = 10) {\n ggdag::geom_dag_label_repel(\n aes(x, y, label = label),\n box.padding = 3.5,\n inherit.aes = FALSE,\n max.overlaps = Inf,\n family = getOption(\"book.base_family\"),\n seed = seed,\n label.size = NA,\n label.padding = 0.1,\n size = getOption(\"book.base_size\") / 3,\n ...\n )\n}\n```\n:::\n",
"supporting": [],
"markdown": "# Preface {.unnumbered}\n\nWelcome to *Causal Inference in R*.\nAnswering causal questions is critical for scientific and business purposes, but techniques like randomized clinical trials and A/B testing are not always practical or successful.\nThe tools in this book will allow readers to better make causal inferences with observational data with the R programming language.\nBy its end, we hope to help you:\n\n1. Ask better causal questions.\n2. Understand the assumptions needed for causal inference\n3. Identify the target population for which you want to make inferences\n4. Fit causal models and check their problems\n5. Conduct sensitivity analyses where the techniques we use might be imperfect\n\nThis book is for both academic researchers and data scientists.\nAlthough the questions may differ between these settings, many techniques are the same: causal inference is as helpful for asking questions about cancer as it is about clicks.\nWe use a mix of examples from medicine, economics, and tech to demonstrate that you need a clear causal question and a willingness to be transparent about your assumptions.\n\nYou'll learn a lot in this book, but ironically, you won't learn much about conducting randomized trials, one of the best tools for causal inferences.\nRandomized trials, and their cousins, A/B tests (standard in the tech world), are compelling because they alleviate many of the assumptions we need to make for valid inferences.\nThey are also sufficiently complex in design to merit their own learning resources.\nInstead, we'll focus on observational data where we don't usually benefit from randomization.\nIf you're interested in randomization techniques, don't put away this resource just yet: many causal inference techniques designed for observational data improve randomized analyses, too.\n\nWe're making a few assumptions about you as a reader:\n\n1. You're familiar with the [tidyverse](https://www.tidyverse.org/) ecosystem of R packages and their general philosophy. For instance, we use a lot of dplyr and ggplot2 in this book, but we won't explain their basic grammar. To learn more about starting with the tidyverse, we recommend [*R for Data Science*](https://r4ds.hadley.nz/).\n2. You're familiar with basic statistical modeling in R. For instance, we'll fit many models with `lm()` and `glm()`, but we won't discuss how they work. If you want to learn more about R's powerful modeling functions, we recommend reading [\"A Review of R Modeling Fundamentals\"](https://www.tmwr.org/base-r.html) in [*Tidy Modeling with R*](https://www.tmwr.org).\n3. We also assume you have familiarity with other R basics, such as [writing functions](https://r4ds.hadley.nz/functions.html). [*R for Data Science*](https://r4ds.hadley.nz/) is also a good resource for these topics. (For a deeper dive into the R programming language, we recommend [*Advanced R*](https://adv-r.hadley.nz/index.html), although we don't assume you have mastered its material for this book).\n\nWe'll also use tools from the tidymodels ecosystem, a set of R packages for modeling related to the tidyverse.\nWe don't assume you have used them before.\ntidymodels also focuses on predictive modeling, so many of its tools aren't appropriate for this book.\nNevertheless, if you are interested in this topic, we recommend [*Tidy Modeling with R*](https://www.tmwr.org).\n\nThere are also several other excellent books on causal inference.\nThis book is different in its focus on R, but it's still helpful to see this area from other perspectives.\nA few books you might like:\n\n- [*Causal Inference: What If?*](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)\n- [*Causal Inference: The Mixtape*](https://mixtape.scunning.com/)\n- [*The Effect*](https://theeffectbook.net/)\n\nThe first book is focused on epidemiology.\nThe latter two are focused on econometrics.\nWe also recommend *The Book of Why* @pearl2018why for more on causal diagrams.\n\n## Conventions\n\n### Modern R Features\n\nWe use two modern R features in R 4.1.0 and above in this book.\nThe first is the native pipe, `|>`.\nThis R feature is similar to the tidyverse's `%>%`, with which you may be more familiar.\nIn typical cases, the two work interchangeably.\nOne notable difference is that `|>` uses the `_` symbol to direct the pipe's results, e.g., `.df |> lm(y ~ x, data = _)`.\nSee [this Tidyverse Blog post](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) for more on this topic.\n\nAnother modern R feature we use is the native lambda, a way of writing short functions that looks like `\\(.x) do_something(.x)`.\nIt is similar to purrr's `~` lambda notation.\nIt's also helpful to realize the native lambda is identical to `function(.x) do_something(.x)`, where `\\` is shorthand for `function`.\nSee [R for Data Science's chapter on iteration](https://r4ds.hadley.nz/iteration.html) for more on this topic.\n\n## Theming\n\nThe plots in this book use a consistent theme that we don't include in every code chunk, meaning if you run the code for a visualization, you might get a slightly different-looking result.\nWe set the following defaults related to ggplot2:\n\n<!-- TODO: make sure these are up to date -->\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(\n # set default colors in ggplot2 to colorblind-friendly\n # Okabe-Ito and Viridis palettes\n ggplot2.discrete.colour = ggokabeito::palette_okabe_ito(),\n ggplot2.discrete.fill = ggokabeito::palette_okabe_ito(),\n ggplot2.continuous.colour = \"viridis\",\n ggplot2.continuous.fill = \"viridis\",\n # set theme font and size\n book.base_family = \"sans\",\n book.base_size = 14\n)\n\nlibrary(ggplot2)\n\n# set default theme\ntheme_set(\n theme_minimal(\n base_size = getOption(\"book.base_size\"),\n base_family = getOption(\"book.base_family\")\n ) %+replace%\n theme(\n panel.grid.minor = element_blank(),\n legend.position = \"bottom\"\n )\n)\n```\n:::\n\n\nWe also mask a few functions from ggdag that we like to customize:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntheme_dag <- function() {\n ggdag::theme_dag(base_family = getOption(\"book.base_family\"))\n}\n\ngeom_dag_label_repel <- function(..., seed = 10) {\n ggdag::geom_dag_label_repel(\n aes(x, y, label = label),\n box.padding = 3.5,\n inherit.aes = FALSE,\n max.overlaps = Inf,\n family = getOption(\"book.base_family\"),\n seed = seed,\n label.size = NA,\n label.padding = 0.1,\n size = getOption(\"book.base_size\") / 3,\n ...\n )\n}\n```\n:::\n",
"supporting": [
"index_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
12 changes: 6 additions & 6 deletions chapters/03-counterfactuals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ data_observed <- data |>
# change the exposure to randomized, generate from a binomial distribution
# with a probability 0.5 for being in either group
exposure = case_when(
rbinom(10, 1, 0.5) == 1 ~ "chocolate",
rbinom(n(), 1, 0.5) == 1 ~ "chocolate",
TRUE ~ "vanilla"
),
observed_outcome = case_when(
Expand Down Expand Up @@ -314,8 +314,8 @@ set.seed(11)
data_observed <- data |>
mutate(
exposure_unobserved = case_when(
rbinom(10, 1, 0.25) == 1 ~ "chocolate (spoiled)",
rbinom(10, 1, 0.25) == 1 ~ "chocolate",
rbinom(n(), 1, 0.25) == 1 ~ "chocolate (spoiled)",
rbinom(n(), 1, 0.25) == 1 ~ "chocolate",
TRUE ~ "vanilla"
),
observed_outcome = case_when(
Expand Down Expand Up @@ -355,7 +355,7 @@ set.seed(11)
data_observed <- data |>
mutate(
exposure = case_when(
rbinom(10, 1, 0.5) == 1 ~ "chocolate",
rbinom(n(), 1, 0.5) == 1 ~ "chocolate",
TRUE ~ "vanilla"
),
exposure_partner =
Expand Down Expand Up @@ -452,9 +452,9 @@ data_observed <- data |>
prefer_chocolate = y_chocolate > y_vanilla,
exposure = case_when(
# people who like chocolate more chose that 80% of the time
prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), "chocolate", "vanilla"),
prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), "chocolate", "vanilla"),
# people who like vanilla more chose that 80% of the time
!prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), "vanilla", "chocolate")
!prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), "vanilla", "chocolate")
),
observed_outcome = case_when(
exposure == "chocolate" ~ y_chocolate,
Expand Down
Loading