diff --git a/.github/workflows/quarto.yaml b/.github/workflows/quarto.yaml index ef4c7249..8fa7e1e0 100644 --- a/.github/workflows/quarto.yaml +++ b/.github/workflows/quarto.yaml @@ -31,8 +31,6 @@ jobs: - name: Install Google Fonts run: | - brew tap homebrew/cask - brew tap homebrew/cask-fonts brew install font-open-sans - name: Query dependencies diff --git a/_freeze/chapters/03-counterfactuals/execute-results/html.json b/_freeze/chapters/03-counterfactuals/execute-results/html.json index 57e9c9d1..eda24059 100644 --- a/_freeze/chapters/03-counterfactuals/execute-results/html.json +++ b/_freeze/chapters/03-counterfactuals/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "4e09aee172d6a4be4bd0de6a0280898e", + "hash": "621b9d86b8f405d857ab549998bdb303", "result": { - "markdown": "# Estimating counterfactuals {#sec-counterfactuals}\n\n\n\n\n\n## Potential Outcomes {#sec-potential}\n\nLet's begin by thinking about the philosophical concept of a *potential outcome.* Prior to some \"cause\" occurring, for example receiving some exposure, the *potential outcomes* are all of the potential things that could occur depending on what you are exposed to.\nFor simplicity, let's assume an exposure has two levels:\n\n- $X=1$ if you are exposed\n\n- $X=0$ if you are not exposed\n\nUnder this simple scenario, there are two potential outcomes:\n\n- $Y(1)$ the potential outcome if you are exposed\n\n- $Y(0)$ the potential outcome if you are not exposed\n\nOnly *one* of these potential outcomes will actually be realized, the one corresponding to the exposure that actually occurred, and therefore only one is observable.\nIt is important to remember that these exposures are defined at a particular instance in time, so only one can happen to any individual.\nIn the case of a binary exposure, this leaves one potential outcome as *observable* and one *missing.* In fact, early causal inference methods were often framed as missing data problems; we need to make certain assumptions about the *missing counterfactuals*, the value of the potential outcome corresponding to the exposure(s) that did not occur.\n\nOur causal effect of interest is often some difference in potential outcomes $Y(1) - Y(0)$, averaged over a particular population.\n\n## Counterfactuals\n\nConceptually, the missing counterfactual outcome is one that would have occurred under a different set of circumstances.\nIn causal inference, we *wish* we could observe the conterfactual outcome that would have occurred in an alternate universe where the exposure status for a given observation was flipped.\nTo do this, we attempt to control for all factors that are related to an exposure and outcome such that we can *construct* (or estimate) such a counterfactual outcome.\n\nLet's think about a specific example.\nIce-T, best known as an American rapper and Fin on Law and Order: SVU, co-authored a book titled \"Split Decision: Life Stories\", published in 2022.\nHere is the synopsis:\n\n> **Award-winning actor, rapper, and producer Ice-T unveils a compelling memoir of his early life robbing jewelry stores until he found fame and fortune---while a handful of bad choices sent his former crime partner down an incredibly different path.**\\\n> \\\n> Ice-T rose to fame in the late 1980s, earning acclaim for his music before going on to enthrall television audiences as Odafin \"Fin\" Tutuola in *Law & Order: Special Victims Unit*.\n> But it could have gone much differently.\\\n>\n> \\\n> In this \"poignant and powerful\" (*Library Journal*, starred review) memoir, Ice-T and Spike, his former crime partner---collaborating with *New York Times* bestselling author Douglas Century---relate the shocking stories of their shared pasts, and how just a handful of decisions led to their incredibly different lives.\n> Both grew up in violent, gang-controlled Los Angeles neighborhoods and worked together to orchestrate a series of jewelry heists.\\\n>\n> \\\n> But while Ice-T was discovered rapping in a club and got his first record deal, Spike was caught for a jewelry robbery and did three years in prison.\n> As his music career began to take off, Ice made the decision to abandon the criminal life; Spike continued to plan increasingly ingenious and risky jewel heists.\n> And in 1992, after one of Spike's robberies ended tragically, he was sentenced to thirty-five years to life.\n> While he sat behind bars, he watched his former partner rise to fame in music, movies, and television.\\\n>\n> \\\n> \"Propulsive\" (*Publishers Weekly*, starred review), timely, and thoughtful, two men with two very different lives reveal how their paths might have very well been reversed if they made different choices.\n> All it took was a *split decision*.\n> [@split]\n\nThis premise is compelling because it implies that we are observing a *counterfactual*.\nThe book begins by setting up all the ways Ice-T and his friend Spike were similar prior to some important moment (both grew up in Los Angeles neighborhoods, both were involved with gangs, both worked together to orchestrate a series of jewelry heists, etc).\nThen something happens -- Ice-T makes a decision to abandon criminal life and Spike makes the opposite decision.\nWhat happens next for Ice-T includes fame and fortune, while Spike ends up with 35 years to life in prison.\nThis book is attempting a small study, two people who prior to some event were the same and after were different -- Spike's outcomes serve as the counterfactual to Ice-T's.\n\n::: {#tbl-causal-map layout-ncol=\"1\"}\n\n```{mermaid}\n%%| echo: false\nflowchart LR\nA{Ice-T} --> |observed| B(Abandons criminal life)\nA -.-> |missing counterfactual| C(Does one more heist)\nC -.-> D[35 years in prison]\nB --> E[Fame & Fortune]\n\nclassDef grey fill:#ddd\nclass D,C grey\n```\n\n```{mermaid}\n%%| echo: false\nflowchart LR\nA{Spike} -.-> |missing counterfactual| B(Abandons criminal life)\nA --> |observed| C(Does one more heist)\nC --> D[35 years in prison]\nB -.-> E[Fame & Fortune]\nclassDef grey fill:#ddd\nclass E,B grey\n```\n\n\nIce-T and Spike Causal Map\n:::\n\nIn practice, this is what we attempt to do with causal inference techniques.\nEven randomized trials are limited to a single factual world, so we compare the average effects of groups with different exposures.\nNow, having this as a concrete example of an attempt to construct a counterfactual scenario in the \"real-world\" there are several issues that we can immediately see, highlighting the difficulty in drawing such inference.\nFirst, while the synopsis implies that the two individuals were similar prior to the precipitating event that dictated their future opposite directions, we can easily identify factors in which perhaps they differed.\nIce-T decided to leave his life of crime, but that wasn't the only factor in his success: he had enough musical talent to make a career of it.\nDid Spike have Ice-T's musical talent?\nCan we really conclude that his life would have turned out exactly like Ice-T's if he had made the exact same choices as Ice-T?\nIf we want to truly estimate the causal effect of the decision to leave criminal life on Ice-T's future outcomes, we would need to observe his ultimate course both under making the decision and not.\nOf course this is not possible, so what can we do?\nPerhaps we can find someone else who is exactly like Ice-T who did not make the same decision and see how they fare.\nOf course, Ice-T is unique, it would be challenging to find someone exactly like him.\nAgain, this is attempted with Spike, and even so presents challenges.\nOften, instead of relying on a single individual, we rely on many individuals.\nWe could conduct an experiment where we *randomize* many individuals to leave criminal life (or not) and see how this impacts their outcomes *on average* (this randomized trial seems to present some ethical issues, perhaps we need to look to *observational* studies to help answer this question).\nIn any case, we must rely on statistical techniques to help construct these unobservable counterfactuals.\n\n### Potential Outcomes Simulation {#sec-po-sim}\n\nLet's suppose some happiness index, from 1-10 exists.\nWe are interested in assessing whether eating chocolate ice cream versus vanilla will increase happiness.\nWe have 10 individuals with two potential outcomes for each, one is what their happiness would be if they ate chocolate ice cream, (defined as `y_chocolate` in the code below), and one is what their happiness would be if they ate vanilla ice cream (defined as `y_vanilla` in the code below). We can define the true causal effect of eating chocolate ice cream (versus vanilla) on happiness for each individual as the difference between the two (@tbl-po).\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\ndata <- data |>\n mutate(causal_effect = y_chocolate - y_vanilla)\n\ndata\n```\n:::\n\n::: {#tbl-po .cell tbl-cap='Potential Outcomes Simulation: The causal effect of eating chocolate (versus vanilla) ice cream on happiness'}\n::: {.cell-output-display}\n`````{=html}\n\n \n\n\n\n\n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Potential Outcomes
Causal Effect
id $$Y_i(\\textrm{chocolate})$$ $$Y_i(\\textrm{vanilla})$$ $$Y_i(\\textrm{chocolate}) - Y_i(\\textrm{vanilla})$$
1 4 1 3
2 4 3 1
3 6 4 2
4 5 5 0
5 6 5 1
6 5 6 -1
7 6 8 -2
8 7 6 1
9 5 3 2
10 6 5 1
\n\n`````\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata |>\n summarize(\n avg_chocolate = mean(y_chocolate),\n avg_vanilla = mean(y_vanilla),\n avg_causal_effect = mean(causal_effect)\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n avg_chocolate avg_vanilla avg_causal_effect\n1 5.4 4.6 0.8\n```\n\n\n:::\n:::\n\n\nFor example, examining @tbl-po, the causal effect of eating chocolate ice cream (versus vanilla) for individual `4` is 0, whereas the causal effect for individual `9` is 2. The *average* potential happiness after eating chocolate is 5.4 and the *average* potential happiness after eating vanilla is 4.6. The *average* treatment effect of eating chocolate (versus vanilla) ice cream among the ten individuals in this study is 0.8. \n\nIn reality, we cannot observe both potential outcomes, in any moment in time, each individual in our study can only eat *one* flavor of ice cream. Suppose we let our participants choose which ice cream they wanted to eat and each choose their favorite (i.e. they knew which would make them \"happier\" and picked that one. Now what we *observe* is shown in @tbl-obs.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n # people who like chocolate more chose that\n y_chocolate > y_vanilla ~ \"chocolate\",\n # people who like vanilla more chose that\n y_vanilla >= y_chocolate ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n # we can only observe the exposure and one potential outcome\n select(id, exposure, observed_outcome)\ndata_observed\n```\n:::\n\n::: {#tbl-obs .cell tbl-cap='Potential Outcomes Simulation: The observed exposure and outcome used to estimate the effect of eating chocolate (versus vanilla) ice cream on happiness'}\n::: {.cell-output-display}\n`````{=html}\n\n \n\n\n\n\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Exposure
Observed Outcome
id $$X_i$$ $$Y_i$$
1 chocolate 4
2 chocolate 4
3 chocolate 6
4 vanilla 5
5 chocolate 6
6 vanilla 6
7 vanilla 8
8 chocolate 7
9 chocolate 5
10 chocolate 6
\n\n`````\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.43\n2 vanilla 6.33\n```\n\n\n:::\n:::\n\n\nNow, the *observed* average outcome among those who ate chocolate ice cream is 5.4 (the same as the true average potential outcome), while the *observed* average outcome among those who ate vanilla is 6.3 -- quite different from the *actual* average (4.6). The estimated causal effect here could be calculated as 5.4 - 6.3 = -0.9. \n\nIt turns out here, these 10 participants *chose* which ice cream they wanted to eat and they always chose to eat their favorite! This artificially made it look like eating vanilla ice cream would increase the happiness in this population when in fact we know the opposite is true. The next section will discuss which assumptions need to be true in order to allow us to *accurately* estimate causal effects using observed data. As a sneak peak, our issue here was that how the exposure was decided, if instead we *randomized* who ate chocolate versus vanilla ice cream we would (on average, with a large enough sample) recover the true causal effect.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## we are doing something *random* so let's set a seed so we always observe the\n## same result each time we run the code\nset.seed(11)\ndata_observed <- data |>\n mutate(\n # change the exposure to randomized, generate from a binomial distribution\n # with a probability 0.5 for being in either group\n exposure = case_when(\n rbinom(10, 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n # we can only observe the exposure and one potential outcome\n select(id, exposure, observed_outcome)\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.33\n2 vanilla 4.71\n```\n\n\n:::\n:::\n\n\n## Causal Assumptions {#sec-assump}\n\nLike most statistical approaches, the validity of a causal analysis depends on how well certain assumptions are met.\nAs mentioned in @sec-potential, the potential outcomes framework envisions that each individual possesses a range of potential outcomes for every conceivable value of some input.\nFor instance, as in the scenario previously described with two exposure levels (exposed: 1 and unexposed: 0), we can define potential outcomes for exposure ($Y(1)$) and no exposure ($Y(0)$), and subsequently analyze the difference between these outcomes, i.e., $Y(1) - Y(0)$, to comprehend the impact of the input (the exposure) on the outcome, $Y$.\nAt any given time, only one of these *potential outcomes* is observable -- namely, the outcome tied to the actual exposure the individual underwent.\nUnder certain assumptions, we can leverage data from individuals exposed to different inputs to compare the average differences in their observed outcomes.\nThe most common assumptions across the approaches we describe in this book are:\n\n1. **Consistency**: We assume that the causal question you claim you are answering is consistent with the one you are *actually* answering with your analysis. Mathematically, this means that $Y_{obs} = (X)Y(1) + (1 - X)Y(0)$, in other words, the outcome you observe is exactly equal to the potential outcome under the exposure you received. Two common ways to discuss this assumption are: \n * **Well defined exposure**: We assume that for each value of the exposure, there is no difference between subjects in the delivery of that exposure.\nPut another way, multiple versions of the treatment do not exist. \n * **No interference**: We assume that the outcome (technically all *potential* outcomes, regardless of whether they are observed) for any subject does not depend on another subject's exposure.\n \n::: callout-tip\n## Jargon\n\nAssumption 1 is sometimes referred to as *stable-unit-treatment-value-assumption* or SUTVA [@imbens2015causal].\nLikewise, these assumptions are sometimes referred to as *identifiability conditions* since we need them to hold in order to identify causal estimates.\n:::\n\n2. **Exchangeability**: We assume that within levels of relevant variables (confounders), exposed and unexposed subjects have an equal likelihood of experiencing any outcome prior to exposure; i.e. the exposed and unexposed subjects are exchangeable.\nThis assumption is sometimes referred to as **no unmeasured confounding**.\n\n3. **Positivity**: We assume that within each level and combination of the study variables used to achieve exchangeability, there are exposed and unexposed subjects.\nSaid differently, each individual has some chance of experiencing every available exposure level.\nSometimes this is referred to as the **probabilistic** assumption.\n\n\n\n\n::: callout-note\n## Apples-to-apples\n\nPractically, most of the assumptions we need to make for causal inference are so we can make an *apples-to-apples* comparison: we want to make sure we're comparing individuals that are similar --- who would serve as good proxies for each other's counterfactuals. \n\nThe phrase *apples-to-apples* stems from the saying \"comparing apples to oranges\", e.g. comparing two things that are incomparable. \n\nThat's only one way to say it. [There are a lot of variations worldwide](https://en.wikipedia.org/wiki/Apples_and_oranges). Here are some other things people incorrectly compare:\n\n* Cheese and chalk (UK English)\n* Apples and pears (German)\n* Potatoes and sweet potatoes (Latin American Spanish)\n* Grandmothers and toads (Serbian)\n* Horses and donkeys (Hindi)\n:::\n\n### Causal Assumptions Simulation\n\nLet's bring back our simulation from @sec-po-sim. Recall that we have individuals who will either eat chocolate or vanilla ice cream and we are interested in assessing the causal effect of this exposure on their happiness. Let's see how violations of each assumption may impact the estimation of the causal effect.\n\n#### Consistency violation\n\nTwo ways the consistency assumption can be violated is (1) lack of a well defined exposure and (2) interference. Let's see how these impact our ability to accurately estimate a causal effect.\n\n##### Well defined exposure\n\nSuppose that there were in fact two containers of chocolate ice cream, one of which was spoiled. Therefore, despite the fact that having an exposure \"chocolate\" could mean different things depending on where the individual's scoop came from (regular chocolate ice cream, or spoiled chocolate ice cream), we are lumping them all together under a single umbrella (hence the violation, we have \"multiple versions of treatment\"). You can see how this falls under consistency because the issue here is that the potential outcome we think we are estimating is not the one we are actually observing.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_spoiledchocolate = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n) |>\n mutate(causal_effect = y_chocolate - y_vanilla)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n exposure_unobserved = case_when(\n rbinom(10, 1, 0.25) == 1 ~ \"chocolate (spoiled)\",\n rbinom(10, 1, 0.25) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure_unobserved == \"chocolate (spoiled)\" ~ y_spoiledchocolate,\n exposure_unobserved == \"chocolate\" ~ y_chocolate,\n exposure_unobserved == \"vanilla\" ~ y_vanilla\n ),\n exposure = case_when(\n exposure_unobserved %in% c(\"chocolate (spoiled)\", \"chocolate\") ~ \"chocolate\",\n exposure_unobserved == \"vanilla\" ~ \"vanilla\"\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 2.75\n2 vanilla 4.67\n```\n\n\n:::\n:::\n\n\nWe know the *true* average causal effect of (unspoiled) chocolate in the sample is 0.8, however our estimated causal effect (because our data are not consistent with the question we are asking) is -1.9. This demonstrates what can go wrong when *well defined exposure* is violated.\n\n##### Interference \n\nInterference would mean that an individual's exposure impacts another's potential outcome. For example, let's say each individual has a partner, and their potential outcome depends on both what flavor of ice cream they ate *and* what flavor their partner ate. For example, in the simulation below, having a partner that received a different flavor of ice cream increases the happiness by two units.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n partner_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),\n y_chocolate_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_chocolate_vanilla = c(6, 6, 8, 7, 8, 7, 8, 9, 7, 8),\n y_vanilla_chocolate = c(3, 5, 6, 7, 7, 8, 10, 8, 5, 7),\n y_vanilla_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n rbinom(10, 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n exposure_partner =\n c(\"vanilla\", \"vanilla\", \"vanilla\", \"chocolate\", \"chocolate\", \"vanilla\", \"vanilla\", \"vanilla\", \"vanilla\", \"chocolate\"),\n observed_outcome = case_when(\n exposure == \"chocolate\" & exposure_partner == \"chocolate\" ~ y_chocolate_chocolate,\n exposure == \"chocolate\" & exposure_partner == \"vanilla\" ~ y_chocolate_vanilla,\n exposure == \"vanilla\" & exposure_partner == \"chocolate\" ~ y_vanilla_chocolate,\n exposure == \"vanilla\" & exposure_partner == \"vanilla\" ~ y_vanilla_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 7.33\n2 vanilla 5.57\n```\n\n\n:::\n:::\n\nNow our estimated causal effect (because interference exists) is 1.8. This demonstrates what can go wrong when *interference* occurs. One of the main ways to combat interference is change the *unit* under consideration. Here, each individual, each unique *id*, is considered a unit, and there is interference between units (i.e. between partners). If instead we consider each *partner* as a unit and randomize the partners rather than the individuals, we solve the interference issue, as there is not interference *between* different partner sets. This is sometimes referred to as a *cluster randomized trial*. What we decide to do within each cluster may depend on the causal question at hand. For example, if we want to know what would happen if *everyone* at chocolate ice cream versus if *everyone* at vanilla, we would want to randomize both partners to either chocolate or vanilla, as seen below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(11)\n\n## we are now randomizing the *partners* not the individuals\npartners <- data.frame(\n partner_id = 1:5,\n exposure = case_when(\n rbinom(5, 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n )\n)\ndata_observed <- data |>\n left_join(partners, by = \"partner_id\") |>\n mutate(\n # all partners have the same exposure\n exposure_partner = exposure,\n observed_outcome = case_when(\n exposure == \"chocolate\" & exposure_partner == \"chocolate\" ~ y_chocolate_chocolate,\n exposure == \"vanilla\" & exposure_partner == \"vanilla\" ~ y_vanilla_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.5 \n2 vanilla 4.38\n```\n\n\n:::\n:::\n\n\n#### Exchangeability violation\n\nWe have actually already seen an example of an exchangeability violation in @sec-po-sim. In that example, participants were able to choose the ice cream that they wanted to eat, so people who were more likely to have a positive effect from eating chocolate chose that, and those more likely to have a positive effect from eating vanilla chose that. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n # people who like chocolate more chose that\n y_chocolate > y_vanilla ~ \"chocolate\",\n # people who like vanilla more chose that\n y_vanilla >= y_chocolate ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.43\n2 vanilla 6.33\n```\n\n\n:::\n:::\n\n\nHow could we correct this? If we had some people who preferred chocolate ice cream but ended up taking vanilla instead, we could *adjust* for the preference, and the effect conditioned on this would no longer have an exchangeability issue. It turns out that this example as we have constructed it doesn't lend itself to this solution because participants chose their preferred flavor 100% of the time making this *also* a positivity violation. \n\n#### Positivity violation\n\nAs stated above, the previous example violates both *exchangeability* and *positivity*. How could we fix it? As long as *some* people chose outside their preference with some probability (even if it is small!) we can remove this violation. Let's say instead of everyone picking their flavor of preference 100% of the time, they just had a 80% chance of picking that flavor.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n prefer_chocolate = y_chocolate > y_vanilla,\n exposure = case_when(\n # people who like chocolate more chose that 80% of the time\n prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), \"chocolate\", \"vanilla\"),\n # people who like vanilla more chose that 80% of the time\n !prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), \"vanilla\", \"chocolate\")\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n select(id, prefer_chocolate, exposure, observed_outcome)\n\nlm(\n observed_outcome ~ I(exposure == \"chocolate\") + prefer_chocolate,\n data_observed\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nlm(formula = observed_outcome ~ I(exposure == \"chocolate\") + \n prefer_chocolate, data = data_observed)\n\nCoefficients:\n (Intercept) \n 6.156 \nI(exposure == \"chocolate\")TRUE \n 0.531 \n prefer_chocolateTRUE \n -1.469 \n```\n\n\n:::\n:::\n\nAfter *adjusting* for this variable (chocolate preference), we recover the correct causal effect. This value is not exactly the same as the truth we obtain with the (unobservable) potential outcomes because we are dealing with a small sample -- as our sample size increases this will get closer to the truth.\n\nCausal assumptions can be difficult to verify and may not hold for many data collection strategies.\nWe cannot overstate the importance of checking these criteria to the extent possible!\nFollowing any of the recipes in this book are unlikely to give valid answers if the causal assumptions are badly violated.\n", + "markdown": "# Estimating counterfactuals {#sec-counterfactuals}\n\n\n\n\n\n## Potential Outcomes {#sec-potential}\n\nLet's begin by thinking about the philosophical concept of a *potential outcome.* Prior to some \"cause\" occurring, for example receiving some exposure, the *potential outcomes* are all of the potential things that could occur depending on what you are exposed to.\nFor simplicity, let's assume an exposure has two levels:\n\n- $X=1$ if you are exposed\n\n- $X=0$ if you are not exposed\n\nUnder this simple scenario, there are two potential outcomes:\n\n- $Y(1)$ the potential outcome if you are exposed\n\n- $Y(0)$ the potential outcome if you are not exposed\n\nOnly *one* of these potential outcomes will actually be realized, the one corresponding to the exposure that actually occurred, and therefore only one is observable.\nIt is important to remember that these exposures are defined at a particular instance in time, so only one can happen to any individual.\nIn the case of a binary exposure, this leaves one potential outcome as *observable* and one *missing.* In fact, early causal inference methods were often framed as missing data problems; we need to make certain assumptions about the *missing counterfactuals*, the value of the potential outcome corresponding to the exposure(s) that did not occur.\n\nOur causal effect of interest is often some difference in potential outcomes $Y(1) - Y(0)$, averaged over a particular population.\n\n## Counterfactuals\n\nConceptually, the missing counterfactual outcome is one that would have occurred under a different set of circumstances.\nIn causal inference, we *wish* we could observe the conterfactual outcome that would have occurred in an alternate universe where the exposure status for a given observation was flipped.\nTo do this, we attempt to control for all factors that are related to an exposure and outcome such that we can *construct* (or estimate) such a counterfactual outcome.\n\nLet's think about a specific example.\nIce-T, best known as an American rapper and Fin on Law and Order: SVU, co-authored a book titled \"Split Decision: Life Stories\", published in 2022.\nHere is the synopsis:\n\n> **Award-winning actor, rapper, and producer Ice-T unveils a compelling memoir of his early life robbing jewelry stores until he found fame and fortune---while a handful of bad choices sent his former crime partner down an incredibly different path.**\\\n> \\\n> Ice-T rose to fame in the late 1980s, earning acclaim for his music before going on to enthrall television audiences as Odafin \"Fin\" Tutuola in *Law & Order: Special Victims Unit*.\n> But it could have gone much differently.\\\n>\n> \\\n> In this \"poignant and powerful\" (*Library Journal*, starred review) memoir, Ice-T and Spike, his former crime partner---collaborating with *New York Times* bestselling author Douglas Century---relate the shocking stories of their shared pasts, and how just a handful of decisions led to their incredibly different lives.\n> Both grew up in violent, gang-controlled Los Angeles neighborhoods and worked together to orchestrate a series of jewelry heists.\\\n>\n> \\\n> But while Ice-T was discovered rapping in a club and got his first record deal, Spike was caught for a jewelry robbery and did three years in prison.\n> As his music career began to take off, Ice made the decision to abandon the criminal life; Spike continued to plan increasingly ingenious and risky jewel heists.\n> And in 1992, after one of Spike's robberies ended tragically, he was sentenced to thirty-five years to life.\n> While he sat behind bars, he watched his former partner rise to fame in music, movies, and television.\\\n>\n> \\\n> \"Propulsive\" (*Publishers Weekly*, starred review), timely, and thoughtful, two men with two very different lives reveal how their paths might have very well been reversed if they made different choices.\n> All it took was a *split decision*.\n> [@split]\n\nThis premise is compelling because it implies that we are observing a *counterfactual*.\nThe book begins by setting up all the ways Ice-T and his friend Spike were similar prior to some important moment (both grew up in Los Angeles neighborhoods, both were involved with gangs, both worked together to orchestrate a series of jewelry heists, etc).\nThen something happens -- Ice-T makes a decision to abandon criminal life and Spike makes the opposite decision.\nWhat happens next for Ice-T includes fame and fortune, while Spike ends up with 35 years to life in prison.\nThis book is attempting a small study, two people who prior to some event were the same and after were different -- Spike's outcomes serve as the counterfactual to Ice-T's.\n\n::: {#tbl-causal-map layout-ncol=\"1\"}\n\n```{mermaid}\n%%| echo: false\nflowchart LR\nA{Ice-T} --> |observed| B(Abandons criminal life)\nA -.-> |missing counterfactual| C(Does one more heist)\nC -.-> D[35 years in prison]\nB --> E[Fame & Fortune]\n\nclassDef grey fill:#ddd\nclass D,C grey\n```\n\n```{mermaid}\n%%| echo: false\nflowchart LR\nA{Spike} -.-> |missing counterfactual| B(Abandons criminal life)\nA --> |observed| C(Does one more heist)\nC --> D[35 years in prison]\nB -.-> E[Fame & Fortune]\nclassDef grey fill:#ddd\nclass E,B grey\n```\n\n\nIce-T and Spike Causal Map\n:::\n\nIn practice, this is what we attempt to do with causal inference techniques.\nEven randomized trials are limited to a single factual world, so we compare the average effects of groups with different exposures.\nNow, having this as a concrete example of an attempt to construct a counterfactual scenario in the \"real-world\" there are several issues that we can immediately see, highlighting the difficulty in drawing such inference.\nFirst, while the synopsis implies that the two individuals were similar prior to the precipitating event that dictated their future opposite directions, we can easily identify factors in which perhaps they differed.\nIce-T decided to leave his life of crime, but that wasn't the only factor in his success: he had enough musical talent to make a career of it.\nDid Spike have Ice-T's musical talent?\nCan we really conclude that his life would have turned out exactly like Ice-T's if he had made the exact same choices as Ice-T?\nIf we want to truly estimate the causal effect of the decision to leave criminal life on Ice-T's future outcomes, we would need to observe his ultimate course both under making the decision and not.\nOf course this is not possible, so what can we do?\nPerhaps we can find someone else who is exactly like Ice-T who did not make the same decision and see how they fare.\nOf course, Ice-T is unique, it would be challenging to find someone exactly like him.\nAgain, this is attempted with Spike, and even so presents challenges.\nOften, instead of relying on a single individual, we rely on many individuals.\nWe could conduct an experiment where we *randomize* many individuals to leave criminal life (or not) and see how this impacts their outcomes *on average* (this randomized trial seems to present some ethical issues, perhaps we need to look to *observational* studies to help answer this question).\nIn any case, we must rely on statistical techniques to help construct these unobservable counterfactuals.\n\n### Potential Outcomes Simulation {#sec-po-sim}\n\nLet's suppose some happiness index, from 1-10 exists.\nWe are interested in assessing whether eating chocolate ice cream versus vanilla will increase happiness.\nWe have 10 individuals with two potential outcomes for each, one is what their happiness would be if they ate chocolate ice cream, (defined as `y_chocolate` in the code below), and one is what their happiness would be if they ate vanilla ice cream (defined as `y_vanilla` in the code below). We can define the true causal effect of eating chocolate ice cream (versus vanilla) on happiness for each individual as the difference between the two (@tbl-po).\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\ndata <- data |>\n mutate(causal_effect = y_chocolate - y_vanilla)\n\ndata\n```\n:::\n\n::: {#tbl-po .cell tbl-cap='Potential Outcomes Simulation: The causal effect of eating chocolate (versus vanilla) ice cream on happiness'}\n::: {.cell-output-display}\n`````{=html}\n\n \n\n\n\n\n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Potential Outcomes
Causal Effect
id $$Y_i(\\textrm{chocolate})$$ $$Y_i(\\textrm{vanilla})$$ $$Y_i(\\textrm{chocolate}) - Y_i(\\textrm{vanilla})$$
1 4 1 3
2 4 3 1
3 6 4 2
4 5 5 0
5 6 5 1
6 5 6 -1
7 6 8 -2
8 7 6 1
9 5 3 2
10 6 5 1
\n\n`````\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata |>\n summarize(\n avg_chocolate = mean(y_chocolate),\n avg_vanilla = mean(y_vanilla),\n avg_causal_effect = mean(causal_effect)\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n avg_chocolate avg_vanilla avg_causal_effect\n1 5.4 4.6 0.8\n```\n\n\n:::\n:::\n\n\nFor example, examining @tbl-po, the causal effect of eating chocolate ice cream (versus vanilla) for individual `4` is 0, whereas the causal effect for individual `9` is 2. The *average* potential happiness after eating chocolate is 5.4 and the *average* potential happiness after eating vanilla is 4.6. The *average* treatment effect of eating chocolate (versus vanilla) ice cream among the ten individuals in this study is 0.8. \n\nIn reality, we cannot observe both potential outcomes, in any moment in time, each individual in our study can only eat *one* flavor of ice cream. Suppose we let our participants choose which ice cream they wanted to eat and each choose their favorite (i.e. they knew which would make them \"happier\" and picked that one. Now what we *observe* is shown in @tbl-obs.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n # people who like chocolate more chose that\n y_chocolate > y_vanilla ~ \"chocolate\",\n # people who like vanilla more chose that\n y_vanilla >= y_chocolate ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n # we can only observe the exposure and one potential outcome\n select(id, exposure, observed_outcome)\ndata_observed\n```\n:::\n\n::: {#tbl-obs .cell tbl-cap='Potential Outcomes Simulation: The observed exposure and outcome used to estimate the effect of eating chocolate (versus vanilla) ice cream on happiness'}\n::: {.cell-output-display}\n`````{=html}\n\n \n\n\n\n\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Exposure
Observed Outcome
id $$X_i$$ $$Y_i$$
1 chocolate 4
2 chocolate 4
3 chocolate 6
4 vanilla 5
5 chocolate 6
6 vanilla 6
7 vanilla 8
8 chocolate 7
9 chocolate 5
10 chocolate 6
\n\n`````\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.43\n2 vanilla 6.33\n```\n\n\n:::\n:::\n\n\nNow, the *observed* average outcome among those who ate chocolate ice cream is 5.4 (the same as the true average potential outcome), while the *observed* average outcome among those who ate vanilla is 6.3 -- quite different from the *actual* average (4.6). The estimated causal effect here could be calculated as 5.4 - 6.3 = -0.9. \n\nIt turns out here, these 10 participants *chose* which ice cream they wanted to eat and they always chose to eat their favorite! This artificially made it look like eating vanilla ice cream would increase the happiness in this population when in fact we know the opposite is true. The next section will discuss which assumptions need to be true in order to allow us to *accurately* estimate causal effects using observed data. As a sneak peak, our issue here was that how the exposure was decided, if instead we *randomized* who ate chocolate versus vanilla ice cream we would (on average, with a large enough sample) recover the true causal effect.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## we are doing something *random* so let's set a seed so we always observe the\n## same result each time we run the code\nset.seed(11)\ndata_observed <- data |>\n mutate(\n # change the exposure to randomized, generate from a binomial distribution\n # with a probability 0.5 for being in either group\n exposure = case_when(\n rbinom(n(), 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n # we can only observe the exposure and one potential outcome\n select(id, exposure, observed_outcome)\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.33\n2 vanilla 4.71\n```\n\n\n:::\n:::\n\n\n## Causal Assumptions {#sec-assump}\n\nLike most statistical approaches, the validity of a causal analysis depends on how well certain assumptions are met.\nAs mentioned in @sec-potential, the potential outcomes framework envisions that each individual possesses a range of potential outcomes for every conceivable value of some input.\nFor instance, as in the scenario previously described with two exposure levels (exposed: 1 and unexposed: 0), we can define potential outcomes for exposure ($Y(1)$) and no exposure ($Y(0)$), and subsequently analyze the difference between these outcomes, i.e., $Y(1) - Y(0)$, to comprehend the impact of the input (the exposure) on the outcome, $Y$.\nAt any given time, only one of these *potential outcomes* is observable -- namely, the outcome tied to the actual exposure the individual underwent.\nUnder certain assumptions, we can leverage data from individuals exposed to different inputs to compare the average differences in their observed outcomes.\nThe most common assumptions across the approaches we describe in this book are:\n\n1. **Consistency**: We assume that the causal question you claim you are answering is consistent with the one you are *actually* answering with your analysis. Mathematically, this means that $Y_{obs} = (X)Y(1) + (1 - X)Y(0)$, in other words, the outcome you observe is exactly equal to the potential outcome under the exposure you received. Two common ways to discuss this assumption are: \n * **Well defined exposure**: We assume that for each value of the exposure, there is no difference between subjects in the delivery of that exposure.\nPut another way, multiple versions of the treatment do not exist. \n * **No interference**: We assume that the outcome (technically all *potential* outcomes, regardless of whether they are observed) for any subject does not depend on another subject's exposure.\n \n::: callout-tip\n## Jargon\n\nAssumption 1 is sometimes referred to as *stable-unit-treatment-value-assumption* or SUTVA [@imbens2015causal].\nLikewise, these assumptions are sometimes referred to as *identifiability conditions* since we need them to hold in order to identify causal estimates.\n:::\n\n2. **Exchangeability**: We assume that within levels of relevant variables (confounders), exposed and unexposed subjects have an equal likelihood of experiencing any outcome prior to exposure; i.e. the exposed and unexposed subjects are exchangeable.\nThis assumption is sometimes referred to as **no unmeasured confounding**.\n\n3. **Positivity**: We assume that within each level and combination of the study variables used to achieve exchangeability, there are exposed and unexposed subjects.\nSaid differently, each individual has some chance of experiencing every available exposure level.\nSometimes this is referred to as the **probabilistic** assumption.\n\n\n\n\n::: callout-note\n## Apples-to-apples\n\nPractically, most of the assumptions we need to make for causal inference are so we can make an *apples-to-apples* comparison: we want to make sure we're comparing individuals that are similar --- who would serve as good proxies for each other's counterfactuals. \n\nThe phrase *apples-to-apples* stems from the saying \"comparing apples to oranges\", e.g. comparing two things that are incomparable. \n\nThat's only one way to say it. [There are a lot of variations worldwide](https://en.wikipedia.org/wiki/Apples_and_oranges). Here are some other things people incorrectly compare:\n\n* Cheese and chalk (UK English)\n* Apples and pears (German)\n* Potatoes and sweet potatoes (Latin American Spanish)\n* Grandmothers and toads (Serbian)\n* Horses and donkeys (Hindi)\n:::\n\n### Causal Assumptions Simulation\n\nLet's bring back our simulation from @sec-po-sim. Recall that we have individuals who will either eat chocolate or vanilla ice cream and we are interested in assessing the causal effect of this exposure on their happiness. Let's see how violations of each assumption may impact the estimation of the causal effect.\n\n#### Consistency violation\n\nTwo ways the consistency assumption can be violated is (1) lack of a well defined exposure and (2) interference. Let's see how these impact our ability to accurately estimate a causal effect.\n\n##### Well defined exposure\n\nSuppose that there were in fact two containers of chocolate ice cream, one of which was spoiled. Therefore, despite the fact that having an exposure \"chocolate\" could mean different things depending on where the individual's scoop came from (regular chocolate ice cream, or spoiled chocolate ice cream), we are lumping them all together under a single umbrella (hence the violation, we have \"multiple versions of treatment\"). You can see how this falls under consistency because the issue here is that the potential outcome we think we are estimating is not the one we are actually observing.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_spoiledchocolate = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n) |>\n mutate(causal_effect = y_chocolate - y_vanilla)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n exposure_unobserved = case_when(\n rbinom(n(), 1, 0.25) == 1 ~ \"chocolate (spoiled)\",\n rbinom(n(), 1, 0.25) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure_unobserved == \"chocolate (spoiled)\" ~ y_spoiledchocolate,\n exposure_unobserved == \"chocolate\" ~ y_chocolate,\n exposure_unobserved == \"vanilla\" ~ y_vanilla\n ),\n exposure = case_when(\n exposure_unobserved %in% c(\"chocolate (spoiled)\", \"chocolate\") ~ \"chocolate\",\n exposure_unobserved == \"vanilla\" ~ \"vanilla\"\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 2.75\n2 vanilla 4.67\n```\n\n\n:::\n:::\n\n\nWe know the *true* average causal effect of (unspoiled) chocolate in the sample is 0.8, however our estimated causal effect (because our data are not consistent with the question we are asking) is -1.9. This demonstrates what can go wrong when *well defined exposure* is violated.\n\n##### Interference \n\nInterference would mean that an individual's exposure impacts another's potential outcome. For example, let's say each individual has a partner, and their potential outcome depends on both what flavor of ice cream they ate *and* what flavor their partner ate. For example, in the simulation below, having a partner that received a different flavor of ice cream increases the happiness by two units.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n partner_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5),\n y_chocolate_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_chocolate_vanilla = c(6, 6, 8, 7, 8, 7, 8, 9, 7, 8),\n y_vanilla_chocolate = c(3, 5, 6, 7, 7, 8, 10, 8, 5, 7),\n y_vanilla_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n rbinom(n(), 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n ),\n exposure_partner =\n c(\"vanilla\", \"vanilla\", \"vanilla\", \"chocolate\", \"chocolate\", \"vanilla\", \"vanilla\", \"vanilla\", \"vanilla\", \"chocolate\"),\n observed_outcome = case_when(\n exposure == \"chocolate\" & exposure_partner == \"chocolate\" ~ y_chocolate_chocolate,\n exposure == \"chocolate\" & exposure_partner == \"vanilla\" ~ y_chocolate_vanilla,\n exposure == \"vanilla\" & exposure_partner == \"chocolate\" ~ y_vanilla_chocolate,\n exposure == \"vanilla\" & exposure_partner == \"vanilla\" ~ y_vanilla_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 7.33\n2 vanilla 5.57\n```\n\n\n:::\n:::\n\nNow our estimated causal effect (because interference exists) is 1.8. This demonstrates what can go wrong when *interference* occurs. One of the main ways to combat interference is change the *unit* under consideration. Here, each individual, each unique *id*, is considered a unit, and there is interference between units (i.e. between partners). If instead we consider each *partner* as a unit and randomize the partners rather than the individuals, we solve the interference issue, as there is not interference *between* different partner sets. This is sometimes referred to as a *cluster randomized trial*. What we decide to do within each cluster may depend on the causal question at hand. For example, if we want to know what would happen if *everyone* at chocolate ice cream versus if *everyone* at vanilla, we would want to randomize both partners to either chocolate or vanilla, as seen below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(11)\n\n## we are now randomizing the *partners* not the individuals\npartners <- data.frame(\n partner_id = 1:5,\n exposure = case_when(\n rbinom(5, 1, 0.5) == 1 ~ \"chocolate\",\n TRUE ~ \"vanilla\"\n )\n)\ndata_observed <- data |>\n left_join(partners, by = \"partner_id\") |>\n mutate(\n # all partners have the same exposure\n exposure_partner = exposure,\n observed_outcome = case_when(\n exposure == \"chocolate\" & exposure_partner == \"chocolate\" ~ y_chocolate_chocolate,\n exposure == \"vanilla\" & exposure_partner == \"vanilla\" ~ y_vanilla_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.5 \n2 vanilla 4.38\n```\n\n\n:::\n:::\n\n\n#### Exchangeability violation\n\nWe have actually already seen an example of an exchangeability violation in @sec-po-sim. In that example, participants were able to choose the ice cream that they wanted to eat, so people who were more likely to have a positive effect from eating chocolate chose that, and those more likely to have a positive effect from eating vanilla chose that. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\ndata_observed <- data |>\n mutate(\n exposure = case_when(\n # people who like chocolate more chose that\n y_chocolate > y_vanilla ~ \"chocolate\",\n # people who like vanilla more chose that\n y_vanilla >= y_chocolate ~ \"vanilla\"\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n select(id, exposure, observed_outcome)\n\ndata_observed |>\n group_by(exposure) |>\n summarise(avg_outcome = mean(observed_outcome))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n exposure avg_outcome\n \n1 chocolate 5.43\n2 vanilla 6.33\n```\n\n\n:::\n:::\n\n\nHow could we correct this? If we had some people who preferred chocolate ice cream but ended up taking vanilla instead, we could *adjust* for the preference, and the effect conditioned on this would no longer have an exchangeability issue. It turns out that this example as we have constructed it doesn't lend itself to this solution because participants chose their preferred flavor 100% of the time making this *also* a positivity violation. \n\n#### Positivity violation\n\nAs stated above, the previous example violates both *exchangeability* and *positivity*. How could we fix it? As long as *some* people chose outside their preference with some probability (even if it is small!) we can remove this violation. Let's say instead of everyone picking their flavor of preference 100% of the time, they just had a 80% chance of picking that flavor.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata <- data.frame(\n id = 1:10,\n y_chocolate = c(4, 4, 6, 5, 6, 5, 6, 7, 5, 6),\n y_vanilla = c(1, 3, 4, 5, 5, 6, 8, 6, 3, 5)\n)\n\nset.seed(11)\ndata_observed <- data |>\n mutate(\n prefer_chocolate = y_chocolate > y_vanilla,\n exposure = case_when(\n # people who like chocolate more chose that 80% of the time\n prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), \"chocolate\", \"vanilla\"),\n # people who like vanilla more chose that 80% of the time\n !prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), \"vanilla\", \"chocolate\")\n ),\n observed_outcome = case_when(\n exposure == \"chocolate\" ~ y_chocolate,\n exposure == \"vanilla\" ~ y_vanilla\n )\n ) |>\n select(id, prefer_chocolate, exposure, observed_outcome)\n\nlm(\n observed_outcome ~ I(exposure == \"chocolate\") + prefer_chocolate,\n data_observed\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nlm(formula = observed_outcome ~ I(exposure == \"chocolate\") + \n prefer_chocolate, data = data_observed)\n\nCoefficients:\n (Intercept) \n 6.156 \nI(exposure == \"chocolate\")TRUE \n 0.531 \n prefer_chocolateTRUE \n -1.469 \n```\n\n\n:::\n:::\n\nAfter *adjusting* for this variable (chocolate preference), we recover the correct causal effect. This value is not exactly the same as the truth we obtain with the (unobservable) potential outcomes because we are dealing with a small sample -- as our sample size increases this will get closer to the truth.\n\nCausal assumptions can be difficult to verify and may not hold for many data collection strategies.\nWe cannot overstate the importance of checking these criteria to the extent possible!\nFollowing any of the recipes in this book are unlikely to give valid answers if the causal assumptions are badly violated.\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index a4695ac6..b4e533c9 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -1,8 +1,10 @@ { - "hash": "ef3157b3e2d1d97cd016993a6b9fcb07", + "hash": "b13605468b718b90b4a5eec92fa59ff7", "result": { - "markdown": "# Preface {.unnumbered}\n\nWelcome to *Causal Inference in R*.\nAnswering causal questions is critical for scientific and business purposes, but techniques like randomized clinical trials and A/B testing are not always practical or successful.\nThe tools in this book will allow readers better make causal inferences with observational data with the R programming language.\nBy its end, we hope to help you:\n\n1. Ask better causal questions.\n2. Understand the assumptions needed for causal inference\n3. Identify the target population for which you want to make inferences\n4. Fit causal models and check their problems\n5. Conduct sensitivity analyses where the techniques we use might be imperfect\n\nThis book is for both academic researchers and data scientists.\nAlthough the questions may differ between these settings, many techniques are the same: causal inference is as helpful for asking questions about cancer as it is about clicks.\nWe use a mix of examples from medicine, economics, and tech to demonstrate that you need a clear causal question and a willingness to be transparent about your assumptions.\n\nYou'll learn a lot in this book, but ironically, you won't learn much about conducting randomized trials, one of the best tools for causal inferences.\nRandomized trials, and their cousins, A/B tests (standard in the tech world), are compelling because they alleviate many of the assumptions we need to make for valid inferences.\nThey are also sufficiently complex in design to merit their own learning resources.\nInstead, we'll focus on observational data where we don't usually benefit from randomization.\nIf you're interested in randomization techniques, don't put away this resource just yet: many causal inference techniques designed for observational data improve randomized analyses, too.\n\nWe're making a few assumptions about you as a reader:\n\n1. You're familiar with the [tidyverse](https://www.tidyverse.org/) ecosystem of R packages and their general philosophy. For instance, we use a lot of dplyr and ggplot2 in this book, but we won't explain their basic grammar. To learn more about starting with the tidyverse, we recommend [R for Data Science](https://r4ds.hadley.nz/).\n2. You're familiar with basic statistical modeling in R. For instance, we'll fit many models with `lm()` and `glm()`, but we won't discuss how they work. If you want to learn more about R's powerful modeling functions, we recommend reading [\"A Review of R Modeling Fundamentals\"](https://www.tmwr.org/base-r.html) in [Tidy Modeling with R](https://www.tmwr.org).\n3. We also assume you have familiarity with other R basics, such as [writing functions](https://r4ds.hadley.nz/functions.html). [R for Data Science](https://r4ds.hadley.nz/) is also a good resource for these topics. (For a deeper dive into the R programming language, we recommend [Advanced R](https://adv-r.hadley.nz/index.html), although we don't assume you have mastered its material for this book).\n\nWe'll also use tools from the tidymodels ecosystem, a set of R packages for modeling related to the tidyverse.\nWe don't assume you have used them before.\ntidymodels also focuses on predictive modeling, so many of its tools aren't appropriate for this book.\nNevertheless, if you are interested in this topic, we recommend [Tidy Modeling with R](https://www.tmwr.org).\n\nThere are also several other excellent books on causal inference.\nThis book is different in its focus on R, but it's still helpful to see this area from other perspectives.\nA few books you might like:\n\n- [*Causal Inference: What If?*](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)\n- [*Causal Inference: The Mixtape*](https://mixtape.scunning.com/)\n- [*The Effect*](https://theeffectbook.net/)\n\nThe first book is focused on epidemiology.\nThe latter two are focused on econometrics.\nWe also recommend *The Book of Why* @pearl2018why for more on causal diagrams.\n\n## Conventions\n\n### Modern R Features\n\nWe use two modern R features in R 4.1.0 and above in this book.\nThe first is the native pipe, `|>`.\nThis R feature is similar to the tidyverse's `%>%`, with which you may be more familiar.\nIn typical cases, the two work interchangeably.\nOne notable difference is that `|>` uses the `_` symbol to direct the pipe's results, e.g., `.df |> lm(y ~ x, data = _)`.\nSee [this Tidyverse Blog post](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) for more on this topic.\n\nAnother modern R feature we use is the native lambda, a way of writing short functions that looks like `\\(.x) do_something(.x)`.\nIt is similar to purrr's `~` lambda notation.\nIt's also helpful to realize the native lambda is identical to `function(.x) do_something(.x)`, where `\\` is shorthand for `function`.\nSee [R for Data Science's chapter on iteration](https://r4ds.hadley.nz/iteration.html) for more on this topic.\n\n## Theming\n\nThe plots in this book use a consistent theme that we don't include in every code chunk, meaning if you run the code for a visualization, you might get a slightly different-looking result.\nWe set the following defaults related to ggplot2:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(\n # set default colors in ggplot2 to colorblind-friendly\n # Okabe-Ito and Viridis palettes\n ggplot2.discrete.colour = ggokabeito::palette_okabe_ito(),\n ggplot2.discrete.fill = ggokabeito::palette_okabe_ito(),\n ggplot2.continuous.colour = \"viridis\",\n ggplot2.continuous.fill = \"viridis\",\n # set theme font and size\n book.base_family = \"sans\",\n book.base_size = 14\n)\n\nlibrary(ggplot2)\n\n# set default theme\ntheme_set(\n theme_minimal(\n base_size = getOption(\"book.base_size\"),\n base_family = getOption(\"book.base_family\")\n ) %+replace%\n theme(\n panel.grid.minor = element_blank(),\n legend.position = \"bottom\"\n )\n)\n```\n:::\n\n\nWe also mask a few functions from ggdag that we like to customize:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntheme_dag <- function() {\n ggdag::theme_dag(base_family = getOption(\"book.base_family\"))\n}\n\ngeom_dag_label_repel <- function(..., seed = 10) {\n ggdag::geom_dag_label_repel(\n aes(x, y, label = label),\n box.padding = 3.5,\n inherit.aes = FALSE,\n max.overlaps = Inf,\n family = getOption(\"book.base_family\"),\n seed = seed,\n label.size = NA,\n label.padding = 0.1,\n size = getOption(\"book.base_size\") / 3,\n ...\n )\n}\n```\n:::\n", - "supporting": [], + "markdown": "# Preface {.unnumbered}\n\nWelcome to *Causal Inference in R*.\nAnswering causal questions is critical for scientific and business purposes, but techniques like randomized clinical trials and A/B testing are not always practical or successful.\nThe tools in this book will allow readers to better make causal inferences with observational data with the R programming language.\nBy its end, we hope to help you:\n\n1. Ask better causal questions.\n2. Understand the assumptions needed for causal inference\n3. Identify the target population for which you want to make inferences\n4. Fit causal models and check their problems\n5. Conduct sensitivity analyses where the techniques we use might be imperfect\n\nThis book is for both academic researchers and data scientists.\nAlthough the questions may differ between these settings, many techniques are the same: causal inference is as helpful for asking questions about cancer as it is about clicks.\nWe use a mix of examples from medicine, economics, and tech to demonstrate that you need a clear causal question and a willingness to be transparent about your assumptions.\n\nYou'll learn a lot in this book, but ironically, you won't learn much about conducting randomized trials, one of the best tools for causal inferences.\nRandomized trials, and their cousins, A/B tests (standard in the tech world), are compelling because they alleviate many of the assumptions we need to make for valid inferences.\nThey are also sufficiently complex in design to merit their own learning resources.\nInstead, we'll focus on observational data where we don't usually benefit from randomization.\nIf you're interested in randomization techniques, don't put away this resource just yet: many causal inference techniques designed for observational data improve randomized analyses, too.\n\nWe're making a few assumptions about you as a reader:\n\n1. You're familiar with the [tidyverse](https://www.tidyverse.org/) ecosystem of R packages and their general philosophy. For instance, we use a lot of dplyr and ggplot2 in this book, but we won't explain their basic grammar. To learn more about starting with the tidyverse, we recommend [*R for Data Science*](https://r4ds.hadley.nz/).\n2. You're familiar with basic statistical modeling in R. For instance, we'll fit many models with `lm()` and `glm()`, but we won't discuss how they work. If you want to learn more about R's powerful modeling functions, we recommend reading [\"A Review of R Modeling Fundamentals\"](https://www.tmwr.org/base-r.html) in [*Tidy Modeling with R*](https://www.tmwr.org).\n3. We also assume you have familiarity with other R basics, such as [writing functions](https://r4ds.hadley.nz/functions.html). [*R for Data Science*](https://r4ds.hadley.nz/) is also a good resource for these topics. (For a deeper dive into the R programming language, we recommend [*Advanced R*](https://adv-r.hadley.nz/index.html), although we don't assume you have mastered its material for this book).\n\nWe'll also use tools from the tidymodels ecosystem, a set of R packages for modeling related to the tidyverse.\nWe don't assume you have used them before.\ntidymodels also focuses on predictive modeling, so many of its tools aren't appropriate for this book.\nNevertheless, if you are interested in this topic, we recommend [*Tidy Modeling with R*](https://www.tmwr.org).\n\nThere are also several other excellent books on causal inference.\nThis book is different in its focus on R, but it's still helpful to see this area from other perspectives.\nA few books you might like:\n\n- [*Causal Inference: What If?*](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)\n- [*Causal Inference: The Mixtape*](https://mixtape.scunning.com/)\n- [*The Effect*](https://theeffectbook.net/)\n\nThe first book is focused on epidemiology.\nThe latter two are focused on econometrics.\nWe also recommend *The Book of Why* @pearl2018why for more on causal diagrams.\n\n## Conventions\n\n### Modern R Features\n\nWe use two modern R features in R 4.1.0 and above in this book.\nThe first is the native pipe, `|>`.\nThis R feature is similar to the tidyverse's `%>%`, with which you may be more familiar.\nIn typical cases, the two work interchangeably.\nOne notable difference is that `|>` uses the `_` symbol to direct the pipe's results, e.g., `.df |> lm(y ~ x, data = _)`.\nSee [this Tidyverse Blog post](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) for more on this topic.\n\nAnother modern R feature we use is the native lambda, a way of writing short functions that looks like `\\(.x) do_something(.x)`.\nIt is similar to purrr's `~` lambda notation.\nIt's also helpful to realize the native lambda is identical to `function(.x) do_something(.x)`, where `\\` is shorthand for `function`.\nSee [R for Data Science's chapter on iteration](https://r4ds.hadley.nz/iteration.html) for more on this topic.\n\n## Theming\n\nThe plots in this book use a consistent theme that we don't include in every code chunk, meaning if you run the code for a visualization, you might get a slightly different-looking result.\nWe set the following defaults related to ggplot2:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(\n # set default colors in ggplot2 to colorblind-friendly\n # Okabe-Ito and Viridis palettes\n ggplot2.discrete.colour = ggokabeito::palette_okabe_ito(),\n ggplot2.discrete.fill = ggokabeito::palette_okabe_ito(),\n ggplot2.continuous.colour = \"viridis\",\n ggplot2.continuous.fill = \"viridis\",\n # set theme font and size\n book.base_family = \"sans\",\n book.base_size = 14\n)\n\nlibrary(ggplot2)\n\n# set default theme\ntheme_set(\n theme_minimal(\n base_size = getOption(\"book.base_size\"),\n base_family = getOption(\"book.base_family\")\n ) %+replace%\n theme(\n panel.grid.minor = element_blank(),\n legend.position = \"bottom\"\n )\n)\n```\n:::\n\n\nWe also mask a few functions from ggdag that we like to customize:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntheme_dag <- function() {\n ggdag::theme_dag(base_family = getOption(\"book.base_family\"))\n}\n\ngeom_dag_label_repel <- function(..., seed = 10) {\n ggdag::geom_dag_label_repel(\n aes(x, y, label = label),\n box.padding = 3.5,\n inherit.aes = FALSE,\n max.overlaps = Inf,\n family = getOption(\"book.base_family\"),\n seed = seed,\n label.size = NA,\n label.padding = 0.1,\n size = getOption(\"book.base_size\") / 3,\n ...\n )\n}\n```\n:::\n", + "supporting": [ + "index_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/chapters/03-counterfactuals.qmd b/chapters/03-counterfactuals.qmd index 026dc78b..c7e9e995 100644 --- a/chapters/03-counterfactuals.qmd +++ b/chapters/03-counterfactuals.qmd @@ -227,7 +227,7 @@ data_observed <- data |> # change the exposure to randomized, generate from a binomial distribution # with a probability 0.5 for being in either group exposure = case_when( - rbinom(10, 1, 0.5) == 1 ~ "chocolate", + rbinom(n(), 1, 0.5) == 1 ~ "chocolate", TRUE ~ "vanilla" ), observed_outcome = case_when( @@ -314,8 +314,8 @@ set.seed(11) data_observed <- data |> mutate( exposure_unobserved = case_when( - rbinom(10, 1, 0.25) == 1 ~ "chocolate (spoiled)", - rbinom(10, 1, 0.25) == 1 ~ "chocolate", + rbinom(n(), 1, 0.25) == 1 ~ "chocolate (spoiled)", + rbinom(n(), 1, 0.25) == 1 ~ "chocolate", TRUE ~ "vanilla" ), observed_outcome = case_when( @@ -355,7 +355,7 @@ set.seed(11) data_observed <- data |> mutate( exposure = case_when( - rbinom(10, 1, 0.5) == 1 ~ "chocolate", + rbinom(n(), 1, 0.5) == 1 ~ "chocolate", TRUE ~ "vanilla" ), exposure_partner = @@ -452,9 +452,9 @@ data_observed <- data |> prefer_chocolate = y_chocolate > y_vanilla, exposure = case_when( # people who like chocolate more chose that 80% of the time - prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), "chocolate", "vanilla"), + prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), "chocolate", "vanilla"), # people who like vanilla more chose that 80% of the time - !prefer_chocolate ~ ifelse(rbinom(10, 1, 0.8), "vanilla", "chocolate") + !prefer_chocolate ~ ifelse(rbinom(n(), 1, 0.8), "vanilla", "chocolate") ), observed_outcome = case_when( exposure == "chocolate" ~ y_chocolate,