diff --git a/_freeze/chapters/06-not-just-a-stats-problem/execute-results/html.json b/_freeze/chapters/06-not-just-a-stats-problem/execute-results/html.json index e35f1de..b810fd3 100644 --- a/_freeze/chapters/06-not-just-a-stats-problem/execute-results/html.json +++ b/_freeze/chapters/06-not-just-a-stats-problem/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "607998bc55e24bae1b0b77cdec2e83dd", + "hash": "c348c5e891d5e67c7c0c1a4ee0d70664", "result": { - "markdown": "# Causal inference is not (just) a statistical problem {#sec-quartets}\n\n\n\n\n\n## The Causal Quartet\n\nWe now have the tools to look at something we've alluded to thus far in the book: causal inference is not (just) a statistical problem.\nOf course, we use statistics to answer causal questions.\nIt's necessary to answer most questions, even if the statistics are basic (as they often are in randomized designs).\nHowever, statistics alone do not allow us to address all of the assumptions of causal inference.\n\nIn 1973, Francis Anscombe introduced a set of four datasets called *Anscombe's Quartet*.\nThese data illustrated an important lesson: summary statistics alone cannot help you understand data; you must also visualize your data.\nIn the plots in @fig-anscombe, each data set has remarkably similar summary statistics, including means and correlations that are nearly identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(quartets)\n\nanscombe_quartet |> \n ggplot(aes(x, y)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![Anscombe's Quartet, a set of four datasets with nearly identical summary statistics. Anscombe's point was that one must visualize the data to understand it.](06-not-just-a-stats-problem_files/figure-html/fig-anscombe-1.png){#fig-anscombe width=672}\n:::\n:::\n\n\nThe Datasaurus Dozen is a modern take on Anscombe's Quartet.\nThe mean, standard deviation, and correlation are nearly identical in each dataset, but the visualizations are very different.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(datasauRus)\n\n# roughly the same correlation in each dataset\ndatasaurus_dozen |> \n group_by(dataset) |> \n summarize(cor = round(cor(x, y), 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 13 × 2\n dataset cor\n \n 1 away -0.06\n 2 bullseye -0.07\n 3 circle -0.07\n 4 dino -0.06\n 5 dots -0.06\n 6 h_lines -0.06\n 7 high_lines -0.07\n 8 slant_down -0.07\n 9 slant_up -0.07\n10 star -0.06\n11 v_lines -0.07\n12 wide_lines -0.07\n13 x_shape -0.07\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndatasaurus_dozen |> \n ggplot(aes(x, y)) + \n geom_point() + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Datasaurus Dozen, a set of datasets with nearly identical summary statistics. The Datasaurus Dozen is a modern version of Anscombe's Quartet. It's actually a baker's dozen, but who's counting?](06-not-just-a-stats-problem_files/figure-html/fig-datasaurus-1.png){#fig-datasaurus width=672}\n:::\n:::\n\n\nIn causal inference, however, even visualization is insufficient to untangle causal effects.\nAs we visualized in DAGs in @sec-dags, background knowledge is required to infer causation from correlation [@onthei1999].\n\nInspired by Anscombe's quartet, the *causal quartet* has many of the same properties of Anscombe's quartet and the Datasaurus Dozen: the numerical summaries of the variables in the dataset are the same [@dagostinomcgowan2023].\nUnlike these data, the causal quartet also *look* the same as each other.\nThe difference is the causal structure that generated each dataset.\n@fig-causal_quartet_hidden shows four datasets where the observational relationship between `exposure` and `outcome` is virtually identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |>\n mutate(exposure = scale(exposure), outcome = scale(outcome)) |> \n ungroup() |> \n ggplot(aes(exposure, outcome)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Causal Quartet, four data sets with nearly identical summary statistics and visualizations. The causal structure of each dataset is different, and data alone cannot tell us which is which.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_hidden-1.png){#fig-causal_quartet_hidden width=672}\n:::\n:::\n\n\nThe question for each dataset is whether to adjust for a third variable, `covariate`.\nIs `covariate` a confounder?\nA mediator?\nA collider?\nWe can't use data to figure this problem out.\nIn @tbl-quartet_lm, it's not clear which effect is correct.\nLikewise, the correlation between `exposure` and `covariate` is no help: they're all the same!\n\n\n::: {#tbl-quartet_lm .cell tbl-cap='The causal quartet, with the estimated effect of `exposure` on `outcome` with and without adjustment for `covariate`. The unadjusted estimate is identical for all four datasets, as is the correlation between `exposure` and `covariate`. The adjusted estimate varies. Without background knowledge, it\\'s not clear which is right.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n
DatasetNot adjusting for covariateAdjusting for covariateCorrelation of exposure and covariate
11.000.550.70
21.000.500.70
31.000.000.70
41.000.880.70
\n
\n```\n\n:::\n:::\n\n\n::: callout-warning\n## The ten percent rule\n\nThe ten percent rule is a common technique in epidemiology and other fields to determine whether a variable is a confounder.\nThe ten percent rule says that you should include a variable in your model if including it changes the effect estimate by more than ten percent.\nThe problem is, it doesn't work.\n*Every* example in the causal quartet causes a more than ten percent change.\nAs we know, this leads to the wrong answer in some of the datasets.\nEven the reverse technique, *excluding* a variable when it's *less* than ten percent, can cause trouble because many minor confounding effects can add up to more considerable bias.\n\n\n::: {#tbl-quartet_ten_percent .cell tbl-cap='The percent change in the coefficient for `exposure` when including `covariate` in the model.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n\n \n\n \n\n \n\n \n \n \n
DatasetPercent change
144.6%
249.7%
399.8%
412.5%
\n
\n```\n\n:::\n:::\n\n:::\n\nWhile the visual relationship between `covariate` and `exposure` is not identical between datasets, all have the same correlation.\nIn @fig-causal_quartet_covariate, the standardized relationship between the two is identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |> \n summarize(cor = round(cor(covariate, exposure), 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 4 × 2\n dataset cor\n \n1 1 0.7\n2 2 0.7\n3 3 0.7\n4 4 0.7\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |>\n mutate(covariate = scale(covariate), exposure = scale(exposure)) |> \n ungroup() |> \n ggplot(aes(covariate, exposure)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset) \n```\n\n::: {.cell-output-display}\n![The correlation is the same in each dataset, but the visual relationship is not. However, the differences in the plots are not enough information to determine whether `covariate` is a confounder, mediator, or collider.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_covariate-1.png){#fig-causal_quartet_covariate width=672}\n:::\n:::\n\n\n::: {.callout-tip} \n## Why did we standardize the coefficients?\n\nStandardizing numeric variables to have a mean of 0 and standard deviation of 1, as implemented in `scale()`, is a common technique in statistics. It's useful for a variety of reasons, but we chose to scale the variables here to emphasize the identical correlation between `covariate` and `exposure` in each dataset. If we didn't scale the variables, the correlation would be the same, but the plots would look different because their standard deviation are different. The beta coefficient in an OLS model is calculated with information about the covariance and the standard deviation of the variable, so scaling it makes the coefficient identical to the Pearson's correlation.\n\n@fig-causal_quartet_covariate_unscaled shows the unscaled relationship between `covariate` and `exposure`. Now, we see some differences: dataset 4 seems to have more variance in `covariate`, but that's not actionable information. In fact, it's a mathematical artifact of the data generating process.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n ggplot(aes(covariate, exposure)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_covariate_unscaled-1.png){#fig-causal_quartet_covariate_unscaled width=672}\n:::\n:::\n\n:::\n\nLet's reveal the labels of the datasets, representing the causal structure of the dataset.\n@fig-causal_quartet, `covariate` plays a different role in each dataset.\nIn 1 and 4, it's a collider (we *shouldn't* adjust for it).\nIn 2, it's a confounder (we *should* adjust for it).\nIn 3, it's a mediator (it depends on the research question).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n ggplot(aes(exposure, outcome)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Causal Quartet, revealed. The first and last datasets are types of collider bias; we should *not* control for `covariate.` In the second dataset, `covariate` is a confounder, and we *should* control for it. In the third dataset, `covariate` is a mediator, and we should control for it if we want the total effect, but not if we want the direct effect.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet-1.png){#fig-causal_quartet width=672}\n:::\n:::\n\n\nWhat can we do if the data can't distinguish these causal structures?\nThe best answer is to have a good sense of the data-generating mechanism.\nIn @fig-quartet-dag, we show the DAG for each dataset.\nOnce we compile a DAG for each dataset, we only need to query the DAG for the correct adjustment set, assuming the DAG is right.\n\n\n::: {#fig-quartet-dag .cell layout-ncol=\"2\"}\n::: {.cell-output-display}\n![The DAG for dataset 1, where `covariate` (c) is a collider. We should *not* adjust for `covariate`, which is a descendant of `exposure` (e) and `outcome` (o).](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-1.png){#fig-quartet-dag-1 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 2, where `covariate` (c) is a confounder. `covariate` is a mutual cause of `exposure` (e) and `outcome` (o), representing a backdoor path, so we *must* adjust for it to get the right answer.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-2.png){#fig-quartet-dag-2 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 3, where `covariate` (c) is a mediator. `covariate` is a descendant of `exposure` (e) and a cause of `outcome` (o). The path through `covariate` is the indirect path, and the path through `exposure` is the direct path. We should adjust for `covariate` if we want the direct effect, but not if we want the total effect.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-3.png){#fig-quartet-dag-3 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 4, where `covariate` (c) is a collider via M-Bias. Although `covariate` happens before both `outcome` (o) and `exposure` (e), it's still a collider. We should *not* adjust for `covariate`, particularly since we can't control for the bias via `u1` and `u2`, which are unmeasured.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-4.png){#fig-quartet-dag-4 width=288}\n:::\n\nThe DAGs for the Causal Quartet.\n:::\n\n\nThe data generating mechanism[^06-not-just-a-stats-problem-1] in the DAGs matches what generated the datasets, so we can use the DAGs to determine the correct effect: unadjusted in datasets 1 and 4 and adjusted in dataset 2.\nFor dataset 3, it depends on which mediation effect we want: adjusted for the direct effect and unadjusted for the total effect.\n\n[^06-not-just-a-stats-problem-1]: See @dagostinomcgowan2023 for the models that generated the datasets.\n\n\n::: {#tbl-quartets_true_effects .cell tbl-cap='The data generating mechanism and true causal effects in each dataset. Sometimes, the unadjusted effect is the same, and sometimes it is not, depending on the mechanism and question.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
Data generating mechanismCorrect causal modelCorrect causal effect
(1) Collideroutcome ~ exposure1
(2) Confounderoutcome ~ exposure; covariate0.5
(3) MediatorDirect effect: outcome ~ exposure; covariate, Total Effect: outcome ~ exposureDirect effect: 0, Total effect: 1
(4) M-Biasoutcome ~ exposure1
\n
\n```\n\n:::\n:::\n\n\n## Time as a heuristic for causal structure\n\nHopefully, we have convinced you of the usefulness of DAGs.\nHowever, constructing correct DAGs is a challenging endeavor.\nIn the causal quartet, we knew the DAGs because we generated the data.\nWe need background knowledge to assemble a candidate causal structure in real life.\nFor some questions, such background knowledge is not available.\nFor others, we may worry about the complexity of the causal structure, particularly when variables mutually evolve with each other, as in @fig-feedback-loop.\n\nOne heuristic is particularly useful when a DAG is incomplete or uncertain: time.\nBecause causality is temporal, a cause must precede an effect.\nMany, but not all, problems in deciding if we should adjust for a confounder are solved by simply putting the variables in order by time.\nTime order is also one of the most critical assumptions you can visualize in a DAG, so it's an excellent place to start, regardless of the completeness of the DAG.\n\nConsider @fig-quartets-time-ordered-1, a time-ordered version of the collider DAG where the covariate is measured at both baseline and follow-up.\nThe original DAG actually represents the *second* measurement, where the covariate is a descendant of both the outcome and exposure.\nIf, however, we control for the same covariate as measured at the start of the study (@fig-quartets-time-ordered-2), it cannot be a descendant of the outcome at follow-up because it has yet to happen.\nThus, when you are missing background knowledge as to the causal structure of the covariate, you can use time-ordering as a defensive measure to avoid bias.\nOnly control for variables that precede the outcome.\n\n\n::: {#fig-quartets-time-ordered .cell layout-ncol=\"2\"}\n::: {.cell-output-display}\n![In a time-ordered version of the collider DAG, controlling for the covariate at follow-up induces bias.](06-not-just-a-stats-problem_files/figure-html/fig-quartets-time-ordered-1.png){#fig-quartets-time-ordered-1 width=384}\n:::\n\n::: {.cell-output-display}\n![Conversely, controlling for the covariate as measured at baseline does not induce bias because it is not a descendant of the outcome.](06-not-just-a-stats-problem_files/figure-html/fig-quartets-time-ordered-2.png){#fig-quartets-time-ordered-2 width=384}\n:::\n\nA time-ordered version of the collider DAG where each variable is measured twice. Controlling for `covariate` at follow-up is a collider, but controlling for `covariate` at baseline is not.\n:::\n\n\n::: callout-warning\n## Don't adjust for the future\n\nThe time-ordering heuristic relies on a simple rule: don't adjust for the future.\n:::\n\nThe quartet package's `causal_quartet_time` has time-ordered measurements of each variable for the four datasets.\nEach has a `*_baseline` and `*_follow-up` measurement.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet_time\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 400 × 12\n covariate_baseline exposure_baseline\n \n 1 -0.0963 -1.43 \n 2 -1.11 0.0593 \n 3 0.647 0.370 \n 4 0.755 0.00471\n 5 1.19 0.340 \n 6 -0.588 -3.61 \n 7 -1.13 1.44 \n 8 0.689 1.02 \n 9 -1.49 -2.43 \n10 -2.78 -1.26 \n# ℹ 390 more rows\n# ℹ 10 more variables: outcome_baseline ,\n# covariate_followup , exposure_followup ,\n# outcome_followup , exposure_mid ,\n# covariate_mid , outcome_mid , u1 ,\n# u2 , dataset \n```\n\n\n:::\n:::\n\n\nUsing the formula `outcome_followup ~ exposure_baseline + covariate_baseline` works for three out of four datasets.\nEven though `covariate_baseline` is only in the adjustment set for the second dataset, it's not a collider in two of the other datasets, so it's not a problem.\n\n\n::: {#tbl-quartet_time_adjusted .cell tbl-cap='The adjusted effect of `exposure_baseline` on `outcome_followup` in each dataset. The effect adjusted for `covariate_baseline` is correct for three out of four datasets.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
DatasetAdjusted effectTruth
(1) Collider1.001.00
(2) Confounder0.500.50
(3) Mediator1.001.00
(4) M-Bias0.881.00
\n
\n```\n\n:::\n:::\n\n\nWhere it fails is in dataset 4, the M-bias example.\nIn this case, `covariate_baseline` is still a collider because the collision occurs before both the exposure and outcome.\nAs we discussed in @sec-m-bias, however, if you are in doubt whether something is genuinely M-bias, it is better to adjust for it than not.\nConfounding bias tends to be worse, and meaningful M-bias is probably rare in real life.\nAs the actual causal structure deviates from perfect M-bias, the severity of the bias tends to decrease.\nSo, if it is clearly M-bias, don't adjust for the variable.\nIf it's not clear, adjust for it.\n\n::: callout-tip\nRemember as well that it is possible to block bias induced by adjusting for a collider in certain circumstances because collider bias is just another open path.\nIf we had `u1` and `u2`, we could control for `covariate` while blocking potential collider bias.\nIn other words, sometimes, when we open a path, we can close it again.\n\n\n::: {.cell}\n\n:::\n\n:::\n\n## Causal and Predictive Models, Revisited {#sec-causal-pred-revisit}\n\n### Prediction metrics\n\nPredictive measurements also fail to distinguish between the four datasets.\nIn @tbl-quartet_time_predictive, we show the difference in a couple of standard predictive metrics when we add `covariate` to the model.\nIn each dataset, `covariate` adds information to the model because it contains associational information about the outcome [^06-not-just-a-stats-problem-2].\nThe RMSE goes down, indicating a better fit, and the R^2^ goes up, showing more variance explained.\nThe coefficients for `covariate` represent the information about `outcome` it contains; they don't tell us from where in the causal structure that information originates.\nCorrelation isn't causation, and neither is prediction.\nIn the case of the collider data set, it's not even a helpful prediction tool because you wouldn't have `covariate` at the time of prediction, given that it happens after the exposure and outcome.\n\n[^06-not-just-a-stats-problem-2]: For M-bias, including `covariate` in the model is helpful to the extent that it has information about `u2`, one of the causes of the outcome.\n In this case, the data generating mechanism was such that `covariate` contains more information from `u1` than `u2`, so it doesn't add as much predictive value.\n Random noise represents most of what `u2` doesn't account for.\n\n\n::: {#tbl-quartet_time_predictive .cell tbl-cap='The difference in predictive metrics on `outcome` in each dataset with and without `covariate`. In each dataset, `covariate` adds information to the model, but this offers little guidance regarding the proper causal model.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
DatasetRMSER2
(1) Collider−0.140.12
(2) Confounder−0.200.14
(3) Mediator−0.480.37
(4) M-Bias−0.010.01
\n
\n```\n\n:::\n:::\n\n\n### The Table Two Fallacy[^06-not-just-a-stats-problem-3]\n\n[^06-not-just-a-stats-problem-3]: If you recall, the Table Two Fallacy is named after the tendency in health research journals to have a complete set of model coefficients in the second table of an article.\n See @Westreich2013 for a detailed discussion of the Table Two Fallacy.\n\nRelatedly, model coefficients for variables *other* than those of the causes we're interested in can be difficult to interpret.\nIn a model with `outcome ~ exposure + covariate`, it's tempting to present the coefficient of `covariate` as well as `exposure`.\nThe problem, as discussed @sec-pred-or-explain, is that the causal structure for the effect of `covariate` on `outcome` may differ from that of `exposure` on `outcome`.\nLet's consider a variation of the quartet DAGs with other variables.\n\nFirst, let's start with the confounder DAG.\nIn @fig-quartet_confounder, we see that `covariate` is a confounder.\nIf this DAG represents the complete causal structure for `outcome`, the model `outcome ~ exposure + covariate` will give an unbiased estimate of the effect on `outcome` for `exposure`, assuming we've met other assumptions of the modeling process.\nThe adjustment set for `covariate`'s effect on `outcome` is empty, and `exposure` is not a collider, so controlling for it does not induce bias[^06-not-just-a-stats-problem-4].\nBut look again.\n`exposure` is a mediator for `covariate`'s effect on `outcome`; some of the total effect is mediated through `outcome`, while there is also a direct effect of `covariate` on `outcome`. **Both estimates are unbiased, but they are different *types* of estimates**. The effect of `exposure` on `outcome` is the *total effect* of that relationship, while the effect of `covariate` on `outcome` is the *direct effect*.\n\n[^06-not-just-a-stats-problem-4]: Additionally, OLS produces a *collapsable* effect.\n Other effects, like the odds and hazards ratios, are *non-collapsable*, meaning including unrelated variables in the model *can* change the effect estimate.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![The DAG for dataset 2, where `covariate` is a confounder. If you look closely, you'll realize that, from the perspective of the effect of `covariate` on the `outcome`, `exposure` is a *mediator*.](06-not-just-a-stats-problem_files/figure-html/fig-quartet_confounder-1.png){#fig-quartet_confounder width=288}\n:::\n:::\n\n\nWhat if we add `q`, a mutual cause of `covariate` and `outcome`?\nIn @fig-quartet_confounder_q, the adjustment sets are still different.\nThe adjustment set for `outcome ~ exposure` is still the same: `{covariate}`.\nThe `outcome ~ covariate` adjustment set is `{q}`.\nIn other words, `q` is a confounder for `covariate`'s effect on `outcome`.\nThe model `outcome ~ exposure + covariate` will produce the correct effect for `exposure` but not for the direct effect of `covariate`.\nNow, we have a situation where `covariate` not only answers a different type of question than `exposure` but is also biased by the absence of `q`.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![A modification of the DAG for dataset 2, where `covariate` is a confounder. Now, the relationship between `covariate` and `outcome` is confounded by `q`, a variable not necessary to calculate the unbiased effect of `exposure` on `outcome`.](06-not-just-a-stats-problem_files/figure-html/fig-quartet_confounder_q-1.png){#fig-quartet_confounder_q width=336}\n:::\n:::\n\n\nSpecifying a single causal model is deeply challenging.\nHaving a single model answer multiple causal questions is exponentially more difficult.\nIf attempting to do so, apply the same scrutiny to both[^06-not-just-a-stats-problem-5] questions.\nIs it possible to have a single adjustment set that answers both questions?\nIf not, specify two models or forego one of the questions.\nIf so, you need to ensure that the estimates answer the correct question.\nWe'll also discuss *joint* causal effects in @sec-interaction.\n\n[^06-not-just-a-stats-problem-5]: Practitioners of *casual* inference will interpret *many* effects from a single model in this way, but we consider this an act of bravado.\n\nUnfortunately, algorithms for detecting adjustment sets for multiple exposures and effect types are not well-developed, so you may need to rely on your knowledge of the causal structure in determining the intersection of the adjustment sets.\n", + "markdown": "# Causal inference is not (just) a statistical problem {#sec-quartets}\n\n\n\n\n\n## The Causal Quartet\n\nWe now have the tools to look at something we've alluded to thus far in the book: causal inference is not (just) a statistical problem.\nOf course, we use statistics to answer causal questions.\nIt's necessary to answer most questions, even if the statistics are basic (as they often are in randomized designs).\nHowever, statistics alone do not allow us to address all of the assumptions of causal inference.\n\nIn 1973, Francis Anscombe introduced a set of four datasets called *Anscombe's Quartet*.\nThese data illustrated an important lesson: summary statistics alone cannot help you understand data; you must also visualize your data.\nIn the plots in @fig-anscombe, each data set has remarkably similar summary statistics, including means and correlations that are nearly identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(quartets)\n\nanscombe_quartet |> \n ggplot(aes(x, y)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![Anscombe's Quartet, a set of four datasets with nearly identical summary statistics. Anscombe's point was that one must visualize the data to understand it.](06-not-just-a-stats-problem_files/figure-html/fig-anscombe-1.png){#fig-anscombe width=672}\n:::\n:::\n\n\nThe Datasaurus Dozen is a modern take on Anscombe's Quartet.\nThe mean, standard deviation, and correlation are nearly identical in each dataset, but the visualizations are very different.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(datasauRus)\n\n# roughly the same correlation in each dataset\ndatasaurus_dozen |> \n group_by(dataset) |> \n summarize(cor = round(cor(x, y), 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 13 × 2\n dataset cor\n \n 1 away -0.06\n 2 bullseye -0.07\n 3 circle -0.07\n 4 dino -0.06\n 5 dots -0.06\n 6 h_lines -0.06\n 7 high_lines -0.07\n 8 slant_down -0.07\n 9 slant_up -0.07\n10 star -0.06\n11 v_lines -0.07\n12 wide_lines -0.07\n13 x_shape -0.07\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ndatasaurus_dozen |> \n ggplot(aes(x, y)) + \n geom_point() + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Datasaurus Dozen, a set of datasets with nearly identical summary statistics. The Datasaurus Dozen is a modern version of Anscombe's Quartet. It's actually a baker's dozen, but who's counting?](06-not-just-a-stats-problem_files/figure-html/fig-datasaurus-1.png){#fig-datasaurus width=672}\n:::\n:::\n\n\nIn causal inference, however, even visualization is insufficient to untangle causal effects.\nAs we visualized in DAGs in @sec-dags, background knowledge is required to infer causation from correlation [@onthei1999].\n\nInspired by Anscombe's quartet, the *causal quartet* has many of the same properties of Anscombe's quartet and the Datasaurus Dozen: the numerical summaries of the variables in the dataset are the same [@dagostinomcgowan2023].\nUnlike these data, the causal quartet also *look* the same as each other.\nThe difference is the causal structure that generated each dataset.\n@fig-causal_quartet_hidden shows four datasets where the observational relationship between `exposure` and `outcome` is virtually identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |>\n mutate(exposure = scale(exposure), outcome = scale(outcome)) |> \n ungroup() |> \n ggplot(aes(exposure, outcome)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Causal Quartet, four data sets with nearly identical summary statistics and visualizations. The causal structure of each dataset is different, and data alone cannot tell us which is which.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_hidden-1.png){#fig-causal_quartet_hidden width=672}\n:::\n:::\n\n\nThe question for each dataset is whether to adjust for a third variable, `covariate`.\nIs `covariate` a confounder?\nA mediator?\nA collider?\nWe can't use data to figure this problem out.\nIn @tbl-quartet_lm, it's not clear which effect is correct.\nLikewise, the correlation between `exposure` and `covariate` is no help: they're all the same!\n\n\n::: {#tbl-quartet_lm .cell tbl-cap='The causal quartet, with the estimated effect of `exposure` on `outcome` with and without adjustment for `covariate`. The unadjusted estimate is identical for all four datasets, as is the correlation between `exposure` and `covariate`. The adjusted estimate varies. Without background knowledge, it\\'s not clear which is right.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n \n\n\n\n \n\n\n\n \n\n\n\n \n \n \n
DatasetNot adjusting for covariateAdjusting for covariateCorrelation of exposure and covariate
11.000.550.70
21.000.500.70
31.000.000.70
41.000.880.70
\n
\n```\n\n:::\n:::\n\n\n::: callout-warning\n## The ten percent rule\n\nThe ten percent rule is a common technique in epidemiology and other fields to determine whether a variable is a confounder.\nThe ten percent rule says that you should include a variable in your model if including it changes the effect estimate by more than ten percent.\nThe problem is, it doesn't work.\n*Every* example in the causal quartet causes a more than ten percent change.\nAs we know, this leads to the wrong answer in some of the datasets.\nEven the reverse technique, *excluding* a variable when it's *less* than ten percent, can cause trouble because many minor confounding effects can add up to more considerable bias.\n\n\n::: {#tbl-quartet_ten_percent .cell tbl-cap='The percent change in the coefficient for `exposure` when including `covariate` in the model.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n\n \n\n \n\n \n\n \n \n \n
DatasetPercent change
144.6%
249.7%
399.8%
412.5%
\n
\n```\n\n:::\n:::\n\n:::\n\nWhile the visual relationship between `covariate` and `exposure` is not identical between datasets, all have the same correlation.\nIn @fig-causal_quartet_covariate, the standardized relationship between the two is identical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |> \n summarize(cor = round(cor(covariate, exposure), 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 4 × 2\n dataset cor\n \n1 1 0.7\n2 2 0.7\n3 3 0.7\n4 4 0.7\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n group_by(dataset) |>\n mutate(covariate = scale(covariate), exposure = scale(exposure)) |> \n ungroup() |> \n ggplot(aes(covariate, exposure)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset) \n```\n\n::: {.cell-output-display}\n![The correlation is the same in each dataset, but the visual relationship is not. However, the differences in the plots are not enough information to determine whether `covariate` is a confounder, mediator, or collider.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_covariate-1.png){#fig-causal_quartet_covariate width=672}\n:::\n:::\n\n\n::: {.callout-tip} \n## Why did we standardize the coefficients?\n\nStandardizing numeric variables to have a mean of 0 and standard deviation of 1, as implemented in `scale()`, is a common technique in statistics. It's useful for a variety of reasons, but we chose to scale the variables here to emphasize the identical correlation between `covariate` and `exposure` in each dataset. If we didn't scale the variables, the correlation would be the same, but the plots would look different because their standard deviation are different. The beta coefficient in an OLS model is calculated with information about the covariance and the standard deviation of the variable, so scaling it makes the coefficient identical to the Pearson's correlation.\n\n@fig-causal_quartet_covariate_unscaled shows the unscaled relationship between `covariate` and `exposure`. Now, we see some differences: dataset 4 seems to have more variance in `covariate`, but that's not actionable information. In fact, it's a mathematical artifact of the data generating process.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n # hide the dataset names\n mutate(dataset = as.integer(factor(dataset))) |> \n ggplot(aes(covariate, exposure)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![@fig-causal_quartet_covariate, unscaled](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet_covariate_unscaled-1.png){#fig-causal_quartet_covariate_unscaled width=672}\n:::\n:::\n\n:::\n\nLet's reveal the labels of the datasets, representing the causal structure of the dataset.\n@fig-causal_quartet, `covariate` plays a different role in each dataset.\nIn 1 and 4, it's a collider (we *shouldn't* adjust for it).\nIn 2, it's a confounder (we *should* adjust for it).\nIn 3, it's a mediator (it depends on the research question).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet |> \n ggplot(aes(exposure, outcome)) + \n geom_point() + \n geom_smooth(method = \"lm\", se = FALSE) + \n facet_wrap(~ dataset)\n```\n\n::: {.cell-output-display}\n![The Causal Quartet, revealed. The first and last datasets are types of collider bias; we should *not* control for `covariate.` In the second dataset, `covariate` is a confounder, and we *should* control for it. In the third dataset, `covariate` is a mediator, and we should control for it if we want the total effect, but not if we want the direct effect.](06-not-just-a-stats-problem_files/figure-html/fig-causal_quartet-1.png){#fig-causal_quartet width=672}\n:::\n:::\n\n\nWhat can we do if the data can't distinguish these causal structures?\nThe best answer is to have a good sense of the data-generating mechanism.\nIn @fig-quartet-dag, we show the DAG for each dataset.\nOnce we compile a DAG for each dataset, we only need to query the DAG for the correct adjustment set, assuming the DAG is right.\n\n\n::: {#fig-quartet-dag .cell layout-ncol=\"2\"}\n::: {.cell-output-display}\n![The DAG for dataset 1, where `covariate` (c) is a collider. We should *not* adjust for `covariate`, which is a descendant of `exposure` (e) and `outcome` (o).](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-1.png){#fig-quartet-dag-1 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 2, where `covariate` (c) is a confounder. `covariate` is a mutual cause of `exposure` (e) and `outcome` (o), representing a backdoor path, so we *must* adjust for it to get the right answer.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-2.png){#fig-quartet-dag-2 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 3, where `covariate` (c) is a mediator. `covariate` is a descendant of `exposure` (e) and a cause of `outcome` (o). The path through `covariate` is the indirect path, and the path through `exposure` is the direct path. We should adjust for `covariate` if we want the direct effect, but not if we want the total effect.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-3.png){#fig-quartet-dag-3 width=288}\n:::\n\n::: {.cell-output-display}\n![The DAG for dataset 4, where `covariate` (c) is a collider via M-Bias. Although `covariate` happens before both `outcome` (o) and `exposure` (e), it's still a collider. We should *not* adjust for `covariate`, particularly since we can't control for the bias via `u1` and `u2`, which are unmeasured.](06-not-just-a-stats-problem_files/figure-html/fig-quartet-dag-4.png){#fig-quartet-dag-4 width=288}\n:::\n\nThe DAGs for the Causal Quartet.\n:::\n\n\nThe data generating mechanism[^06-not-just-a-stats-problem-1] in the DAGs matches what generated the datasets, so we can use the DAGs to determine the correct effect: unadjusted in datasets 1 and 4 and adjusted in dataset 2.\nFor dataset 3, it depends on which mediation effect we want: adjusted for the direct effect and unadjusted for the total effect.\n\n[^06-not-just-a-stats-problem-1]: See @dagostinomcgowan2023 for the models that generated the datasets.\n\n\n::: {#tbl-quartets_true_effects .cell tbl-cap='The data generating mechanism and true causal effects in each dataset. Sometimes, the unadjusted effect is the same, and sometimes it is not, depending on the mechanism and question.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
Data generating mechanismCorrect causal modelCorrect causal effect
(1) Collideroutcome ~ exposure1
(2) Confounderoutcome ~ exposure; covariate0.5
(3) MediatorDirect effect: outcome ~ exposure; covariate, Total Effect: outcome ~ exposureDirect effect: 0, Total effect: 1
(4) M-Biasoutcome ~ exposure1
\n
\n```\n\n:::\n:::\n\n\n## Time as a heuristic for causal structure\n\nHopefully, we have convinced you of the usefulness of DAGs.\nHowever, constructing correct DAGs is a challenging endeavor.\nIn the causal quartet, we knew the DAGs because we generated the data.\nWe need background knowledge to assemble a candidate causal structure in real life.\nFor some questions, such background knowledge is not available.\nFor others, we may worry about the complexity of the causal structure, particularly when variables mutually evolve with each other, as in @fig-feedback-loop.\n\nOne heuristic is particularly useful when a DAG is incomplete or uncertain: time.\nBecause causality is temporal, a cause must precede an effect.\nMany, but not all, problems in deciding if we should adjust for a confounder are solved by simply putting the variables in order by time.\nTime order is also one of the most critical assumptions you can visualize in a DAG, so it's an excellent place to start, regardless of the completeness of the DAG.\n\nConsider @fig-quartets-time-ordered-1, a time-ordered version of the collider DAG where the covariate is measured at both baseline and follow-up.\nThe original DAG actually represents the *second* measurement, where the covariate is a descendant of both the outcome and exposure.\nIf, however, we control for the same covariate as measured at the start of the study (@fig-quartets-time-ordered-2), it cannot be a descendant of the outcome at follow-up because it has yet to happen.\nThus, when you are missing background knowledge as to the causal structure of the covariate, you can use time-ordering as a defensive measure to avoid bias.\nOnly control for variables that precede the outcome.\n\n\n::: {#fig-quartets-time-ordered .cell layout-ncol=\"2\"}\n::: {.cell-output-display}\n![In a time-ordered version of the collider DAG, controlling for the covariate at follow-up induces bias.](06-not-just-a-stats-problem_files/figure-html/fig-quartets-time-ordered-1.png){#fig-quartets-time-ordered-1 width=384}\n:::\n\n::: {.cell-output-display}\n![Conversely, controlling for the covariate as measured at baseline does not induce bias because it is not a descendant of the outcome.](06-not-just-a-stats-problem_files/figure-html/fig-quartets-time-ordered-2.png){#fig-quartets-time-ordered-2 width=384}\n:::\n\nA time-ordered version of the collider DAG where each variable is measured twice. Controlling for `covariate` at follow-up is a collider, but controlling for `covariate` at baseline is not.\n:::\n\n\n::: callout-warning\n## Don't adjust for the future\n\nThe time-ordering heuristic relies on a simple rule: don't adjust for the future.\n:::\n\nThe quartet package's `causal_quartet_time` has time-ordered measurements of each variable for the four datasets.\nEach has a `*_baseline` and `*_follow-up` measurement.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncausal_quartet_time\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 400 × 12\n covariate_baseline exposure_baseline\n \n 1 -0.0963 -1.43 \n 2 -1.11 0.0593 \n 3 0.647 0.370 \n 4 0.755 0.00471\n 5 1.19 0.340 \n 6 -0.588 -3.61 \n 7 -1.13 1.44 \n 8 0.689 1.02 \n 9 -1.49 -2.43 \n10 -2.78 -1.26 \n# ℹ 390 more rows\n# ℹ 10 more variables: outcome_baseline ,\n# covariate_followup , exposure_followup ,\n# outcome_followup , exposure_mid ,\n# covariate_mid , outcome_mid , u1 ,\n# u2 , dataset \n```\n\n\n:::\n:::\n\n\nUsing the formula `outcome_followup ~ exposure_baseline + covariate_baseline` works for three out of four datasets.\nEven though `covariate_baseline` is only in the adjustment set for the second dataset, it's not a collider in two of the other datasets, so it's not a problem.\n\n\n::: {#tbl-quartet_time_adjusted .cell tbl-cap='The adjusted effect of `exposure_baseline` on `outcome_followup` in each dataset. The effect adjusted for `covariate_baseline` is correct for three out of four datasets.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
DatasetAdjusted effectTruth
(1) Collider1.001.00
(2) Confounder0.500.50
(3) Mediator1.001.00
(4) M-Bias0.881.00
\n
\n```\n\n:::\n:::\n\n\nWhere it fails is in dataset 4, the M-bias example.\nIn this case, `covariate_baseline` is still a collider because the collision occurs before both the exposure and outcome.\nAs we discussed in @sec-m-bias, however, if you are in doubt whether something is genuinely M-bias, it is better to adjust for it than not.\nConfounding bias tends to be worse, and meaningful M-bias is probably rare in real life.\nAs the actual causal structure deviates from perfect M-bias, the severity of the bias tends to decrease.\nSo, if it is clearly M-bias, don't adjust for the variable.\nIf it's not clear, adjust for it.\n\n::: callout-tip\nRemember as well that it is possible to block bias induced by adjusting for a collider in certain circumstances because collider bias is just another open path.\nIf we had `u1` and `u2`, we could control for `covariate` while blocking potential collider bias.\nIn other words, sometimes, when we open a path, we can close it again.\n\n\n::: {.cell}\n\n:::\n\n:::\n\n## Causal and Predictive Models, Revisited {#sec-causal-pred-revisit}\n\n### Prediction metrics\n\nPredictive measurements also fail to distinguish between the four datasets.\nIn @tbl-quartet_time_predictive, we show the difference in a couple of standard predictive metrics when we add `covariate` to the model.\nIn each dataset, `covariate` adds information to the model because it contains associational information about the outcome [^06-not-just-a-stats-problem-2].\nThe RMSE goes down, indicating a better fit, and the R^2^ goes up, showing more variance explained.\nThe coefficients for `covariate` represent the information about `outcome` it contains; they don't tell us from where in the causal structure that information originates.\nCorrelation isn't causation, and neither is prediction.\nIn the case of the collider data set, it's not even a helpful prediction tool because you wouldn't have `covariate` at the time of prediction, given that it happens after the exposure and outcome.\n\n[^06-not-just-a-stats-problem-2]: For M-bias, including `covariate` in the model is helpful to the extent that it has information about `u2`, one of the causes of the outcome.\n In this case, the data generating mechanism was such that `covariate` contains more information from `u1` than `u2`, so it doesn't add as much predictive value.\n Random noise represents most of what `u2` doesn't account for.\n\n\n::: {#tbl-quartet_time_predictive .cell tbl-cap='The difference in predictive metrics on `outcome` in each dataset with and without `covariate`. In each dataset, `covariate` adds information to the model, but this offers little guidance regarding the proper causal model.'}\n::: {.cell-output-display}\n\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n\n\n \n\n\n \n\n\n \n\n\n \n \n \n
DatasetRMSER2
(1) Collider−0.140.12
(2) Confounder−0.200.14
(3) Mediator−0.480.37
(4) M-Bias−0.010.01
\n
\n```\n\n:::\n:::\n\n\n### The Table Two Fallacy[^06-not-just-a-stats-problem-3]\n\n[^06-not-just-a-stats-problem-3]: If you recall, the Table Two Fallacy is named after the tendency in health research journals to have a complete set of model coefficients in the second table of an article.\n See @Westreich2013 for a detailed discussion of the Table Two Fallacy.\n\nRelatedly, model coefficients for variables *other* than those of the causes we're interested in can be difficult to interpret.\nIn a model with `outcome ~ exposure + covariate`, it's tempting to present the coefficient of `covariate` as well as `exposure`.\nThe problem, as discussed @sec-pred-or-explain, is that the causal structure for the effect of `covariate` on `outcome` may differ from that of `exposure` on `outcome`.\nLet's consider a variation of the quartet DAGs with other variables.\n\nFirst, let's start with the confounder DAG.\nIn @fig-quartet_confounder, we see that `covariate` is a confounder.\nIf this DAG represents the complete causal structure for `outcome`, the model `outcome ~ exposure + covariate` will give an unbiased estimate of the effect on `outcome` for `exposure`, assuming we've met other assumptions of the modeling process.\nThe adjustment set for `covariate`'s effect on `outcome` is empty, and `exposure` is not a collider, so controlling for it does not induce bias[^06-not-just-a-stats-problem-4].\nBut look again.\n`exposure` is a mediator for `covariate`'s effect on `outcome`; some of the total effect is mediated through `outcome`, while there is also a direct effect of `covariate` on `outcome`. **Both estimates are unbiased, but they are different *types* of estimates**. The effect of `exposure` on `outcome` is the *total effect* of that relationship, while the effect of `covariate` on `outcome` is the *direct effect*.\n\n[^06-not-just-a-stats-problem-4]: Additionally, OLS produces a *collapsable* effect.\n Other effects, like the odds and hazards ratios, are *non-collapsable*, meaning including unrelated variables in the model *can* change the effect estimate.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![The DAG for dataset 2, where `covariate` is a confounder. If you look closely, you'll realize that, from the perspective of the effect of `covariate` on the `outcome`, `exposure` is a *mediator*.](06-not-just-a-stats-problem_files/figure-html/fig-quartet_confounder-1.png){#fig-quartet_confounder width=288}\n:::\n:::\n\n\nWhat if we add `q`, a mutual cause of `covariate` and `outcome`?\nIn @fig-quartet_confounder_q, the adjustment sets are still different.\nThe adjustment set for `outcome ~ exposure` is still the same: `{covariate}`.\nThe `outcome ~ covariate` adjustment set is `{q}`.\nIn other words, `q` is a confounder for `covariate`'s effect on `outcome`.\nThe model `outcome ~ exposure + covariate` will produce the correct effect for `exposure` but not for the direct effect of `covariate`.\nNow, we have a situation where `covariate` not only answers a different type of question than `exposure` but is also biased by the absence of `q`.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![A modification of the DAG for dataset 2, where `covariate` is a confounder. Now, the relationship between `covariate` and `outcome` is confounded by `q`, a variable not necessary to calculate the unbiased effect of `exposure` on `outcome`.](06-not-just-a-stats-problem_files/figure-html/fig-quartet_confounder_q-1.png){#fig-quartet_confounder_q width=336}\n:::\n:::\n\n\nSpecifying a single causal model is deeply challenging.\nHaving a single model answer multiple causal questions is exponentially more difficult.\nIf attempting to do so, apply the same scrutiny to both[^06-not-just-a-stats-problem-5] questions.\nIs it possible to have a single adjustment set that answers both questions?\nIf not, specify two models or forego one of the questions.\nIf so, you need to ensure that the estimates answer the correct question.\nWe'll also discuss *joint* causal effects in @sec-interaction.\n\n[^06-not-just-a-stats-problem-5]: Practitioners of *casual* inference will interpret *many* effects from a single model in this way, but we consider this an act of bravado.\n\nUnfortunately, algorithms for detecting adjustment sets for multiple exposures and effect types are not well-developed, so you may need to rely on your knowledge of the causal structure in determining the intersection of the adjustment sets.\n", "supporting": [ "06-not-just-a-stats-problem_files" ], diff --git a/chapters/06-not-just-a-stats-problem.qmd b/chapters/06-not-just-a-stats-problem.qmd index 39097ad..492c62b 100644 --- a/chapters/06-not-just-a-stats-problem.qmd +++ b/chapters/06-not-just-a-stats-problem.qmd @@ -168,6 +168,7 @@ Standardizing numeric variables to have a mean of 0 and standard deviation of 1, ```{r} #| label: fig-causal_quartet_covariate_unscaled #| message: false +#| fig-cap: "@fig-causal_quartet_covariate, unscaled" causal_quartet |> # hide the dataset names mutate(dataset = as.integer(factor(dataset))) |>