diff --git a/chapters/21-sensitivity.qmd b/chapters/21-sensitivity.qmd index af09bfc..7bc4752 100644 --- a/chapters/21-sensitivity.qmd +++ b/chapters/21-sensitivity.qmd @@ -775,6 +775,293 @@ For instance, you may feel like these two DAGs are equally plausible or want to Thus far, we've probed some of the assumptions we've made about the causal structure of the question. We can take this further using quantitative bias analysis, which uses mathematical assumptions to see how results would change under different conditions. -### Tipping point analyses +### Sensitivity analyses for unmeasured confounders + +Sensitivity analyses for unmeasured confounding are important tools in observational studies to assess how robust findings are to potential unmeasured factors [@d2022sensitivity]. These analyses rely on three key components: + +1) the observed exposure-outcome effect after adjusting for measured confounders, +2) the estimated relationship between a hypothetical unmeasured confounder and the exposure, and +3) the estimated relationship between that unmeasured confounder and the outcome. + +By specifying plausible values for these relationships, researchers can quantify how much the observed effect might change if such an unmeasured confounder existed. + +Let's think about why this works in the context of our example above. Let's assume @fig-dag-magic-sens displays the true relationship between our exposure and outcome of interest. Suppose we hadn't measured historic high temperature, denoted by the dashed lines; we have an unmeasured confounder. The three key components above are described by 1) the arrow between 'Extra Magic Morning' and 'Average wait', 2) The arrow between 'Historic high temperature' and 'Extra Magic Morning', and 3) the arrow between 'Historic high temperature' and 'Average wait'. + +```{r} +#| label: fig-dag-magic-sens +#| echo: false +#| fig-cap: > +#| The original proposed DAG for the relationship between Extra Magic Hours +#| in the morning at a particular park and the average wait +#| time between 9 am and 10 am with the line type of each edge displaying +#| what is measured. Here, we have not measured Historic high temperature. +#| As before, we are saying that we believe 1) Extra Magic Hours impacts +#| average wait time and 2) both Extra Magic Hours and average wait time +#| are determined by the time the park closes, historic high temperatures, +#| and ticket season. +curvatures <- rep(0, 7) +curvatures[5] <- 0.3 +emm_wait_dag |> + tidy_dagitty() |> + node_status() |> + mutate(linetype = ifelse(name == "park_temperature_high", "dashed", "solid")) |> + ggplot( + aes(x, y, xend = xend, yend = yend, color = status, edge_linetype = linetype) + ) + + geom_dag_edges_arc(curvature = curvatures, edge_color = "grey80") + + geom_dag_point() + + geom_dag_text_repel(aes(label = label), size = 3.8, seed = 1630, color = "#494949") + + scale_color_okabe_ito(na.value = "grey90") + + theme_dag() + + theme(legend.position = "none") + + coord_cartesian(clip = "off") + + scale_x_continuous( + limits = c(-1.25, 2.25), + breaks = c(-1, 0, 1, 2) + ) +``` + +Various methods are available depending on the type of outcome (e.g. continuous, binary, time-to-event) and how much is known about potential unmeasured confounders. While these analyses cannot prove the absence of unmeasured confounding, they provide valuable insight into how sensitive results are to violations of the "no unmeasured confounders" assumption that is crucial for causal inference in observational studies. + +#### Observed exposure-outcome effect + +The first componenet, the observed exposure-outcome effect, is the proposed causal effect of interest, i.e. the effect you would like to perform a sensitivity analysis on. The effect itself will depend on the choice of outcome model, which in turn often depends on the distribution of the outcome and the desired effect measure: + +1. For continuous outcomes: Linear models or generalized linear models (GLMs) with Gaussian distribution and identity link are used, typically estimating a coefficient. + +2. For binary outcomes, we have a few choices: + +* GLMs with binomial distribution and log link +* GLMs with Poisson distribution and log link +* GLMs with binomial distribution and logit link +These estimate coefficients, which can be exponentiated to obtain risk ratios (log link models) or odds ratios (logit link models). + + +3. For time-to-event outcomes: Cox proportional hazards models are used, with the hazard ratio obtained by exponentiating the coefficient. + +Let's use the analysis from @tbl-alt-sets where we only adjusted for 'Time park closed' and 'Ticket season'. According to @fig-dag-magic-sens, we know 'Historic high temperature' is also a confounder, but it is *unmeasured* so we cannot include it in our practical adjustment set. This resulted in an observed effect of `r round(effects[2], 2)`. + +#### Unmeasured confounder-exposure effect + +The relationship between an unmeasured confounder and the exposure can be characterized in three ways: + +1. For a binary unmeasured confounder: + +* Prevalence of the unmeasured confounder in the exposed group +* Prevalence of the unmeasured confounder in the unexposed group + +2. For a continuous unmeasured confounder (assuming a normal distribution and unit variance): + +* Difference in means of the unmeasured confounder between exposed and unexposed groups + +3. Distribution-agnostic approach: + +* Partial $R^2$, representing the proportion of variation in the exposure explained by the unmeasured confounder after accounting for measured confounders + +These characterizations allow researchers to specify the unmeasured confounder-exposure relationship in sensitivity analyses, accommodating different types of confounders and levels of knowledge about their distribution. + +Our unmeasured confounder here, 'Historic high temperature', is continuous. For this example, let's assume it is normally distributed. We say to assume "unit variance" (a variance of 1), because it makes it easier to talk about the impact of the confounder in standard-deviation terms. Let's assume that on days with extra magic morning hours the historic high temperature is normally distributed with a mean of 80.5 degrees and a standard deviation of 9 degrees. Likewise, assume that on days without extra magic morning hours the historic high temperature is normally distributed with a mean of 82 degrees and a standard deviation of 9 degrees. We can convert these to 'unit variance' normally distributed variables by dividing by that standard deviation, 9 (sometimes we refer to this as *standardizing* our variable); this gives us a standardized mean of 8.94 for days with extra magic morning hours and 9.11 for the others, or a mean difference of -0.17. Hold on to this number; we'll use it in conjunction with the next section for our sensitivity analysis. + +#### Unmeasured confounder-outcome effect + +The relationship between an unmeasured confounder and the outcome can be quantified in two main ways: + +1. Coefficient-based approach: Estimate the coefficient for an unmeasured confounder in a fully adjusted outcome model. You can also estimate the exponentiated coefficient (risk ratio, odds ratio, or hazard ratio) + +2. Distribution-agnostic approach (for continuous outcomes): Use partial $R^2$, representing the proportion of variation in the outcome explained by the unmeasured confounder after accounting for the exposure and measured confounders + +Let's do the coeffient-based approach. In our case, we need to estimate what we think the coefficient bewteen our standardized 'Historic high temperature' variable and our outcome after adjusting for our exposure as well as the other measured confounders (in this case ticket season and the time the park closed). Another way to describe this effect in the context of this problem is: "How would the average posted wait time change if we changed the historic high temperature by one standard deviation, after adjusting for whether there were extra magic morning hours, the park close time, and the ticket season?" Let's suppose we think this would change by -2.3 minutes. That is, if the historic high temperature is one standard deviation unit higher (in our scenario, 9 degrees warmer), we expect this to decrease the average posted wait time by 2.3 minutes. + + +For a mathematical explanation of these quantities, see @d2022sensitivity. + +#### Putting the components together + +Once you have estimated the above three quantities, we can calculate a updated effect estimate between the exposure and outcome that takes into account an unmeasured factor like the one specified. We can use the {tipr} R package to perform these analyses. The functions in the {tipr} package follow a unified grammar. The function names follow this form: `{action}_{effect}_with_{what}`. + +For example, to adjust (`action`) a coefficient (`effect`) with a binary unmeasured confounder (`what`), we use the function `adjust_coef_with_binary()`. + + +Below is a copy of the table included in @lucy2022tipr about this package. + ++----------+--------------------+----------------------------------------------+ +| category | Function term | Use | ++==========+====================+==============================================+ +|**action**| `adjust` | These functions adjust observed effects, | +| | | requiring both the unmeasured | +| | | confounder-exposure relationship and | +| | | unmeasured confounder-outcome relationship to| +| | | be specified. | ++----------+--------------------+----------------------------------------------+ +| | `tip` | These functions tip observed effects. Only | +| | | one relationship, either the unmeasured | +| | | confounder-exposure relationship or | +| | | unmeasured confounder-outcome relationship | +| | | needs to be specified. | ++----------+--------------------+----------------------------------------------+ +|**effect**| `coef` | These functions specify an observed | +| | | coefficient from a linear, log-linear, | +| | | logistic, or Cox proportional hazards model | ++----------+--------------------+----------------------------------------------+ +| | `rr` | These functions specify an observed | +| | | relative risk | ++----------+--------------------+----------------------------------------------+ +| | `or` | These functions specify an observed | +| | | odds ratio | ++----------+--------------------+----------------------------------------------+ +| | `hr` | These functions specify an observed | +| | | hazard ratio | +| | | ++----------+--------------------+----------------------------------------------+ +|**what** | `continuous` | These functions specify an unmeasured +| | | standardized Normally distributed confounder. +| | | These functions will include the parameters +| | | `exposure_confounder_effect` and +| | | `confounder_outcome_effect` ++----------+--------------------+----------------------------------------------+ +| | `binary` | These functions specify an unmeasured binary +| | | confounder. These functions will include the +| | | parameters `exposed_confounder_prev`, +| | | `unexposed_confounder_prev`, and +| | | `confounder_outcome_effect` ++----------+--------------------+----------------------------------------------+ +| | `r2` | These functions specify an unmeasured +| | | confounder parameterized by specifying the +| | | percent of variation in the exposure / outcome +| | | explained by the unmeasured confounder. These +| | | functions will include the parameters +| | | `confounder_exposure_r2` and +| | | `outcome_exposure_r2` ++----------+--------------------+----------------------------------------------+ +: Grammar of `tipr` functions. {#tbl-sens} + +You can find full documentation here: [r-causal.github.io/tipr/](https://r-causal.github.io/tipr/) + +#### Example + +Ok, now we have everything we need to perform our sensitivity analysis. @tbl-sens provides what we need; we can use the {tipr} R package to apply these; we will use the `adjust_coef` function. Let's plug in the quantities we established above for each of the three parameters, `effect_observed`, `exposure_confounder_effect`, and `confounder_outcome_effect`. + +```{r} +library(tipr) +adjust_coef( + effect_observed = 6.58, + exposure_confounder_effect = -.17, + confounder_outcome_effect = -2.3 +) +``` + +Examining this output, we see that if there were an unmeasured confounder like the one we specified above, our observed effect, 6.58, would be attenuated to 6.19. That is, rather than the effect of extra magic morning hours increasing the average posted wait time at 9am by 6.58 minutes, the true effect would be 6.19 minutes, assuming our specifications about the unmeasured confounder were correct. Take a look at @tbl-alt-sets, this number should look familiar. + +In this case, our "guesses" about the relationship between our unmeasured confounder and the exposure and outcome were accurate because it was, in fact, measured! In reality, we often have to use other techniques to come up with these guesses. Sometimes, we have access to different data or previous studies that would allow us to quantify these effects. Sometimes, we have some information about one of the effects, but not the other (i.e., we have a guess for the impact of the historic high temperature on the average posted wait time, but not on whether there are extra magic morning hours). In these cases, one solution is an *array*-based approach where we specify the effect we are sure of and vary the one we are not. Let's see an example of that. We can plot this array to help us see the impact of this potential confounder. Examining @fig-sens-array, we can see, for example, that if there was a one standard deviation difference in historic high temperature such that extra magic morning hours were 9 degrees cooler on average than days without extra magic morning hours, the true causal effect of extra magic morning hours on the average posted wait time at 9am would be 4.28 minutes, rather than the observed 6.58 minutes. + +```{r} +#| label: fig-sens-array +#| fig-cap: > +#| Impact of a normally distributed unmeasured confounder with an assumed +#| confounder-outcome effect of -2.3 on an observed coefficient of 6.58 (dashed line). +#| The x-axis shows the assumed relationship between the exposure and unmeasured confounder. +#| The y-axis shows the corresponding relationship between the exposure and outcome after +#| adjusting for the unmeasured confounder + +library(tipr) +adjust_df <- adjust_coef( + effect_observed = 6.58, + exposure_confounder_effect = seq(0, -1, by = -0.05), + confounder_outcome_effect = -2.3, + verbose = FALSE +) + +ggplot( + adjust_df, + aes( + x = exposure_confounder_effect, + y = effect_adjusted + ) +) + + geom_hline(yintercept = 6.58, lty = 2) + + geom_point() + + geom_line() + + labs( + x = "Exposure - unmeasured confounder effect", + y = "Adjusted Effect" + ) +``` + + +Many times, we find ourselves without any certainty about a potential unmeasured confounder's impact on the exposure and outcome. One approach could be to decide on a range of *both* values to examine. @fig-sens-array-2 is an example of this. One thing we can learn by looking at this graph is that the adjusted effect will cross the null when a one standard deviation change in the historic high temperature changes the average posed wait time by at least ~7 minutes and there is around a one standard deviation difference between the average historic temperature on extra magic morning days compared to those without. This is known as a *tipping point*. + + +```{r} +#| label: fig-sens-array-2 +#| fig-cap: > +#| Impact of a normally distributed unmeasured confounder with an observed coefficient of #| 6.58 (dashed line). +#| The dotted line shows were the effect crosses the *null*, that is, where the adjusted +#| effect is actually 0. +#| The x-axis shows the assumed relationship between the exposure and unmeasured confounder; +#| each line represents a different relationship between the unmeasured confounder and the +#| outcome, varied from -1 to -7, as labeled on the left. +#| The y-axis shows the corresponding relationship between the exposure and outcome after +#| adjusting for each unmeasured confounder. + +library(tipr) +adjust_df <- adjust_coef( + effect_observed = 6.58, + exposure_confounder_effect = rep(seq(0, -1, by = -0.05), each = 7), + confounder_outcome_effect = rep(seq(-1, -7, by = -1), times = 21), + verbose = FALSE +) + +ggplot( + adjust_df, + aes( + x = exposure_confounder_effect, + y = effect_adjusted, + group = confounder_outcome_effect + ) +) + + geom_hline(yintercept = 6.58, lty = 2) + + geom_hline(yintercept = 0, lty = 3) + + geom_point() + + geom_line() + + geom_label( + data = adjust_df[141:147, ], + aes( + x = exposure_confounder_effect, + y = effect_adjusted, + label = confounder_outcome_effect + ) + ) + + labs( + x = "Exposure - unmeasured confounder effect", + y = "Adjusted Effect" + ) +``` + + +#### Tipping point analyses + +Tipping point sensitivity analyses aim to determine the characteristics of an unmeasured confounder that would change the observed effect to a specific value, often the null. Instead of exploring a range of values for unknown sensitivity parameters, it identifies the value that would "tip" the observed effect. This approach can be applied to point estimates or confidence interval bounds. The analysis calculates the smallest possible effect of an unmeasured confounder that would cause this tipping. By rearranging equations and setting the adjusted outcome to the null (or any value of interest), we can solve for a single sensitivity parameter, given other parameters. The {tipr} R package also provides functions to perform these calculations for various scenarios, including different effect measures, confounder types, and known relationships. + +Using the example above, let's use the `tip_coef` function to see what would tip our observed coefficient. For this, we only need to specify one, either the exposure-unmeasured confounder effect *or* the unmeasured confounder-outcome effect and the function will calculate the other that would result in "tipping" the observed effect to the null. Let's first replicate what we saw in @fig-sens-array-2. Let's specify the unmeasured confounder effect-outcome effect of -7 minutes. The output below tells us that if there is an unmeasured confounder with an effect on the outcome of -7 minutes, it would need to have a diffence of -0.94 in order to tip our observed effect of 6.58 minutes. + +```{r} +tip_coef( + effect_observed = 6.58, + confounder_outcome_effect = -7 +) +``` + +Instead, let's suppose we were pretty sure about the unmeasured confounder-outcome effect we assumed above, -2.3, but we weren't sure what to expect in terms of the relationship between historic high temperature and whether the day had extra magic morning hours. Let's see how big the difference there would have to be for a confounder like this to make our observed effect of 6.58 minutes tip to 0 minutes. + +```{r} +tip_coef( + effect_observed = 6.58, + confounder_outcome_effect = -2.3 +) +``` + +This shows that we would need an effect of -2.86 between the expsoure and the confounder. In other words, for this particular example, we would need the average difference in historic temperature to be ~25 degrees (-2.86 times our standard deviation, 9), to change our effect to the null. This is a pretty huge, and potentially implausible, effect. If we were missing historic high temperature, and assuming we are correct about our -2.3 estimate, we could feel pretty confident that missing this variable is not skewing our result so much that we are directionally wrong. + + ### Other types of QBA diff --git a/citations.bib b/citations.bib index 90ac73d..8a61edf 100644 --- a/citations.bib +++ b/citations.bib @@ -714,6 +714,17 @@ @Article{keil2014 Month="Nov" } +@article{d2022sensitivity, + title={Sensitivity analyses for unmeasured confounders}, + author={D'Agostino McGowan, Lucy}, + journal={Current Epidemiology Reports}, + volume={9}, + number={4}, + pages={361--375}, + year={2022}, + publisher={Springer} +} + @article{dagostinomcgowan2023, title = {Causal Inference Is Not Just a Statistics Problem}, author = {{D{\textquoteright}Agostino McGowan}, Lucy and Gerke, Travis and Barrett, Malcolm},