Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-collapsibility: additive vs multiplicative models #234

Closed
malcolmbarrett opened this issue Mar 12, 2024 · 5 comments · Fixed by #271
Closed

Non-collapsibility: additive vs multiplicative models #234

malcolmbarrett opened this issue Mar 12, 2024 · 5 comments · Fixed by #271
Assignees

Comments

@malcolmbarrett
Copy link
Collaborator

malcolmbarrett commented Mar 12, 2024

I think I have a working simulation that shows the contrast in non-collapsibility effects in models that were generated with additive and multiplicative models.

@LucyMcGowan, what do you think?

One thing I find weird but may be doing incorrectly: the LPM works for the additive model but not for the multiplicative model. I would have guessed using the LPM would workish for the multiplicative model because even though additive, it's collapsible. Could it be due to something in the generating process? Is my expectation wrong about what log odds ratio it should produce?

set.seed(12345)
library(tidyverse)
n <- 1000000

multiplicative_data <- tibble(
  x1 = rnorm(n, mean = 0, sd = 1),
  x2 = rnorm(n, mean = 0, sd = 1),
  x3 = rnorm(n, mean = 0, sd = 1)
)

# baseline log-odds
beta0 <- -1
# coef log-odds ratio
beta1 <- 1.1
beta2 <- -1.1
beta3 <- 0.5

# calculate multiplicative probability on logistic scale
prob <- with(
  multiplicative_data, 
  plogis(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3 + rnorm(n, sd = .1))
)

multiplicative_data <- multiplicative_data |>
  mutate(y = rbinom(n, 1, prob))

# answer is wrong without all three
glm(y ~ x1, data = multiplicative_data, family = binomial())$coefficients
#> (Intercept)          x1 
#>  -0.7873634   0.8763435
glm(y ~ x1 + x2 + x3, data = multiplicative_data, family = binomial())$coefficients
#> (Intercept)          x1          x2          x3 
#>  -0.9956394   1.0980438  -1.1027489   0.4968647

# using a linear probability model
coefs <- lm(y ~ x1, data = multiplicative_data)$coefficients
coefs
#> (Intercept)          x1 
#>   0.3368481   0.1696988
# log odds ratio. Also wrong?
log((coefs[2] / (2 - coefs[2])) / (coefs[1] / (1 - coefs[1])))
#>        x1 
#> -1.700838


additive_data <- tibble(
  x1 = rnorm(n, mean = 0, sd = 1),
  x2 = rnorm(n, mean = 0, sd = 1),
  x3 = rnorm(n, mean = 0, sd = 1)
)

# Baseline probability
beta0 <- 0.5
# Coef risk difference
beta1 <- 0.1  
beta2 <- 0.1  
beta3 <- 0.1

# Calculate probabilities using a linear combination of predictors
# Clamp to 0 and 1
prob <- with(additive_data, beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3)
prob <- pmin(pmax(prob, 0), 1)

additive_data <- additive_data |>
  mutate(y = rbinom(n, 1, prob))

# expected log odds ratio of each coef:
# log((.6/(1-.6)) / (.5/(1-.5))) = 0.4055
# answer is wrong with more than one predictor of y
glm(y ~ x1, data = additive_data, family = binomial())$coefficients
#>  (Intercept)           x1 
#> -0.001851411  0.417459645
glm(y ~ x1 + x2 + x3, data = additive_data, family = binomial())$coefficients
#>  (Intercept)           x1           x2           x3 
#> -0.001874116  0.455620087  0.452392691  0.454884135

# correct on the linear, collapsible scale
lm(y ~ x1, data = additive_data)$coefficients
#> (Intercept)          x1 
#>   0.4995522   0.1001733
lm(y ~ x1 + x2 + x3, data = additive_data)$coefficients
#> (Intercept)          x1          x2          x3 
#>  0.49955491  0.10017435  0.09946685  0.09999777

Created on 2024-03-12 with reprex v2.1.0


<sup>Created on 2024-03-12 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>
@malcolmbarrett
Copy link
Collaborator Author

Note: additive model is collapsible but wrong, e.g. adding the other two will give the same (wrong?) answer. It's because it's the wrong data gen mechanism

@malcolmbarrett
Copy link
Collaborator Author

log odds calc is wrong for the first one because it doesn't include the baseline probability

@LucyMcGowan
Copy link
Collaborator

LucyMcGowan commented Mar 12, 2024

just to follow up on your comment, I think this should be:

log(((coefs[1] + coefs[2]) / (1 - (coefs[1] + coefs[2]))) / (coefs[1] / (1 - coefs[1])))

but also this is kind of a weird thing to do here because it will be different for every difference in x1 I think (like here we are saying the difference between x1 = 0 and x1 = 1, but because the odds ratios are not linear if instead you looked at the difference between x1 = 1 and x1 = 2 it would be a different value

@LucyMcGowan
Copy link
Collaborator

I also don't think you want random error here:

  plogis(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3 + rnorm(n, sd = .1))

because the randomness is already baked in by pulling from plogis, and then also by running through rbinom. I think maybe would make more sense to just have the randomness come from the draw from the binomial distribution, so something like this:

prob <- with(
  multiplicative_data, 
  1 / (1 + exp(-(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3))
)

malcolmbarrett added a commit that referenced this issue Mar 14, 2024
update collapsibility comment (related to #234)
@LucyMcGowan
Copy link
Collaborator

LucyMcGowan commented Mar 15, 2024

Ah maybe I’m wrong! Is plogis deterministic? This seems to say yes! Edit: oh yeah of course it is 🫣 my poor little brain hahaha. I still think we don’t need rnorm in there since we already do rbinom but plogis is good 😂

https://x.com/noah_greifer/status/1768505576143659132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants