Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference results marginaleffects vs riskRegression/predict (cox) #1237

Open
csthiago opened this issue Oct 21, 2024 · 6 comments
Open

Difference results marginaleffects vs riskRegression/predict (cox) #1237

csthiago opened this issue Oct 21, 2024 · 6 comments

Comments

@csthiago
Copy link

Hi Vincent,

I was testing the marginaleffects with cox regression.
I get slightly different results that using predict from survival (point estimate).


library(survival)
library(riskRegression)
library(marginaleffects)
set.seed(100)

dt <- survival::bladder
dt$rx <- as.factor(dt$rx)
fit <- coxph(formula = Surv(stop,event)~ rx+size+number,data=dt,y=TRUE,x=TRUE)

avg_comparisons(fit, variables = "rx",newdata = datagrid(stop=20),
            type="survival")
rx1_mean <- predict(fit, type="survival",
        newdata = dt |> mutate(stop=20,
                               rx="1")) |> 
  mean()
rx2_mean <- predict(fit, type="survival",
        newdata = dt |> mutate(stop=20,
                               rx="2")) |> 
  mean()
rx2_mean-rx1_mean
ateFit1a <- ate(fit, data = dt, treatment = "rx", times = 20)
summary(ateFit1a)

The value I get from avg_comparisons is
image
The point estimate using predict is 0.1034387
and the values from riskRegression ATE is:
image

Do you have any idea why the difference?

Thank you very much

@vincentarelbundock
Copy link
Owner

datagrid() sets all unnamed variables to their mean or mode. That could be a difference, but I don't know what the ate() function does.

@csthiago
Copy link
Author

the ate calculates the average treatment effect using g-formula (or ipw or double robust).
I have redone using dt > mutate and the PE is equal (silly me). The CI is different, but the riskRegression uses influence function instead delta.
Just to confirm one point.
To calculate risk difference for cox regression (in each time), the sytanx would be:

avg_comparisons(fit, variables = "rx",by="time",
            type="survival")

and for risk ratio would be type="lp" ?

Thanks

@vincentarelbundock
Copy link
Owner

As always, this function computes a difference on the specified scale. Here, type="lp" means a difference between the two groups on the linear probability scale.

You can compute ratios instead of differences on any scale, by specifying type with avg_comparisons(fit, variables = "rx", comparison = "ratio")

@ngreifer
Copy link
Contributor

To do g-computation using avg_comparisons() to get the same estimate as the ate() function, you only need to add grid_type = "counterfactual" to datagrid. So running the following should yield the correct estimate:

avg_comparisons(fit, variables = "rx", newdata = datagrid(stop = 20, grid_type = "counterfactual"),
                type = "survival")

Note that the predictions are on the survival probability scale when setting type = "survival", whereas for ate(), the predictions are on the risk scale (1 - survival). That means to get the risk difference with the correct sign, you need to manually program the comparison:

avg_comparisons(fit, variables = "rx", newdata = datagrid(stop = 20, grid_type = "counterfactual"),
                type = "survival", comparison = \(hi, lo) (1 - mean(hi)) - (1 - mean(lo)))

To get the risk ratio, you again need to manually program the comparison because the usual method of setting comparison = "ratio" computes the survival ratio. So the risk ratio would be:

avg_comparisons(fit, variables = "rx", newdata = datagrid(stop = 20, grid_type = "counterfactual"),
                type = "survival", comparison = \(hi, lo) (1 - mean(hi)) / (1 - mean(lo)))

This uses a symmetric confidence interval around the ratio, which may not be accurate. Instead, we can compute the symmetric confidence interval around the log of the risk ratio and then exponentiate that:

avg_comparisons(fit, variables = "rx",newdata = datagrid(stop = 20, grid_type = "counterfactual"),
                type = "survival", comparison = \(hi, lo) log((1 - mean(hi))/(1 - mean(lo))), transform = "exp")

The point estimate will be the same, and agrees with that of ate(). I can't speak to which method of computing the standard error is more accurate (delta method vs. influence function).

Note, to get the marginal risk predictions using avg_predictions(), you can supply the same arguments (without comparisons) but to get the risks rather than survival probabilities you would use byfun, e.g.,

avg_predictions(fit, variables = "rx", newdata = datagrid(stop = 20, grid_type = "counterfactual"),
                type = "survival", byfun = \(...) 1 - weighted.mean(...))

@csthiago
Copy link
Author

Thank you very much, @ngreifer !

@vincentarelbundock
Copy link
Owner

Thanks @csthiago for raising this and @ngreifer for sovling it! I really appreciate it.

I'll leave this open for now as I'll soon be working on a potential coxph vignette. Might refer back to this eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants