Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHARMa results interpretation for a glmmTMB model fitting beta distribution #450

Open
Agiacobino opened this issue Oct 31, 2024 · 6 comments

Comments

@Agiacobino
Copy link

I am trying to evaluate the effect of different chemical treatments against the mite Varroa in adult bees during a 56 days treatment period.

The sample for Varroa is taken from each colony, by sampling about 300 bees and inspected for account for the number of mites in 300 bees. We then calculated the proportion of mites/bees for each colony.

So we sampled the colonies on day 0 ( treatment day), day 7, day 21, day 56 and day 70. Every time we also counted the number of brood frames in that colony because the brood availabilituy allows the mite to reproduce.

I fitted a model using glmmTMB including zero inflated intercept since treated colonies result in a lot of zeros.
Fixed effects: treatment, day of sampling, the interaction between treatment and day and the amount of brood as covariate. Also, included the colony ID as random effect.

Fullmodel<-glmmTMB(mites_adult_bees~Treatment+day+Treatment*day+brood_frames+(1|Col_ID),ziformula=~1,family=beta_family(),data=dflong)

To assess the model fitting I am using DHARMa package and obtain the following residues plot:
model residues plot

However I am having a hard time trying to interpret whether the results are suggesting that my data can be analyzed using beta distribution.

TIA,

Agostina

@melina-leite
Copy link
Collaborator

Hi @Agiacobino,

I didn't understand what your observation is: is it the number of mites/300 bees for each colony? If so, you have ONE observation per colony, so why use colony as a random effect (Are you trying to model it as an Observation Level Random Effect (OLRE)?)? If not, could you please explain your observation again?

Did you test for zero inflation before adjusting a zero-inflated beta regression? If you adjust a beta without zero-inflation, you could test it with testZeroInflation.

Now, looking just at the main plot of the residuals (there are other plots and tests you could use to investigate better the patterns, see the Vignette), it seems you have dispersion problems. If you run testDispersion() you may see the dispersion parameter and if it is related to over or underdispersion. Also, the significance of the tests depends much on the sample size: if it is large, any small depart from the dispersion of 1 will be considered significant. So, I suggest you judge the dispersion by the value itself not just its significance.

Best,
Melina

@Agiacobino
Copy link
Author

Hi @melina-leite thanks for your response.

My observation is thw #mites/#of bees ( I counted the bees so I have the exact amount in each sample). So my response is a proportion for each colony? I use colony as a random effect because I measured that day 5 times per colony over a period of 70 days ( day 0 ( treatment day), day 7, day 21, day 56 and day 70).

I could not run the model without zero inflation because I got the following warning:
Error en eval(family$initialize): y values must be 0 < y < 1

The model was:
Fullmodelnozi<-glmmTMB(mites_adult_bees~Treatment+day+Treatment*day+brood_frames+brood_frames+(1|Col_ID),family=beta_family(),data=dflong)

After day 0 ( treatment day ) a lot of colonies has 0 mites because of treatment efficacy that is why I thought that zero inflation was a better approach.

I run the model that first mentioned in the previous post:
Fullmodel<-glmmTMB(mites_adult_bees~Treatment+day+Treatment*day+brood_frames+brood_frames+(1|Col_ID),ziformula=~1,family=beta_family(),data=dflong).

Then I run the plots and test ( the residual plot is not exactly the same because we decide to remove one problematic colony that had an issue not conected to the trial, so it slightly change).

diagnose(
Fullmodel,
eval_eps = 1e-05,
evec_eps = 0.01,
big_coef = 10,
big_sd_log10 = 3,
big_zstat = 5,
check_coefs = TRUE,
check_zstats = TRUE,
check_hessian = TRUE,
check_scales = TRUE,
explain = TRUE
)

Outcome:
Unusually large Z-statistics (|x|>5):

                       (Intercept)                 dayprop_mites_adult_56 
                         -9.979968                               5.452730 
            dayprop_mites_adult_70     TreatmentOA:dayprop_mites_adult_56 
                          7.765163                              -5.142275 
TreatmentOA:dayprop_mites_adult_70 TreatmentAmi+OA:dayprop_mites_adult_70 
                         -5.182280                              -5.109453 
                    zi~(Intercept)                          d~(Intercept) 
                        -10.028616                              38.333521 

Large Z-statistics (estimate/std err) suggest a possible failure of the Wald
approximation - often also associated with parameters that are at or near the edge
of their range (e.g. random-effects standard deviations approaching 0).
(Alternately, they may simply represent very well-estimated parameters; intercepts
of non-centered models may fall in this category.) While the Wald p-values and
standard errors listed in summary() may be unreliable, profile confidence intervals
(see ?confint.glmmTMB) and likelihood ratio test p-values derived by comparing
models (e.g. ?drop1) are probably still OK. (Note that the LRT is conservative when
the null value is on the boundary, e.g. a variance or zero-inflation value of 0
(Self and Liang 1987; Stram and Lee 1994; Goldman and Whelan 2000); in simple cases
these p-values are approximately twice as large as they should be.)

#Residuals

model_res<-simulateResiduals(Fullmodel)
plot(model_res)

residual plot

testDispersion(Fullmodel)

data: simulationOutput
dispersion = 2.3443, p-value = 0.016
alternative hypothesis: two.sided

testZeroInflation(Fullmodel)

data: simulationOutput
ratioObsSim = 0.99146, p-value = 1 ## this makes sense since I adjusted for zero inflation
alternative hypothesis: two.sided

I agree with that all seems to be conected to dispersion problems. I include the outocome of the test that you suggested. BUt also the diagnose outcome is highlighting some issues. However, the main goal at the end is to have a good model for treatment comparisons, that we normally fit with simpler models except that this approach seems to suit better to the response variable.

I am happy to share the script and data if it might help, but I don´t want to take much of your time.

Thanks!!

Agostina

@melina-leite
Copy link
Collaborator

Hi @Agiacobino,

Yes, I missed the part where you have repeated measures of the same colony! You are right to include colony as a random intercept, and I was also wrong to suggest you try a nonzero-inflated model, given that you are using a beta distribution (it can't have zeros!).

Indeed, you have a large overdispersion parameter.

I was wondering if you could use a Binomial proportion, but it doesn't seem to be the case unless you counted just the presence of mites per bee, then the response variable would be the proportion of bees with mites (bees with mites/total bees). Alternatively (if bees can have more than one mite), I thought about using a Negative Binomial distribution with the number of mites as a response variable, and including the log of the number of bees counted as an offset in the model (in the end the offset will work like a way to have the average number of mites per bee), Does it make sense?
Negative binomial usually deals very well with zeros and overdispersion problems.

@Agiacobino
Copy link
Author

Hi Melina, thanks a lot for you insights and suggestions.

I will work on the Negative binomial model and probably contact you again so we can check on both models, if that is ok with you.

Thanks!

Agostina

@Agiacobino
Copy link
Author

Hi Melina, I run the models following your instructions.

The model fitting improved significantly.

I run first the negitve binomial model without zero inflation, then I tested the zero inflation parameter and it was not significant. However, the residual plot for the non zi model showed outliers. So I run a zi model, and the residuals did not show outliers.
The summary of the model are quite similar, so I just wanted to double check which model fits better from the residual point of view.

Model without zi

Fullmodel_nb<- glmmTMB(mites ~ Treatment+day+Treatment*day+brood_frames+brood_frames+
offset(log(bees)) + (1|Col_ID),
data = dflong,
ziformula = ~0,
family = nbinom2)

test zero inflation

testZeroInflation(Fullmodel_nb)

DHARMa zero-inflation test via comparison to expected zeros with simulation under H0
= fitted model

data: simulationOutput
ratioObsSim = 1.0922, p-value = 0.528
alternative hypothesis: two.sided

residual plot

Residual plot no zero inflation model

Model with zi

Fullmodel_nb_zi <- glmmTMB(mites ~ Treatment+day+Treatment*day+brood_frames+brood_frames+
offset(log(bees)) + (1|Col_ID),
data = dflong,
ziformula = ~1, family = nbinom2)

residual plot

Residual plot zero inflation model

So, my next question would be wich type of nbinom 1 or nbinom2 should I use?

I run the following code to see mean vs var by treatment and day and compared to theorical NB1 and NB2 following Owls example: a zero-inflated, generalized linear ( BOlker et al., 2012).

mean_var <- ddply(dflong,
.(Treatment:day),
summarise,
mitesmean=mean(mites, na.rm=TRUE),
mitesvar=var(mites, na.rm=TRUE))

q1<-qplot(mitesmean,mitesvar,data=mean_var)

q1

print(q1+
## linear (quasi-Poisson/NB1) fit
geom_smooth(method="lm",formula=yx-1)+
## semi-quadratic (NB2/LNP)
geom_smooth(method="lm",formula=y
I(x^2)+offset(x)-1,colour="purple"))

The plot
image

I am sorry to keep asking you, but it seems that makes a difference because I run both models and the results are quite different.

Thanks!!

Agostina

@melina-leite
Copy link
Collaborator

Hi @Agiacobino,

As far as I can see from your residual plot and tests, the model without zero inflation is fine. These two red dots (outliers) are not a problem for your analysis, and the test doesn't even detect an "excess of outliers." You can see the help of testOutliers to understand how it is generated.

Regarding which model to select (if zero-inflated or not or which parametrization of the negative binomial), I quote Florian from other issues:

#442 "residual checks are not a model selection criteria, as they do not account for complexity. Thus, if you want to know if you should add complexity to a model, use a model selection criteria."

#356 "For model selection, use tools like AUC or likelihood ratio tests*. For mixed models, the problem here is that the df are often not clear, so you have to be careful. DHARMa has a function for a simulated likelihood ratio test https://rdrr.io/cran/DHARMa/man/simulateLRT.html that circumvents this problem also works for comparing models with different variance structures."

*or AIC model selection criterion

Best,
Melina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants