From b77daf74a2fc0e64decc672d534afd2687d67293 Mon Sep 17 00:00:00 2001 From: ahisi Date: Sat, 13 Jan 2024 10:47:37 +0100 Subject: [PATCH] Final proofreading - Corrections --- published-202312-favrot-hierarchical-supp.qmd | 12 ++++++------ published-202312-favrot-hierarchical.qmd | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/published-202312-favrot-hierarchical-supp.qmd b/published-202312-favrot-hierarchical-supp.qmd index dd49697..36fc63b 100644 --- a/published-202312-favrot-hierarchical-supp.qmd +++ b/published-202312-favrot-hierarchical-supp.qmd @@ -5,18 +5,18 @@ library(ggplot2) ## Model adjustment and comparison to the negative binomial model { .unnumbered } -To check the model's fit to the data, we performed a posterior predictive check of our model to check that the data were compatible with the model assumptions. To do so, we computed the probability of exceeding each individual data with the fitted model (2). Note that the number of pest individuals per plant are not available in practice; the data correspond to observed numbers of pest individuals for groups of $N_i$ plants. Based on the posterior probability check, the computed probabilities were all falling in the range 0.22-0.93 (except for the observations equal to 0, for which the probability of being greater was equal to 1), and were thus not extreme. This result indicates that the model specified is not incompatible with the observed data and that the over-dispersion was correctly taken into account. +To check the model's fit to the data, we performed a posterior predictive check of our model to check that the data were compatible with the model assumptions. To do so, we computed the probability of exceeding each individual data with the fitted model (2). Note that the number of pest individuals per plant are not available in practice; the data correspond to observed numbers of pest individuals for groups of $N_i$ plants. Based on the posterior probability check (@fig-posterior-pred-check), the computed probabilities were all falling in the range 0.22-0.93 (except for the observations equal to 0, for which the probability of being greater was equal to 1), and were thus not extreme. This result indicates that the model specified is not incompatible with the observed data and that the over-dispersion was correctly taken into account. -We also fitted a new model including a negative binomial distribution instead of a Poisson distribution. The results were almost identical between both types of model. See the figure below. +We also fitted a new model including a negative binomial distribution instead of a Poisson distribution. The results were almost identical between both types of model (@fig-obs-vs-pred). ```{r, echo = FALSE} MCMC_samples_of_predictions = readRDS(file = "data/MCMC_samples_of_predictions.rds") ``` -```{r fig-S1, echo = FALSE} +```{r fig-S1, echo = FALSE, fig.height = 4} #| label: fig-posterior-pred-check -#| fig-cap: "Posterior predictive check, The X-axis is on a logarithmic scale and represents the number of aphids increased by 1." +#| fig-cap: "Posterior predictive check, The x-axis is on a logarithmic scale and represents the number of aphids increased by 1." ggplot(MCMC_samples_of_predictions) + geom_point(aes(x = Y + 1, y = posterior_predictive_check)) + @@ -27,9 +27,9 @@ ggplot(MCMC_samples_of_predictions) + ``` -```{r fig-S2, echo = FALSE} +```{r fig-S2, echo = FALSE, fig.height = 4} #| label: fig-obs-vs-pred -#| fig-cap: "Observed vs predicted values" +#| fig-cap: "Observed vs predicted values." ggplot(MCMC_samples_of_predictions) + geom_point(aes(x = Y, y = prediction)) + xlab("Observed number of aphids") + diff --git a/published-202312-favrot-hierarchical.qmd b/published-202312-favrot-hierarchical.qmd index 2232732..2eb619f 100644 --- a/published-202312-favrot-hierarchical.qmd +++ b/published-202312-favrot-hierarchical.qmd @@ -93,7 +93,7 @@ Pest prevalence and intensity are commonly measured in factorial field trials to ## Description of the data -Data are collected in 32 field trials conducted in France, Belgium and the Netherlands to compare several treatments against aphids in sugar beets. Each trial consists in a plot located in a given site at a given year (site-year) divided into one to four blocks. Each of these blocks is itself divided into strips where different treatments are tested, one of these treatments being an untreated control and the others corresponding to different types of insecticide. In each strip of each block, the number of aphids is counted on a sample of 10 beet plants (intensity). The number of infested plants (prevalence) is measured as well, but only in 15 trials out of 32. The total numbers of intensity and prevalence data are equal to 1128 and 561, respectively. Note that the number of aphids is not counted on each beet plant but in the whole plant sample. Intensity and prevalence are monitored at different times after treatments. As shown in @fig-one A, the data set is unbalanced as less data are available for the treatment Mavrik-jet compared to the others. @fig-one-1 B shows that the intensity and prevalence tend to increase with time. +Data are collected in 32 field trials conducted in France, Belgium and the Netherlands to compare several treatments against aphids in sugar beets. Each trial consists in a plot located in a given site at a given year (site-year) divided into one to four blocks. Each of these blocks is itself divided into strips where different treatments are tested, one of these treatments being an untreated control and the others corresponding to different types of insecticide. In each strip of each block, the number of aphids is counted on a sample of 10 beet plants (intensity). The number of infested plants (prevalence) is measured as well, but only in 15 trials out of 32. The total numbers of intensity and prevalence data are equal to 1128 and 561, respectively. Note that the number of aphids is not counted on each beet plant but in the whole plant sample. Intensity and prevalence are monitored at different times after treatments. As shown in @fig-one A, the data set is unbalanced as less data are available for the treatment Mavrik-jet compared to the others. @fig-one B shows that the intensity and prevalence tend to increase with time. @@ -787,7 +787,7 @@ tab7 %>% In case A (@tbl-params-future), the distribution of Z is such that Z is rarely close to 1 and often lower than 0.5 (@fig-height A.2). In case C, the distribution of Z is such that Z is often very close to 1 (100% of plants infested). Case B is intermediate. The accuracy of the estimated values of the model parameters $\gamma$ is better with scenario "100% Y - 0% Z" than with scenario "0% Y - 100% Z", for all number of trials. The advantage of "100% Y - 0% Z" is stronger in case of high pest prevalence (i.e., cases B and C) but very small in case of low pest prevalence (case A). For example, with 20 trials, the mean absolute error is 27% lower in the scenario "100% Y - 0% Z" than in "0% Y - 100% Z" for parameter set C (0.55 vs. 0.75), 10% lower for parameter set B (0.55 vs. 0.62), and not different for parameter set A (0.64). The "100% W" scenario leads to similar results as "100% Y - 0% Z", regardless of $\alpha_0$ and the number of trials. Results obtained with "100% Y / 100% Z" are generally similar to those obtained with "100% Y - 0% Z" and "100% W" but better than those obtained with the scenario "0% Y - 100% Z" in cases B and C. Here again, results show that 20 trials are not sufficient to obtain accurate parameter estimates. ```{r figure8, cache = FALSE, out.width="100%", fig.height = 6, echo = FALSE} -#| label: fig-eight +#| label: fig-height #| fig-cap: "Comparison of the \"100% Y - 0% Z\", \"0% Y - 100% Z\", \"100% Y / 100% Z\" and \"100% W\" scenarios according to the distribution of $Z$ and the number of trials, using the $E_{\\gamma}$ criterion (@eq-E_gamma). **A**, **B** and **C** correspond to different $Z$ distributions which are given by A.2, B.2 and C.2 (distribution for a number of trials equal to 40). A, B and C respectively correspond to $\\alpha_0 =$ -1, 1 and 2. The details of the simulation parameters are given in @tbl-params-future. A1, B1 and C1 represent the absolute error $E_{\\gamma}$ averaged over the 974 simulated data sets as a function of the number of trials. Colors correspond to the different scenarios." res_simu_q3_1 = readRDS(file = "results/Simulations/res_simu_q3_1.rds");