From 9d744756dd7ab97bf2b31d5c47bca1430a2154df Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Wed, 8 Jan 2025 14:16:07 -0800 Subject: [PATCH] reviewed ch IBD and latin sq design --- chapters/incomplete-block-design.qmd | 30 ++++----- chapters/latin-design.qmd | 95 ++++++++++++++++++---------- chapters/repeated-measures.qmd | 8 +-- docs/search.json | 43 ++++++++----- 4 files changed, 106 insertions(+), 70 deletions(-) diff --git a/chapters/incomplete-block-design.qmd b/chapters/incomplete-block-design.qmd index b61f37e..f2771eb 100644 --- a/chapters/incomplete-block-design.qmd +++ b/chapters/incomplete-block-design.qmd @@ -18,7 +18,6 @@ Incomplete block designs are grouped into two groups: (1) balanced lattice desig In alpha-lattice design, the blocks are grouped into complete replicates. These designs are also termed as "resolvable incomplete block designs" or "partially balanced incomplete block designs" [@paterson]. This design has been more commonly used instead of balanced IBD because of it's practicability, flexibility, and versatility. -To avoid having a disconnected design, a balanced incomplete block design can be used. ### Statistical Model @@ -93,7 +92,6 @@ desplot::desplot(dat, text=gen, cex=1, out1=block, out2=gen, out2.gpar=list(col = "black", lwd = 1, lty = 1), main="Incomplete block design") - # desplot::desplot(dat, yield~col*row, # text=gen, shorten='none', cex=.6, out1=block, # aspect=252/96, # true aspect @@ -232,7 +230,20 @@ emmeans(model_icbd1, ~ gen) ### Partially Balanced IBD (Alpha Lattice Design) -The data used in this example is published in *Cyclic and Computer Generated Designs* [@john_cyclic]. The data in this trial was laid out in an alpha lattice design. This trial data had 24 genotypes ("gen"), 6 incomplete blocks, each replicated 3 times. +The statistical model for partially balanced design includes: + +$$y_{ij(l)} = \mu + \alpha_i + \beta_{i(l)} + \tau_j + \epsilon_{ij(l)}$$ + +Where: + +$\mu$ = overall experimental mean +$\alpha$ = replicate effect (random) +$\beta$ = incomplete block effect (random) +$\tau$ = treatment effect (fixed) +$\epsilon_{ij(l)}$ = intra-block residual + + +The data used in this example is published in *Cyclic and Computer Generated Designs* [@john_cyclic]. The trial was laid out in an alpha lattice design. This trial data had 24 genotypes ("gen"), 6 incomplete blocks, each replicated 3 times. Let's start analyzing this example first by loading the required libraries for linear mixed models: @@ -324,7 +335,7 @@ The response variables seems to follow a normal distribution curve, with fewer v ### lme4 ```{r} -mod_alpha <- lmer(yield ~ gen + (1|rep:block), +mod_alpha <- lmer(yield ~ gen + (1|rep/block), data = data1, na.action = na.exclude) tidy(mod_alpha) @@ -338,15 +349,6 @@ mod_alpha1 <- lme(yield ~ gen, data = data1, na.action = na.exclude) tidy(mod_alpha1) - -## need to try pdIdent here -# model_lme <-lme(yield ~ gen, -# random = list(one = pdBlocked(list( -# pdIdent(~ 0 + rep:block)))), -# data = data1 %>% mutate(one = factor(1))) -# -# summary(model_lme) - ``` ::: @@ -366,7 +368,6 @@ check_model(mod_alpha1, check = c('normality', 'linearity')) ``` ::: - #### Inference Let's ANOVA table using `anova()` from lmer and lme models, respectively. @@ -380,7 +381,6 @@ anova(mod_alpha, type = "1") #### nlme ```{r} anova(mod_alpha1, type = "sequential") -#anova(model_lme, type = "sequential") ``` ::: diff --git a/chapters/latin-design.qmd b/chapters/latin-design.qmd index a02db45..a63f90f 100644 --- a/chapters/latin-design.qmd +++ b/chapters/latin-design.qmd @@ -8,24 +8,27 @@ par(mar=c(5.1, 6, 4.1, 2.1)) ## Background -Latin square design In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments. +In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments. Advantages of Latin square design are: + 1. The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels. + 2. The analysis is quite simple. -Disadvantage: -1. A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤t≤10. +Disadvantages: + +1. A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤ t≤ 10. 2. Any additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means. -3. The effect of each treatment on the response must be approximately the same across rows and columns. +3. The effect of each treatment on the response must be approximately same across the rows and columns. Statistical model for a response in Latin square design is: $Y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijk}$ -where, $\mu$ is the experiment mean, $\alpha_i's$ are treatment effects, $\beta$ and $\gamma$ are the row- and column specific effects. +where, $\mu$ is the experiment mean, $\alpha_i's$ represents treatment effect, $\beta$ and $\gamma$ are the row- and column specific effects. Assumptions of this design includes normality and independent distribution of error ($\epsilon_{ijk}$) terms. And there is no interaction between two blocking (rows & columns) factors and treatments. @@ -40,6 +43,7 @@ Let's start the analysis firstly by loading the required libraries: library(lme4); library(lmerTest); library(emmeans); library(performance) library(dplyr); library(broom.mixed); library(agridat); library(desplot) ``` + ### nlme ```{r, message=FALSE, warning=FALSE} @@ -53,6 +57,7 @@ The data used in this example is extracted from the `agridat` package. In this e ```{r} dat <- agridat::goulden.latin ``` + | | | |-------|-------------------------------| | trt | treatment factor, 5 levels | @@ -63,10 +68,13 @@ dat <- agridat::goulden.latin : Table of variables in the data set {tbl-latin} ### Data integrity checks + Firstly, let's verify the class of variables in the dataset using `str()` function in base R + ```{r} str(dat) ``` + Here yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change 'row' and 'col' from integer t factor/character. ```{r} @@ -75,28 +83,38 @@ dat1 <- dat |> col = as.factor(col)) ``` -Next, to verify if the data meets the assumption of the Latin square design let's plot the field layout for this experiment. -```{r} -desplot::desplot(data = dat, flip = TRUE, - form = yield ~ row + col, - out1 = row, out1.gpar=list(col="black", lwd=3), - out2 = col, out2.gpar=list(col="black", lwd=3), - text = trt, cex = 1, shorten = "no", - main = "Field layout", - show.key = FALSE) +Next, to verify if the data meets the assumption of the Latin square design let's plot the field layout for this experiment. -``` +```{r, echo=FALSE, warning=FALSE} -This looks great! Here we can see that there are equal number of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn't appear more than once in each row and column. +desplot::desplot(data = dat1, flip = TRUE, + form = trt ~ col + row, + text = trt, cex = 0.7, shorten = "no", + out1 = trt, + # out2 = block, + main = "Alpha Lattice Design", show.key =F) +# desplot::desplot(data = dat, flip = TRUE, +# form = yield ~ row + col, +# out1 = row, out1.gpar=list(col="black", lwd=3), +# out2 = col, out2.gpar=list(col="black", lwd=3), +# text = trt, cex = 1, shorten = "no", +# main = "Field layout", +# show.key = FALSE) +``` + +This looks great! Here we can see that there are equal number (5) of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn't appear more than once in each row and column. Next step is to check if there are any missing values in response variable. + ```{r} apply(dat, 2, function(x) sum(is.na(x))) ``` -And we do not have any missing values in the data. + +No missing values detected in this data set. Before fitting the model, let's create a histogram of response variable to see if there are extreme values. + ```{r, echo=FALSE} #| label: lattice_design #| fig-cap: "Histogram of the dependent variable." @@ -110,8 +128,11 @@ hist(dat$yield, main = "", xlab = "yield") ``` ### Model fitting + Here we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect. +VarCorr(m1_b) + ::: panel-tabset ### lme4 @@ -119,81 +140,87 @@ Here we will fit a model to evaluate the impact of fungicide treatments on wheat m1_a <- lmer(yield ~ trt + (1|row) + (1|col), data = dat1, na.action = na.exclude) -tidy(m1_a) +summary(m1_a) ``` ### nlme + ```{r} -dat$dummy <- factor(1) m1_b <- lme(yield ~ trt, random =list(~1|row, ~1|col), - #list(dummy = pdBlocked(list( - # pdIdent(~row - 1), - # pdIdent(~col - 1)))), data = dat, na.action = na.exclude) summary(m1_b) -#VarCorr(m1_b) ``` ::: ### Check Model Assumptions -::: panel-tabset +This step involves inspection of model residuals. by using `check_model()` function from the "performance" package. + +:::: panel-tabset #### lme4 + ```{r, fig.height=3} check_model(m1_a, check = c("linearity", "normality")) ``` #### nlme -::: {layout-ncol=2 .column-body} - +::: {.column-body layout-ncol="2"} ```{r echo=FALSE, eval=FALSE} par(mar=c(5.1, 5, 2.1, 2.1)) plot(residuals(m1_b), xlab = "fitted values", ylab = "residuals", cex.lab = 1.8, cex.axis = 1.5); abline(0,0) ``` - ```{r echo=FALSE, eval=FALSE} par(mar=c(5.1, 5, 2.1, 2.1)) qqvals <- qqnorm(residuals(m1_b), plot.it=FALSE) qqplot(qqvals$x, qqvals$y, xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", cex.lab = 1.7, cex.axis = 1.5); qqline(residuals(m1_b)) ``` -::: +::: ```{r, fig.height=3} check_model(m1_b, check = c("linearity", "normality")) ``` -::: +:::: + +These visuals imply that assumptions of linear model have been met. ### Inference -We can look look at the analysis of variance for treatment effect on yield using `anova()` function. + +We can now proceed to the variance partioning. In this case, we will use `anova()` with `type = 1` or `type = "sequesntial"` for lmer() and lme() models, respectively. ::: panel-tabset #### lme4 -```{r, fig.height=3} + +```{r} anova(m1_a, type = "1") ``` #### nlme -```{r, fig.height=3} + +```{r} anova(m1_b, type = "sequential") ``` ::: -Here we observed a significant impact on fungicide treatment on crop yield. Let's have a look at the estimated marginal means of wheat yield with each treatment using `emmeans()` function. +Both models have detected a significant treatment effect. Here we observed a significant impact on fungicide treatment on crop yield. Let's have a look at the estimated marginal means of wheat yield with each treatment using `emmeans()` function. ::: panel-tabset #### lme4 + ```{r, fig.height=3} emmeans(m1_a, ~ trt) ``` #### nlme + ```{r, fig.height=3} emmeans(m1_b, ~ trt) ``` -::: \ No newline at end of file +::: + +We see that wheat yield was higher with 'C' fungicide treatment compared to other fungicides applied in this study. Which implies that 'C' fungicide was more efficient in controlling the stem rust in wheat. diff --git a/chapters/repeated-measures.qmd b/chapters/repeated-measures.qmd index 57136b5..30fa9f7 100644 --- a/chapters/repeated-measures.qmd +++ b/chapters/repeated-measures.qmd @@ -4,11 +4,11 @@ source(here::here("settings.r")) ``` -In the previous chapters we covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures random effects. +In the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects. -Studies that involve repeated observations of the exact same experimental units require a repeated measures component to properly model correlations across time with the experiment unit. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to repeated measures effect while analyzing the main effects. +Studies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time of each subject. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects. -In these models, the 'iid' assumption (idependently and identically distributed) is being violated, so we need to introduce specialized covariance structures that can account for these correlations between error terms. +In these models, the 'iid' assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms. There are several types of covariance structures: @@ -97,7 +97,6 @@ ggplot(data = dat, aes(y = y, x = factweek, fill = variety)) + ``` Looks like variety '1' has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. - One last step before we fit model is to look at the distribution of response variable. ```{r, eval=FALSE} @@ -224,7 +223,6 @@ Firstly, we need to look at the class of variables in the data set. ```{r} str(Yield) ``` - We will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect. ::: column-margin diff --git a/docs/search.json b/docs/search.json index 339de94..91707a1 100644 --- a/docs/search.json +++ b/docs/search.json @@ -192,7 +192,7 @@ "href": "chapters/incomplete-block-design.html", "title": "9  Incomplete Block Design", "section": "", - "text": "9.1 Background\nThe block design in Chapter 4 was complete, meaning that every block contained all the treatments. In practice, it may not be possible to have too many treatments in each block. Sometimes, there are also situations where it is advised to not have many treatments in each block.\nIn such cases, incomplete block designs are used where we have to decide what subset of treatments to be used in an individual block. This will work well if we enough blocks. However, if we only have small number of blocks, there would be the risk that certain quantities are not estimable anymore.\nIncomplete block designs are grouped into balanced lattice design and partially balanced (or alpha-lattice) designs.\nTo avoid having a disconnected design, a balanced incomplete block design can be used\nThe statistical model for balanced incomplete block design is:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\] Where:\n\\(\\mu\\) = overall experimental mean \\(\\alpha\\) = treatment effects (fixed) \\(\\beta\\) = block effects (random) \\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\] There are few key points that we need to keep in mind while designing incomplete block designs:\nAn excellent description of incomplete block design is provided in ANOVA and Mixed Models by Lukas Meier.\nThe balanced incomplete block designs are guided by strict principles and guidelines including: the number of treatments must be a perfect square (e.g. 25, 36, and so on); number of replicates must be equal to no. of blocks +1;", + "text": "9.1 Background\nThe block design described in Chapter 4 was complete, meaning that each block contained each treatment level at least once. In practice, it may not be possible or advisable to include all treatments in each block, either due to limitations in treatment availability (e.g. limited seed stocks) or the block size becomes too large to serve its original goals of controlling for spatial variation.\nIn such cases, incomplete block designs (IBD) can be used. Incomplete block designs break the experiment into many smaller incomplete blocks that are nested within standard RCBD-style blocks and assigns a subset of the treatment levels to each incomplete block. There are several different approaches Patterson and Williams (1976) for how to assign treatment levels to incomplete blocks and these designs impact the final statistical analysis (and if all treatments included in the experimental design are estimable). An excellent description of incomplete block design is provided in ANOVA and Mixed Models by Lukas Meier.\nIncomplete block designs are grouped into two groups: (1) balanced lattice designs; and (2) partially balanced (also commonly called alpha-lattice) designs. Balanced IBD designs have been previously called “lattice designs” [need refs], but we are not using that term to avoid confusion with alpha-lattice designs, a term that is commonly used.\nIn alpha-lattice design, the blocks are grouped into complete replicates. These designs are also termed as “resolvable incomplete block designs” or “partially balanced incomplete block designs” (paterson?). This design has been more commonly used instead of balanced IBD because of it’s practicability, flexibility, and versatility.", "crumbs": [ "Experiment designs", "9  Incomplete Block Design" @@ -203,7 +203,7 @@ "href": "chapters/incomplete-block-design.html#background", "title": "9  Incomplete Block Design", "section": "", - "text": "A drawback of this design is that block effect and treatment effects are confounded.\nTo eliminate the block effects, better compare treatments within a block.\nNo treatment should appear twice in any block as they contributes nothing to within block comparisons.\n\n\n\n\n\n\n\n\n\nA note\n\n\n\nBecause the blocks are incomplete, the Type I and Type III sums of squares will be different. That is, the missing treatments in each block represent missing observations (but not missing ‘at random’).", + "text": "9.1.1 Statistical Model\nThe statistical model for a balanced incomplete block design is:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = treatment effects (fixed)\n\\(\\beta\\) = block effects (random)\n\\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\]\nThere are few key points that we need to keep in mind while designing incomplete block experiments:\n\nA drawback of this design is that block effect and treatment effects are confounded.\nTo remove the block effects, it is better compare treatments within a block.\nNo treatment should appear twice in any block as it contributes nothing to within block comparisons.\n\nThe balanced incomplete block designs are guided by strict principles and guidelines including: the number of treatments must be a perfect square (e.g. 25, 36, and so on), and number of replicates must be equal to number of blocks +1.\n\n\n\n\n\n\nNote on Sums of Squares\n\n\n\nBecause the blocks are incomplete, the Type I and Type III sums of squares will be different even when there is no missing data from a trail. That is because the missing treatments in each block represent missing observations (even though they are not missing ‘at random’).", "crumbs": [ "Experiment designs", "9  Incomplete Block Design" @@ -501,7 +501,7 @@ "href": "chapters/repeated-measures.html#rcbd-repeated-measures", "title": "11  Repeated measures mixed models", "section": "12.1 RCBD Repeated Measures", - "text": "12.1 RCBD Repeated Measures\nThe example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.\nWe need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).\n\ndat <- agriTutorial::sorghum %>% \n mutate(week = as.numeric(factweek),\n block = as.character(varblock)) \n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nReplicate\nreplication unit\n\n\nWeek\nTime points when data was collected\n\n\nvariety\ntreatment factor, 4 levels\n\n\ny\nyield (lbs)\n\n\n\n\n12.1.1 Data Integrity Checks\nLet’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.\n\nstr(dat)\n\n'data.frame': 100 obs. of 9 variables:\n $ y : num 5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...\n $ variety : Factor w/ 4 levels \"1\",\"2\",\"3\",\"4\": 1 1 1 1 1 1 1 1 1 1 ...\n $ Replicate: Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ factweek : Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 2 3 4 5 1 2 3 4 5 ...\n $ factplot : Factor w/ 20 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ varweek : int 1 2 3 4 5 1 2 3 4 5 ...\n $ varblock : int 1 1 1 1 1 2 2 2 2 2 ...\n $ week : num 1 2 3 4 5 1 2 3 4 5 ...\n $ block : chr \"1\" \"1\" \"1\" \"1\" ...\n\n\nIn this data, we have block, factplot, factweek as factor variables and y & week as numeric.\n\ntable(dat$variety, dat$block)\n\n \n 1 2 3 4 5\n 1 5 5 5 5 5\n 2 5 5 5 5 5\n 3 5 5 5 5 5\n 4 5 5 5 5 5\n\n\nThe cross tabulation shows a equal number of varieties in each block.\n\nggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +\n geom_boxplot() + \n #scale_fill_brewer(palette=\"Dark2\") +\n scale_fill_viridis_d(option = \"F\") +\n theme_bw()\n\n\n\n\n\n\n\n\nLooks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties.\nOne last step before we fit model is to look at the distribution of response variable.\n\nhist(dat$y, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 12.1: Histogram of the dependent variable.\n\n\n\n\n\n\n12.1.2 Model Building\nLet’s fit the basic model first using lme() from the nlme package.\n\nlm1 <- lme(y ~ variety + factweek + variety:factweek, random = ~1|block/factplot,\n data = dat,\n na.action = na.exclude)\n\nThe model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.\nBy adding this correlation structure, what we are implying is to keep each plot independent, but to allowing AR1 or compound symmetry correlations between responses for a given subject, here time variable is week and it must be numeric.\n\ncs1 <- corAR1(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\ncs2 <- corCompSymm(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\n\nIn the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices. In nlme, please search the help tool to know more about functions for different correlation structure classes.\n\nlm2 <- update(lm1, corr = cs1)\nlm3 <- update(lm1, corr= cs2)\n\nNow let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova() or by compare_performance() function from the ‘performance’ library.\n\nanovaperformance\n\n\n\nanova(lm1, lm2, lm3)\n\n Model df AIC BIC logLik Test L.Ratio p-value\nlm1 1 23 18.837478 73.62409 13.58126 \nlm2 2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487 <.0001\nlm3 3 24 20.837478 78.00612 13.58126 \n\n\n\n\n\nresult <- compare_performance(lm1, lm2, lm3)\n\nSome of the nested models seem to be identical and probably only vary in\n their random effects.\n\nprint_md(result)\n\n\nComparison of Model Performance Indices\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nModel\nAIC (weights)\nAICc (weights)\nBIC (weights)\nR2 (cond.)\nR2 (marg.)\nICC\nRMSE\nSigma\n\n\n\n\nlm1\nlme\n-50.5 (<.001)\n-36.0 (<.001)\n9.4 (<.001)\n0.99\n0.37\n0.98\n0.10\n0.13\n\n\nlm2\nlme\n-77.5 (>.999)\n-61.5 (>.999)\n-15.0 (>.999)\n0.97\n0.41\n0.95\n0.15\n0.18\n\n\nlm3\nlme\n-48.5 (<.001)\n-32.5 (<.001)\n14.0 (<.001)\n0.98\n0.37\n0.98\n0.11\n0.14\n\n\n\n\n\n\n\n\nWe prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.\nLet’s run a tidy() on lm2 model to look at the estimates for random and fixed effects.\n\ntidy(lm2)\n\nWarning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of\nnesting\n\n\n# A tibble: 20 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 4.24 0.291 64 14.6 5.44e-22\n 2 fixed variety2 0.906 0.114 12 7.94 4.05e- 6\n 3 fixed variety3 0.646 0.114 12 5.66 1.05e- 4\n 4 fixed variety4 0.912 0.114 12 8.00 3.78e- 6\n 5 fixed factweek2 -0.196 0.0571 64 -3.44 1.04e- 3\n 6 fixed factweek3 -0.836 0.0755 64 -11.1 1.60e-16\n 7 fixed factweek4 -1.16 0.0867 64 -13.3 4.00e-20\n 8 fixed factweek5 -1.54 0.0943 64 -16.3 1.57e-24\n 9 fixed variety2:factweek2 0.0280 0.0807 64 0.347 7.30e- 1\n10 fixed variety3:factweek2 0.382 0.0807 64 4.73 1.26e- 5\n11 fixed variety4:factweek2 -0.0140 0.0807 64 -0.174 8.63e- 1\n12 fixed variety2:factweek3 0.282 0.107 64 2.64 1.03e- 2\n13 fixed variety3:factweek3 0.662 0.107 64 6.20 4.55e- 8\n14 fixed variety4:factweek3 0.388 0.107 64 3.64 5.55e- 4\n15 fixed variety2:factweek4 0.228 0.123 64 1.86 6.77e- 2\n16 fixed variety3:factweek4 0.744 0.123 64 6.06 7.86e- 8\n17 fixed variety4:factweek4 0.390 0.123 64 3.18 2.28e- 3\n18 fixed variety2:factweek5 0.402 0.133 64 3.01 3.70e- 3\n19 fixed variety3:factweek5 0.672 0.133 64 5.04 4.11e- 6\n20 fixed variety4:factweek5 0.222 0.133 64 1.66 1.01e- 1\n\n\n\n\n12.1.3 Check Model Assumptions\n\ncheck_model(lm2, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n12.1.4 Inference\nThe ANOVA table suggests a highly significant effect of the variety, week, and variety x week interaction effect.\n\nanova(lm2, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 64 212.10509 <.0001\nvariety 3 12 28.28895 <.0001\nfactweek 4 64 74.79758 <.0001\nvariety:factweek 12 64 7.03546 <.0001\n\n\nWe can estimate the marginal means for variety and week effect and their interaction using emmeans() function.\n\nmean_1 <- emmeans(lm2, ~ variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nmean_1\n\n variety emmean SE df lower.CL upper.CL\n 1 3.50 0.288 4 2.70 4.29\n 2 4.59 0.288 4 3.79 5.39\n 3 4.63 0.288 4 3.84 5.43\n 4 4.61 0.288 4 3.81 5.40\n\nResults are averaged over the levels of: factweek \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nmean_2 <- emmeans(lm2, ~ variety*factweek)\nmean_2\n\n variety factweek emmean SE df lower.CL upper.CL\n 1 1 4.24 0.291 4 3.43 5.05\n 2 1 5.15 0.291 4 4.34 5.96\n 3 1 4.89 0.291 4 4.08 5.70\n 4 1 5.15 0.291 4 4.35 5.96\n 1 2 4.05 0.291 4 3.24 4.85\n 2 2 4.98 0.291 4 4.17 5.79\n 3 2 5.07 0.291 4 4.27 5.88\n 4 2 4.94 0.291 4 4.14 5.75\n 1 3 3.41 0.291 4 2.60 4.21\n 2 3 4.59 0.291 4 3.79 5.40\n 3 3 4.71 0.291 4 3.91 5.52\n 4 3 4.71 0.291 4 3.90 5.51\n 1 4 3.09 0.291 4 2.28 3.89\n 2 4 4.22 0.291 4 3.41 5.03\n 3 4 4.48 0.291 4 3.67 5.28\n 4 4 4.39 0.291 4 3.58 5.20\n 1 5 2.70 0.291 4 1.89 3.51\n 2 5 4.01 0.291 4 3.20 4.82\n 3 5 4.02 0.291 4 3.21 4.83\n 4 5 3.83 0.291 4 3.03 4.64\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\nTime variable\n\n\n\nHere is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.", + "text": "12.1 RCBD Repeated Measures\nThe example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.\nWe need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).\n\ndat <- agriTutorial::sorghum %>% \n mutate(week = as.numeric(factweek),\n block = as.character(varblock)) \n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nReplicate\nreplication unit\n\n\nWeek\nTime points when data was collected\n\n\nvariety\ntreatment factor, 4 levels\n\n\ny\nyield (lbs)\n\n\n\n\n12.1.1 Data Integrity Checks\nLet’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.\n\nstr(dat)\n\n'data.frame': 100 obs. of 9 variables:\n $ y : num 5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...\n $ variety : Factor w/ 4 levels \"1\",\"2\",\"3\",\"4\": 1 1 1 1 1 1 1 1 1 1 ...\n $ Replicate: Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ factweek : Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 2 3 4 5 1 2 3 4 5 ...\n $ factplot : Factor w/ 20 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ varweek : int 1 2 3 4 5 1 2 3 4 5 ...\n $ varblock : int 1 1 1 1 1 2 2 2 2 2 ...\n $ week : num 1 2 3 4 5 1 2 3 4 5 ...\n $ block : chr \"1\" \"1\" \"1\" \"1\" ...\n\n\nIn this data, we have block, factplot, factweek as factor variables and y & week as numeric.\n\ntable(dat$variety, dat$block)\n\n \n 1 2 3 4 5\n 1 5 5 5 5 5\n 2 5 5 5 5 5\n 3 5 5 5 5 5\n 4 5 5 5 5 5\n\n\nThe cross tabulation shows a equal number of varieties in each block.\n\nggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +\n geom_boxplot() + \n #scale_fill_brewer(palette=\"Dark2\") +\n scale_fill_viridis_d(option = \"F\") +\n theme_bw()\n\n\n\n\n\n\n\n\nLooks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. One last step before we fit model is to look at the distribution of response variable.\n\nhist(dat$y, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 12.1: Histogram of the dependent variable.\n\n\n\n\n\n\n12.1.2 Model Building\nLet’s fit the basic model first using lme() from the nlme package.\n\nlm1 <- lme(y ~ variety + factweek + variety:factweek, random = ~1|block/factplot,\n data = dat,\n na.action = na.exclude)\n\nThe model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.\nBy adding this correlation structure, what we are implying is to keep each plot independent, but to allowing AR1 or compound symmetry correlations between responses for a given subject, here time variable is week and it must be numeric.\n\ncs1 <- corAR1(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\ncs2 <- corCompSymm(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\n\nIn the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices. In nlme, please search the help tool to know more about functions for different correlation structure classes.\n\nlm2 <- update(lm1, corr = cs1)\nlm3 <- update(lm1, corr= cs2)\n\nNow let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova() or by compare_performance() function from the ‘performance’ library.\n\nanovaperformance\n\n\n\nanova(lm1, lm2, lm3)\n\n Model df AIC BIC logLik Test L.Ratio p-value\nlm1 1 23 18.837478 73.62409 13.58126 \nlm2 2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487 <.0001\nlm3 3 24 20.837478 78.00612 13.58126 \n\n\n\n\n\nresult <- compare_performance(lm1, lm2, lm3)\n\nSome of the nested models seem to be identical and probably only vary in\n their random effects.\n\nprint_md(result)\n\n\nComparison of Model Performance Indices\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nModel\nAIC (weights)\nAICc (weights)\nBIC (weights)\nR2 (cond.)\nR2 (marg.)\nICC\nRMSE\nSigma\n\n\n\n\nlm1\nlme\n-50.5 (<.001)\n-36.0 (<.001)\n9.4 (<.001)\n0.99\n0.37\n0.98\n0.10\n0.13\n\n\nlm2\nlme\n-77.5 (>.999)\n-61.5 (>.999)\n-15.0 (>.999)\n0.97\n0.41\n0.95\n0.15\n0.18\n\n\nlm3\nlme\n-48.5 (<.001)\n-32.5 (<.001)\n14.0 (<.001)\n0.98\n0.37\n0.98\n0.11\n0.14\n\n\n\n\n\n\n\n\nWe prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.\nLet’s run a tidy() on lm2 model to look at the estimates for random and fixed effects.\n\ntidy(lm2)\n\nWarning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of\nnesting\n\n\n# A tibble: 20 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 4.24 0.291 64 14.6 5.44e-22\n 2 fixed variety2 0.906 0.114 12 7.94 4.05e- 6\n 3 fixed variety3 0.646 0.114 12 5.66 1.05e- 4\n 4 fixed variety4 0.912 0.114 12 8.00 3.78e- 6\n 5 fixed factweek2 -0.196 0.0571 64 -3.44 1.04e- 3\n 6 fixed factweek3 -0.836 0.0755 64 -11.1 1.60e-16\n 7 fixed factweek4 -1.16 0.0867 64 -13.3 4.00e-20\n 8 fixed factweek5 -1.54 0.0943 64 -16.3 1.57e-24\n 9 fixed variety2:factweek2 0.0280 0.0807 64 0.347 7.30e- 1\n10 fixed variety3:factweek2 0.382 0.0807 64 4.73 1.26e- 5\n11 fixed variety4:factweek2 -0.0140 0.0807 64 -0.174 8.63e- 1\n12 fixed variety2:factweek3 0.282 0.107 64 2.64 1.03e- 2\n13 fixed variety3:factweek3 0.662 0.107 64 6.20 4.55e- 8\n14 fixed variety4:factweek3 0.388 0.107 64 3.64 5.55e- 4\n15 fixed variety2:factweek4 0.228 0.123 64 1.86 6.77e- 2\n16 fixed variety3:factweek4 0.744 0.123 64 6.06 7.86e- 8\n17 fixed variety4:factweek4 0.390 0.123 64 3.18 2.28e- 3\n18 fixed variety2:factweek5 0.402 0.133 64 3.01 3.70e- 3\n19 fixed variety3:factweek5 0.672 0.133 64 5.04 4.11e- 6\n20 fixed variety4:factweek5 0.222 0.133 64 1.66 1.01e- 1\n\n\n\n\n12.1.3 Check Model Assumptions\n\ncheck_model(lm2, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n12.1.4 Inference\nThe ANOVA table suggests a highly significant effect of the variety, week, and variety x week interaction effect.\n\nanova(lm2, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 64 212.10509 <.0001\nvariety 3 12 28.28895 <.0001\nfactweek 4 64 74.79758 <.0001\nvariety:factweek 12 64 7.03546 <.0001\n\n\nWe can estimate the marginal means for variety and week effect and their interaction using emmeans() function.\n\nmean_1 <- emmeans(lm2, ~ variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nmean_1\n\n variety emmean SE df lower.CL upper.CL\n 1 3.50 0.288 4 2.70 4.29\n 2 4.59 0.288 4 3.79 5.39\n 3 4.63 0.288 4 3.84 5.43\n 4 4.61 0.288 4 3.81 5.40\n\nResults are averaged over the levels of: factweek \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nmean_2 <- emmeans(lm2, ~ variety*factweek)\nmean_2\n\n variety factweek emmean SE df lower.CL upper.CL\n 1 1 4.24 0.291 4 3.43 5.05\n 2 1 5.15 0.291 4 4.34 5.96\n 3 1 4.89 0.291 4 4.08 5.70\n 4 1 5.15 0.291 4 4.35 5.96\n 1 2 4.05 0.291 4 3.24 4.85\n 2 2 4.98 0.291 4 4.17 5.79\n 3 2 5.07 0.291 4 4.27 5.88\n 4 2 4.94 0.291 4 4.14 5.75\n 1 3 3.41 0.291 4 2.60 4.21\n 2 3 4.59 0.291 4 3.79 5.40\n 3 3 4.71 0.291 4 3.91 5.52\n 4 3 4.71 0.291 4 3.90 5.51\n 1 4 3.09 0.291 4 2.28 3.89\n 2 4 4.22 0.291 4 3.41 5.03\n 3 4 4.48 0.291 4 3.67 5.28\n 4 4 4.39 0.291 4 3.58 5.20\n 1 5 2.70 0.291 4 1.89 3.51\n 2 5 4.01 0.291 4 3.20 4.82\n 3 5 4.02 0.291 4 3.21 4.83\n 4 5 3.83 0.291 4 3.03 4.64\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\nTime variable\n\n\n\nHere is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.", "crumbs": [ "11  Repeated Measures" ] @@ -549,11 +549,11 @@ { "objectID": "chapters/additional-resources.html", "href": "chapters/additional-resources.html", - "title": "14  Additional Resources", + "title": "15  Additional Resources", "section": "", - "text": "14.1 Further Reading", + "text": "15.1 Further Reading", "crumbs": [ - "14  Additional Resources" + "15  Additional Resources" ] }, { @@ -579,21 +579,21 @@ { "objectID": "chapters/additional-resources.html#further-reading", "href": "chapters/additional-resources.html#further-reading", - "title": "14  Additional Resources", + "title": "15  Additional Resources", "section": "", - "text": "lme4 vignette for fitting linear mixed models\nMixed-Effects Models in S and S-PLUS thee book for nlme, by José C. Pinheiro and Douglas M. Bates. We used this book extensively for developing this guide. Sadly, it’s both out of print and we could not find a free copy online. However, there are affordable used copies available.\nMixed Effects Models and Extensions in Ecology with R by Alain F. Zuur, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith.", + "text": "lme4 vignette for fitting linear mixed models\nMixed-Effects Models in S and S-PLUS thee book for nlme, by José C. Pinheiro and Douglas M. Bates. We used this book extensively for developing this guide. Sadly, it’s both out of print and we could not find a free copy online. However, there are affordable used copies available.\nMixed Effects Models and Extensions in Ecology with R by Alain F. Zuur, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith.\nANOVA and Mixed Models by Lukas Meier", "crumbs": [ - "14  Additional Resources" + "15  Additional Resources" ] }, { "objectID": "chapters/additional-resources.html#other-resources", "href": "chapters/additional-resources.html#other-resources", - "title": "14  Additional Resources", - "section": "14.2 Other Resources", - "text": "14.2 Other Resources\n\nEasy Stats a collection of R packages to assist in statistical modelling, with a big focus on linear models.\nMixed Model CRAN Task View a curated list of R packages relevant to mixed modelling. This is a great place to start\nR-SIG-mixed-models mailing list for help and discussion of mixed-model-related questions, course announcements, etc\nGrammar of Experimental Designs by Emi Tanaka. This has a great description of basic principles of experimental design.", + "title": "15  Additional Resources", + "section": "15.2 Other Resources", + "text": "15.2 Other Resources\n\nEasy Stats a collection of R packages to assist in statistical modelling, with a big focus on linear models.\nMixed Model CRAN Task View a curated list of R packages relevant to mixed modelling. This is a great place to start\nR-SIG-mixed-models mailing list for help and discussion of mixed-model-related questions, course announcements, etc\nGrammar of Experimental Designs by Emi Tanaka. This has a great description of basic principles of experimental design.", "crumbs": [ - "14  Additional Resources" + "15  Additional Resources" ] }, { @@ -775,7 +775,7 @@ "href": "chapters/latin-design.html", "title": "10  Latin Square Design", "section": "", - "text": "10.1 Background\nLatin square design In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.\nAdvantages of Latin square design are: 1. The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels. 2. The analysis is quite simple.\nDisadvantage: 1. A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤t≤10.\nStatistical model for a response in Latin square design is:\n\\(Y_{ijk} = \\mu + \\alpha_i + \\beta_j + \\gamma_k + \\epsilon_{ijk}\\)\nwhere, \\(\\mu\\) is the experiment mean, \\(\\alpha_i's\\) are treatment effects, \\(\\beta\\) and \\(\\gamma\\) are the row- and column specific effects.\nAssumptions of this design includes normality and independent distribution of error (\\(\\epsilon_{ijk}\\)) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.", + "text": "10.1 Background\nIn the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.\nAdvantages of Latin square design are:\nDisadvantages:\nStatistical model for a response in Latin square design is:\n\\(Y_{ijk} = \\mu + \\alpha_i + \\beta_j + \\gamma_k + \\epsilon_{ijk}\\)\nwhere, \\(\\mu\\) is the experiment mean, \\(\\alpha_i's\\) represents treatment effect, \\(\\beta\\) and \\(\\gamma\\) are the row- and column specific effects.\nAssumptions of this design includes normality and independent distribution of error (\\(\\epsilon_{ijk}\\)) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.", "crumbs": [ "Experiment designs", "10  Latin Square Design" @@ -786,7 +786,7 @@ "href": "chapters/latin-design.html#example-analysis", "title": "10  Latin Square Design", "section": "10.2 Example Analysis", - "text": "10.2 Example Analysis\nLet’s start the analysis firstly by loading the required libraries:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans); library(performance)\nlibrary(dplyr); library(broom.mixed); library(agridat); library(desplot)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans); library(performance)\nlibrary(dplyr); library(agridat); library(desplot)\n\n\n\n\nThe data used in this example is extracted from the agridat package. In this experiment, 5 treatments (A = Dusted before rains. B = Dusted after rains. C = Dusted once each week. D = Drifting, once each week. E = Not dusted) were tested to control stem rust in wheat.\n\ndat <- agridat::goulden.latin\n\n\nTable of variables in the data set\n\n\ntrt\ntreatment factor, 5 levels\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\nwheat yield\n\n\n\n\n10.2.1 Data integrity checks\nFirstly, let’s verify the class of variables in the dataset using str() function in base R\n\nstr(dat)\n\n'data.frame': 25 obs. of 4 variables:\n $ trt : Factor w/ 5 levels \"A\",\"B\",\"C\",\"D\",..: 2 3 4 5 1 4 1 3 2 5 ...\n $ yield: num 4.9 9.3 7.6 5.3 9.3 6.4 4 15.4 7.6 6.3 ...\n $ row : int 5 4 3 2 1 5 4 3 2 1 ...\n $ col : int 1 1 1 1 1 2 2 2 2 2 ...\n\n\nHere yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change ‘row’ and ‘col’ from integer t factor/character.\n\ndat1 <- dat |> \n mutate(row = as.factor(row),\n col = as.factor(col))\n\nNext, to verify if the data meets the assumption of the Latin square design let’s plot the field layout for this experiment.\n\ndesplot::desplot(data = dat, flip = TRUE,\n form = yield ~ row + col, \n out1 = row, out1.gpar=list(col=\"black\", lwd=3),\n out2 = col, out2.gpar=list(col=\"black\", lwd=3),\n text = trt, cex = 1, shorten = \"no\",\n main = \"Field layout\", \n show.key = FALSE)\n\n\n\n\n\n\n\n\nThis looks great! Here we can see that there are equal number of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn’t appear more than once in each row and column.\nNext step is to check if there are any missing values in response variable.\n\napply(dat, 2, function(x) sum(is.na(x)))\n\n trt yield row col \n 0 0 0 0 \n\n\nAnd we do not have any missing values in the data.\nBefore fitting the model, let’s create a histogram of response variable to see if there are extreme values.\n\n\n\n\n\n\nHistogram of the dependent variable.\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\n\n\n10.2.2 Model fitting\nHere we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.\n\nlme4nlme\n\n\n\nm1_a <- lmer(yield ~ trt + (1|row) + (1|col),\n data = dat1,\n na.action = na.exclude)\ntidy(m1_a) \n\n# A tibble: 8 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 fixed <NA> (Intercept) 6.84 0.942 7.26 11.9 1.03e-5\n2 fixed <NA> trtB -0.380 0.967 -0.393 12.0 7.01e-1\n3 fixed <NA> trtC 6.28 0.967 6.50 12.0 2.96e-5\n4 fixed <NA> trtD 1.12 0.967 1.16 12.0 2.69e-1\n5 fixed <NA> trtE -1.92 0.967 -1.99 12.0 7.04e-2\n6 ran_pars row sd__(Intercept) 1.37 NA NA NA NA \n7 ran_pars col sd__(Intercept) 0.483 NA NA NA NA \n8 ran_pars Residual sd__Observation 1.53 NA NA NA NA \n\n\n\n\n\ndat$dummy <- factor(1)\nm1_b <- lme(yield ~ trt,\n random =list(~1|row, ~1|col),\n #list(dummy = pdBlocked(list(\n # pdIdent(~row - 1),\n # pdIdent(~col - 1)))),\n data = dat, \n na.action = na.exclude)\n\nsummary(m1_b)\n\nLinear mixed-effects model fit by REML\n Data: dat \n AIC BIC logLik\n 106.0974 114.0633 -45.04872\n\nRandom effects:\n Formula: ~1 | row\n (Intercept)\nStdDev: 1.344469\n\n Formula: ~1 | col %in% row\n (Intercept) Residual\nStdDev: 1.494696 0.628399\n\nFixed effects: yield ~ trt \n Value Std.Error DF t-value p-value\n(Intercept) 6.84 0.9419764 16 7.261328 0.0000\ntrtB -0.38 1.0254756 16 -0.370560 0.7158\ntrtC 6.28 1.0254756 16 6.123987 0.0000\ntrtD 1.12 1.0254756 16 1.092176 0.2909\ntrtE -1.92 1.0254756 16 -1.872302 0.0796\n Correlation: \n (Intr) trtB trtC trtD \ntrtB -0.544 \ntrtC -0.544 0.500 \ntrtD -0.544 0.500 0.500 \ntrtE -0.544 0.500 0.500 0.500\n\nStandardized Within-Group Residuals:\n Min Q1 Med Q3 Max \n-0.5686726 -0.2469684 -0.1061146 0.2349101 0.7617205 \n\nNumber of Observations: 25\nNumber of Groups: \n row col %in% row \n 5 25 \n\n#VarCorr(m1_b)\n\n\n\n\n\n\n10.2.3 Check Model Assumptions\n\nlme4nlme\n\n\n\ncheck_model(m1_a, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(m1_b, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\n\n\n10.2.4 Inference\nWe can look look at the analysis of variance for treatment effect on yield using anova() function.\n\nlme4nlme\n\n\n\nanova(m1_a, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ntrt 196.61 49.152 4 12 21.032 2.366e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(m1_b, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 16 132.38123 <.0001\ntrt 4 16 18.69608 <.0001\n\n\n\n\n\nHere we observed a significant impact on fungicide treatment on crop yield. Let’s have a look at the estimated marginal means of wheat yield with each treatment using emmeans() function.\n\nlme4nlme\n\n\n\nemmeans(m1_a, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 11.9 4.79 8.89\n B 6.46 0.942 11.9 4.41 8.51\n C 13.12 0.942 11.9 11.07 15.17\n D 7.96 0.942 11.9 5.91 10.01\n E 4.92 0.942 11.9 2.87 6.97\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(m1_b, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 4 4.22 9.46\n B 6.46 0.942 4 3.84 9.08\n C 13.12 0.942 4 10.50 15.74\n D 7.96 0.942 4 5.34 10.58\n E 4.92 0.942 4 2.30 7.54\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95", + "text": "10.2 Example Analysis\nLet’s start the analysis firstly by loading the required libraries:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans); library(performance)\nlibrary(dplyr); library(broom.mixed); library(agridat); library(desplot)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans); library(performance)\nlibrary(dplyr); library(agridat); library(desplot)\n\n\n\n\nThe data used in this example is extracted from the agridat package. In this experiment, 5 treatments (A = Dusted before rains. B = Dusted after rains. C = Dusted once each week. D = Drifting, once each week. E = Not dusted) were tested to control stem rust in wheat.\n\ndat <- agridat::goulden.latin\n\n\nTable of variables in the data set\n\n\ntrt\ntreatment factor, 5 levels\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\nwheat yield\n\n\n\n\n10.2.1 Data integrity checks\nFirstly, let’s verify the class of variables in the dataset using str() function in base R\n\nstr(dat)\n\n'data.frame': 25 obs. of 4 variables:\n $ trt : Factor w/ 5 levels \"A\",\"B\",\"C\",\"D\",..: 2 3 4 5 1 4 1 3 2 5 ...\n $ yield: num 4.9 9.3 7.6 5.3 9.3 6.4 4 15.4 7.6 6.3 ...\n $ row : int 5 4 3 2 1 5 4 3 2 1 ...\n $ col : int 1 1 1 1 1 2 2 2 2 2 ...\n\n\nHere yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change ‘row’ and ‘col’ from integer t factor/character.\n\ndat1 <- dat |> \n mutate(row = as.factor(row),\n col = as.factor(col))\n\nNext, to verify if the data meets the assumption of the Latin square design let’s plot the field layout for this experiment.\n\n\n\n\n\n\n\n\n\nThis looks great! Here we can see that there are equal number (5) of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn’t appear more than once in each row and column.\nNext step is to check if there are any missing values in response variable.\n\napply(dat, 2, function(x) sum(is.na(x)))\n\n trt yield row col \n 0 0 0 0 \n\n\nNo missing values detected in this data set.\nBefore fitting the model, let’s create a histogram of response variable to see if there are extreme values.\n\n\n\n\n\n\nHistogram of the dependent variable.\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\n\n\n10.2.2 Model fitting\nHere we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.\nVarCorr(m1_b)\n\nlme4nlme\n\n\n\nm1_a <- lmer(yield ~ trt + (1|row) + (1|col),\n data = dat1,\n na.action = na.exclude)\nsummary(m1_a) \n\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: yield ~ trt + (1 | row) + (1 | col)\n Data: dat1\n\nREML criterion at convergence: 89.8\n\nScaled residuals: \n Min 1Q Median 3Q Max \n-1.3994 -0.5383 -0.1928 0.5220 1.8429 \n\nRandom effects:\n Groups Name Variance Std.Dev.\n row (Intercept) 1.8660 1.3660 \n col (Intercept) 0.2336 0.4833 \n Residual 2.3370 1.5287 \nNumber of obs: 25, groups: row, 5; col, 5\n\nFixed effects:\n Estimate Std. Error df t value Pr(>|t|) \n(Intercept) 6.8400 0.9420 11.9446 7.261 1.03e-05 ***\ntrtB -0.3800 0.9669 12.0000 -0.393 0.7012 \ntrtC 6.2800 0.9669 12.0000 6.495 2.96e-05 ***\ntrtD 1.1200 0.9669 12.0000 1.158 0.2692 \ntrtE -1.9200 0.9669 12.0000 -1.986 0.0704 . \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n (Intr) trtB trtC trtD \ntrtB -0.513 \ntrtC -0.513 0.500 \ntrtD -0.513 0.500 0.500 \ntrtE -0.513 0.500 0.500 0.500\n\n\n\n\n\nm1_b <- lme(yield ~ trt,\n random =list(~1|row, ~1|col),\n data = dat, \n na.action = na.exclude)\n\nsummary(m1_b)\n\nLinear mixed-effects model fit by REML\n Data: dat \n AIC BIC logLik\n 106.0974 114.0633 -45.04872\n\nRandom effects:\n Formula: ~1 | row\n (Intercept)\nStdDev: 1.344469\n\n Formula: ~1 | col %in% row\n (Intercept) Residual\nStdDev: 1.494696 0.628399\n\nFixed effects: yield ~ trt \n Value Std.Error DF t-value p-value\n(Intercept) 6.84 0.9419764 16 7.261328 0.0000\ntrtB -0.38 1.0254756 16 -0.370560 0.7158\ntrtC 6.28 1.0254756 16 6.123987 0.0000\ntrtD 1.12 1.0254756 16 1.092176 0.2909\ntrtE -1.92 1.0254756 16 -1.872302 0.0796\n Correlation: \n (Intr) trtB trtC trtD \ntrtB -0.544 \ntrtC -0.544 0.500 \ntrtD -0.544 0.500 0.500 \ntrtE -0.544 0.500 0.500 0.500\n\nStandardized Within-Group Residuals:\n Min Q1 Med Q3 Max \n-0.5686726 -0.2469684 -0.1061146 0.2349101 0.7617205 \n\nNumber of Observations: 25\nNumber of Groups: \n row col %in% row \n 5 25 \n\n\n\n\n\n\n\n10.2.3 Check Model Assumptions\nThis step involves inspection of model residuals. by using check_model() function from the “performance” package.\n\nlme4nlme\n\n\n\ncheck_model(m1_a, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(m1_b, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\nThese visuals imply that assumptions of linear model have been met.\n\n\n10.2.4 Inference\nWe can now proceed to the variance partioning. In this case, we will use anova() with type = 1 or type = \"sequesntial\" for lmer() and lme() models, respectively.\n\nlme4nlme\n\n\n\nanova(m1_a, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ntrt 196.61 49.152 4 12 21.032 2.366e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(m1_b, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 16 132.38123 <.0001\ntrt 4 16 18.69608 <.0001\n\n\n\n\n\nBoth models have detected a significant treatment effect. Here we observed a significant impact on fungicide treatment on crop yield. Let’s have a look at the estimated marginal means of wheat yield with each treatment using emmeans() function.\n\nlme4nlme\n\n\n\nemmeans(m1_a, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 11.9 4.79 8.89\n B 6.46 0.942 11.9 4.41 8.51\n C 13.12 0.942 11.9 11.07 15.17\n D 7.96 0.942 11.9 5.91 10.01\n E 4.92 0.942 11.9 2.87 6.97\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(m1_b, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 4 4.22 9.46\n B 6.46 0.942 4 3.84 9.08\n C 13.12 0.942 4 10.50 15.74\n D 7.96 0.942 4 5.34 10.58\n E 4.92 0.942 4 2.30 7.54\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nWe see that wheat yield was higher with ‘C’ fungicide treatment compared to other fungicides applied in this study. Which implies that ‘C’ fungicide was more efficient in controlling the stem rust in wheat.", "crumbs": [ "Experiment designs", "10  Latin Square Design" @@ -797,7 +797,7 @@ "href": "chapters/latin-design.html#background", "title": "10  Latin Square Design", "section": "", - "text": "Any additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.\nThe effect of each treatment on the response must be approximately the same across rows and columns.", + "text": "The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels.\nThe analysis is quite simple.\n\n\n\nA Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤ t≤ 10.\nAny additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.\nThe effect of each treatment on the response must be approximately same across the rows and columns.", "crumbs": [ "Experiment designs", "10  Latin Square Design" @@ -953,5 +953,16 @@ "crumbs": [ "12  Marginal Means and Contrasts" ] + }, + { + "objectID": "chapters/incomplete-block-design.html#examples-analyses", + "href": "chapters/incomplete-block-design.html#examples-analyses", + "title": "9  Incomplete Block Design", + "section": "9.2 Examples Analyses", + "text": "9.2 Examples Analyses\n\n9.2.1 Balanced Incomplete Block Design\nWe will demonstrate an example data set designed in a balanced incomplete block design. First, load the libraries required for analysis and estimation.\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nThe data used for this example analysis was extracted from the agridat package. This example is comprised of soybean balanced incomplete block experiment.\n\ndat <- agridat::weiss.incblock\n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in bu/ac\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.1.1 Data integrity checks\nWe will start inspecting the data set firstly by looking at the class of each variable:\n\nstr(dat)\n\n'data.frame': 186 obs. of 5 variables:\n $ block: Factor w/ 31 levels \"B01\",\"B02\",\"B03\",..: 1 2 3 4 5 6 7 8 9 10 ...\n $ gen : Factor w/ 31 levels \"G01\",\"G02\",\"G03\",..: 24 15 20 18 20 5 22 1 9 14 ...\n $ yield: num 29.8 24.2 30.5 20 35.2 25 23.6 23.6 29.3 25.5 ...\n $ row : int 42 36 30 24 18 12 6 42 36 30 ...\n $ col : int 1 1 1 1 1 1 1 2 2 2 ...\n\n\nThe variables we need for the model are block, genand yield. The block and gen are classified as factor variables and yield is numeric. Therefore, we do not need to change class of any of the required variables.\nNext, let’s check the independent variables. We can look at this by running a cross tabulations among block and gen factors.\n\nagg_tbl <- dat %>% group_by(gen) %>% \n summarise(total_count=n(),\n .groups = 'drop')\nagg_tbl\n\n# A tibble: 31 × 2\n gen total_count\n <fct> <int>\n 1 G01 6\n 2 G02 6\n 3 G03 6\n 4 G04 6\n 5 G05 6\n 6 G06 6\n 7 G07 6\n 8 G08 6\n 9 G09 6\n10 G10 6\n# ℹ 21 more rows\n\n\n\nagg_df <- aggregate(dat$gen, by=list(dat$block), FUN=length)\nagg_df\n\n Group.1 x\n1 B01 6\n2 B02 6\n3 B03 6\n4 B04 6\n5 B05 6\n6 B06 6\n7 B07 6\n8 B08 6\n9 B09 6\n10 B10 6\n11 B11 6\n12 B12 6\n13 B13 6\n14 B14 6\n15 B15 6\n16 B16 6\n17 B17 6\n18 B18 6\n19 B19 6\n20 B20 6\n21 B21 6\n22 B22 6\n23 B23 6\n24 B24 6\n25 B25 6\n26 B26 6\n27 B27 6\n28 B28 6\n29 B29 6\n30 B30 6\n31 B31 6\n\n\nThere are 31 varieties (levels of gen) and it is perfectly balanced, with exactly one observation per treatment per block.\nWe can calculate the sum of missing values in variables in this data set to evaluate the extent of missing values in different variables:\n\napply(dat, 2, function(x) sum(is.na(x)))\n\nblock gen yield row col \n 0 0 0 0 0 \n\n\nNo missing data!\nLast, let’s plot a histogram of the dependent variable. This is a quick check before analysis to see if there is any strong deviation in values.\n\n\n\n\n\n\n\n\n\nFigure 9.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\nResponse variable values fall within expected range, with few extreme values on right tail. This data set is ready for analysis!\n\n\n9.2.1.2 Model Building\nWe will be evaluating the response of yield as affected by gen (fixed effect) and block (random effect).\n\n\nPlease note that incomplete block effect can be analyzed as a fixed (intra-block analysis) or a random (inter-block analysis) effect. When we consider block as a random effect, the mean values of a block also contain information about the treatment effects.\n\nlme4nlme\n\n\n\nmodel_icbd <- lmer(yield ~ gen + (1|block),\n data = dat, \n na.action = na.exclude)\ntidy(model_icbd)\n\n# A tibble: 33 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 24.6 0.922 26.7 153. 2.30e-59\n 2 fixed <NA> genG02 2.40 1.17 2.06 129. 4.17e- 2\n 3 fixed <NA> genG03 8.04 1.17 6.88 129. 2.31e-10\n 4 fixed <NA> genG04 2.37 1.17 2.03 129. 4.42e- 2\n 5 fixed <NA> genG05 1.60 1.17 1.37 129. 1.73e- 1\n 6 fixed <NA> genG06 7.39 1.17 6.32 129. 3.82e- 9\n 7 fixed <NA> genG07 -0.419 1.17 -0.359 129. 7.20e- 1\n 8 fixed <NA> genG08 3.04 1.17 2.60 129. 1.04e- 2\n 9 fixed <NA> genG09 4.84 1.17 4.14 129. 6.22e- 5\n10 fixed <NA> genG10 -0.0429 1.17 -0.0367 129. 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\nmodel_icbd1 <- lme(yield ~ gen,\n random = ~ 1|block,\n data = dat, \n na.action = na.exclude)\ntidy(model_icbd1)\n\n# A tibble: 33 × 8\n effect group term estimate std.error df statistic p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 24.6 0.922 125 26.7 2.10e-53\n 2 fixed <NA> genG02 2.40 1.17 125 2.06 4.18e- 2\n 3 fixed <NA> genG03 8.04 1.17 125 6.88 2.54e-10\n 4 fixed <NA> genG04 2.37 1.17 125 2.03 4.43e- 2\n 5 fixed <NA> genG05 1.60 1.17 125 1.37 1.73e- 1\n 6 fixed <NA> genG06 7.39 1.17 125 6.32 4.11e- 9\n 7 fixed <NA> genG07 -0.419 1.17 125 -0.359 7.20e- 1\n 8 fixed <NA> genG08 3.04 1.17 125 2.60 1.04e- 2\n 9 fixed <NA> genG09 4.84 1.17 125 4.14 6.33e- 5\n10 fixed <NA> genG10 -0.0429 1.17 125 -0.0367 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\n\n\n9.2.1.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(model_icbd, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_icbd1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n\n\nHere we observed a right skewness in residuals, this can be resolved by using data transformation e.g. log transformation of response variable. Please refer to chapter to read more about data transformation.\n\n\n9.2.1.4 Inference\nWe can extract information about ANOVA using anova().\n\nlme4nlme\n\n\n\nanova(model_icbd, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ngen 1901.1 63.369 30 129.06 17.675 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_icbd1, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 125 4042.016 <.0001\ngen 30 125 17.675 <.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(model_icbd, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 24.6 0.923 153 22.7 26.4\n G02 27.0 0.923 153 25.2 28.8\n G03 32.6 0.923 153 30.8 34.4\n G04 26.9 0.923 153 25.1 28.8\n G05 26.2 0.923 153 24.4 28.0\n G06 32.0 0.923 153 30.1 33.8\n G07 24.2 0.923 153 22.3 26.0\n G08 27.6 0.923 153 25.8 29.4\n G09 29.4 0.923 153 27.6 31.2\n G10 24.5 0.923 153 22.7 26.4\n G11 27.1 0.923 153 25.2 28.9\n G12 29.3 0.923 153 27.4 31.1\n G13 29.9 0.923 153 28.1 31.8\n G14 24.2 0.923 153 22.4 26.1\n G15 26.1 0.923 153 24.3 27.9\n G16 25.9 0.923 153 24.1 27.8\n G17 19.7 0.923 153 17.9 21.5\n G18 25.7 0.923 153 23.9 27.5\n G19 29.0 0.923 153 27.2 30.9\n G20 33.2 0.923 153 31.3 35.0\n G21 31.1 0.923 153 29.3 32.9\n G22 25.2 0.923 153 23.3 27.0\n G23 29.8 0.923 153 28.0 31.6\n G24 33.6 0.923 153 31.8 35.5\n G25 27.0 0.923 153 25.2 28.8\n G26 27.1 0.923 153 25.3 29.0\n G27 23.8 0.923 153 22.0 25.6\n G28 26.5 0.923 153 24.6 28.3\n G29 24.8 0.923 153 22.9 26.6\n G30 36.2 0.923 153 34.4 38.0\n G31 27.1 0.923 153 25.3 28.9\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model_icbd1, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 24.6 0.922 30 22.7 26.5\n G02 27.0 0.922 30 25.1 28.9\n G03 32.6 0.922 30 30.7 34.5\n G04 26.9 0.922 30 25.1 28.8\n G05 26.2 0.922 30 24.3 28.1\n G06 32.0 0.922 30 30.1 33.8\n G07 24.2 0.922 30 22.3 26.0\n G08 27.6 0.922 30 25.7 29.5\n G09 29.4 0.922 30 27.5 31.3\n G10 24.5 0.922 30 22.6 26.4\n G11 27.1 0.922 30 25.2 28.9\n G12 29.3 0.922 30 27.4 31.1\n G13 29.9 0.922 30 28.1 31.8\n G14 24.2 0.922 30 22.4 26.1\n G15 26.1 0.922 30 24.2 28.0\n G16 25.9 0.922 30 24.0 27.8\n G17 19.7 0.922 30 17.8 21.6\n G18 25.7 0.922 30 23.8 27.6\n G19 29.0 0.922 30 27.2 30.9\n G20 33.2 0.922 30 31.3 35.0\n G21 31.1 0.922 30 29.2 33.0\n G22 25.2 0.922 30 23.3 27.1\n G23 29.8 0.922 30 27.9 31.7\n G24 33.6 0.922 30 31.8 35.5\n G25 27.0 0.922 30 25.1 28.9\n G26 27.1 0.922 30 25.3 29.0\n G27 23.8 0.922 30 21.9 25.7\n G28 26.5 0.922 30 24.6 28.4\n G29 24.8 0.922 30 22.9 26.6\n G30 36.2 0.922 30 34.3 38.1\n G31 27.1 0.922 30 25.2 29.0\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n9.2.2 Partially Balanced IBD (Alpha Lattice Design)\nThe statistical model for partially balanced design includes:\n\\[y_{ij(l)} = \\mu + \\alpha_i + \\beta_{i(l)} + \\tau_j + \\epsilon_{ij(l)}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = replicate effect (random)\n\\(\\beta\\) = incomplete block effect (random)\n\\(\\tau\\) = treatment effect (fixed)\n\\(\\epsilon_{ij(l)}\\) = intra-block residual\nThe data used in this example is published in Cyclic and Computer Generated Designs (John and Williams 1995). The trial was laid out in an alpha lattice design. This trial data had 24 genotypes (“gen”), 6 incomplete blocks, each replicated 3 times.\nLet’s start analyzing this example first by loading the required libraries for linear mixed models:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\n\ndata1 <- agridat::john.alpha\n\n\nTable of variables in the data set\n\n\nblock\nincomplete blocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in tonnes/ha\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.2.1 Data integrity checks\nLet’s look into the structure of the data first to verify the class of the variables.\n\nstr(data1)\n\n'data.frame': 72 obs. of 7 variables:\n $ plot : int 1 2 3 4 5 6 7 8 9 10 ...\n $ rep : Factor w/ 3 levels \"R1\",\"R2\",\"R3\": 1 1 1 1 1 1 1 1 1 1 ...\n $ block: Factor w/ 6 levels \"B1\",\"B2\",\"B3\",..: 1 1 1 1 2 2 2 2 3 3 ...\n $ gen : Factor w/ 24 levels \"G01\",\"G02\",\"G03\",..: 11 4 5 22 21 10 20 2 23 14 ...\n $ yield: num 4.12 4.45 5.88 4.58 4.65 ...\n $ row : int 1 2 3 4 5 6 7 8 9 10 ...\n $ col : int 1 1 1 1 1 1 1 1 1 1 ...\n\n\nNext step is to evaluate the independent variables. First, check the number of treatments per replication (each treatment should be replicated 3 times).\n\nagg_tbl <- data1 %>% group_by(gen) %>% \n summarise(total_count=n(),\n .groups = 'drop')\nagg_tbl\n\n# A tibble: 24 × 2\n gen total_count\n <fct> <int>\n 1 G01 3\n 2 G02 3\n 3 G03 3\n 4 G04 3\n 5 G05 3\n 6 G06 3\n 7 G07 3\n 8 G08 3\n 9 G09 3\n10 G10 3\n# ℹ 14 more rows\n\n\nThis looks balanced, as expected.\nAlso, let’s have a look at the number of times each treatment appear per block.\n\nagg_blk <- aggregate(data1$gen, by=list(data1$block), FUN=length)\nagg_blk\n\n Group.1 x\n1 B1 12\n2 B2 12\n3 B3 12\n4 B4 12\n5 B5 12\n6 B6 12\n\n\n12 treatments randomly appear in incomplete block. Each incomplete block has same number of treatments.\nLastly, before fitting the model, it’s a good idea to look at the distribution of dependent variable, yield.\n\n\n\n\n\n\n\n\n\nFigure 9.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(data1$yield, main = \"\", xlab = \"yield\")\n\nThe response variables seems to follow a normal distribution curve, with fewer values on extreme lower and higher ends.\n\n\n9.2.2.2 Model Building\n\nlme4nlme\n\n\n\nmod_alpha <- lmer(yield ~ gen + (1|rep/block),\n data = data1, \n na.action = na.exclude)\ntidy(mod_alpha)\n\n# A tibble: 27 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 5.11 0.276 18.5 6.19 0.00000118 \n 2 fixed <NA> genG02 -0.629 0.269 -2.34 38.2 0.0248 \n 3 fixed <NA> genG03 -1.61 0.268 -6.00 37.7 0.000000590\n 4 fixed <NA> genG04 -0.618 0.268 -2.30 37.7 0.0269 \n 5 fixed <NA> genG05 -0.0705 0.258 -0.274 34.8 0.786 \n 6 fixed <NA> genG06 -0.571 0.268 -2.13 37.7 0.0398 \n 7 fixed <NA> genG07 -0.997 0.258 -3.87 34.8 0.000457 \n 8 fixed <NA> genG08 -0.580 0.268 -2.16 37.7 0.0370 \n 9 fixed <NA> genG09 -1.61 0.258 -6.21 35.3 0.000000390\n10 fixed <NA> genG10 -0.735 0.259 -2.83 35.9 0.00754 \n# ℹ 17 more rows\n\n\n\n\n\nmod_alpha1 <- lme(yield ~ gen,\n random = ~ 1|rep/block,\n data = data1, \n na.action = na.exclude)\ntidy(mod_alpha1)\n\nWarning in tidy.lme(mod_alpha1): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 24 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 5.11 0.276 31 18.5 2.63e-18\n 2 fixed genG02 -0.629 0.269 31 -2.34 2.61e- 2\n 3 fixed genG03 -1.61 0.268 31 -6.00 1.23e- 6\n 4 fixed genG04 -0.618 0.268 31 -2.30 2.81e- 2\n 5 fixed genG05 -0.0705 0.258 31 -0.274 7.86e- 1\n 6 fixed genG06 -0.571 0.268 31 -2.13 4.12e- 2\n 7 fixed genG07 -0.997 0.258 31 -3.87 5.23e- 4\n 8 fixed genG08 -0.580 0.268 31 -2.16 3.84e- 2\n 9 fixed genG09 -1.61 0.258 31 -6.21 6.71e- 7\n10 fixed genG10 -0.735 0.259 31 -2.83 8.05e- 3\n# ℹ 14 more rows\n\n\n\n\n\n\n\n9.2.2.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(mod_alpha, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(mod_alpha1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n#check_model(model_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n9.2.2.4 Inference\nLet’s ANOVA table using anova() from lmer and lme models, respectively.\n\nlme4nlme\n\n\n\nanova(mod_alpha, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ngen 10.679 0.46429 23 34.902 5.4478 4.229e-06 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(mod_alpha1, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 31 470.9507 <.0001\ngen 23 31 5.4478 <.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(mod_alpha, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 5.11 0.279 6.20 4.43 5.78\n G02 4.48 0.279 6.20 3.80 5.15\n G03 3.50 0.279 6.20 2.82 4.18\n G04 4.49 0.279 6.20 3.81 5.17\n G05 5.04 0.278 6.19 4.36 5.71\n G06 4.54 0.278 6.19 3.86 5.21\n G07 4.11 0.279 6.20 3.43 4.79\n G08 4.53 0.279 6.20 3.85 5.20\n G09 3.50 0.278 6.19 2.83 4.18\n G10 4.37 0.279 6.20 3.70 5.05\n G11 4.28 0.279 6.20 3.61 4.96\n G12 4.76 0.279 6.20 4.08 5.43\n G13 4.76 0.278 6.19 4.08 5.43\n G14 4.78 0.278 6.19 4.10 5.45\n G15 4.97 0.278 6.19 4.29 5.65\n G16 4.73 0.279 6.20 4.05 5.41\n G17 4.60 0.278 6.19 3.93 5.28\n G18 4.36 0.279 6.20 3.69 5.04\n G19 4.84 0.278 6.19 4.16 5.52\n G20 4.04 0.278 6.19 3.36 4.72\n G21 4.80 0.278 6.19 4.12 5.47\n G22 4.53 0.278 6.19 3.85 5.20\n G23 4.25 0.278 6.19 3.58 4.93\n G24 4.15 0.279 6.20 3.48 4.83\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(mod_alpha1, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 5.11 0.276 2 3.92 6.30\n G02 4.48 0.276 2 3.29 5.67\n G03 3.50 0.276 2 2.31 4.69\n G04 4.49 0.276 2 3.30 5.68\n G05 5.04 0.276 2 3.85 6.22\n G06 4.54 0.276 2 3.35 5.72\n G07 4.11 0.276 2 2.92 5.30\n G08 4.53 0.276 2 3.34 5.72\n G09 3.50 0.276 2 2.31 4.69\n G10 4.37 0.276 2 3.19 5.56\n G11 4.28 0.276 2 3.10 5.47\n G12 4.76 0.276 2 3.57 5.94\n G13 4.76 0.276 2 3.57 5.95\n G14 4.78 0.276 2 3.59 5.96\n G15 4.97 0.276 2 3.78 6.16\n G16 4.73 0.276 2 3.54 5.92\n G17 4.60 0.276 2 3.42 5.79\n G18 4.36 0.276 2 3.17 5.55\n G19 4.84 0.276 2 3.65 6.03\n G20 4.04 0.276 2 2.85 5.23\n G21 4.80 0.276 2 3.61 5.98\n G22 4.53 0.276 2 3.34 5.72\n G23 4.25 0.276 2 3.06 5.44\n G24 4.15 0.276 2 2.97 5.34\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n\nJohn, JA, and ER Williams. 1995. Cyclic and Computer Generated Designs. 2nd ed. New York: Chapman; Hall/CRC Press. https://doi.org/10.1201/b15075.\n\n\nPatterson, H. D., and E. R. Williams. 1976. “A New Class of Resolvable Incomplete Block Designs.” Biometrika 63 (1): 83–92. https://doi.org/10.2307/2335087.\n\n\nYates, F. 1936. “A New Method of Arranging Variety Trials Involving a Large Number of Varieties.” J Agric Sci 26: 424–55.", + "crumbs": [ + "Experiment designs", + "9  Incomplete Block Design" + ] } ] \ No newline at end of file