reviewed ch IBD and latin sq design

IdahoAgStats · Jan 8, 2025 · 9d74475 · 9d74475
1 parent 300c35d
commit 9d74475
Show file tree

Hide file tree

Showing 4 changed files with 106 additions and 70 deletions.
diff --git a/chapters/incomplete-block-design.qmd b/chapters/incomplete-block-design.qmd
@@ -18,7 +18,6 @@ Incomplete block designs are grouped into two groups: (1) balanced lattice desig
 
 In alpha-lattice design, the blocks are grouped into complete replicates. These designs are also termed as "resolvable incomplete block designs" or "partially balanced incomplete block designs" [@paterson]. This design has been more commonly used instead of balanced IBD because of it's practicability, flexibility, and versatility. 
 
-To avoid having a disconnected design, a balanced incomplete block design can be used.
 
 ### Statistical Model
 
@@ -93,7 +92,6 @@ desplot::desplot(dat,
          text=gen, cex=1, out1=block,
         out2=gen, out2.gpar=list(col = "black", lwd = 1, lty = 1),
          main="Incomplete block design")
-
 # desplot::desplot(dat, yield~col*row,
 #           text=gen, shorten='none', cex=.6, out1=block,
 #           aspect=252/96, # true aspect
@@ -232,7 +230,20 @@ emmeans(model_icbd1, ~ gen)
 
 ### Partially Balanced IBD (Alpha Lattice Design)
 
-The data used in this example is published in *Cyclic and Computer Generated Designs* [@john_cyclic]. The data in this trial was laid out in an alpha lattice design. This trial data had 24 genotypes ("gen"), 6 incomplete blocks, each replicated 3 times. 
+The statistical model for partially balanced design includes:
+
+$$y_{ij(l)} = \mu + \alpha_i + \beta_{i(l)} + \tau_j + \epsilon_{ij(l)}$$ 
+
+Where:
+
+$\mu$ = overall experimental mean   
+$\alpha$ = replicate effect (random)  
+$\beta$ = incomplete block effect (random)  
+$\tau$ = treatment effect (fixed)  
+$\epsilon_{ij(l)}$ = intra-block residual  
+
+
+The data used in this example is published in *Cyclic and Computer Generated Designs* [@john_cyclic]. The trial was laid out in an alpha lattice design. This trial data had 24 genotypes ("gen"), 6 incomplete blocks, each replicated 3 times. 
 
 Let's start analyzing this example first by loading the required libraries for linear mixed models:
 
@@ -324,7 +335,7 @@ The response variables seems to follow a normal distribution curve, with fewer v
 ### lme4
 
 ```{r}
-mod_alpha <- lmer(yield ~ gen + (1|rep:block),
+mod_alpha <- lmer(yield ~ gen + (1|rep/block),
                    data = data1, 
                    na.action = na.exclude)
 tidy(mod_alpha)
@@ -338,15 +349,6 @@ mod_alpha1 <- lme(yield ~ gen,
                   data = data1, 
                   na.action = na.exclude)
 tidy(mod_alpha1)
-
-## need to try pdIdent here
-# model_lme <-lme(yield ~  gen,
-#               random = list(one = pdBlocked(list(
-#          pdIdent(~ 0 + rep:block)))),
-#         data = data1 %>% mutate(one = factor(1)))
-# 
-# summary(model_lme)
-
 ```
 :::
 
@@ -366,7 +368,6 @@ check_model(mod_alpha1, check = c('normality', 'linearity'))
 ```
 :::
 
-
 #### Inference
 
 Let's ANOVA table using `anova()` from lmer and lme models, respectively.
@@ -380,7 +381,6 @@ anova(mod_alpha, type = "1")
 #### nlme
 ```{r}
 anova(mod_alpha1, type = "sequential")
-#anova(model_lme, type = "sequential")
 ```
 :::
 

diff --git a/chapters/latin-design.qmd b/chapters/latin-design.qmd
@@ -8,24 +8,27 @@ par(mar=c(5.1, 6, 4.1, 2.1))
 
 ## Background
 
-Latin square design In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.
+In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.
 
 Advantages of Latin square design are:
+
 1.  The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels.
+
 2.  The analysis is quite simple.
 
-Disadvantage: 
-1. A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤t≤10.
+Disadvantages:
+
+1.  A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤ t≤ 10.
 
 2.  Any additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.
 
-3.  The effect of each treatment on the response must be approximately the same across rows and columns.
+3.  The effect of each treatment on the response must be approximately same across the rows and columns.
 
 Statistical model for a response in Latin square design is:
 
 $Y_{ijk} = \mu + \alpha_i + \beta_j +  \gamma_k + \epsilon_{ijk}$
 
-where, $\mu$ is the experiment mean, $\alpha_i's$ are treatment effects, $\beta$ and $\gamma$ are the row- and column specific effects.
+where, $\mu$ is the experiment mean, $\alpha_i's$ represents treatment effect, $\beta$ and $\gamma$ are the row- and column specific effects.
 
 Assumptions of this design includes normality and independent distribution of error ($\epsilon_{ijk}$) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.
 
@@ -40,6 +43,7 @@ Let's start the analysis firstly by loading the required libraries:
 library(lme4); library(lmerTest); library(emmeans); library(performance)
 library(dplyr); library(broom.mixed); library(agridat); library(desplot)
 ```
+
 ### nlme
 
 ```{r, message=FALSE, warning=FALSE}
@@ -53,6 +57,7 @@ The data used in this example is extracted from the `agridat` package. In this e
 ```{r}
 dat <- agridat::goulden.latin
 ```
+
 |       |                               |
 |-------|-------------------------------|
 | trt   | treatment factor, 5 levels    |
@@ -63,10 +68,13 @@ dat <- agridat::goulden.latin
 : Table of variables in the data set {tbl-latin}
 
 ### Data integrity checks
+
 Firstly, let's verify the class of variables in the dataset using `str()` function in base R
+
 ```{r}
 str(dat)
 ```
+
 Here yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change 'row' and 'col' from integer t factor/character.
 
 ```{r}
@@ -75,28 +83,38 @@ dat1 <- dat |>
                col = as.factor(col))
 ```
 
-Next, to verify if the data meets the assumption of the Latin square design let's plot the field layout for this experiment. 
-```{r}
-desplot::desplot(data = dat, flip = TRUE,
-        form = yield ~ row + col, 
-        out1 = row, out1.gpar=list(col="black", lwd=3),
-        out2 = col, out2.gpar=list(col="black", lwd=3),
-        text = trt, cex = 1, shorten = "no",
-        main = "Field layout", 
-        show.key = FALSE)
+Next, to verify if the data meets the assumption of the Latin square design let's plot the field layout for this experiment.
 
-```
+```{r, echo=FALSE, warning=FALSE}
 
-This looks great! Here we can see that there are equal number of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn't appear more than once in each row and column. 
+desplot::desplot(data = dat1, flip = TRUE,
+        form = trt ~ col + row,         
+        text = trt, cex = 0.7, shorten = "no", 
+        out1 = trt,                          
+       # out2 = block,  
+        main = "Alpha Lattice Design", show.key =F) 
+# desplot::desplot(data = dat, flip = TRUE,
+#         form = yield ~ row + col, 
+#         out1 = row, out1.gpar=list(col="black", lwd=3),
+#         out2 = col, out2.gpar=list(col="black", lwd=3),
+#         text = trt, cex = 1, shorten = "no",
+#         main = "Field layout", 
+#         show.key = FALSE)
 
+```
+
+This looks great! Here we can see that there are equal number (5) of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn't appear more than once in each row and column.
 
 Next step is to check if there are any missing values in response variable.
+
 ```{r}
 apply(dat, 2, function(x) sum(is.na(x)))
 ```
-And we do not have any missing values in the data.
+
+No missing values detected in this data set.
 
 Before fitting the model, let's create a histogram of response variable to see if there are extreme values.
+
 ```{r, echo=FALSE}
 #| label: lattice_design
 #| fig-cap: "Histogram of the dependent variable."
@@ -110,90 +128,99 @@ hist(dat$yield, main = "", xlab = "yield")
 ```
 
 ### Model fitting
+
 Here we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.
 
+VarCorr(m1_b)
+
 ::: panel-tabset
 ### lme4
 
 ```{r}
 m1_a <- lmer(yield ~ trt + (1|row) + (1|col),
            data = dat1,
            na.action = na.exclude)
-tidy(m1_a) 
+summary(m1_a) 
 ```
 
 ### nlme
+
 ```{r}
-dat$dummy <- factor(1)
 m1_b <- lme(yield ~ trt,
           random =list(~1|row, ~1|col),
-                  #list(dummy = pdBlocked(list(
-                   #               pdIdent(~row - 1),
-                    #              pdIdent(~col - 1)))),
           data = dat, 
           na.action = na.exclude)
 
 summary(m1_b)
-#VarCorr(m1_b)
 ```
 :::
 
 ### Check Model Assumptions
 
-::: panel-tabset
+This step involves inspection of model residuals. by using `check_model()` function from the "performance" package.
+
+:::: panel-tabset
 #### lme4
+
 ```{r, fig.height=3}
 check_model(m1_a, check = c("linearity", "normality"))
 ```
 
 #### nlme
 
-::: {layout-ncol=2 .column-body}
-
+::: {.column-body layout-ncol="2"}
 ```{r echo=FALSE, eval=FALSE}
 par(mar=c(5.1, 5, 2.1, 2.1))
 plot(residuals(m1_b), xlab = "fitted values", ylab = "residuals",
      cex.lab = 1.8, cex.axis = 1.5); abline(0,0)
 ```
 
-
 ```{r echo=FALSE, eval=FALSE}
 par(mar=c(5.1, 5, 2.1, 2.1))
 qqvals <- qqnorm(residuals(m1_b), plot.it=FALSE)
 qqplot(qqvals$x, qqvals$y, xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", cex.lab = 1.7, cex.axis = 1.5); qqline(residuals(m1_b))
 ```
-::: 
+:::
 
 ```{r, fig.height=3}
 check_model(m1_b, check = c("linearity", "normality"))
 ```
-:::
+::::
+
+These visuals imply that assumptions of linear model have been met.
 
 ### Inference
-We can look look at the analysis of variance for treatment effect on yield using `anova()` function.
+
+We can now proceed to the variance partioning. In this case, we will use `anova()` with `type = 1` or `type = "sequesntial"` for lmer() and lme() models, respectively.
 
 ::: panel-tabset
 #### lme4
-```{r, fig.height=3}
+
+```{r}
 anova(m1_a, type = "1")
 ```
 
 #### nlme
-```{r, fig.height=3}
+
+```{r}
 anova(m1_b, type = "sequential")
 ```
 :::
 
-Here we observed a significant impact on fungicide treatment on crop yield. Let's have a look at the estimated marginal means of wheat yield with each treatment using `emmeans()` function.
+Both models have detected a significant treatment effect. Here we observed a significant impact on fungicide treatment on crop yield. Let's have a look at the estimated marginal means of wheat yield with each treatment using `emmeans()` function.
 
 ::: panel-tabset
 #### lme4
+
 ```{r, fig.height=3}
 emmeans(m1_a, ~ trt)
 ```
 
 #### nlme
+
 ```{r, fig.height=3}
 emmeans(m1_b, ~ trt)
 ```
-:::
+:::
+
+We see that wheat yield was higher with 'C' fungicide treatment compared to other fungicides applied in this study. Which implies that 'C' fungicide was more efficient in controlling the stem rust in wheat.
diff --git a/chapters/repeated-measures.qmd b/chapters/repeated-measures.qmd
@@ -4,11 +4,11 @@
 source(here::here("settings.r"))
 ```
 
-In the previous chapters we covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures random effects.
+In the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects.
 
-Studies that involve repeated observations of the exact same experimental units require a repeated measures component to properly model correlations across time with the experiment unit. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to repeated measures effect while analyzing the main effects.
+Studies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time of each subject. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects.
 
-In these models, the 'iid' assumption (idependently and identically distributed) is being violated, so we need to introduce specialized covariance structures that can account for these correlations between error terms.
+In these models, the 'iid' assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms.
 
 There are several types of covariance structures:
 
@@ -97,7 +97,6 @@ ggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +
 ```
 
 Looks like variety '1' has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties.
-
 One last step before we fit model is to look at the distribution of response variable.
 
 ```{r, eval=FALSE}
@@ -224,7 +223,6 @@ Firstly, we need to look at the class of variables in the data set.
 ```{r}
 str(Yield)
 ```
-
 We will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect.
 
 ::: column-margin