search.json

[
  {
    "objectID": "index.html",
    "href": "index.html",
    "title": "Field Guide to the R Mixed Model Wilderness",
    "section": "",
    "text": "Preface\n“Path in the Wilderness” by Erich Taeubel, Jr.\nRunning mixed models in R is no easy task. There are dozens of packages supporting these aims, each with varying functionality, syntax, and conventions. The linear mixed model ecosystem in R consists of over 80 libraries that either construct and solve mixed model equations or helper packages the process the results from mixed model analysis. These libraries provide a patchwork of overlapping and unique functionality regarding the fundamental structure of mixed models: allowable distributions, nested and crossed random effects, heterogeneous error structures and other facets. No single library has all possible functionality enabled.\nThis patchwork of packages makes it very challenging for statisticians to conduct mixed model analysis and to teach others how to run mixed models in R. The purpose of this guide to to provide some recipes for handling common analytical scenario’s that require mixed models. As a field guide, it is intended to be succinct, and to help researchers meet their analytic goals.\nIn general, the content from this website may not be copied or reproduced without attribution. However, the example code and required data sets to run the code are MIT licensed. These can be accessed on GitHub.",
    "crumbs": [
      "Preface"
    ]
  },
  {
    "objectID": "index.html#what-this-does-not-cover",
    "href": "index.html#what-this-does-not-cover",
    "title": "Field Guide to the R Mixed Model Wilderness",
    "section": "What This Does Not Cover",
    "text": "What This Does Not Cover\n\nGeneralized linear models where the response variable does not follow a normal distribution. We do address cases of unequal variance, but if another distribution and/or a link function is required for the model, that is not addressed in this guide.\nBasic principles of experimental design. We assume you know this, but if you do not, please check out the Grammar of Experimental Design for guidance on these topics.\nInstructions in using R. We assume familiarity with R. If you need help in learning R, there are numerous guides, including our introductory R course.",
    "crumbs": [
      "Preface"
    ]
  },
  {
    "objectID": "index.html#notice",
    "href": "index.html#notice",
    "title": "Field Guide to the R Mixed Model Wilderness",
    "section": "Notice!",
    "text": "Notice!\nThis is a work-in-progress and will be updated over time.",
    "crumbs": [
      "Preface"
    ]
  },
  {
    "objectID": "chapters/intro.html",
    "href": "chapters/intro.html",
    "title": "1  Introduction",
    "section": "",
    "text": "1.1 Terms\nThis guide is focused on frequentist implementations of mixed models in R, covering different scenarios common in the agricultural and life sciences.\nThis is not intended to be a guide to the theory of mixed models, it is focused on implementations of models only.\nPlease read this section and refer back to if when you forget what these terms mean.",
    "crumbs": [
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
    ]
  },
  {
    "objectID": "chapters/intro.html#terms",
    "href": "chapters/intro.html#terms",
    "title": "1  Introduction",
    "section": "",
    "text": "Table 1.1: Terms definitions\n\n\n\n\n\n\n\n\n\nTerm\nDefinition\n\n\n\n\nRandom effect\nAn independent variable where the levels being estimated compose a random sample from a population whose variance will be estimated\n\n\nFixed effect\nAn independent variable with specific, predefined levels to estimate\n\n\nExperimental unit\nThe smallest unit being used for analysis. This could be an animal, a field plot, a person, a meat or muscle sample. The unit may be assessed multiple times or through multiple point in time. When the analysis is all said and done, the predictions occur at this level.",
    "crumbs": [
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
    ]
  },
  {
    "objectID": "chapters/intro.html#packages",
    "href": "chapters/intro.html#packages",
    "title": "1  Introduction",
    "section": "1.2 Packages",
    "text": "1.2 Packages\n\n1.2.1 Table of required packages for modelling\n\n\n\nTable 1.2: Table of required packages\n\n\n\n\n\nPackage\nPurpose\n\n\n\n\nlme4 (Bates et al. 2015)\nmain package for linear mixed models\n\n\nlmerTest (Kuznetsova, Brockhoff, and Christensen 2017)\nfor computing p-values when using lme4\n\n\nnlme (J. Pinheiro, Bates, and R Core Team 2023; J. C. Pinheiro and Bates 2000)\nmain package for linear mixed models and part of ‘base R’\n\n\nemmeans (Lenth 2022)\nfor estimating fixed effects, their confidence intervals and conducting contrasts\n\n\nbroom.mixed (Bolker and Robinson 2024)\npackage for presenting the model summary output into a tidy workflow.\n\n\nDHARMa (Hartig 2022)\nfor evaluating residuals (error terms) in generalized linear models\n\n\nperformance (Lüdecke et al. 2021)\nFor creating diagnostic plots or to compute fit measures\n\n\n\n\n\n\n\n\n1.2.2 Optional packages\n\n\n\nTable 1.3: Table of optional packages\n\n\n\n\n\nPackage Name\nFunction\n\n\nhere\nFor setting work directory\n\n\nggplot\nplotting\n\n\ndesplot\nplotting\n\n\nagridat\nto download example dataset\n\n\nagricolae\nto download example dataset\n\n\n\n\n\n\nThis entire guide will use the here package for loading data. If you can load your data fine without this package, please carry on. ‘here’ is certainly not required for running mixed models.\n\n\n\n\nBates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.\n\n\nBolker, Ben, and David Robinson. 2024. Broom.mixed: Tidying Methods for Mixed Models. https://CRAN.R-project.org/package=broom.mixed.\n\n\nHartig, Florian. 2022. DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. https://CRAN.R-project.org/package=DHARMa.\n\n\nKuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen. 2017. “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software 82 (13): 1–26. https://doi.org/10.18637/jss.v082.i13.\n\n\nLenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski. 2021. “performance: An R Package for Assessment, Comparison and Testing of Statistical Models.” Journal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nPinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models in s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.\n\n\nPinheiro, José, Douglas Bates, and R Core Team. 2023. Nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.",
    "crumbs": [
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
    ]
  },
  {
    "objectID": "chapters/analysis-tips.html",
    "href": "chapters/analysis-tips.html",
    "title": "2  Tips on Analysis",
    "section": "",
    "text": "Below are some things our office frequently says to researchers.\n\n2.0.1 Think About Your Analytical Goals\nThroughout this guide, we have tried to explicitly state the goals of each analysis. This helps informs how to approach the analysis of an experiment. It can be difficult, especially for new scientists-in-training (i.e. graduate students), to understand what it is they want to estimate. You may have been handed a data set you had no role in generating and told to “analyze this” with no additional context. Or perhaps you may have conducted a large study that has some overall goals that are lofty, yet vague. And now you must translate the vague aims into clear statistical questions.\nIt can helpful to think about the exact results you are hoping to get. What does this look like exactly? Do you want to estimate the changes in plant diversity as the result of a herbicide spraying program? Do you want to find out if a fertilizer treatment changed protein content in a crop and by how much? Do you want to know about changes in human diet due to an intervention? What are quantifiable difference that you and/or experts in your domain would find meaningful?\nConsider what the results would look like for (1) the best case scenario where your wildest research dreams come true, and (2) null results, when you find out that your treatment or invention had no effect. It’s very helpful to understand and recognize exactly what both situations look like.\nBy “consider”, we mean: imagine the final plot or table, or summary sentence you want to present, either in a peer-reviewed manuscript, or some output for stakeholders. From this, you can work backwards to determine the analytical approach needed to arrive at that desired final output. Or you may determine that your data are unsuitable to generate the desired output, in which case, it’s best to determine that as soon as possible.\nBy “consider”, we also mean: imagine exactly what the spreadsheet of results would contain after a successful trial. What columns are present and what data are in those cells. If you are planning an experiment, this can help ensure you plan it properly to actually test whatever it is you want to evaluate. If the experiment is done, this enables you to evaluate if you have the information present to test your hypothesis.\nBy taking the time to reflect on what it is you exactly want to analyze, this can save time and prevent you from doing unneeded analyzes that don’t serve this final goal. There is rarely (never?) one way to analyze an experiment or a data set, so use your limited time wisely and focus on what matters to you most.\n\n\n2.0.2 Know That Data Cleaning is Time Consuming\n\n\n\n\n\n\n\n\n\nFigure 2.1: How you will spend your time\n\n\n\n\nThis has and will continue to occupy the majority of researcher’s time when conducting an analysis. Truly, we are sorry for this. But, please know it is not you, it is the nature of data. Plan for and prepare yourself mentally to spend time cleaning and preparing your data for analysis.1 This will likely take way longer than the actual analysis! It is needed to ensure you can actually get correct results in an analysis, and hence data cleaning is worth the time it requires.\n1 For an excellent set of basic instructions on data preparation, please see: Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2–10.\n\n2.0.3 Interpret ANOVA and P-values with Caution\n\nInformally, a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.\n---American Statistical Association\n\nThe great majority of researched are deeply interested in p-values. This is not a bad thing per se, but sometimes the focus is so strong it comes at the expense of other valuable pieces of information, like treatment estimates! Russ Leanth, author of the emmeans package refers to this particular practice as “star gazing”.\nIt is important to evaluate why you want to do ANOVA, what extra information it will bring and what you plan to do with those results. Sometimes, researchers want to conduct an ANOVA even though the original goals of analysis were reached without it. Running an ANOVA may increase or decrease confidence in your other results. That is not at all what ANOVA is intended to do, nor is this what p-values can tell us. ANOVA compares across group variation to within group variation. It cannot tell us if anything is the ‘same’ (there’s a separate branch of analysis, ‘equivalence testing’, for that), and it cannot tell us specifically what is different, unless you are fortunate enough to only have 2 levels in your treatment structure. P-values provide no guarantee that something is truly different or not; it only quantifies the probability you could have observed these results by chance.\nThe American Statistics Association recommends that “Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”2 That article also explains what p-values are telling us and how to avoid committing analytical errors and/or misinterpreting p-values. If you have time to read the full article, it will benefit your research!\n2 Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133.The main problematic behavior I see is researchers using p-values as the sole criteria on whether to present results: “We wanted to test if x, y and z had an effect. We ran some model and found that that only x had a significant effect, and those results indicate…” (while results with a p-value &gt; 0.05 are ignored).\nA better option would be to discuss the the results of the analysis and how they addressed the research questions: how did the dependent variable change (or not change) as a result of the treatments/interventions/independent variables? What are the parameters or treatment predictions and what do they tell us with regard to the research goals? And to bolster those estimates, what are the confidence intervals on those estimates? What are the p-values for the statistical tests? P-values can support the results and conclusions, but the main results desired by a researcher are usually the estimates themselves - so lead with that!\nTo learn more about common pitfalls in interpreting p-values, check out our blog post on the subject and/or this paper3 on the subject.\n3 Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 31(4):337-50.\n\n2.0.4 Comments on Hypothesis Testing and Usage of Treatment Letters\nOften, I see researchers use compact letter display (e.g. “A”, “B”, “C”, ….) for indicating differences among treatments. This makes for concise presentation of results in tables and figures, but it can both kill statistical power and misses nuance in the results.\n\n\n\nImage from a paper published in 2024. Although this was a fully crossed factorial experiment, compact letter display was implemented across all treatment combinations, resulting in some nonsensical comparisons among some more informative contrasts. What a waste.\nImplementing compact letter display can kill statistical power (the probability of detecting true differences) because it requires that all pairwise comparison being made. Doing this, especially when there are many treatment levels, has its perils. The biggest problem is that this creates a multiple testing problem. The RCBD example in this guide has 42 treatments, resulting in a total of 861 comparisons (\\(=42*(42-1)/2\\)), that are then adjusted for multiple tests. With that many tests, a severe adjustment is likely and hence things that are different are not detected. With so many tests, it could be that there is an overall effect due to treatment, but they all share the same letter!\nThe second problem is one of interpretation. Just because two treatments or varieties share a letter does not mean they are equivalent. It only means that they were not found to be different. A funny distinction, but alas. There is an entire branch of statistics, ‘equivalence testing’ devoted to just this topic - how to test if two things are actually the same. This involves the user declaring a maximum allowable numeric difference for a variable in order to determine if two items are statistically different or equivalent - something that these pairwise comparisons are not doing.]\nAnother problem is that doing all pairwise comparison may not align with experimental goals. In many circumstances, not every pairwise combination is of any interest or relevance to the study. Additionally, complex treatment structure may necessitate custom contrasts that highlight differences between the marginal estimate of multiple treatments versus another. For example, there may be 2 levels of ‘high’ nitrogen fertilizer treatment with two different sources (i.e. types of fertilizer). A researcher may want to contrast those two levels together against ‘low’ nitrogen treatment levels.\nOften, researchers have embedded additional structure in the treatments that is not fully reflected in the statistical model. For example, perhaps a study is looking at five different intercropping mixtures, two that incorporate a legume and 3 that do not. Conducting all pairwise comparisons with miss estimating the difference due to including a legume in an intercropping mix and not incorporating one. Soil fertility and other agronomic studies often have complex treatment structure. When it is not practical or financially feasible to have a full factorial experiment, embedding different treatment combinations in the main factor of analysis can accomplish this. This is a good study design approach, but compact letter display is an efficient way to report results. In such cases, custom contrasts are a better choice for hypothesis testing.The emmeans chapter covers how to do this.\n\n\n2.0.5 Final Thoughts\nGood statistical analysis requires a thoughtful, intentional approach. If you have gone to the trouble to conduct a well designed experiment or assemble a useful data set, take the time and effort to analyze it properly.\n\n\n\n\nBroman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.\n\n\nGreenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.\n\n\nWasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.",
    "crumbs": [
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Tao of Analysis</span>"
    ]
  },
  {
    "objectID": "chapters/background.html",
    "href": "chapters/background.html",
    "title": "3  Mixed model theory and background",
    "section": "",
    "text": "3.1 Model\nMixed-effects models are called “mixed” because they simultaneously model fixed and random effects. Fixed effects (e.g. treatments) represent population-level (average) effects that should persist across experiments. Fixed effects are similar to the parameters found in “traditional” regression techniques like ordinary least squares. Random effects are discrete units sampled from some population (e.g. plots, participants), and thus they are inherently categorical.\nRecall simple linear regression with intercept (\\(\\beta_0\\)) and slope (\\(\\beta_1\\)) effect for subject \\(i\\). The slope and intercept are chosen in a way so that the residual sum of squares is minimized.\n\\[  Y = \\beta_0 + \\beta_1 X + \\epsilon \\]\nIf we consider this model in a mixed model framework, \\(\\beta_0\\) and \\(\\beta_0\\) are considered fixed effects (also known as the population-averaged values) and \\(b_i\\) is a random effect for subject i. The random effect can be thought of as each subject’s deviation from the fixed intercept parameter. The key assumption about \\(b_i\\) is that it is independent, identically and normally distributed with a mean of zero and associated variance. Random effects are especially useful when we have (1) lots of levels (e.g., many species or blocks), (2) relatively little data on each level (although we need multiple samples from most of the levels), and (3) uneven sampling across levels.\nFor example, if we let the intercept be a random effect, it takes the form:\n\\[  Y = \\beta_0 + b_i + \\beta_1 X + \\epsilon \\]\nIn this model, predictions would vary depending on each subject’s random intercept term, but slopes would be the same.\nIn second case, we can have a fixed intercept and a random slope. The model will be:\n\\[  Y = \\beta_0 + (\\beta_1 + b_i)(X) + \\epsilon\\]\nIn this model, the \\(\\beta_i\\) is a random effect for subject \\(i\\). Predictions would vary with random slope term, but the intercept will be the same:\nThird case would be the mixed model with random slope and intercept:\n\\[  Y = (\\beta_0 + a_i) + (\\beta_1 + b_i)(X) + \\epsilon\\]\nIn this model, \\(a_i\\) and \\(b_i\\) are random effects for subject \\(i\\) applied to the intercept and slope, respectively. Predictions would vary depending on each subject’s slope and intercept terms:",
    "crumbs": [
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Mixed Model Background</span>"
    ]
  },
  {
    "objectID": "chapters/background.html#model",
    "href": "chapters/background.html#model",
    "title": "3  Mixed model theory and background",
    "section": "",
    "text": "Example mixed model with random intercepts but identical slopes.\n\n\n\n\n\n\n\n\n\n\nMixed model with random slopes but identical intercepts.\n\n\n\n\n\n\n\n\n\n\nMixed Model with random intercept and slope",
    "crumbs": [
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Mixed Model Background</span>"
    ]
  },
  {
    "objectID": "chapters/background.html#formula-notation",
    "href": "chapters/background.html#formula-notation",
    "title": "3  Mixed model theory and background",
    "section": "3.2 R Formula Syntax for Random and Fixed Effects",
    "text": "3.2 R Formula Syntax for Random and Fixed Effects\nFormula notation is often used in the R syntax for linear models. It looks like this: \\(Y ~ X\\), where \\(Y\\) is the dependent variable (the response) and \\(X\\) is/are the independent variable(s) that is, the experimental treatments or interventions.\n\nmy_formula &lt;- formula(Y ~ treatment1 + treatment2)\nclass(my_formula)\n\n[1] \"formula\"\n\n\nThe package ‘lme4’ has some additional conventions regarding the formula. Random effects are put in parentheses and a 1| is used to denote random intercepts (rather than random slopes). The table below provides several examples of random effects in mixed models. The names of grouping factors are denoted g, g1, and g2, and covariates as x.\n\n\n\n\n\n\n\n\nFormula\nAlternative\nMeaning\n\n\n\n\n(1|g)\n1 + (1|g)\nRandom intercept with a fixed mean\n\n\n(1|g1/g2)\n(1| 1) + (1|g1:g2)\nIntercept varying among g1 and g2 within g1\n\n\n(1|g1) + (1|g2)\n1 + (1|g1) + (1|g2)\nIntercept varying among g1 and g2\n\n\nx + (x|g)\n1 + x + (1 + x|g)\nCorrelated random intercept and slope\n\n\nx + (x||g)\n1 + x + (1|g) + (0 + x|g)\nUncorrelated random intercept and slope\n\n\n\nThe first example, (1|g) suffices for most models and is the only structure used in this guide.",
    "crumbs": [
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Mixed Model Background</span>"
    ]
  },
  {
    "objectID": "chapters/rcbd.html",
    "href": "chapters/rcbd.html",
    "title": "4  Randomized Complete Block Design",
    "section": "",
    "text": "4.1 Background\nThis is a simple model that can serve as a good entrance point to mixed models.\nRandomized complete block design (RCBD) is very common design where experimental treatments are applied at random to experimental units within each block. The block can represent a spatial or temporal unit or even different technicians taking data. The blocks are intended to control for a nuisance source of variation, such as over time, spatial variance, changes in equipment or operators, or myriad other causes. They are a random effect where the actual blocks used in the study are a random sample of a distribution of other blocks.\nThe statistical model:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\] Where:\n\\(\\mu\\) = overall experimental mean \\(\\alpha\\) = treatment effects (fixed) \\(\\beta\\) = block effects (random) \\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\]\nBoth the overall error and the block effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma\\) and \\(sigma_B\\), respectively.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Randomized Complete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/rcbd.html#background",
    "href": "chapters/rcbd.html#background",
    "title": "4  Randomized Complete Block Design",
    "section": "",
    "text": "‘iid’ assumption for error terms\n\n\n\nIn this model, the error terms, \\(\\epsilon\\) are assumed to be “iid”, that is, independently and identically distributed. This means they have constant variance and they each individual error term is independent from the others.\nThis guide will later address examples when this assumption is violated and how to handle it.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Randomized Complete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/rcbd.html#example-analysis",
    "href": "chapters/rcbd.html#example-analysis",
    "title": "4  Randomized Complete Block Design",
    "section": "4.2 Example Analysis",
    "text": "4.2 Example Analysis\nFirst, load the libraries for analysis and estimation:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr)\n\n\n\n\nNext, let’s load some data. It is located here if you want to download it yourself (recommended).\nThis data set is for a single wheat variety trial conducted in Aberdeen, Idaho in 2015. The trial includes 4 blocks and 42 different treatments (wheat varieties in this case). This experiment consists of a series of plots (the experimental unit) laid out in a rectangular grid in a farm field. The goal of this analysis is the estimate the yield of each variety and the determine the rankings of each variety for the variable.\n\nvar_trial &lt;- read.csv(here::here(\"data\", \"aberdeen2015.csv\"))\n\n\nTable of variables in the data set\n\n\n\n\n\n\nblock\nblocking unit\n\n\nrange\ncolumn position for each plot\n\n\nrow\nrow position for each plot\n\n\nvariety\ncrop variety (the treatment) being evaluated\n\n\nstand_pct\npercentage of the plot with actual plants growing in them\n\n\ndays_to_heading_julian\nJulian days (starting January 1st) until plot “headed” (first spike emerged)\n\n\nlodging\npercentage of plants in the plot that fell down and hence could not be harvested\n\n\nyield_bu_a\nyield (bushels per acre)\n\n\n\nThere are several variables present that are not useful for this analysis. The only thing we are concerned about is block, variety, yield_bu_a, and test_weight.\n\n4.2.1 Data integrity checks\nThe first thing is to make sure the data is what we expect. There are two steps:\n\nmake sure data are the expected data type\ncheck the extent of missing data\ninspect the independent variables and make sure the expected levels are present in the data\ninspect the dependent variable to ensure its distribution is following expectations\n\n\nstr(var_trial)\n\n'data.frame':   168 obs. of  10 variables:\n $ block                 : int  4 4 4 4 4 4 4 4 4 4 ...\n $ range                 : int  1 1 1 1 1 1 1 1 1 1 ...\n $ row                   : int  1 2 3 4 5 6 7 8 9 10 ...\n $ variety               : chr  \"DAS004\" \"Kaseberg\" \"Bruneau\" \"OR2090473\" ...\n $ stand_pct             : int  100 98 96 100 98 100 100 100 99 100 ...\n $ days_to_heading_julian: int  149 146 149 146 146 151 145 145 146 146 ...\n $ height                : int  39 35 33 31 33 44 30 36 36 29 ...\n $ lodging               : int  0 0 0 0 0 0 0 0 0 0 ...\n $ yield_bu_a            : num  128 130 119 115 141 ...\n $ test_weight           : num  56.4 55 55.3 54.1 54.1 56.4 54.7 57.5 56.1 53.8 ...\n\n\nThese look okay except for block, which is currently coded as integer (numeric). We don’t want run a regression of block, where block 1 has twice the effect of block 2, and so on. So, converting it to a character will fix that. It can also be converted to a factor, but character variables are a bit easier to work with, and ultimately, equivalent to factor conversion\n\nvar_trial$block &lt;- as.character(var_trial$block)\n\nNext, check the independent variables. Running a cross tabulations is often sufficient to ascertain this.\n\ntable(var_trial$variety, var_trial$block)\n\n                        \n                         1 2 3 4\n  06-03303B              1 1 1 1\n  Bobtail                1 1 1 1\n  Brundage               1 1 1 1\n  Bruneau                1 1 1 1\n  DAS003                 1 1 1 1\n  DAS004                 1 1 1 1\n  Eltan                  1 1 1 1\n  IDN-01-10704A          1 1 1 1\n  IDN-02-29001A          1 1 1 1\n  IDO1004                1 1 1 1\n  IDO1005                1 1 1 1\n  Jasper                 1 1 1 1\n  Kaseberg               1 1 1 1\n  LCS Artdeco            1 1 1 1\n  LCS Biancor            1 1 1 1\n  LCS Drive              1 1 1 1\n  LOR-833                1 1 1 1\n  LOR-913                1 1 1 1\n  LOR-978                1 1 1 1\n  Madsen                 1 1 1 1\n  Madsen / Eltan (50/50) 1 1 1 1\n  Mary                   1 1 1 1\n  Norwest Duet           1 1 1 1\n  Norwest Tandem         1 1 1 1\n  OR2080637              1 1 1 1\n  OR2080641              1 1 1 1\n  OR2090473              1 1 1 1\n  OR2100940              1 1 1 1\n  Rosalyn                1 1 1 1\n  Stephens               1 1 1 1\n  SY  Ovation            1 1 1 1\n  SY 107                 1 1 1 1\n  SY Assure              1 1 1 1\n  UI Castle CLP          1 1 1 1\n  UI Magic CLP           1 1 1 1\n  UI Palouse             1 1 1 1\n  UI Sparrow             1 1 1 1\n  UI-WSU Huffman         1 1 1 1\n  WB 456                 1 1 1 1\n  WB 528                 1 1 1 1\n  WB1376 CLP             1 1 1 1\n  WB1529                 1 1 1 1\n\n\nThere are 42 varieties and there appears to be no mis-spellings among them that might confuse R into thinking varieties are different when they are actually the same. R is sensitive to case and white space, which can make it easy to create near duplicate treatments, such as “eltan” and “Eltan” and “Eltan”. There is no evidence of that in this data set. Additionally, it is perfectly balanced, with exactly one observation per treatment per rep. Please note that this does not tell us anything about the extent of missing data.\n\n\n\n\n\n\nMissing Data\n\n\n\nHere is a quick check to count the number of missing data in each column. This is not neededfor the data sets in this tutorial that have already been comprehensively examined, but it is helpful to check that the level of missingness displayed in an R session is what you expect.\n\napply(var_trial, 2, function(x) sum(is.na(x)))\n\n                 block                  range                    row \n                     0                      0                      0 \n               variety              stand_pct days_to_heading_julian \n                     0                      0                      0 \n                height                lodging             yield_bu_a \n                     0                      0                      0 \n           test_weight \n                     0 \n\n\nAlas, no missing data!\n\n\nIf there were independent variables with a continuous distribution (a covariate), plot those data.\nLast, check the dependent variable. A histogram is often quite sufficient to accomplish this. This is designed to be a quick check, so no need to spend time making the plot look good.\n\n\n\n\n\n\n\n\n\nFigure 4.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(var_trial$yield_bu_a, main = \"\", xlab = \"yield\")\n\nThe range is roughly falling into the range we expect. We (the authors) know this from talking with the person who generated the data, not through our own intuition. There are mp large spikes of points at a single value (indicating something odd), nor are there any extreme values (low or high) that might indicate problems.\nData are not expected to be normally distributed at this point, so don’t bother running any Shapiro-Wilk tests. This histogram is a check to ensure the the data are entered correctly and they appear valid. It requires a mixture of domain knowledge and statistical training to know this, but over time, if you look at these plots with regularity, you will gain a feel for what your data should look like at this stage.\nThese are not complicated checks. They are designed to be done quickly and should be done for every analysis if you not previously already inspected the data as thus. We do this before every analysis and often discover surprising things! Best to discover these things early, since they are likely to impact the final analysis.\nThis data set is ready for analysis!\n\n\n4.2.2 Model Building\n\n\nRecall the model:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\]\nFor this model, \\(\\alpha_i\\) is the variety effect (fixed) and \\(\\beta_j\\) is the block effect (random).\nHere is the R syntax for the RCBD statistical model:\n\nlme4nlme\n\n\n\nmodel_rcbd_lmer &lt;- lmer(yield_bu_a ~ variety + (1|block),\n                   data = var_trial, \n                   na.action = na.exclude)\n\n\n\n\nmodel_rcbd_lme &lt;- lme(yield_bu_a ~ variety,\n                  random = ~ 1|block,\n                  data = var_trial, \n                  na.action = na.exclude)\n\n\n\n\nThe parentheses are used to indicate that ‘block’ is a random effect, and this particular notation (1|block) indicates that a ‘random intercept’ model is being fit. This is the most common approach. It means there is one overall effect fit for each block.\nWe use the argument na.action = na.exclude as instruction for how to handle missing data: conduct the analysis, adjusting as needed for the missing data, and when prediction or residuals are output, please pad them in the appropriate places for missing data so they can be easily merged into the main data set if need be.\n\n\n4.2.3 Check Model Assumptions\n\n\nR syntax for checking model assumptions is the same for lme4 and nlme.\nRemember those iid assumptions? Let’s make sure we actually met them.\n\n4.2.3.1 Old Way\nThere are special plotting function written for lme4 and nlme objects (ie.plot(lmer_object)) for checking the homoscedasticity (constant variance).\n\n\n\n\n\n\n\n\n\nFigure 4.2: Plot of residuals versus fitted values\n\n\n\n\n\nplot(model_rcbd_lmer, resid(., scaled=TRUE) ~ fitted(.), \n     xlab = \"fitted values\", ylab = \"studentized residuals\")\n\nWe are looking for a random and uniform distribution of points. This looks good!\nChecking normality requiring first extracting the model residuals with resid() and then generating a qq-plot and line.\n\n\n\n\n\n\n\n\n\nFigure 4.3: QQ-plot of residuals\n\n\n\n\n\nqqnorm(resid(model_rcbd_lmer), main = NULL); qqline(resid(model_rcbd_lmer))\n\nThis is reasonably good. Things do tend to fall apart at the tails a little, so this is not concerning.\n\n\n4.2.3.2 New Way\nNowadays, we can take advantage of the performance package, which provides a comprehensive suite of diagnostic plots.\n\n\nPlease look for check_model() in help tab to find what other checks you can perform using this function. If you would like to check all assumptions you can use the argument check = \"all\".\n\ncheck_model(model_rcbd_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n4.2.4 Inference\n\n\nR syntax for estimating model marginal means is the same for lme4 and nlme.\nEstimates for each treatment level can be obtained with the ‘emmeans’ package.\n\nrcbd_emm &lt;- emmeans(model_rcbd_lmer, ~ variety)\nas.data.frame(rcbd_emm) %&gt;% arrange(desc(emmean))\n\n variety                  emmean       SE    df  lower.CL upper.CL\n Rosalyn                155.2703 7.212203 77.85 140.91149 169.6292\n IDO1005                153.5919 7.212203 77.85 139.23310 167.9508\n OR2080641              152.6942 7.212203 77.85 138.33536 167.0530\n Bobtail                151.6403 7.212203 77.85 137.28149 165.9992\n UI Sparrow             151.6013 7.212203 77.85 137.24245 165.9601\n Kaseberg               150.9768 7.212203 77.85 136.61794 165.3356\n IDN-01-10704A          148.9861 7.212203 77.85 134.62729 163.3450\n 06-03303B              148.8300 7.212203 77.85 134.47116 163.1888\n WB1529                 148.2445 7.212203 77.85 133.88568 162.6034\n DAS003                 145.2000 7.212203 77.85 130.84116 159.5588\n IDN-02-29001A          144.5755 7.212203 77.85 130.21665 158.9343\n Bruneau                143.9900 7.212203 77.85 129.63116 158.3488\n SY 107                 143.6387 7.212203 77.85 129.27987 157.9975\n WB 528                 142.9752 7.212203 77.85 128.61633 157.3340\n OR2080637              141.7652 7.212203 77.85 127.40633 156.1240\n Jasper                 141.2968 7.212203 77.85 126.93794 155.6556\n UI Magic CLP           139.5403 7.212203 77.85 125.18149 153.8992\n Madsen                 139.2671 7.212203 77.85 124.90826 153.6259\n LCS Biancor            139.1110 7.212203 77.85 124.75213 153.4698\n SY  Ovation            138.6426 7.212203 77.85 124.28375 153.0014\n OR2090473              137.8229 7.212203 77.85 123.46407 152.1817\n Madsen / Eltan (50/50) 136.9642 7.212203 77.85 122.60536 151.3230\n UI-WSU Huffman         135.4810 7.212203 77.85 121.12213 149.8398\n Mary                   134.8564 7.212203 77.85 120.49762 149.2153\n Norwest Tandem         134.3490 7.212203 77.85 119.99020 148.7079\n Brundage               134.0758 7.212203 77.85 119.71697 148.4346\n IDO1004                132.5145 7.212203 77.85 118.15568 146.8733\n DAS004                 132.2413 7.212203 77.85 117.88245 146.6001\n Norwest Duet           132.0852 7.212203 77.85 117.72633 146.4440\n Eltan                  131.4606 7.212203 77.85 117.10181 145.8195\n LCS Artdeco            130.8361 7.212203 77.85 116.47729 145.1950\n UI Palouse             130.4848 7.212203 77.85 116.12600 144.8437\n LOR-978                130.4458 7.212203 77.85 116.08697 144.8046\n LCS Drive              128.7674 7.212203 77.85 114.40858 143.1262\n Stephens               127.1671 7.212203 77.85 112.80826 141.5259\n OR2100940              126.1523 7.212203 77.85 111.79342 140.5111\n UI Castle CLP          125.5277 7.212203 77.85 111.16891 139.8866\n WB1376 CLP             123.6932 7.212203 77.85 109.33439 138.0521\n LOR-833                122.7565 7.212203 77.85 108.39762 137.1153\n LOR-913                118.7752 7.212203 77.85 104.41633 133.1340\n WB 456                 118.4629 7.212203 77.85 104.10407 132.8217\n SY Assure              111.0468 7.212203 77.85  96.68794 125.4056\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\nThis table indicates the estimated marginal means (“emmeans”, sometimes called “least squares means”), the standard error (“SE”) of those means, the degrees of freedom and the upper and lower bounds of the 95% confidence interval. As an additional step, the emmeans were sorted from largest to smallest.\nAt this point, the analysis goals have been met: we know the estimated means for each treatment and their rankings.\nIf you want to run ANOVA, it can be done quite easily. By default, the Kenward-Rogers method of degrees of freedom approximation is used.\n\n\nThe Type I method is sometimes referred to as the “sequential” sum of squares, because it involves a process of adding terms to the model one at a time. Type I sum of squares is the default hypothesis testing method used by the anova() function. This only matters when a data set is unbalanced across treatments, either due to design or missing data points.\n\nlme4nlme\n\n\n\nanova(model_rcbd_lmer, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n        Sum Sq Mean Sq NumDF DenDF F value    Pr(&gt;F)    \nvariety  18354  447.65    41   123  2.4528 8.017e-05 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_rcbd_lme, type = \"sequential\")\n\n            numDF denDF   F-value p-value\n(Intercept)     1   123 2514.1283  &lt;.0001\nvariety        41   123    2.4528   1e-04\n\n\n\n\n\n\n\n\n\n\n\nna.action = na.exclude\n\n\n\nYou may have noticed the final argument for na.action in the model statement:\nmodel_rcbd_lmer &lt;- lmer(yield_bu_a ~ variety + (1|block),\n                   data = var_trial, \n                   na.action = na.exclude)\nThe argument na.action = na.exclude provides instructions for how to handle missing data. na.exclude removes the missing data points before proceeding with the analysis. When any obervation-levels model outputs is generated (e.g. predictions, residuals), they are padded in the appropriate place to account for missing data. This is handy because it makes it easier to add those results to the original data set if so desired.\nSince there are no missing data, this step was not strictly necessary, but it’s a good habit to be in.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Randomized Complete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/factorial-design.html",
    "href": "chapters/factorial-design.html",
    "title": "5  RCBD Design with Several Crossed Factors",
    "section": "",
    "text": "5.1 Background\nFactorial design involves studying the impact of multiple factors simultaneously. Each factor can have multiple levels, and combinations of these levels form the experimental conditions. This design allows us to understand the main effects of individual factors and their interactions on the response variable. The statistical model for factorial design is: \\[y_{ij} = \\mu +  \\tau_i+ \\beta_j + \\tau_i\\beta_j + \\epsilon_{ij}\\] Where: \\(\\mu\\) = experiment mean, \\(\\tau\\) = effect of factor A, \\(\\beta\\) = effect of factor B, and \\(\\tau\\beta\\) = interaction effect of factor A and B.\nAssumptions of this model includes: independent and identically distributed error terms with a constant variance.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>5</span>  <span class='chapter-title'>Factorial RCBD Design</span>"
    ]
  },
  {
    "objectID": "chapters/factorial-design.html#example-analysis",
    "href": "chapters/factorial-design.html#example-analysis",
    "title": "5  RCBD Design with Several Crossed Factors",
    "section": "5.2 Example Analysis",
    "text": "5.2 Example Analysis\nFirst step is to load the libraries required for the analysis:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nNext, we will load the dataset named ‘cochran.factorial’ from the ‘agridat’ package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus. The goal of this analysis is the estimate the effect of d, n, p, k, and their interactions on bean yield.\nNote, while importing the data, d, n, p, and k were converted into factor variables using the mutate() function from dplyr package. This helps in reducing the extra steps of converting each single variable to factor manually.\n\nlibrary(agridat)\ndata1 &lt;- agridat::cochran.factorial %&gt;% \n  mutate(d = as.factor(d),\n         n = as.factor(n),\n         p = as.factor(p),\n         k = as.factor(k))\n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nrep\nreplication unit\n\n\ntrt\ntreatment factor, 16 levels\n\n\nd\ndung treatment, 2 levels\n\n\nn\nnitrogen treatment, 2 levels\n\n\np\nphosphorus treatment, 2 levels\n\n\nk\npotassium treatment, 2 levels\n\n\nyield\nyield (lbs)\n\n\n\nThe objective of this example is evaluate the individual and interactive effect of “d”, “n”, “p”, and “k” treatments on yield.\n\n5.2.1 Data Integrity Checks\nFirst step is to Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric/integer.\n\nstr(data1)\n\n'data.frame':   32 obs. of  8 variables:\n $ rep  : Factor w/ 2 levels \"R1\",\"R2\": 1 1 1 1 1 1 1 1 1 1 ...\n $ block: Factor w/ 2 levels \"B1\",\"B2\": 1 1 1 1 1 1 1 1 2 2 ...\n $ trt  : Factor w/ 16 levels \"(1)\",\"d\",\"dk\",..: 15 10 2 14 5 6 9 11 8 12 ...\n $ yield: int  45 55 53 36 41 48 55 42 50 44 ...\n $ d    : Factor w/ 2 levels \"0\",\"1\": 2 2 1 2 1 1 1 2 1 2 ...\n $ n    : Factor w/ 2 levels \"0\",\"1\": 2 2 2 1 1 1 2 1 2 1 ...\n $ p    : Factor w/ 2 levels \"0\",\"1\": 1 2 2 1 2 1 1 2 1 2 ...\n $ k    : Factor w/ 2 levels \"0\",\"1\": 2 1 2 1 1 2 1 2 2 1 ...\n\n\nThis looks good.\nNext step is to inspect the independent variables and make sure the expected levels are present in the data.\n\ntable(data1$d, data1$n, data1$p, data1$k)\n\n, ,  = 0,  = 0\n\n   \n    0 1\n  0 2 2\n  1 2 2\n\n, ,  = 1,  = 0\n\n   \n    0 1\n  0 2 2\n  1 2 2\n\n, ,  = 0,  = 1\n\n   \n    0 1\n  0 2 2\n  1 2 2\n\n, ,  = 1,  = 1\n\n   \n    0 1\n  0 2 2\n  1 2 2\n\n\nThe design looks well balanced.\nLast step is to inspect the dependent variable to ensure its distribution follows the bell-shaped curve and no skewness is there.\n\n\n\n\n\n\n\n\n\nFigure 5.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(data1$yield)\n\nNo extreme (low or high) yield values were observed in data.\n\n\n5.2.2 Model fitting\nModel fitting with R is exactly the same as shown in previous chapters: we need to include all effect, as well as the interaction, which is represented by using the colon indicator ‘:’. Therefore, model syntax is:\nyield ~ d + n + p + k + d:n + d:p + d:k + n:p + n:k + p:k + d:n:p:k\nwhich can be abbreviated as:\nyield ~ d*n*p*k\n\nlme4nlme\n\n\n\nmodel1_lmer &lt;- lmer(yield ~ d*n*p*k + (1|block),\n                   data = data1, \n                   na.action = na.exclude)\ntidy(model1_lmer)\n\n# A tibble: 18 × 8\n   effect   group    term           estimate std.error statistic    df   p.value\n   &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;             &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;\n 1 fixed    &lt;NA&gt;     (Intercept)      49          3.70   13.2     16.0  4.91e-10\n 2 fixed    &lt;NA&gt;     d1               -9.5        5.24   -1.81    16.0  8.84e- 2\n 3 fixed    &lt;NA&gt;     n1                0.500      5.24    0.0955  16.0  9.25e- 1\n 4 fixed    &lt;NA&gt;     p1              -11.5        5.24   -2.20    16.0  4.31e- 2\n 5 fixed    &lt;NA&gt;     k1                1.00       5.24    0.191   16.0  8.51e- 1\n 6 fixed    &lt;NA&gt;     d1:n1            13.5        7.82    1.73    16.0  1.03e- 1\n 7 fixed    &lt;NA&gt;     d1:p1            15.5        7.82    1.98    16.0  6.49e- 2\n 8 fixed    &lt;NA&gt;     n1:p1             9.50       7.82    1.22    16.0  2.42e- 1\n 9 fixed    &lt;NA&gt;     d1:k1             4.00       7.82    0.512   16.0  6.16e- 1\n10 fixed    &lt;NA&gt;     n1:k1             0.500      7.82    0.0639  16.0  9.50e- 1\n11 fixed    &lt;NA&gt;     p1:k1             3.00       7.82    0.384   16.0  7.06e- 1\n12 fixed    &lt;NA&gt;     d1:n1:p1        -14.5       12.1    -1.19    16.0  2.50e- 1\n13 fixed    &lt;NA&gt;     d1:n1:k1        -17.0       12.1    -1.40    16.0  1.81e- 1\n14 fixed    &lt;NA&gt;     d1:p1:k1         -7.00      12.1    -0.576   16.0  5.72e- 1\n15 fixed    &lt;NA&gt;     n1:p1:k1         -4.50      12.1    -0.370   16.0  7.16e- 1\n16 fixed    &lt;NA&gt;     d1:n1:p1:k1      25.0       19.9     1.26    16.0  2.27e- 1\n17 ran_pars block    sd__(Intercep…    1.26      NA      NA       NA   NA       \n18 ran_pars Residual sd__Observati…    4.92      NA      NA       NA   NA       \n\n\n\n\n\nmodel2_lme &lt;- lme(yield ~ d*n*p*k,\n              random = ~ 1|block,\n              data = data1, \n              na.action = na.exclude)\ntidy(model2_lme)\n\n# A tibble: 18 × 8\n   effect   group    term           estimate std.error    df statistic   p.value\n   &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;             &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;\n 1 fixed    &lt;NA&gt;     (Intercept)      49          4.79    15   10.2      3.66e-8\n 2 fixed    &lt;NA&gt;     d1               -9.5        6.77    15   -1.40     1.81e-1\n 3 fixed    &lt;NA&gt;     n1                0.500      6.77    15    0.0739   9.42e-1\n 4 fixed    &lt;NA&gt;     p1              -11.5        6.77    15   -1.70     1.10e-1\n 5 fixed    &lt;NA&gt;     k1                1.00       6.77    15    0.148    8.85e-1\n 6 fixed    &lt;NA&gt;     d1:n1            13.5       11.6     15    1.16     2.63e-1\n 7 fixed    &lt;NA&gt;     d1:p1            15.5       11.6     15    1.34     2.02e-1\n 8 fixed    &lt;NA&gt;     n1:p1             9.50      11.6     15    0.818    4.26e-1\n 9 fixed    &lt;NA&gt;     d1:k1             4.00      11.6     15    0.345    7.35e-1\n10 fixed    &lt;NA&gt;     n1:k1             0.500     11.6     15    0.0431   9.66e-1\n11 fixed    &lt;NA&gt;     p1:k1             3.00      11.6     15    0.258    8.00e-1\n12 fixed    &lt;NA&gt;     d1:n1:p1        -14.5       21.0     15   -0.690    5.01e-1\n13 fixed    &lt;NA&gt;     d1:n1:k1        -17.0       21.0     15   -0.809    4.31e-1\n14 fixed    &lt;NA&gt;     d1:p1:k1         -7.00      21.0     15   -0.333    7.44e-1\n15 fixed    &lt;NA&gt;     n1:p1:k1         -4.50      21.0     15   -0.214    8.33e-1\n16 fixed    &lt;NA&gt;     d1:n1:p1:k1      25.0       39.7     15    0.630    5.38e-1\n17 ran_pars block    sd_(Intercept)    3.28      NA       NA   NA       NA      \n18 ran_pars Residual sd_Observation    4.92      NA       NA   NA       NA      \n\n\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nInstead of summary() function, we used tidy() function from the ‘broom.mixed’ package to get a short summary output of the model.\n\n\n\n\n5.2.3 Check Model Assumptions\n\nlme4nlme\n\n\n\ncheck_model(model1_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model2_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nThe linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects all fall nearly on a straight line so we can be satisfied with that.\n\n\n5.2.4 Inference\nWe can get an ANOVA table for the linear mixed model using the function anova(), which works for both lmer() and lme() models..\n\nlme4nlme\n\n\n\ncar::Anova(model1_lmer, type = 'III', test.statistic=\"F\")\n\nAnalysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)\n\nResponse: yield\n                   F Df Df.res    Pr(&gt;F)    \n(Intercept) 175.2030  1 20.439 1.729e-11 ***\nd             3.2928  1 20.439   0.08429 .  \nn             0.0091  1 20.439   0.92484    \np             4.8252  1 20.439   0.03974 *  \nk             0.0365  1 20.439   0.85040    \nd:n           2.9812  1 25.421   0.09637 .  \nd:p           3.9300  1 25.421   0.05834 .  \nn:p           1.4763  1 25.421   0.23552    \nd:k           0.2617  1 25.421   0.61335    \nn:k           0.0041  1 25.421   0.94951    \np:k           0.1472  1 25.421   0.70440    \nd:n:p         1.4251  1 37.012   0.24016    \nd:n:k         1.9589  1 37.012   0.16996    \nd:p:k         0.3321  1 37.012   0.56789    \nn:p:k         0.1373  1 37.012   0.71313    \nd:n:p:k       1.5778  1 66.709   0.21346    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model2_lme, type = \"marginal\")\n\n            numDF denDF   F-value p-value\n(Intercept)     1    15 104.83445  &lt;.0001\nd               1    15   1.97029  0.1808\nn               1    15   0.00546  0.9421\np               1    15   2.88720  0.1099\nk               1    15   0.02183  0.8845\nd:n             1    15   1.35278  0.2630\nd:p             1    15   1.78330  0.2017\nn:p             1    15   0.66990  0.4259\nd:k             1    15   0.11876  0.7352\nn:k             1    15   0.00186  0.9662\np:k             1    15   0.06680  0.7996\nd:n:p           1    15   0.47580  0.5009\nd:n:k           1    15   0.65401  0.4313\nd:p:k           1    15   0.11089  0.7437\nn:p:k           1    15   0.04583  0.8334\nd:n:p:k         1    15   0.39719  0.5380\n\n\n\n\n\nLet’s find estimates for some of the factors such as n, p, and n:k interaction effect. This will help us look at the combined effect of n & k on bean yield.\n\nlme4nlme\n\n\n\nemmeans(model1_lmer, specs = ~ n)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n emmean   SE df lower.CL upper.CL\n 0   43.8 1.52 37     40.7     46.8\n 1   50.1 1.52 37     47.0     53.2\n\nResults are averaged over the levels of: d, p, k \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model1_lmer, specs = ~ p)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n p emmean   SE df lower.CL upper.CL\n 0   47.4 1.52 37     44.3     50.5\n 1   46.5 1.52 37     43.4     49.6\n\nResults are averaged over the levels of: d, n, k \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model1_lmer, specs = ~ n:k)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n k emmean   SE   df lower.CL upper.CL\n 0 0   42.4 1.95 25.4     38.4     46.4\n 1 0   50.8 1.95 25.4     46.7     54.8\n 0 1   45.1 1.95 25.4     41.1     49.1\n 1 1   49.5 1.95 25.4     45.5     53.5\n\nResults are averaged over the levels of: d, p \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model2_lme, specs = ~ n)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n emmean   SE df lower.CL upper.CL\n 0   43.8 2.63  1     10.4     77.1\n 1   50.1 2.63  1     16.7     83.5\n\nResults are averaged over the levels of: d, p, k \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model2_lme, specs = ~ p)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n p emmean   SE df lower.CL upper.CL\n 0   47.4 2.63  1     14.0     80.8\n 1   46.5 2.63  1     13.1     79.9\n\nResults are averaged over the levels of: d, n, k \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model2_lme, specs = ~ n:k)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n k emmean  SE df lower.CL upper.CL\n 0 0   42.4 2.9  1     5.50     79.2\n 1 0   50.8 2.9  1    13.88     87.6\n 0 1   45.1 2.9  1     8.25     82.0\n 1 1   49.5 2.9  1    12.63     86.4\n\nResults are averaged over the levels of: d, p \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nIn summary, while working with factorial designs make sure to carefully interpret ANOVA and estimated marginal means for main and interaction effects.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>5</span>  <span class='chapter-title'>Factorial RCBD Design</span>"
    ]
  },
  {
    "objectID": "chapters/split-plot-design.html",
    "href": "chapters/split-plot-design.html",
    "title": "6  Split Plot Design",
    "section": "",
    "text": "6.1 Details for Split Plot Designs\nSplit-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized (CRD), completely randomized block (RCBD). The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.\nThe statistical model structure this design:\n\\[y_{ijk} = \\mu + \\alpha_i + \\beta_k + (\\alpha_j\\beta_k) + \\epsilon_{ij} + \\delta_{ijk} \\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\tau\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta  \\sim N(0, \\sigma_\\delta)\\]\nBoth the error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma_\\epsilon\\) and \\(\\sigma_\\delta\\), respectively.\nThis is also referred as “Split-Block RCB” design. The statistical model structure for split plot design:\n\\[y_{ijk} = \\mu + \\rho_j +  \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\epsilon_{ij} + \\delta_{ijk}\\] Where:\n\\(\\mu\\) = overall experimental mean, \\(\\rho\\) = block effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta  \\sim N(0, \\sigma_\\delta)\\]\nBoth the overall error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma\\) and \\(\\sigma_\\delta\\), respectively.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Split Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/split-plot-design.html#details-for-split-plot-designs",
    "href": "chapters/split-plot-design.html#details-for-split-plot-designs",
    "title": "6  Split Plot Design",
    "section": "",
    "text": "Whole Plot Randomized as a completely randomized design\n\n\n\n\n\n\n\n\nWhole Plot Randomized as an RCBD\n\n\n\n\n\n\n\n\n\n\n\n\n\n‘iid’ assumption for error terms\n\n\n\nIn these model, the error terms, \\(\\epsilon\\) are assumed to be “iid”, that is, independently and identically distributed. This means they have constant variance and they each individual error term is independent from the others.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Split Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/split-plot-design.html#analysis-examples",
    "href": "chapters/split-plot-design.html#analysis-examples",
    "title": "6  Split Plot Design",
    "section": "6.2 Analysis Examples",
    "text": "6.2 Analysis Examples\nLoad required libraries\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance); library(ggplot2)\nlibrary(broom.mixed)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(ggplot2); library(broom.mixed)\n\n\n\n\n\n6.2.1 Example model for CRD Split Plot Designs\nLet’s import height data. It is located here if you want to download it yourself (recommended).\nThe data (Height data) for this example involves a CRD split plot designed experiment. Treatments are 4 Timings (times) and 8 managements (manage). The whole plots are times and management represents subplot and 3 replications.\n\nheight_data &lt;- readxl::read_excel(here::here(\"data\", \"height_data.xlsx\"))\n\n\nTable of variables in the oat data set\n\n\nrep\nreplication unit\n\n\ntime\nMain plot with 4 levels\n\n\nManage\nSplit-plot with 8 levels\n\n\nsample\ntwo sampling units per each rep\n\n\nheight\nyield (lbs per acre)\n\n\n\n\n6.2.1.1 Data integrity checks\n\nRun a cross tabulation using table() to check the arrangement of whole-plots and sub-plots.\n\n\ntable(height_data$time, height_data$manage)\n\n    \n     M1 M2 M3 M4 M5 M6 M7 M8\n  T1  6  6  6  6  6  6  6  6\n  T2  6  6  6  6  6  6  6  6\n  T3  6  6  6  6  6  6  6  6\n  T4  6  6  6  6  6  6  6  6\n\n\nThe levels of whole plots and subplots are balanced.\n\nLook at structure of the data using str(), this will help in identifying class of the variable. In this data set, class of the whole-plot, sub-plot, and block should be factor/character and response variable (height) should be numeric.\n\n\nstr(height_data)\n\ntibble [192 × 5] (S3: tbl_df/tbl/data.frame)\n $ time  : chr [1:192] \"T1\" \"T1\" \"T1\" \"T1\" ...\n $ manage: chr [1:192] \"M1\" \"M2\" \"M3\" \"M4\" ...\n $ rep   : chr [1:192] \"R1\" \"R1\" \"R1\" \"R1\" ...\n $ sample: chr [1:192] \"S1\" \"S1\" \"S1\" \"S1\" ...\n $ height: num [1:192] 104.5 92.3 96.8 94.7 105.7 ...\n\n\nThe ‘time’, ‘manage’, and ‘rep’ are character and variable height is numeric. The structure of the data is in format as needed.\n\nCheck the number of missing values in each column.\n\n\napply(height_data, 2, function(x) sum(is.na(x)))\n\n  time manage    rep sample height \n     0      0      0      0      0 \n\n\n\nExploratory boxplot to look at the height observations at different times with variable managements.\n\n\nggplot(data = height_data, aes(y = height, x = time)) +\n  geom_boxplot(aes(fill = manage), alpha = 0.6)\n\n\n\n\n\n\n\n\nLast, check the dependent variable by plotting a histogram of height data.\n\n\n\n\n\n\n\n\n\nFigure 6.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(height_data$height, main = \"\", xlab = \"yield\")\n\nThe distribution of height data looks close to normal.\n\n\n6.2.1.2 Model building\n\n\nRecall the model:\n\\[y_{ijk} = \\mu + \\gamma_i +  \\alpha_j + \\beta_k + (\\alpha_j\\beta_k) + \\epsilon_{ijk}\\]\nFor this model, \\(\\gamma\\) = block/rep effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B (fixed).\nIn order to test the main effects of the treatments as well as the interaction between two factors, we can specify that in model as: time + manage + time:manage or time*manage.\nWhen dealing with split plot design across reps or blocks, the random effects needs to be nested hierarchically, from largest unit to smallest. For example, in this example the random effects will be designated as (1 | rep/time). This implies that we use random intercept at each of the rep and time (whole-plot) level.\n\nlme4nlme\n\n\n\nmodel_lmer &lt;- lmer(height ~ time*manage + (1|rep/time), data = height_data)\ntidy(model_lmer)\n\n# A tibble: 35 × 8\n   effect group term        estimate std.error statistic     df    p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;  &lt;dbl&gt;      &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)   108.        3.19    33.9     4.38 0.00000181\n 2 fixed  &lt;NA&gt;  timeT2          3.18      2.63     1.21  104.   0.229     \n 3 fixed  &lt;NA&gt;  timeT3         -2.25      2.63    -0.855 104.   0.394     \n 4 fixed  &lt;NA&gt;  timeT4          1.28      2.63     0.488 104.   0.627     \n 5 fixed  &lt;NA&gt;  manageM2       -4.45      2.55    -1.74  152.   0.0832    \n 6 fixed  &lt;NA&gt;  manageM3       -5.30      2.55    -2.08  152.   0.0395    \n 7 fixed  &lt;NA&gt;  manageM4       -6.18      2.55    -2.42  152.   0.0166    \n 8 fixed  &lt;NA&gt;  manageM5       -5.02      2.55    -1.97  152.   0.0511    \n 9 fixed  &lt;NA&gt;  manageM6       -3.42      2.55    -1.34  152.   0.183     \n10 fixed  &lt;NA&gt;  manageM7       -9.75      2.55    -3.82  152.   0.000193  \n# ℹ 25 more rows\n\n\n\n\n\nmodel_lme &lt;-lme(height ~ time*manage,\n             random = ~ 1|rep/time, data = height_data)\n\ntidy(model_lme)\n\nWarning in tidy.lme(model_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 32 × 7\n   effect term        estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)   108.        3.19   152    33.9   9.59e-73\n 2 fixed  timeT2          3.18      2.63     6     1.21  2.72e- 1\n 3 fixed  timeT3         -2.25      2.63     6    -0.855 4.25e- 1\n 4 fixed  timeT4          1.28      2.63     6     0.488 6.43e- 1\n 5 fixed  manageM2       -4.45      2.55   152    -1.74  8.32e- 2\n 6 fixed  manageM3       -5.30      2.55   152    -2.08  3.95e- 2\n 7 fixed  manageM4       -6.18      2.55   152    -2.42  1.66e- 2\n 8 fixed  manageM5       -5.02      2.55   152    -1.97  5.11e- 2\n 9 fixed  manageM6       -3.42      2.55   152    -1.34  1.83e- 1\n10 fixed  manageM7       -9.75      2.55   152    -3.82  1.93e- 4\n# ℹ 22 more rows\n\n\n\n\n\n\n\n6.2.1.3 Check Model Assumptions\nBefore interpreting the model we should investigate the assumptions of the model to ensure any conclusions we draw are valid. There are assumptions that we can check are 1. Homogeneity (equal variance) 2. normality of residuals 3. values with high leverage.\nWe will use check_model() function from ‘performance’ package. The plots generated using this code gives a visual check of various assumptions including normality of residuals, normality of random effects, heteroscedasticity, homogeneity of variance, and multicollinearity.\n\nlme4nlme\n\n\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nIn this case the residuals fit the assumptions of the model well.\n\n\n6.2.1.4 Inference\nThe anova() function prints the the rows of analysis of variance table for whole-plot, sub-plot, and their interactions. We observed a significant effect of manage factor only.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = 'III', test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: height\n                Chisq Df Pr(&gt;Chisq)    \n(Intercept) 1148.5658  1    &lt; 2e-16 ***\ntime           4.5139  3    0.21105    \nmanage        15.9090  7    0.02596 *  \ntime:manage   24.3349 21    0.27711    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n            numDF denDF   F-value p-value\n(Intercept)     1   152 1148.6202  &lt;.0001\ntime            3     6    1.5046  0.3061\nmanage          7   152    2.2727  0.0315\ntime:manage    21   152    1.1588  0.2955\n\n\n\n\n\nWe can further compute estimated marginal means for each fixed effect and interaction effect can be obtained using emmeans().\n\nlme4nlme\n\n\n\nm1 &lt;- emmeans(model_lmer, ~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm1\n\n time emmean  SE   df lower.CL upper.CL\n T1      103 2.7 2.27     92.8      114\n T2      106 2.7 2.27     95.5      116\n T3      100 2.7 2.27     89.8      111\n T4      104 2.7 2.27     94.0      115\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nm2 &lt;- emmeans(model_lme, ~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm2\n\n time emmean  SE df lower.CL upper.CL\n T1      103 2.7  2     91.6      115\n T2      106 2.7  2     94.2      118\n T3      100 2.7  2     88.6      112\n T4      104 2.7  2     92.8      116\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nFurther, a pairwise comparison or contrasts can be analyzed using estimated means. In this model, ‘time’ factor has 4 levels. We can use pairs() function to evaluate pairwise comparison among different ‘time’ levels.\nHere’s a example using pairs() function to compare difference in height among different time points.\n\nlme4nlme\n\n\n\npairs(m1)\n\n contrast estimate   SE df t.ratio p.value\n T1 - T2     -2.68 1.11  6  -2.426  0.1719\n T1 - T3      2.95 1.11  6   2.665  0.1287\n T1 - T4     -1.21 1.11  6  -1.091  0.7072\n T2 - T3      5.63 1.11  6   5.091  0.0089\n T2 - T4      1.48 1.11  6   1.334  0.5767\n T3 - T4     -4.15 1.11  6  -3.756  0.0358\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: kenward-roger \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\n\n\n\npairs(m2)\n\n contrast estimate   SE df t.ratio p.value\n T1 - T2     -2.68 1.11  6  -2.426  0.1719\n T1 - T3      2.95 1.11  6   2.665  0.1287\n T1 - T4     -1.21 1.11  6  -1.091  0.7072\n T2 - T3      5.63 1.11  6   5.091  0.0089\n T2 - T4      1.48 1.11  6   1.334  0.5767\n T3 - T4     -4.15 1.11  6  -3.756  0.0358\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\n\n\n\n\n\n\n\n\n\npairs()\n\n\n\nThe default p-value adjustment in pairs() function is “tukey”, other options include “holm”, “hochberg”, “BH”, “BY”, and “none”. In addition, it’s okay to use this function when independent variable has few factors (2-4). For variable with multiple levels, it’s better to use custom contrasts. For more information on custom contrasts please visit Chapter 12.\n\n\n\n\n\n6.2.2 Example model for RCBD Split Plot Designs\nThe oats data used in this example is from the MASS package. The design is RCBD split plot with 6 blocks, 3 main plots and 4 subplots. The primary outcome variable was oat yield.\n\nTable of variables in the oat data set\n\n\nblock\nblocking unit\n\n\nVariety (V)\nMain plot with 3 levels\n\n\nNitrogen (N)\nSplit-plot with 4 levels\n\n\nyield (Y)\nyield (lbs per acre)\n\n\n\nThe objective of this analysis is to study the impact of different varieties and nitrogen application rates on oat yields.\nTo fully examine the yield of oats due to varieties and nutrient levels in a split plots. We will need to statistically analyse and compare the effects of varieties (main plot), nutrient levels (subplot), their interaction.\n\nlibrary(MASS)\ndata(\"oats\")\nhead(oats,5)\n\n  B           V      N   Y\n1 I     Victory 0.0cwt 111\n2 I     Victory 0.2cwt 130\n3 I     Victory 0.4cwt 157\n4 I     Victory 0.6cwt 174\n5 I Golden.rain 0.0cwt 117\n\n\n\n6.2.2.1 Data integrity checks\nLet’s look at the structure of the data. The “B”, “V”, and “N” needs to be ‘factor’ and “Y” should be numeric.\n\nstr(oats)\n\n'data.frame':   72 obs. of  4 variables:\n $ B: Factor w/ 6 levels \"I\",\"II\",\"III\",..: 1 1 1 1 1 1 1 1 1 1 ...\n $ V: Factor w/ 3 levels \"Golden.rain\",..: 3 3 3 3 1 1 1 1 2 2 ...\n $ N: Factor w/ 4 levels \"0.0cwt\",\"0.2cwt\",..: 1 2 3 4 1 2 3 4 1 2 ...\n $ Y: int  111 130 157 174 117 114 161 141 105 140 ...\n\n\nNext, run the table() command to verify the levels of main-plots and sub-plots.\n\ntable(oats$V, oats$N)\n\n             \n              0.0cwt 0.2cwt 0.4cwt 0.6cwt\n  Golden.rain      6      6      6      6\n  Marvellous       6      6      6      6\n  Victory          6      6      6      6\n\n\n\nCheck the number of missing values in each column.\n\n\napply(oats, 2, function(x) sum(is.na(x)))\n\nB V N Y \n0 0 0 0 \n\n\nLast, check the dependent variable by plotting a histogram of yield data.\n\n\n\n\n\n\n\n\n\nFigure 6.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(oats$Y, main = \"\", xlab = \"yield\")\n\n\n\n6.2.2.2 Model Building the Model\nWe are evaluating the effect of V, N and their interaction on yield. The 1|B/V implies that random intercepts vary with block and V within each block.\n\n\nRecall the model:\n\\[y_{ijk} = \\mu + \\rho_j +  \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\epsilon_{ij} + \\delta_{ijk}\\] Where:\n\\(\\mu\\) = overall experimental mean, \\(\\rho\\) = block effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\nlme4nlme\n\n\n\nmodel2_lmer &lt;- lmer(Y ~  V + N + V:N + (1|B/V), \n                   data = oats, \n                   na.action = na.exclude)\ntidy(model2_lmer)\n\n# A tibble: 15 × 8\n   effect   group    term            estimate std.error statistic    df  p.value\n   &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;              &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed    &lt;NA&gt;     (Intercept)       80.0        9.11    8.78    16.1  1.55e-7\n 2 fixed    &lt;NA&gt;     VMarvellous        6.67       9.72    0.686   30.2  4.98e-1\n 3 fixed    &lt;NA&gt;     VVictory          -8.50       9.72   -0.875   30.2  3.89e-1\n 4 fixed    &lt;NA&gt;     N0.2cwt           18.5        7.68    2.41    45.0  2.02e-2\n 5 fixed    &lt;NA&gt;     N0.4cwt           34.7        7.68    4.51    45.0  4.58e-5\n 6 fixed    &lt;NA&gt;     N0.6cwt           44.8        7.68    5.84    45.0  5.48e-7\n 7 fixed    &lt;NA&gt;     VMarvellous:N0…    3.33      10.9     0.307   45.0  7.60e-1\n 8 fixed    &lt;NA&gt;     VVictory:N0.2c…   -0.333     10.9    -0.0307  45.0  9.76e-1\n 9 fixed    &lt;NA&gt;     VMarvellous:N0…   -4.17      10.9    -0.383   45.0  7.03e-1\n10 fixed    &lt;NA&gt;     VVictory:N0.4c…    4.67      10.9     0.430   45.0  6.70e-1\n11 fixed    &lt;NA&gt;     VMarvellous:N0…   -4.67      10.9    -0.430   45.0  6.70e-1\n12 fixed    &lt;NA&gt;     VVictory:N0.6c…    2.17      10.9     0.199   45.0  8.43e-1\n13 ran_pars V:B      sd__(Intercept)   10.3       NA      NA       NA   NA      \n14 ran_pars B        sd__(Intercept)   14.6       NA      NA       NA   NA      \n15 ran_pars Residual sd__Observation   13.3       NA      NA       NA   NA      \n\n\n\n\n\nmodel2_lme &lt;- lme(Y ~  V + N + V:N ,\n                  random = ~1|B/V,\n                  data = oats, \n                  na.action = na.exclude)\ntidy(model2_lme)\n\nWarning in tidy.lme(model2_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 12 × 7\n   effect term                estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;                  &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)           80          9.11    45    8.78   2.56e-11\n 2 fixed  VMarvellous            6.67       9.72    10    0.686  5.08e- 1\n 3 fixed  VVictory              -8.50       9.72    10   -0.875  4.02e- 1\n 4 fixed  N0.2cwt               18.5        7.68    45    2.41   2.02e- 2\n 5 fixed  N0.4cwt               34.7        7.68    45    4.51   4.58e- 5\n 6 fixed  N0.6cwt               44.8        7.68    45    5.84   5.48e- 7\n 7 fixed  VMarvellous:N0.2cwt    3.33      10.9     45    0.307  7.60e- 1\n 8 fixed  VVictory:N0.2cwt      -0.333     10.9     45   -0.0307 9.76e- 1\n 9 fixed  VMarvellous:N0.4cwt   -4.17      10.9     45   -0.383  7.03e- 1\n10 fixed  VVictory:N0.4cwt       4.67      10.9     45    0.430  6.70e- 1\n11 fixed  VMarvellous:N0.6cwt   -4.67      10.9     45   -0.430  6.70e- 1\n12 fixed  VVictory:N0.6cwt       2.17      10.9     45    0.199  8.43e- 1\n\n\n\n\n\n\n\n6.2.2.3 Check Model Assumptions\nAs shown in example 1, We need to verify the normality of residuals and homogeneous variance. Here we are using the check_model() function from the performance package.\n\nlme4nlme\n\n\n\ncheck_model(model2_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model2_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n\n\n6.2.2.4 Inference\nWe can evaluate the model for the analysis of variance, for V, N and their interaction effect.\n\nlme4nlme\n\n\n\ncar::Anova(model2_lmer, type = \"III\", test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: Y\n              Chisq Df Pr(&gt;Chisq)    \n(Intercept) 77.1664  1  &lt; 2.2e-16 ***\nV            2.4491  2     0.2939    \nN           39.0683  3  1.679e-08 ***\nV:N          1.8169  6     0.9357    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model2_lme, type = \"marginal\")\n\n            numDF denDF  F-value p-value\n(Intercept)     1    45 77.16729  &lt;.0001\nV               2    10  1.22454  0.3344\nN               3    45 13.02273  &lt;.0001\nV:N             6    45  0.30282  0.9322\n\n\n\n\n\nNext, we can estimate marginal means for V, N, or their interaction (V*N) effect.\n\nlme4nlme\n\n\n\nemm1 &lt;- emmeans(model2_lmer, ~ V *N) \nemm1\n\n V           N      emmean   SE   df lower.CL upper.CL\n Golden.rain 0.0cwt   80.0 9.11 16.1     60.7     99.3\n Marvellous  0.0cwt   86.7 9.11 16.1     67.4    106.0\n Victory     0.0cwt   71.5 9.11 16.1     52.2     90.8\n Golden.rain 0.2cwt   98.5 9.11 16.1     79.2    117.8\n Marvellous  0.2cwt  108.5 9.11 16.1     89.2    127.8\n Victory     0.2cwt   89.7 9.11 16.1     70.4    109.0\n Golden.rain 0.4cwt  114.7 9.11 16.1     95.4    134.0\n Marvellous  0.4cwt  117.2 9.11 16.1     97.9    136.5\n Victory     0.4cwt  110.8 9.11 16.1     91.5    130.1\n Golden.rain 0.6cwt  124.8 9.11 16.1    105.5    144.1\n Marvellous  0.6cwt  126.8 9.11 16.1    107.5    146.1\n Victory     0.6cwt  118.5 9.11 16.1     99.2    137.8\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemm1 &lt;- emmeans(model2_lme, ~ V *N) \nemm1\n\n V           N      emmean   SE df lower.CL upper.CL\n Golden.rain 0.0cwt   80.0 9.11  5     56.6    103.4\n Marvellous  0.0cwt   86.7 9.11  5     63.3    110.1\n Victory     0.0cwt   71.5 9.11  5     48.1     94.9\n Golden.rain 0.2cwt   98.5 9.11  5     75.1    121.9\n Marvellous  0.2cwt  108.5 9.11  5     85.1    131.9\n Victory     0.2cwt   89.7 9.11  5     66.3    113.1\n Golden.rain 0.4cwt  114.7 9.11  5     91.3    138.1\n Marvellous  0.4cwt  117.2 9.11  5     93.8    140.6\n Victory     0.4cwt  110.8 9.11  5     87.4    134.2\n Golden.rain 0.6cwt  124.8 9.11  5    101.4    148.2\n Marvellous  0.6cwt  126.8 9.11  5    103.4    150.2\n Victory     0.6cwt  118.5 9.11  5     95.1    141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nIn the next chapter, we will continue with extension of split plot design called split-split plot design.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Split Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/split-split-plot.html",
    "href": "chapters/split-split-plot.html",
    "title": "7  Split-Split Plot Design",
    "section": "",
    "text": "7.1 Details for split-split plot designs\nThe split-split-plot design is an extension of the split-plot design to accommodate a third factor: one factor in main-plot, other in subplot and the third factor in sub-subplot\nThe statistical model structure this design:\n\\[y_{ijk} = \\mu + \\rho_j +  \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\tau_n + (\\alpha_i\\tau_n) + (\\tau_n\\beta_k) + (\\alpha_i\\beta_k\\tau_n) + \\epsilon_{ijk} + \\delta_{ijkn}\\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\tau\\) = main effect of sub-subplot, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta  \\sim N(0, \\sigma_\\delta)\\]\nThe assumptions of the model includes normal distribution of both the error and the rep effects with a mean of zero and standard deviations of \\(\\sigma_\\epsilon\\) and \\(\\sigma_\\delta\\), respectively.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Split-Split Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/split-split-plot.html#example-analysis",
    "href": "chapters/split-split-plot.html#example-analysis",
    "title": "7  Split-Split Plot Design",
    "section": "7.2 Example Analysis",
    "text": "7.2 Example Analysis\n\nlme4nlme\n\n\n\nlibrary(dplyr)\nlibrary(lme4); library(lmerTest); library(broom.mixed)\nlibrary(emmeans); library(performance)\n\n\n\n\nlibrary(dplyr)\nlibrary(nlme); library(emmeans)\nlibrary(broom.mixed); library(performance)\n\n\n\n\nIn this example, we have a rice yield data from the agricolae package. The experiment consists of 3 different rice varieties grown under 3 management practices and 5 Nitrogen levels in the split-split plot design.\n\nrice &lt;- read.csv(here::here(\"data\", \"rice_ssp.csv\"))\n\n\nTable of variables in the rice data set\n\n\n\n\n\n\nblock\nblocking unit\n\n\nnitrogen\ndifferent nitrogen fertilizer rates as main plot with 5 levels\n\n\nmanagement\nmanagement practices as subplot with 3 levels\n\n\nvariety\ncrop variety being a sub-subplot with 3 levels\n\n\nyield\nyield (bushels per acre)\n\n\n\n\n7.2.1 Data integrity checks\nBefore analyzing the data let’s do some preliminary data quality checks. We will start with evaluation of the structure of the data where class of block, nitrogen, management and variety should be a character/factor and yield should be numeric.\n\nstr(rice)\n\n'data.frame':   135 obs. of  6 variables:\n $ X         : int  1 2 3 4 5 6 7 8 9 10 ...\n $ block     : int  1 1 1 1 1 1 1 1 1 1 ...\n $ nitrogen  : int  0 0 0 50 50 50 80 80 80 110 ...\n $ management: chr  \"m1\" \"m2\" \"m3\" \"m1\" ...\n $ variety   : int  1 1 1 1 1 1 1 1 1 1 ...\n $ yield     : num  3.32 3.77 4.66 3.19 3.62 ...\n\n\nHere we need to convert block, nitrogen, variety, and management to characters.\n\nrice$block &lt;- as.character(rice$block)\nrice$nitrogen &lt;- as.character(rice$nitrogen)\nrice$management &lt;- as.character(rice$management)\nrice$variety &lt;- as.character(rice$variety)\n\nNext, run a cross tabulations to check balance of observations across independent variables:\n\ntable(rice$variety, rice$nitrogen, rice$management)\n\n, ,  = m1\n\n   \n    0 110 140 50 80\n  1 3   3   3  3  3\n  2 3   3   3  3  3\n  3 3   3   3  3  3\n\n, ,  = m2\n\n   \n    0 110 140 50 80\n  1 3   3   3  3  3\n  2 3   3   3  3  3\n  3 3   3   3  3  3\n\n, ,  = m3\n\n   \n    0 110 140 50 80\n  1 3   3   3  3  3\n  2 3   3   3  3  3\n  3 3   3   3  3  3\n\n\nIt looks perfectly balanced, with exactly 3 observation per treatment group.\nLast, check the distribution of the dependent variable by plotting a histogram of yield values using hist() in R.\n\nhist(rice$yield)\n\n\n\n\n\n\n\n\n\n\nFigure 7.1: Histogram of the dependent variable.\n\n\n\n\n\n\n7.2.2 Model Building\nThe variance analysis of a split-split plot design is divided into three parts: the main-plot, subplot and sub-subplot analysis. We can use the nesting notation in the random part because nitrogen and management are nested in blocks. We can do blocks as fixed or random.\n\nlme4nlme\n\n\n\nmodel_lmer &lt;- lmer(yield ~ nitrogen * management * variety +\n                     (1 | block / nitrogen / management),\n                   data = rice,\n                   na.action = na.exclude)\n\nboundary (singular) fit: see help('isSingular')\n\ntidy(model_lmer)\n\n# A tibble: 49 × 8\n   effect group term                 estimate std.error statistic    df  p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;                   &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)             3.90      0.386    10.1    89.7 1.79e-16\n 2 fixed  &lt;NA&gt;  nitrogen110             0.753     0.545     1.38   89.7 1.71e- 1\n 3 fixed  &lt;NA&gt;  nitrogen140             0.165     0.545     0.302  89.7 7.63e- 1\n 4 fixed  &lt;NA&gt;  nitrogen50              0.335     0.545     0.614  89.7 5.41e- 1\n 5 fixed  &lt;NA&gt;  nitrogen80              1.33      0.545     2.44   89.7 1.68e- 2\n 6 fixed  &lt;NA&gt;  managementm2            0.420     0.540     0.779  80.0 4.38e- 1\n 7 fixed  &lt;NA&gt;  managementm3            1.43      0.540     2.65   80.0 9.82e- 3\n 8 fixed  &lt;NA&gt;  variety2                1.45      0.540     2.68   80.0 8.83e- 3\n 9 fixed  &lt;NA&gt;  variety3                1.48      0.540     2.74   80.0 7.49e- 3\n10 fixed  &lt;NA&gt;  nitrogen110:managem…    0.377     0.763     0.493  80.0 6.23e- 1\n# ℹ 39 more rows\n\n\n\n\n\nmodel_lme &lt;- lme(yield ~ nitrogen*management*variety,\n                  random = ~ 1|block/nitrogen/management,\n                  data = rice, \n                  na.action = na.exclude)\ntidy(model_lme)\n\nWarning in tidy.lme(model_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 45 × 7\n   effect term                     estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;                       &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)                 3.90      0.386    60    10.1   1.43e-14\n 2 fixed  nitrogen110                 0.753     0.545     8     1.38  2.05e- 1\n 3 fixed  nitrogen140                 0.165     0.545     8     0.302 7.70e- 1\n 4 fixed  nitrogen50                  0.335     0.545     8     0.614 5.56e- 1\n 5 fixed  nitrogen80                  1.33      0.545     8     2.44  4.08e- 2\n 6 fixed  managementm2                0.420     0.540    20     0.779 4.45e- 1\n 7 fixed  managementm3                1.43      0.540    20     2.65  1.55e- 2\n 8 fixed  variety2                    1.45      0.540    60     2.68  9.38e- 3\n 9 fixed  variety3                    1.48      0.540    60     2.74  7.99e- 3\n10 fixed  nitrogen110:managementm2    0.377     0.763    20     0.493 6.27e- 1\n# ℹ 35 more rows\n\n\n\n\n\n\n\nboundary (singular) fit: We get a message that the fit is singular. What does this mean? Some components of the variance-covariance matrix of the random effects are either exactly zero or exactly one. OK what about in English? Basically it means that the algorithm that fits the model parameters doesn’t have enough data to get a good estimate. This often happens when we are trying to fit a model that is too complex for the amount of data we have, or when the random effects are very small and can’t be distinguished from zero. We still get some output but this message should make us take a close look at the random effects and their variances.\n\n\n7.2.3 Check Model Assumptions\nModel Diagnostics: we are looking for a constant variance and normality of residuals. Checking normality requiring first extracting the model residuals and then generating a qq-plot and qq-line. we can do all at one using one function check_model().\n\nlme4nlme\n\n\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nConstant variance and normality of residuals looks good. Here, we didn’t observe any anomalies in the model assumptions.\n\n\n7.2.4 Inference\nLet’s look at the analysis of variance for fixed effects and their interaction effect on yield.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = 'III', test.statistic=\"F\")\n\nAnalysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)\n\nResponse: yield\n                                   F Df Df.res  Pr(&gt;F)    \n(Intercept)                 102.1211  1 89.706 &lt; 2e-16 ***\nnitrogen                      1.9160  4 86.474 0.11496    \nmanagement                    3.6962  2 77.143 0.02932 *  \nvariety                       4.9129  2 60.000 0.01057 *  \nnitrogen:management           0.2118  8 77.143 0.98797    \nnitrogen:variety              2.6681  8 60.000 0.01413 *  \nmanagement:variety            2.2193  4 60.000 0.07754 .  \nnitrogen:management:variety   0.5289 16 60.000 0.92105    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n                            numDF denDF   F-value p-value\n(Intercept)                     1    60 102.12108  &lt;.0001\nnitrogen                        4     8   1.91603  0.2012\nmanagement                      2    20   3.69617  0.0431\nvariety                         2    60   4.91295  0.0106\nnitrogen:management             8    20   0.21177  0.9850\nnitrogen:variety                8    60   2.66810  0.0141\nmanagement:variety              4    60   2.21929  0.0775\nnitrogen:management:variety    16    60   0.52893  0.9210\n\n\n\n\n\nHere, we observed a significant impact of management, variety, and nitrogen x variety interaction effect on rice yield. We can estimate the marginal means for each treatment factor (variety, nitrogen, management) which will averaged across other factors and their interaction.\n\nlme4nlme\n\n\n\nemmeans(model_lmer, ~ management)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n management emmean    SE   df lower.CL upper.CL\n m1           5.90 0.102 11.2     5.68     6.12\n m2           6.49 0.102 11.2     6.26     6.71\n m3           7.28 0.102 11.2     7.05     7.50\n\nResults are averaged over the levels of: nitrogen, variety \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model_lmer, ~ nitrogen*variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n nitrogen variety emmean    SE df lower.CL upper.CL\n 0        1         4.51 0.227 49     4.06     4.97\n 110      1         5.44 0.227 49     4.99     5.90\n 140      1         5.08 0.227 49     4.62     5.53\n 50       1         4.76 0.227 49     4.31     5.22\n 80       1         5.83 0.227 49     5.38     6.29\n 0        2         5.16 0.227 49     4.71     5.62\n 110      2         6.92 0.227 49     6.47     7.38\n 140      2         7.29 0.227 49     6.83     7.74\n 50       2         6.02 0.227 49     5.56     6.47\n 80       2         6.59 0.227 49     6.13     7.04\n 0        3         6.48 0.227 49     6.02     6.93\n 110      3         8.44 0.227 49     7.99     8.90\n 140      3         9.34 0.227 49     8.88     9.79\n 50       3         7.88 0.227 49     7.42     8.34\n 80       3         8.56 0.227 49     8.11     9.02\n\nResults are averaged over the levels of: management \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model_lme, ~ management)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n management emmean    SE df lower.CL upper.CL\n m1           5.90 0.102  2     5.46     6.34\n m2           6.49 0.102  2     6.05     6.92\n m3           7.28 0.102  2     6.84     7.71\n\nResults are averaged over the levels of: nitrogen, variety \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model_lme, ~ nitrogen*variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n nitrogen variety emmean    SE df lower.CL upper.CL\n 0        1         4.51 0.227  2     3.54     5.49\n 110      1         5.44 0.227  2     4.47     6.42\n 140      1         5.08 0.227  2     4.10     6.05\n 50       1         4.76 0.227  2     3.79     5.74\n 80       1         5.83 0.227  2     4.86     6.81\n 0        2         5.16 0.227  2     4.19     6.14\n 110      2         6.92 0.227  2     5.95     7.90\n 140      2         7.29 0.227  2     6.31     8.27\n 50       2         6.02 0.227  2     5.04     6.99\n 80       2         6.59 0.227  2     5.61     7.57\n 0        3         6.48 0.227  2     5.50     7.46\n 110      3         8.44 0.227  2     7.47     9.42\n 140      3         9.34 0.227  2     8.36    10.31\n 50       3         7.88 0.227  2     6.90     8.86\n 80       3         8.56 0.227  2     7.59     9.54\n\nResults are averaged over the levels of: management \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nNotice we get a message that the estimated means for ‘nitrogen x variety’ are averaged over the levels of ‘management’. So we need to be careful about how we interpret these estimates.\n\n\n\n\n\n\nNested random effects\n\n\n\nYou may have noticed the order of random effects in model statement:\nmodel_lme &lt;- lme(yield ~ nitrogen*management*variety,\n                  random = ~ 1|block/nitrogen/management,\n                  data = rice, \n                  na.action = na.exclude)\nThe random effects follow the order of ~1|block/main-plot/split-plot. While fitting the model for split-split plot design please make sure to have a clear understanding of the main plot, split-plot and split-split plot factors to avoid having an erroneous model.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Split-Split Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/strip-plot.html",
    "href": "chapters/strip-plot.html",
    "title": "8  Strip Plot Design",
    "section": "",
    "text": "8.1 Background\nIn strip plot design each block or replication is divided into number of vertical and horizontal strips depending on the levels of the respective factors.\nDivide the experimental area into ‘A’ horizontal strips and ‘B’ vertical strips. Each level of factor A is assigned to all the plots in one row, and each level of factor B is assigned to all the plots in one column.\nThe statistical model:\n\\[y_{ijk} = \\mu + \\alpha_j + \\beta_k + \\alpha_j\\beta_k + b_i + r_{ij} + c_{ik} + \\epsilon_{ijk}\\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) and \\(\\beta\\) are the main effects applied in a horizontal and vertical direction, and \\(\\alpha\\)\\(\\beta\\) represents the interaction between main factors. The random effects in above equation are \\(b_i\\), the random rep effect, \\(r_{ij}\\), the row within rep random effect, \\(c_{ik}\\), the column within rep random effect.\n\\[ b_i \\sim N(0, \\sigma_1^2)\\]\n\\[ r_{ij}  \\sim N(0, \\sigma_2^2)\\]\n\\[ c_{ik} \\sim N(0, \\sigma_3^2)\\]\n\\[ \\epsilon_{ijk} \\sim N(0, \\sigma^2)\\]",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Strip Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/strip-plot.html#background",
    "href": "chapters/strip-plot.html#background",
    "title": "8  Strip Plot Design",
    "section": "",
    "text": "Vertical strip plot for the first factor – vertical factor.\nHorizontal strip plot for the second factor – horizontal factor.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Strip Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/strip-plot.html#example-analysis",
    "href": "chapters/strip-plot.html#example-analysis",
    "title": "8  Strip Plot Design",
    "section": "8.2 Example Analysis",
    "text": "8.2 Example Analysis\nWe will start the analysis first by loading the required libraries for this analysis for lme and lmer models, respectively.\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance); library(desplot)\nlibrary(broom.mixed)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(desplot); library(broom.mixed)\n\n\n\n\nFor this example, we will use Rice strip-plot experiment data from theagridat package. This data contains a strip-plot experiment with three reps, variety as the horizontal strip and nitrogen fertilizer as the vertical strip.\n\ndata1 &lt;- agridat::gomez.stripplot\n\n\nTable of variables in the data set\n\n\nrep\nreplication unit\n\n\nnitro\nnitrogen fertilizer in kg/ha\n\n\ngen\nrice variety\n\n\nrow\nrow (represents gen)\n\n\ncol\ncolumn (represents nitro)\n\n\nyield\ngrain yield in kg/ha\n\n\n\nFor the sake of analysis, ‘row’ and ‘col’ variables are used to represent ‘nitrogen’ and ‘Gen’ factors. The plot below shows the application of treatments in horizontal and vertical direction in a strip plot design.\n\n\n\n\n\n\n\n\n\n\n8.2.1 Data integrity checks\nFirst thing we need to verify is the data types of the variables in data1. The ‘rep’, ‘nitro’, and ‘gen’ needs to be a factor/character variables and ‘yield’ should be numeric.\n\nstr(data1)\n\n'data.frame':   54 obs. of  6 variables:\n $ yield: int  2373 4076 7254 4007 5630 7053 2620 4676 7666 2726 ...\n $ rep  : Factor w/ 3 levels \"R1\",\"R2\",\"R3\": 1 1 1 1 1 1 1 1 1 1 ...\n $ nitro: int  0 60 120 0 60 120 0 60 120 0 ...\n $ gen  : Factor w/ 6 levels \"G1\",\"G2\",\"G3\",..: 1 1 1 2 2 2 3 3 3 4 ...\n $ col  : int  1 3 2 1 3 2 1 3 2 1 ...\n $ row  : int  1 1 1 3 3 3 4 4 4 2 ...\n\n\nLet’s convert ‘nitro’ from numeric to factor.\n\ndata1$nitro &lt;- as.factor(data1$nitro)\n\nLet’s have a look at the balance of treatment factors by running a a cross tabulation of independent variables.\n\ntable(data1$gen, data1$nitro)\n\n    \n     0 60 120\n  G1 3  3   3\n  G2 3  3   3\n  G3 3  3   3\n  G4 3  3   3\n  G5 3  3   3\n  G6 3  3   3\n\n\nIt looks balanced with 3 number of observations for each variety and nitrogen level.\nNext step is to identify if there are any missing observations in the data set.\n\napply(data1, 2, function(x) sum(is.na(x)))\n\nyield   rep nitro   gen   col   row \n    0     0     0     0     0     0 \n\n\nWe don’t have any missing values in this data set.\nLastly, let’s check the distribution of dependent variable by plotting.\n\nhist(data1$yield, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 8.1: Histogram of the dependent variable.\n\n\n\n\nNo extreme values or skewness is present in the yield values.\n\n\n8.2.2 Model Building\nThe impact of nitro, gen, and their interaction was evaluated on rice yield. Three random effects are used to account for rep, row, and column effects, with last two random effects nested within rep, but crossed with each other. The rep, gen nested in rep, and nitro nested in rep were random effects in the model. All random effects are assumed to independent of each other and independent of within group errors.\n\nlme4nlme\n\n\n\nmodel_lmer &lt;- lmer(yield ~  nitro*gen +  (1|rep) + \n                   (1|rep:gen) + (1|rep:nitro), \n                   data = data1)\ntidy(model_lmer)\n\n# A tibble: 22 × 8\n   effect group term           estimate std.error statistic    df     p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;             &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;       &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)       3572.      572.     6.24   17.8 0.00000732 \n 2 fixed  &lt;NA&gt;  nitro60           1560.      558.     2.80   22.4 0.0104     \n 3 fixed  &lt;NA&gt;  nitro120          3976.      558.     7.13   22.4 0.000000341\n 4 fixed  &lt;NA&gt;  genG2             1363.      717.     1.90   20.9 0.0714     \n 5 fixed  &lt;NA&gt;  genG3              678.      717.     0.945  20.9 0.355      \n 6 fixed  &lt;NA&gt;  genG4              487.      717.     0.679  20.9 0.504      \n 7 fixed  &lt;NA&gt;  genG5              530.      717.     0.739  20.9 0.468      \n 8 fixed  &lt;NA&gt;  genG6             -364.      717.    -0.508  20.9 0.617      \n 9 fixed  &lt;NA&gt;  nitro60:genG2      219.      741.     0.296  20.0 0.771      \n10 fixed  &lt;NA&gt;  nitro120:genG2   -1699.      741.    -2.29   20.0 0.0328     \n# ℹ 12 more rows\n\n\n\n\n\nmodel_lme &lt;-lme(yield ~  nitro*gen,\n                random = list(one = pdBlocked(list(\n        pdIdent(~ 0 + rep), \n         pdIdent(~ 0 + rep:gen), \n        pdIdent(~ 0 + rep:nitro)))),\n        data = data1 %&gt;% mutate(one = factor(1)))\n\nsummary(model_lme)\n\nLinear mixed-effects model fit by REML\n  Data: data1 %&gt;% mutate(one = factor(1)) \n       AIC      BIC    logLik\n  651.4204 686.2578 -303.7102\n\nRandom effects:\n Composite Structure: Blocked\n\n Block 1: repR1, repR2, repR3\n Formula: ~0 + rep | one\n Structure: Multiple of an Identity\n           repR1    repR2    repR3\nStdDev: 393.4278 393.4278 393.4278\n\n Block 2: repR1:genG1, repR2:genG1, repR3:genG1, repR1:genG2, repR2:genG2, repR3:genG2, repR1:genG3, repR2:genG3, repR3:genG3, repR1:genG4, repR2:genG4, repR3:genG4, repR1:genG5, repR2:genG5, repR3:genG5, repR1:genG6, repR2:genG6, repR3:genG6\n Formula: ~0 + rep:gen | one\n Structure: Multiple of an Identity\n        repR1:genG1 repR2:genG1 repR3:genG1 repR1:genG2 repR2:genG2 repR3:genG2\nStdDev:    600.1711    600.1711    600.1711    600.1711    600.1711    600.1711\n        repR1:genG3 repR2:genG3 repR3:genG3 repR1:genG4 repR2:genG4 repR3:genG4\nStdDev:    600.1711    600.1711    600.1711    600.1711    600.1711    600.1711\n        repR1:genG5 repR2:genG5 repR3:genG5 repR1:genG6 repR2:genG6 repR3:genG6\nStdDev:    600.1711    600.1711    600.1711    600.1711    600.1711    600.1711\n\n Block 3: repR1:nitro0, repR2:nitro0, repR3:nitro0, repR1:nitro60, repR2:nitro60, repR3:nitro60, repR1:nitro120, repR2:nitro120, repR3:nitro120\n Formula: ~0 + rep:nitro | one\n Structure: Multiple of an Identity\n        repR1:nitro0 repR2:nitro0 repR3:nitro0 repR1:nitro60 repR2:nitro60\nStdDev:     235.2591     235.2591     235.2591      235.2591      235.2591\n        repR3:nitro60 repR1:nitro120 repR2:nitro120 repR3:nitro120 Residual\nStdDev:      235.2591       235.2591       235.2591       235.2591 641.5963\n\nFixed effects:  yield ~ nitro * gen \n                   Value Std.Error DF   t-value p-value\n(Intercept)     3571.667  572.1257 36  6.242800  0.0000\nnitro60         1560.333  557.9682 36  2.796456  0.0082\nnitro120        3976.333  557.9682 36  7.126452  0.0000\ngenG2           1362.667  717.3336 36  1.899628  0.0655\ngenG3            678.000  717.3336 36  0.945167  0.3509\ngenG4            487.333  717.3336 36  0.679368  0.5012\ngenG5            530.000  717.3336 36  0.738847  0.4648\ngenG6           -364.333  717.3336 36 -0.507899  0.6146\nnitro60:genG2    219.000  740.8516 36  0.295606  0.7692\nnitro120:genG2 -1699.333  740.8516 36 -2.293757  0.0277\nnitro60:genG3    312.333  740.8516 36  0.421587  0.6758\nnitro120:genG3  -357.667  740.8516 36 -0.482778  0.6322\nnitro60:genG4    -65.667  740.8516 36 -0.088637  0.9299\nnitro120:genG4  -941.000  740.8516 36 -1.270160  0.2122\nnitro60:genG5    -28.667  740.8516 36 -0.038694  0.9693\nnitro120:genG5 -2066.000  740.8516 36 -2.788682  0.0084\nnitro60:genG6  -1053.333  740.8516 36 -1.421787  0.1637\nnitro120:genG6 -4691.667  740.8516 36 -6.332802  0.0000\n Correlation: \n               (Intr) nitr60 ntr120 genG2  genG3  genG4  genG5  genG6  n60:G2\nnitro60        -0.488                                                        \nnitro120       -0.488  0.500                                                 \ngenG2          -0.627  0.343  0.343                                          \ngenG3          -0.627  0.343  0.343  0.500                                   \ngenG4          -0.627  0.343  0.343  0.500  0.500                            \ngenG5          -0.627  0.343  0.343  0.500  0.500  0.500                     \ngenG6          -0.627  0.343  0.343  0.500  0.500  0.500  0.500              \nnitro60:genG2   0.324 -0.664 -0.332 -0.516 -0.258 -0.258 -0.258 -0.258       \nnitro120:genG2  0.324 -0.332 -0.664 -0.516 -0.258 -0.258 -0.258 -0.258  0.500\nnitro60:genG3   0.324 -0.664 -0.332 -0.258 -0.516 -0.258 -0.258 -0.258  0.500\nnitro120:genG3  0.324 -0.332 -0.664 -0.258 -0.516 -0.258 -0.258 -0.258  0.250\nnitro60:genG4   0.324 -0.664 -0.332 -0.258 -0.258 -0.516 -0.258 -0.258  0.500\nnitro120:genG4  0.324 -0.332 -0.664 -0.258 -0.258 -0.516 -0.258 -0.258  0.250\nnitro60:genG5   0.324 -0.664 -0.332 -0.258 -0.258 -0.258 -0.516 -0.258  0.500\nnitro120:genG5  0.324 -0.332 -0.664 -0.258 -0.258 -0.258 -0.516 -0.258  0.250\nnitro60:genG6   0.324 -0.664 -0.332 -0.258 -0.258 -0.258 -0.258 -0.516  0.500\nnitro120:genG6  0.324 -0.332 -0.664 -0.258 -0.258 -0.258 -0.258 -0.516  0.250\n               n120:G2 n60:G3 n120:G3 n60:G4 n120:G4 n60:G5 n120:G5 n60:G6\nnitro60                                                                   \nnitro120                                                                  \ngenG2                                                                     \ngenG3                                                                     \ngenG4                                                                     \ngenG5                                                                     \ngenG6                                                                     \nnitro60:genG2                                                             \nnitro120:genG2                                                            \nnitro60:genG3   0.250                                                     \nnitro120:genG3  0.500   0.500                                             \nnitro60:genG4   0.250   0.500  0.250                                      \nnitro120:genG4  0.500   0.250  0.500   0.500                              \nnitro60:genG5   0.250   0.500  0.250   0.500  0.250                       \nnitro120:genG5  0.500   0.250  0.500   0.250  0.500   0.500               \nnitro60:genG6   0.250   0.500  0.250   0.500  0.250   0.500  0.250        \nnitro120:genG6  0.500   0.250  0.500   0.250  0.500   0.250  0.500   0.500\n\nStandardized Within-Group Residuals:\n        Min          Q1         Med          Q3         Max \n-1.52993309 -0.52842524  0.05394367  0.51465584  1.46902934 \n\nNumber of Observations: 54\nNumber of Groups: 1 \n\n#tidy(model_lme)\n\n\n\n\n\n\n\n\n\n\nCrossed random effects\n\n\n\nThis type of variance-covariance structure in lme() is represented by a pdBlocked object with pdIdent elements.\n\n\n\n\n8.2.3 Check Model Assumptions\nLet’s evaluate the assumptions of linear mixed models by looking at the residuals and normality of error terms. ::: panel-tabset #### lme4\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n8.2.3.1 nlme\nplot(model_lme, resid(., scaled=TRUE) ~ fitted(.), \n     xlab = \"fitted values\", ylab = \"studentized residuals\")\nqqnorm(residuals(model_lme))\nqqline(residuals(model_lme))\n\n\n\n\n\n\n\n\n\n\n:::\nThe residuals fit the assumptions of the model well.\n\n\n\n8.2.4 Inference\nWe can evaluate the model for the analysis of variance, for main and interaction effects.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = \"III\", test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: yield\n              Chisq Df Pr(&gt;Chisq)    \n(Intercept) 38.9728  1  4.298e-10 ***\nnitro       51.5701  2  6.334e-12 ***\ngen          6.8343  5     0.2333    \nnitro:gen   58.0064 10  8.621e-09 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n            numDF denDF  F-value p-value\n(Intercept)     1    36 38.97256  &lt;.0001\nnitro           2    36 25.78512  &lt;.0001\ngen             5    36  1.36687  0.2597\nnitro:gen      10    36  5.80061  &lt;.0001\n\n\n\n\n\nAnalysis of variance showed a significant interaction impact of gen and nitro on rice grain yield.\nNext, We can estimate marginal means for nitro and gen interaction effects using the emmeans package.\n\nlme4nlme\n\n\n\nemm1 &lt;- emmeans(model_lmer, ~ nitro*gen) \nemm1\n\n nitro gen emmean  SE   df lower.CL upper.CL\n 0     G1    3572 572 17.8     2368     4775\n 60    G1    5132 572 17.8     3929     6335\n 120   G1    7548 572 17.8     6345     8751\n 0     G2    4934 572 17.8     3731     6138\n 60    G2    6714 572 17.8     5510     7917\n 120   G2    7211 572 17.8     6008     8415\n 0     G3    4250 572 17.8     3046     5453\n 60    G3    6122 572 17.8     4919     7326\n 120   G3    7868 572 17.8     6665     9072\n 0     G4    4059 572 17.8     2856     5262\n 60    G4    5554 572 17.8     4350     6757\n 120   G4    7094 572 17.8     5891     8298\n 0     G5    4102 572 17.8     2898     5305\n 60    G5    5633 572 17.8     4430     6837\n 120   G5    6012 572 17.8     4809     7215\n 0     G6    3207 572 17.8     2004     4411\n 60    G6    3714 572 17.8     2511     4918\n 120   G6    2492 572 17.8     1289     3695\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemm1 &lt;- emmeans(model_lme, ~ nitro*gen)\n\nWarning in model.matrix.default(trms, m, contrasts.arg = contrasts): variable\n'rep' is absent, its contrast will be ignored\nWarning in model.matrix.default(trms, m, contrasts.arg = contrasts): variable\n'rep' is absent, its contrast will be ignored\n\nemm1\n\nWarning in qt((1 - level)/adiv, df): NaNs produced\n\n\n nitro gen emmean  SE df lower.CL upper.CL\n 0     G1    3572 572  0      NaN      NaN\n 60    G1    5132 572  0      NaN      NaN\n 120   G1    7548 572  0      NaN      NaN\n 0     G2    4934 572  0      NaN      NaN\n 60    G2    6714 572  0      NaN      NaN\n 120   G2    7211 572  0      NaN      NaN\n 0     G3    4250 572  0      NaN      NaN\n 60    G3    6122 572  0      NaN      NaN\n 120   G3    7868 572  0      NaN      NaN\n 0     G4    4059 572  0      NaN      NaN\n 60    G4    5554 572  0      NaN      NaN\n 120   G4    7094 572  0      NaN      NaN\n 0     G5    4102 572  0      NaN      NaN\n 60    G5    5633 572  0      NaN      NaN\n 120   G5    6012 572  0      NaN      NaN\n 0     G6    3207 572  0      NaN      NaN\n 60    G6    3714 572  0      NaN      NaN\n 120   G6    2492 572  0      NaN      NaN\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nNote that, confidence intervals were not estimated through emmeans from lme model.\n\n\n\n\n\n\nlme vs lmer\n\n\n\nFor strip plot experiment design, fitting nested and crossed random effects is more complicated through nlme. Therefore, it’s more convenient to use lmer in this case as both models yielded same results in the example shown above.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Strip Plot Design</span>"
    ]
  },
  {
    "objectID": "chapters/incomplete-block-design.html",
    "href": "chapters/incomplete-block-design.html",
    "title": "9  Incomplete Block Design",
    "section": "",
    "text": "9.1 Background\nThe block design described in Chapter 4 was complete, meaning that each block contained each treatment level at least once. In practice, it may not be possible or advisable to include all treatments in each block, either due to limitations in treatment availability (e.g. limited seed stocks) or the block size becomes too large to serve its original goals of controlling for spatial variation.\nIn such cases, incomplete block designs (IBD) can be used. Incomplete block designs break the experiment into many smaller incomplete blocks that are nested within standard RCBD-style blocks and assigns a subset of the treatment levels to each incomplete block. There are several different approaches Patterson and Williams (1976) for how to assign treatment levels to incomplete blocks and these designs impact the final statistical analysis (and if all treatments included in the experimental design are estimable). An excellent description of incomplete block design is provided in ANOVA and Mixed Models by Lukas Meier.\nIncomplete block designs are grouped into two groups: (1) balanced lattice designs; and (2) partially balanced (also commonly called alpha-lattice) designs. Balanced IBD designs have been previously called “lattice designs” [need refs], but we are not using that term to avoid confusion with alpha-lattice designs, a term that is commonly used.\nIn alpha-lattice design, the blocks are grouped into complete replicates. These designs are also termed as “resolvable incomplete block designs” or “partially balanced incomplete block designs” (paterson?). This design has been more commonly used instead of balanced IBD because of it’s practicability, flexibility, and versatility.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Incomplete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/incomplete-block-design.html#background",
    "href": "chapters/incomplete-block-design.html#background",
    "title": "9  Incomplete Block Design",
    "section": "",
    "text": "9.1.1 Statistical Model\nThe statistical model for a balanced incomplete block design is:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = treatment effects (fixed)\n\\(\\beta\\) = block effects (random)\n\\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\]\nThere are few key points that we need to keep in mind while designing incomplete block experiments:\n\nA drawback of this design is that block effect and treatment effects are confounded.\nTo remove the block effects, it is better compare treatments within a block.\nNo treatment should appear twice in any block as it contributes nothing to within block comparisons.\n\nThe balanced incomplete block designs are guided by strict principles and guidelines including: the number of treatments must be a perfect square (e.g. 25, 36, and so on), and number of replicates must be equal to number of blocks +1.\n\n\n\n\n\n\nNote on Sums of Squares\n\n\n\nBecause the blocks are incomplete, the Type I and Type III sums of squares will be different even when there is no missing data from a trail. That is because the missing treatments in each block represent missing observations (even though they are not missing ‘at random’).",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Incomplete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/incomplete-block-design.html#examples-analyses",
    "href": "chapters/incomplete-block-design.html#examples-analyses",
    "title": "9  Incomplete Block Design",
    "section": "9.2 Examples Analyses",
    "text": "9.2 Examples Analyses\n\n9.2.1 Balanced Incomplete Block Design\nWe will demonstrate an example data set designed in a balanced incomplete block design. First, load the libraries required for analysis and estimation.\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nThe data used for this example analysis was extracted from the agridat package. This example is comprised of soybean balanced incomplete block experiment.\n\ndat &lt;- agridat::weiss.incblock\n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in bu/ac\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.1.1 Data integrity checks\nWe will start inspecting the data set firstly by looking at the class of each variable:\n\nstr(dat)\n\n'data.frame':   186 obs. of  5 variables:\n $ block: Factor w/ 31 levels \"B01\",\"B02\",\"B03\",..: 1 2 3 4 5 6 7 8 9 10 ...\n $ gen  : Factor w/ 31 levels \"G01\",\"G02\",\"G03\",..: 24 15 20 18 20 5 22 1 9 14 ...\n $ yield: num  29.8 24.2 30.5 20 35.2 25 23.6 23.6 29.3 25.5 ...\n $ row  : int  42 36 30 24 18 12 6 42 36 30 ...\n $ col  : int  1 1 1 1 1 1 1 2 2 2 ...\n\n\nThe variables we need for the model are block, genand yield. The block and gen are classified as factor variables and yield is numeric. Therefore, we do not need to change class of any of the required variables.\nNext, let’s check the independent variables. We can look at this by running a cross tabulations among block and gen factors.\n\nagg_tbl &lt;- dat %&gt;% group_by(gen) %&gt;% \n  summarise(total_count=n(),\n            .groups = 'drop')\nagg_tbl\n\n# A tibble: 31 × 2\n   gen   total_count\n   &lt;fct&gt;       &lt;int&gt;\n 1 G01             6\n 2 G02             6\n 3 G03             6\n 4 G04             6\n 5 G05             6\n 6 G06             6\n 7 G07             6\n 8 G08             6\n 9 G09             6\n10 G10             6\n# ℹ 21 more rows\n\n\n\nagg_df &lt;- aggregate(dat$gen, by=list(dat$block), FUN=length)\nagg_df\n\n   Group.1 x\n1      B01 6\n2      B02 6\n3      B03 6\n4      B04 6\n5      B05 6\n6      B06 6\n7      B07 6\n8      B08 6\n9      B09 6\n10     B10 6\n11     B11 6\n12     B12 6\n13     B13 6\n14     B14 6\n15     B15 6\n16     B16 6\n17     B17 6\n18     B18 6\n19     B19 6\n20     B20 6\n21     B21 6\n22     B22 6\n23     B23 6\n24     B24 6\n25     B25 6\n26     B26 6\n27     B27 6\n28     B28 6\n29     B29 6\n30     B30 6\n31     B31 6\n\n\nThere are 31 varieties (levels of gen) and it is perfectly balanced, with exactly one observation per treatment per block.\nWe can calculate the sum of missing values in variables in this data set to evaluate the extent of missing values in different variables:\n\napply(dat, 2, function(x) sum(is.na(x)))\n\nblock   gen yield   row   col \n    0     0     0     0     0 \n\n\nNo missing data!\nLast, let’s plot a histogram of the dependent variable. This is a quick check before analysis to see if there is any strong deviation in values.\n\n\n\n\n\n\n\n\n\nFigure 9.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\nResponse variable values fall within expected range, with few extreme values on right tail. This data set is ready for analysis!\n\n\n9.2.1.2 Model Building\nWe will be evaluating the response of yield as affected by gen (fixed effect) and block (random effect).\n\n\nPlease note that incomplete block effect can be analyzed as a fixed (intra-block analysis) or a random (inter-block analysis) effect. When we consider block as a random effect, the mean values of a block also contain information about the treatment effects.\n\nlme4nlme\n\n\n\nmodel_icbd &lt;- lmer(yield ~ gen + (1|block),\n                   data = dat, \n                   na.action = na.exclude)\ntidy(model_icbd)\n\n# A tibble: 33 × 8\n   effect group term        estimate std.error statistic    df  p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)  24.6        0.922   26.7     153. 2.30e-59\n 2 fixed  &lt;NA&gt;  genG02        2.40       1.17     2.06    129. 4.17e- 2\n 3 fixed  &lt;NA&gt;  genG03        8.04       1.17     6.88    129. 2.31e-10\n 4 fixed  &lt;NA&gt;  genG04        2.37       1.17     2.03    129. 4.42e- 2\n 5 fixed  &lt;NA&gt;  genG05        1.60       1.17     1.37    129. 1.73e- 1\n 6 fixed  &lt;NA&gt;  genG06        7.39       1.17     6.32    129. 3.82e- 9\n 7 fixed  &lt;NA&gt;  genG07       -0.419      1.17    -0.359   129. 7.20e- 1\n 8 fixed  &lt;NA&gt;  genG08        3.04       1.17     2.60    129. 1.04e- 2\n 9 fixed  &lt;NA&gt;  genG09        4.84       1.17     4.14    129. 6.22e- 5\n10 fixed  &lt;NA&gt;  genG10       -0.0429     1.17    -0.0367  129. 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\nmodel_icbd1 &lt;- lme(yield ~ gen,\n                  random = ~ 1|block,\n                  data = dat, \n                  na.action = na.exclude)\ntidy(model_icbd1)\n\n# A tibble: 33 × 8\n   effect group term        estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)  24.6        0.922   125   26.7    2.10e-53\n 2 fixed  &lt;NA&gt;  genG02        2.40       1.17    125    2.06   4.18e- 2\n 3 fixed  &lt;NA&gt;  genG03        8.04       1.17    125    6.88   2.54e-10\n 4 fixed  &lt;NA&gt;  genG04        2.37       1.17    125    2.03   4.43e- 2\n 5 fixed  &lt;NA&gt;  genG05        1.60       1.17    125    1.37   1.73e- 1\n 6 fixed  &lt;NA&gt;  genG06        7.39       1.17    125    6.32   4.11e- 9\n 7 fixed  &lt;NA&gt;  genG07       -0.419      1.17    125   -0.359  7.20e- 1\n 8 fixed  &lt;NA&gt;  genG08        3.04       1.17    125    2.60   1.04e- 2\n 9 fixed  &lt;NA&gt;  genG09        4.84       1.17    125    4.14   6.33e- 5\n10 fixed  &lt;NA&gt;  genG10       -0.0429     1.17    125   -0.0367 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\n\n\n9.2.1.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(model_icbd, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_icbd1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n\n\nHere we observed a right skewness in residuals, this can be resolved by using data transformation e.g. log transformation of response variable. Please refer to chapter to read more about data transformation.\n\n\n9.2.1.4 Inference\nWe can extract information about ANOVA using anova().\n\nlme4nlme\n\n\n\nanova(model_icbd, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n    Sum Sq Mean Sq NumDF  DenDF F value    Pr(&gt;F)    \ngen 1901.1  63.369    30 129.06  17.675 &lt; 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_icbd1, type = \"sequential\")\n\n            numDF denDF  F-value p-value\n(Intercept)     1   125 4042.016  &lt;.0001\ngen            30   125   17.675  &lt;.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(model_icbd, ~ gen)\n\n gen emmean    SE  df lower.CL upper.CL\n G01   24.6 0.923 153     22.7     26.4\n G02   27.0 0.923 153     25.2     28.8\n G03   32.6 0.923 153     30.8     34.4\n G04   26.9 0.923 153     25.1     28.8\n G05   26.2 0.923 153     24.4     28.0\n G06   32.0 0.923 153     30.1     33.8\n G07   24.2 0.923 153     22.3     26.0\n G08   27.6 0.923 153     25.8     29.4\n G09   29.4 0.923 153     27.6     31.2\n G10   24.5 0.923 153     22.7     26.4\n G11   27.1 0.923 153     25.2     28.9\n G12   29.3 0.923 153     27.4     31.1\n G13   29.9 0.923 153     28.1     31.8\n G14   24.2 0.923 153     22.4     26.1\n G15   26.1 0.923 153     24.3     27.9\n G16   25.9 0.923 153     24.1     27.8\n G17   19.7 0.923 153     17.9     21.5\n G18   25.7 0.923 153     23.9     27.5\n G19   29.0 0.923 153     27.2     30.9\n G20   33.2 0.923 153     31.3     35.0\n G21   31.1 0.923 153     29.3     32.9\n G22   25.2 0.923 153     23.3     27.0\n G23   29.8 0.923 153     28.0     31.6\n G24   33.6 0.923 153     31.8     35.5\n G25   27.0 0.923 153     25.2     28.8\n G26   27.1 0.923 153     25.3     29.0\n G27   23.8 0.923 153     22.0     25.6\n G28   26.5 0.923 153     24.6     28.3\n G29   24.8 0.923 153     22.9     26.6\n G30   36.2 0.923 153     34.4     38.0\n G31   27.1 0.923 153     25.3     28.9\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model_icbd1, ~ gen)\n\n gen emmean    SE df lower.CL upper.CL\n G01   24.6 0.922 30     22.7     26.5\n G02   27.0 0.922 30     25.1     28.9\n G03   32.6 0.922 30     30.7     34.5\n G04   26.9 0.922 30     25.1     28.8\n G05   26.2 0.922 30     24.3     28.1\n G06   32.0 0.922 30     30.1     33.8\n G07   24.2 0.922 30     22.3     26.0\n G08   27.6 0.922 30     25.7     29.5\n G09   29.4 0.922 30     27.5     31.3\n G10   24.5 0.922 30     22.6     26.4\n G11   27.1 0.922 30     25.2     28.9\n G12   29.3 0.922 30     27.4     31.1\n G13   29.9 0.922 30     28.1     31.8\n G14   24.2 0.922 30     22.4     26.1\n G15   26.1 0.922 30     24.2     28.0\n G16   25.9 0.922 30     24.0     27.8\n G17   19.7 0.922 30     17.8     21.6\n G18   25.7 0.922 30     23.8     27.6\n G19   29.0 0.922 30     27.2     30.9\n G20   33.2 0.922 30     31.3     35.0\n G21   31.1 0.922 30     29.2     33.0\n G22   25.2 0.922 30     23.3     27.1\n G23   29.8 0.922 30     27.9     31.7\n G24   33.6 0.922 30     31.8     35.5\n G25   27.0 0.922 30     25.1     28.9\n G26   27.1 0.922 30     25.3     29.0\n G27   23.8 0.922 30     21.9     25.7\n G28   26.5 0.922 30     24.6     28.4\n G29   24.8 0.922 30     22.9     26.6\n G30   36.2 0.922 30     34.3     38.1\n G31   27.1 0.922 30     25.2     29.0\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n9.2.2 Partially Balanced IBD (Alpha Lattice Design)\nThe statistical model for partially balanced design includes:\n\\[y_{ij(l)} = \\mu + \\alpha_i + \\beta_{i(l)} + \\tau_j + \\epsilon_{ij(l)}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = replicate effect (random)\n\\(\\beta\\) = incomplete block effect (random)\n\\(\\tau\\) = treatment effect (fixed)\n\\(\\epsilon_{ij(l)}\\) = intra-block residual\nThe data used in this example is published in Cyclic and Computer Generated Designs (John and Williams 1995). The trial was laid out in an alpha lattice design. This trial data had 24 genotypes (“gen”), 6 incomplete blocks, each replicated 3 times.\nLet’s start analyzing this example first by loading the required libraries for linear mixed models:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\n\ndata1 &lt;- agridat::john.alpha\n\n\nTable of variables in the data set\n\n\nblock\nincomplete blocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in tonnes/ha\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.2.1 Data integrity checks\nLet’s look into the structure of the data first to verify the class of the variables.\n\nstr(data1)\n\n'data.frame':   72 obs. of  7 variables:\n $ plot : int  1 2 3 4 5 6 7 8 9 10 ...\n $ rep  : Factor w/ 3 levels \"R1\",\"R2\",\"R3\": 1 1 1 1 1 1 1 1 1 1 ...\n $ block: Factor w/ 6 levels \"B1\",\"B2\",\"B3\",..: 1 1 1 1 2 2 2 2 3 3 ...\n $ gen  : Factor w/ 24 levels \"G01\",\"G02\",\"G03\",..: 11 4 5 22 21 10 20 2 23 14 ...\n $ yield: num  4.12 4.45 5.88 4.58 4.65 ...\n $ row  : int  1 2 3 4 5 6 7 8 9 10 ...\n $ col  : int  1 1 1 1 1 1 1 1 1 1 ...\n\n\nNext step is to evaluate the independent variables. First, check the number of treatments per replication (each treatment should be replicated 3 times).\n\nagg_tbl &lt;- data1 %&gt;% group_by(gen) %&gt;% \n  summarise(total_count=n(),\n            .groups = 'drop')\nagg_tbl\n\n# A tibble: 24 × 2\n   gen   total_count\n   &lt;fct&gt;       &lt;int&gt;\n 1 G01             3\n 2 G02             3\n 3 G03             3\n 4 G04             3\n 5 G05             3\n 6 G06             3\n 7 G07             3\n 8 G08             3\n 9 G09             3\n10 G10             3\n# ℹ 14 more rows\n\n\nThis looks balanced, as expected.\nAlso, let’s have a look at the number of times each treatment appear per block.\n\nagg_blk &lt;- aggregate(data1$gen, by=list(data1$block), FUN=length)\nagg_blk\n\n  Group.1  x\n1      B1 12\n2      B2 12\n3      B3 12\n4      B4 12\n5      B5 12\n6      B6 12\n\n\n12 treatments randomly appear in incomplete block. Each incomplete block has same number of treatments.\nLastly, before fitting the model, it’s a good idea to look at the distribution of dependent variable, yield.\n\n\n\n\n\n\n\n\n\nFigure 9.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(data1$yield, main = \"\", xlab = \"yield\")\n\nThe response variables seems to follow a normal distribution curve, with fewer values on extreme lower and higher ends.\n\n\n9.2.2.2 Model Building\n\nlme4nlme\n\n\n\nmod_alpha &lt;- lmer(yield ~ gen + (1|rep/block),\n                   data = data1, \n                   na.action = na.exclude)\ntidy(mod_alpha)\n\n# A tibble: 27 × 8\n   effect group term        estimate std.error statistic    df     p.value\n   &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;       &lt;dbl&gt;\n 1 fixed  &lt;NA&gt;  (Intercept)   5.11       0.276    18.5    6.19 0.00000118 \n 2 fixed  &lt;NA&gt;  genG02       -0.629      0.269    -2.34  38.2  0.0248     \n 3 fixed  &lt;NA&gt;  genG03       -1.61       0.268    -6.00  37.7  0.000000590\n 4 fixed  &lt;NA&gt;  genG04       -0.618      0.268    -2.30  37.7  0.0269     \n 5 fixed  &lt;NA&gt;  genG05       -0.0705     0.258    -0.274 34.8  0.786      \n 6 fixed  &lt;NA&gt;  genG06       -0.571      0.268    -2.13  37.7  0.0398     \n 7 fixed  &lt;NA&gt;  genG07       -0.997      0.258    -3.87  34.8  0.000457   \n 8 fixed  &lt;NA&gt;  genG08       -0.580      0.268    -2.16  37.7  0.0370     \n 9 fixed  &lt;NA&gt;  genG09       -1.61       0.258    -6.21  35.3  0.000000390\n10 fixed  &lt;NA&gt;  genG10       -0.735      0.259    -2.83  35.9  0.00754    \n# ℹ 17 more rows\n\n\n\n\n\nmod_alpha1 &lt;- lme(yield ~ gen,\n                  random = ~ 1|rep/block,\n                  data = data1, \n                  na.action = na.exclude)\ntidy(mod_alpha1)\n\nWarning in tidy.lme(mod_alpha1): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 24 × 7\n   effect term        estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)   5.11       0.276    31    18.5   2.63e-18\n 2 fixed  genG02       -0.629      0.269    31    -2.34  2.61e- 2\n 3 fixed  genG03       -1.61       0.268    31    -6.00  1.23e- 6\n 4 fixed  genG04       -0.618      0.268    31    -2.30  2.81e- 2\n 5 fixed  genG05       -0.0705     0.258    31    -0.274 7.86e- 1\n 6 fixed  genG06       -0.571      0.268    31    -2.13  4.12e- 2\n 7 fixed  genG07       -0.997      0.258    31    -3.87  5.23e- 4\n 8 fixed  genG08       -0.580      0.268    31    -2.16  3.84e- 2\n 9 fixed  genG09       -1.61       0.258    31    -6.21  6.71e- 7\n10 fixed  genG10       -0.735      0.259    31    -2.83  8.05e- 3\n# ℹ 14 more rows\n\n\n\n\n\n\n\n9.2.2.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(mod_alpha, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(mod_alpha1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nWe observed a few extremes in residuals and normality curve showed a right skewness. #### Inference\nLet’s ANOVA table using anova() from lmer and lme models, respectively.\n\nlme4nlme\n\n\n\nanova(mod_alpha, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n    Sum Sq Mean Sq NumDF  DenDF F value    Pr(&gt;F)    \ngen 10.679 0.46429    23 34.902  5.4478 4.229e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(mod_alpha1, type = \"sequential\")\n\n            numDF denDF  F-value p-value\n(Intercept)     1    31 470.9507  &lt;.0001\ngen            23    31   5.4478  &lt;.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(mod_alpha, ~ gen)\n\n gen emmean    SE   df lower.CL upper.CL\n G01   5.11 0.279 6.20     4.43     5.78\n G02   4.48 0.279 6.20     3.80     5.15\n G03   3.50 0.279 6.20     2.82     4.18\n G04   4.49 0.279 6.20     3.81     5.17\n G05   5.04 0.278 6.19     4.36     5.71\n G06   4.54 0.278 6.19     3.86     5.21\n G07   4.11 0.279 6.20     3.43     4.79\n G08   4.53 0.279 6.20     3.85     5.20\n G09   3.50 0.278 6.19     2.83     4.18\n G10   4.37 0.279 6.20     3.70     5.05\n G11   4.28 0.279 6.20     3.61     4.96\n G12   4.76 0.279 6.20     4.08     5.43\n G13   4.76 0.278 6.19     4.08     5.43\n G14   4.78 0.278 6.19     4.10     5.45\n G15   4.97 0.278 6.19     4.29     5.65\n G16   4.73 0.279 6.20     4.05     5.41\n G17   4.60 0.278 6.19     3.93     5.28\n G18   4.36 0.279 6.20     3.69     5.04\n G19   4.84 0.278 6.19     4.16     5.52\n G20   4.04 0.278 6.19     3.36     4.72\n G21   4.80 0.278 6.19     4.12     5.47\n G22   4.53 0.278 6.19     3.85     5.20\n G23   4.25 0.278 6.19     3.58     4.93\n G24   4.15 0.279 6.20     3.48     4.83\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(mod_alpha1, ~ gen)\n\n gen emmean    SE df lower.CL upper.CL\n G01   5.11 0.276  2     3.92     6.30\n G02   4.48 0.276  2     3.29     5.67\n G03   3.50 0.276  2     2.31     4.69\n G04   4.49 0.276  2     3.30     5.68\n G05   5.04 0.276  2     3.85     6.22\n G06   4.54 0.276  2     3.35     5.72\n G07   4.11 0.276  2     2.92     5.30\n G08   4.53 0.276  2     3.34     5.72\n G09   3.50 0.276  2     2.31     4.69\n G10   4.37 0.276  2     3.19     5.56\n G11   4.28 0.276  2     3.10     5.47\n G12   4.76 0.276  2     3.57     5.94\n G13   4.76 0.276  2     3.57     5.95\n G14   4.78 0.276  2     3.59     5.96\n G15   4.97 0.276  2     3.78     6.16\n G16   4.73 0.276  2     3.54     5.92\n G17   4.60 0.276  2     3.42     5.79\n G18   4.36 0.276  2     3.17     5.55\n G19   4.84 0.276  2     3.65     6.03\n G20   4.04 0.276  2     2.85     5.23\n G21   4.80 0.276  2     3.61     5.98\n G22   4.53 0.276  2     3.34     5.72\n G23   4.25 0.276  2     3.06     5.44\n G24   4.15 0.276  2     2.97     5.34\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n\nJohn, JA, and ER Williams. 1995. Cyclic and Computer Generated Designs. 2nd ed. New York: Chapman; Hall/CRC Press. https://doi.org/10.1201/b15075.\n\n\nPatterson, H. D., and E. R. Williams. 1976. “A New Class of Resolvable Incomplete Block Designs.” Biometrika 63 (1): 83–92. https://doi.org/10.2307/2335087.\n\n\nYates, F. 1936. “A New Method of Arranging Variety Trials Involving a Large Number of Varieties.” J Agric Sci 26: 424–55.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Incomplete Block Design</span>"
    ]
  },
  {
    "objectID": "chapters/latin-design.html",
    "href": "chapters/latin-design.html",
    "title": "10  Latin Square Design",
    "section": "",
    "text": "10.1 Background\nIn the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.\nAdvantages of Latin square design are:\nDisadvantages:\nStatistical model for a response in Latin square design is:\n\\(Y_{ijk} = \\mu + \\alpha_i + \\beta_j +  \\gamma_k + \\epsilon_{ijk}\\)\nwhere, \\(\\mu\\) is the experiment mean, \\(\\alpha_i's\\) represents treatment effect, \\(\\beta\\) and \\(\\gamma\\) are the row- and column specific effects.\nAssumptions of this design includes normality and independent distribution of error (\\(\\epsilon_{ijk}\\)) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>10</span>  <span class='chapter-title'>Latin Square Design</span>"
    ]
  },
  {
    "objectID": "chapters/latin-design.html#background",
    "href": "chapters/latin-design.html#background",
    "title": "10  Latin Square Design",
    "section": "",
    "text": "The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels.\nThe analysis is quite simple.\n\n\n\nA Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤ t≤ 10.\nAny additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.\nThe effect of each treatment on the response must be approximately same across the rows and columns.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>10</span>  <span class='chapter-title'>Latin Square Design</span>"
    ]
  },
  {
    "objectID": "chapters/latin-design.html#example-analysis",
    "href": "chapters/latin-design.html#example-analysis",
    "title": "10  Latin Square Design",
    "section": "10.2 Example Analysis",
    "text": "10.2 Example Analysis\nLet’s start the analysis firstly by loading the required libraries:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans); library(performance)\nlibrary(dplyr); library(broom.mixed); library(agridat); library(desplot)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans); library(performance)\nlibrary(dplyr); library(agridat); library(desplot)\n\n\n\n\nThe data used in this example is extracted from the agridat package. In this experiment, 5 treatments (A = Dusted before rains. B = Dusted after rains. C = Dusted once each week. D = Drifting, once each week. E = Not dusted) were tested to control stem rust in wheat.\n\ndat &lt;- agridat::goulden.latin\n\n\nTable of variables in the data set\n\n\ntrt\ntreatment factor, 5 levels\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\nwheat yield\n\n\n\n\n10.2.1 Data integrity checks\nFirstly, let’s verify the class of variables in the dataset using str() function in base R\n\nstr(dat)\n\n'data.frame':   25 obs. of  4 variables:\n $ trt  : Factor w/ 5 levels \"A\",\"B\",\"C\",\"D\",..: 2 3 4 5 1 4 1 3 2 5 ...\n $ yield: num  4.9 9.3 7.6 5.3 9.3 6.4 4 15.4 7.6 6.3 ...\n $ row  : int  5 4 3 2 1 5 4 3 2 1 ...\n $ col  : int  1 1 1 1 1 2 2 2 2 2 ...\n\n\nHere yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change ‘row’ and ‘col’ from integer t factor/character.\n\ndat1 &lt;- dat |&gt; \n        mutate(row = as.factor(row),\n               col = as.factor(col))\n\nNext, to verify if the data meets the assumption of the Latin square design let’s plot the field layout for this experiment.\n\n\n\n\n\n\n\n\n\nThis looks great! Here we can see that there are equal number (5) of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn’t appear more than once in each row and column.\nNext step is to check if there are any missing values in response variable.\n\napply(dat, 2, function(x) sum(is.na(x)))\n\n  trt yield   row   col \n    0     0     0     0 \n\n\nNo missing values detected in this data set.\nBefore fitting the model, let’s create a histogram of response variable to see if there are extreme values.\n\n\n\n\n\n\nHistogram of the dependent variable.\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\n\n\n10.2.2 Model fitting\nHere we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.\nVarCorr(m1_b)\n\nlme4nlme\n\n\n\nm1_a &lt;- lmer(yield ~ trt + (1|row) + (1|col),\n           data = dat1,\n           na.action = na.exclude)\nsummary(m1_a) \n\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: yield ~ trt + (1 | row) + (1 | col)\n   Data: dat1\n\nREML criterion at convergence: 89.8\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.3994 -0.5383 -0.1928  0.5220  1.8429 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n row      (Intercept) 1.8660   1.3660  \n col      (Intercept) 0.2336   0.4833  \n Residual             2.3370   1.5287  \nNumber of obs: 25, groups:  row, 5; col, 5\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(&gt;|t|)    \n(Intercept)   6.8400     0.9420 11.9446   7.261 1.03e-05 ***\ntrtB         -0.3800     0.9669 12.0000  -0.393   0.7012    \ntrtC          6.2800     0.9669 12.0000   6.495 2.96e-05 ***\ntrtD          1.1200     0.9669 12.0000   1.158   0.2692    \ntrtE         -1.9200     0.9669 12.0000  -1.986   0.0704 .  \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr) trtB   trtC   trtD  \ntrtB -0.513                     \ntrtC -0.513  0.500              \ntrtD -0.513  0.500  0.500       \ntrtE -0.513  0.500  0.500  0.500\n\n\n\n\n\nm1_b &lt;- lme(yield ~ trt,\n          random =list(~1|row, ~1|col),\n          data = dat, \n          na.action = na.exclude)\n\nsummary(m1_b)\n\nLinear mixed-effects model fit by REML\n  Data: dat \n       AIC      BIC    logLik\n  106.0974 114.0633 -45.04872\n\nRandom effects:\n Formula: ~1 | row\n        (Intercept)\nStdDev:    1.344469\n\n Formula: ~1 | col %in% row\n        (Intercept) Residual\nStdDev:    1.494696 0.628399\n\nFixed effects:  yield ~ trt \n            Value Std.Error DF   t-value p-value\n(Intercept)  6.84 0.9419764 16  7.261328  0.0000\ntrtB        -0.38 1.0254756 16 -0.370560  0.7158\ntrtC         6.28 1.0254756 16  6.123987  0.0000\ntrtD         1.12 1.0254756 16  1.092176  0.2909\ntrtE        -1.92 1.0254756 16 -1.872302  0.0796\n Correlation: \n     (Intr) trtB   trtC   trtD  \ntrtB -0.544                     \ntrtC -0.544  0.500              \ntrtD -0.544  0.500  0.500       \ntrtE -0.544  0.500  0.500  0.500\n\nStandardized Within-Group Residuals:\n       Min         Q1        Med         Q3        Max \n-0.5686726 -0.2469684 -0.1061146  0.2349101  0.7617205 \n\nNumber of Observations: 25\nNumber of Groups: \n         row col %in% row \n           5           25 \n\n\n\n\n\n\n\n10.2.3 Check Model Assumptions\nThis step involves inspection of model residuals. by using check_model() function from the “performance” package.\n\nlme4nlme\n\n\n\ncheck_model(m1_a, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(m1_b, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\nThese visuals imply that assumptions of linear model have been met.\n\n\n10.2.4 Inference\nWe can now proceed to the variance partioning. In this case, we will use anova() with type = 1 or type = \"sequesntial\" for lmer() and lme() models, respectively.\n\nlme4nlme\n\n\n\nanova(m1_a, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n    Sum Sq Mean Sq NumDF DenDF F value    Pr(&gt;F)    \ntrt 196.61  49.152     4    12  21.032 2.366e-05 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(m1_b, type = \"sequential\")\n\n            numDF denDF   F-value p-value\n(Intercept)     1    16 132.38123  &lt;.0001\ntrt             4    16  18.69608  &lt;.0001\n\n\n\n\n\nBoth models have detected a significant treatment effect. Here we observed a significant impact on fungicide treatment on crop yield. Let’s have a look at the estimated marginal means of wheat yield with each treatment using emmeans() function.\n\nlme4nlme\n\n\n\nemmeans(m1_a, ~ trt)\n\n trt emmean    SE   df lower.CL upper.CL\n A     6.84 0.942 11.9     4.79     8.89\n B     6.46 0.942 11.9     4.41     8.51\n C    13.12 0.942 11.9    11.07    15.17\n D     7.96 0.942 11.9     5.91    10.01\n E     4.92 0.942 11.9     2.87     6.97\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(m1_b, ~ trt)\n\n trt emmean    SE df lower.CL upper.CL\n A     6.84 0.942  4     4.22     9.46\n B     6.46 0.942  4     3.84     9.08\n C    13.12 0.942  4    10.50    15.74\n D     7.96 0.942  4     5.34    10.58\n E     4.92 0.942  4     2.30     7.54\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nWe see that wheat yield was higher with ‘C’ fungicide treatment compared to other fungicides applied in this study. Which implies that ‘C’ fungicide was more efficient in controlling the stem rust in wheat.",
    "crumbs": [
      "Experiment designs",
      "<span class='chapter-number'>10</span>  <span class='chapter-title'>Latin Square Design</span>"
    ]
  },
  {
    "objectID": "chapters/repeated-measures.html",
    "href": "chapters/repeated-measures.html",
    "title": "11  Repeated measures mixed models",
    "section": "",
    "text": "12 Example Analysis\nIn the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects.\nStudies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time of each subject. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects.\nIn these models, the ‘iid’ assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms.\nThere are several types of covariance structures:\nThe repeated measures syntax in nlme follow this convention: corr = corAR1(value = (b/w -1 & 1), form = ~ t|g, fixed = (T or F)).\nOne can use differnt correlation structure classes such as CorAR1(), corCompSymm(), CorSymm().\nFor form(), ~ t or ~ t|g, specifying a time covariate t and, optionally a grouping factor g. When we use ~t|g form, the correlation structure is assumed to apply only to observations within the same grouping level.\nThe default starting value is zero, and if fixed = FALSE (the current nlme default), this value will be allowed to change during the model fitting process. A covariate for this correlation structure must be a integer value.\nThere are several other options in the nlme machinery (search “cor” for more options and details on the syntax).\nFitting models with correlated observations requires new libraries including mmrm and nlme. The lmer package allows random effects only.\nIn this tutorial we will analyze the data with repeated measures from different experiment designs including randomized complete block design, split plot, and split-split plot design.\nFor examples used in this chapter we will fitting model using mmrm and lme packages. So, let’s start with loading the required libraries for this analysis.\nFirst, we will start with the first example from a randomized complete block design with repeated measures.",
    "crumbs": [
      "<span class='chapter-number'>11</span>  <span class='chapter-title'>Repeated Measures</span>"
    ]
  },
  {
    "objectID": "chapters/repeated-measures.html#rcbd-repeated-measures",
    "href": "chapters/repeated-measures.html#rcbd-repeated-measures",
    "title": "11  Repeated measures mixed models",
    "section": "12.1 RCBD Repeated Measures",
    "text": "12.1 RCBD Repeated Measures\nThe example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.\nWe need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).\n\ndat &lt;- agriTutorial::sorghum %&gt;%   \n  mutate(week = as.numeric(factweek),\n         block = as.character(varblock)) \n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nReplicate\nreplication unit\n\n\nWeek\nTime points when data was collected\n\n\nvariety\ntreatment factor, 4 levels\n\n\ny\nyield (lbs)\n\n\n\n\n12.1.1 Data Integrity Checks\nLet’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.\n\nstr(dat)\n\n'data.frame':   100 obs. of  9 variables:\n $ y        : num  5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...\n $ variety  : Factor w/ 4 levels \"1\",\"2\",\"3\",\"4\": 1 1 1 1 1 1 1 1 1 1 ...\n $ Replicate: Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ factweek : Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 2 3 4 5 1 2 3 4 5 ...\n $ factplot : Factor w/ 20 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ varweek  : int  1 2 3 4 5 1 2 3 4 5 ...\n $ varblock : int  1 1 1 1 1 2 2 2 2 2 ...\n $ week     : num  1 2 3 4 5 1 2 3 4 5 ...\n $ block    : chr  \"1\" \"1\" \"1\" \"1\" ...\n\n\nIn this data, we have block, factplot, factweek as factor variables and y & week as numeric.\n\ntable(dat$variety, dat$block)\n\n   \n    1 2 3 4 5\n  1 5 5 5 5 5\n  2 5 5 5 5 5\n  3 5 5 5 5 5\n  4 5 5 5 5 5\n\n\nThe cross tabulation shows a equal number of variety treatments in each block.\n\nggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +\n  geom_boxplot() +  \n  #scale_fill_brewer(palette=\"Dark2\") +\n  scale_fill_viridis_d(option = \"F\") +\n    theme_bw()\n\n\n\n\n\n\n\n\nLooks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. One last step before we fit model is to look at the distribution of response variable.\n\nhist(dat$y, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 12.1: Histogram of the dependent variable.\n\n\n\n\n\n\n12.1.2 Model Building\nLet’s fit the basic model first using lme() from the nlme package.\n\nlm1 &lt;- lme(y ~ variety + factweek + variety:factweek,\n           random = ~1|block/factplot,\n           data = dat,\n           na.action = na.exclude)\n\nThe model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.\nBy adding this correlation structure, we are accounting for variation caused by repeated measurements over weeks for each plot. The AR1 structure assumes that data points collected more proximate are more correlated. Whereas, the compound symmetry structure assumes that correlation is equal for all time gaps. Here, we will fit model with both correlation structures and compare models to find out the best fit model.\nIn this analysis, time variable is week and it must be numeric.\n\ncs1 &lt;- corAR1(form = ~ week|block/factplot,  value = 0.2, fixed = FALSE)\ncs2 &lt;- corCompSymm(form = ~ week|block/factplot,  value = 0.2, fixed = FALSE)\n\nIn the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices. In nlme, please search the help tool to know more about functions for different correlation structure classes.\n\nlm2 &lt;- update(lm1, corr = cs1)\nlm3 &lt;- update(lm1, corr= cs2)\n\nNow let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova() or by compare_performance() function from the ‘performance’ library.\n\nanovaperformance\n\n\n\nanova(lm1, lm2, lm3)\n\n    Model df       AIC      BIC   logLik   Test  L.Ratio p-value\nlm1     1 23 18.837478 73.62409 13.58126                        \nlm2     2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487  &lt;.0001\nlm3     3 24 20.837478 78.00612 13.58126                        \n\n\n\n\n\nresult &lt;- compare_performance(lm1, lm2, lm3)\n\nSome of the nested models seem to be identical and probably only vary in\n  their random effects.\n\nprint_md(result)\n\n\nComparison of Model Performance Indices\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nModel\nAIC (weights)\nAICc (weights)\nBIC (weights)\nR2 (cond.)\nR2 (marg.)\nICC\nRMSE\nSigma\n\n\n\n\nlm1\nlme\n-50.5 (&lt;.001)\n-36.0 (&lt;.001)\n9.4 (&lt;.001)\n0.99\n0.37\n0.98\n0.10\n0.13\n\n\nlm2\nlme\n-77.5 (&gt;.999)\n-61.5 (&gt;.999)\n-15.0 (&gt;.999)\n0.97\n0.41\n0.95\n0.15\n0.18\n\n\nlm3\nlme\n-48.5 (&lt;.001)\n-32.5 (&lt;.001)\n14.0 (&lt;.001)\n0.98\n0.37\n0.98\n0.11\n0.14\n\n\n\n\n\n\n\n\nWe prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.\nLet’s run a tidy() on lm2 model to look at the estimates for random and fixed effects.\n\ntidy(lm2)\n\nWarning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of\nnesting\n\n\n# A tibble: 20 × 7\n   effect term               estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;                 &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)          4.24      0.291     64    14.6   5.44e-22\n 2 fixed  variety2             0.906     0.114     12     7.94  4.05e- 6\n 3 fixed  variety3             0.646     0.114     12     5.66  1.05e- 4\n 4 fixed  variety4             0.912     0.114     12     8.00  3.78e- 6\n 5 fixed  factweek2           -0.196     0.0571    64    -3.44  1.04e- 3\n 6 fixed  factweek3           -0.836     0.0755    64   -11.1   1.60e-16\n 7 fixed  factweek4           -1.16      0.0867    64   -13.3   4.00e-20\n 8 fixed  factweek5           -1.54      0.0943    64   -16.3   1.57e-24\n 9 fixed  variety2:factweek2   0.0280    0.0807    64     0.347 7.30e- 1\n10 fixed  variety3:factweek2   0.382     0.0807    64     4.73  1.26e- 5\n11 fixed  variety4:factweek2  -0.0140    0.0807    64    -0.174 8.63e- 1\n12 fixed  variety2:factweek3   0.282     0.107     64     2.64  1.03e- 2\n13 fixed  variety3:factweek3   0.662     0.107     64     6.20  4.55e- 8\n14 fixed  variety4:factweek3   0.388     0.107     64     3.64  5.55e- 4\n15 fixed  variety2:factweek4   0.228     0.123     64     1.86  6.77e- 2\n16 fixed  variety3:factweek4   0.744     0.123     64     6.06  7.86e- 8\n17 fixed  variety4:factweek4   0.390     0.123     64     3.18  2.28e- 3\n18 fixed  variety2:factweek5   0.402     0.133     64     3.01  3.70e- 3\n19 fixed  variety3:factweek5   0.672     0.133     64     5.04  4.11e- 6\n20 fixed  variety4:factweek5   0.222     0.133     64     1.66  1.01e- 1\n\n\n\n\n12.1.3 Check Model Assumptions\n\ncheck_model(lm2, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n12.1.4 Inference\nThe ANOVA table suggests a significant effect of the variety, week, and variety x week interaction effect.\n\nanova(lm2, type = \"marginal\")\n\n                 numDF denDF   F-value p-value\n(Intercept)          1    64 212.10509  &lt;.0001\nvariety              3    12  28.28895  &lt;.0001\nfactweek             4    64  74.79758  &lt;.0001\nvariety:factweek    12    64   7.03546  &lt;.0001\n\n\nWe can estimate the marginal means for variety and week effect and their interaction using emmeans() function.\n\nmean_1 &lt;- emmeans(lm2, ~ variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nmean_1\n\n variety emmean    SE df lower.CL upper.CL\n 1         3.50 0.288  4     2.70     4.29\n 2         4.59 0.288  4     3.79     5.39\n 3         4.63 0.288  4     3.84     5.43\n 4         4.61 0.288  4     3.81     5.40\n\nResults are averaged over the levels of: factweek \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nmean_2 &lt;- emmeans(lm2, ~ variety*factweek)\nmean_2\n\n variety factweek emmean    SE df lower.CL upper.CL\n 1       1          4.24 0.291  4     3.43     5.05\n 2       1          5.15 0.291  4     4.34     5.96\n 3       1          4.89 0.291  4     4.08     5.70\n 4       1          5.15 0.291  4     4.35     5.96\n 1       2          4.05 0.291  4     3.24     4.85\n 2       2          4.98 0.291  4     4.17     5.79\n 3       2          5.07 0.291  4     4.27     5.88\n 4       2          4.94 0.291  4     4.14     5.75\n 1       3          3.41 0.291  4     2.60     4.21\n 2       3          4.59 0.291  4     3.79     5.40\n 3       3          4.71 0.291  4     3.91     5.52\n 4       3          4.71 0.291  4     3.90     5.51\n 1       4          3.09 0.291  4     2.28     3.89\n 2       4          4.22 0.291  4     3.41     5.03\n 3       4          4.48 0.291  4     3.67     5.28\n 4       4          4.39 0.291  4     3.58     5.20\n 1       5          2.70 0.291  4     1.89     3.51\n 2       5          4.01 0.291  4     3.20     4.82\n 3       5          4.02 0.291  4     3.21     4.83\n 4       5          3.83 0.291  4     3.03     4.64\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\nTime variable\n\n\n\nHere is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.",
    "crumbs": [
      "<span class='chapter-number'>11</span>  <span class='chapter-title'>Repeated Measures</span>"
    ]
  },
  {
    "objectID": "chapters/repeated-measures.html#split-plot-repeated-measures",
    "href": "chapters/repeated-measures.html#split-plot-repeated-measures",
    "title": "11  Repeated measures mixed models",
    "section": "12.2 Split Plot Repeated Measures",
    "text": "12.2 Split Plot Repeated Measures\nRecall, we have evaluated split plot design Chapter 5. In this example we will use the same methodology used in Chapter 5 and update it with repeated measures component.\nNext, let’s load “Yield” data. It is located here.\n\nYield &lt;- read.csv(here::here(\"data/Yield.csv\"))\n\nThis example contains yield data in a split-plot design. The yield data was collected repeatedly from the same Reps over 5 Sample_times. In this data set, we have:\n\nTable of variables in the data set\n\n\nRep\nreplication unit\n\n\nVariety\nMain plot, 2 levels\n\n\nFertilizer\nSplit plot, 3 levels\n\n\nYield\ncrop yield\n\n\nSample_time\ntime points for data collection\n\n\n\n\n12.2.1 Data Integrity Checks\nFirstly, we need to look at the class of variables in the data set.\n\nstr(Yield)\n\n'data.frame':   120 obs. of  6 variables:\n $ Sample_time: int  1 1 1 1 1 1 1 1 1 1 ...\n $ Variety    : chr  \"VAR1\" \"VAR1\" \"VAR1\" \"VAR1\" ...\n $ Fertilizer : int  1 1 1 1 2 2 2 2 3 3 ...\n $ Rep        : int  1 2 3 4 1 2 3 4 1 2 ...\n $ pH         : num  7.07 7.06 7.08 7.09 7.13 7.12 7.15 7.14 7.18 7.18 ...\n $ Yield      : num  0.604 0.595 3.145 3.091 2.415 ...\n\n\nWe will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect.\n\n\nFor lme(), independent variables in a character/factor form works fine. But, for mmrm() independent variables must be a factor. Thus, for sake of consistancy, we will be using independent variables in factor class.\n\nYield$Variety &lt;- factor(Yield$Variety) \nYield$Fertilizer &lt;- factor(Yield$Fertilizer) \nYield$Sample_time1 &lt;- factor(Yield$Sample_time) \nYield$Rep &lt;- factor(Yield$Rep)  \n\nTo fit model, we first need to convert Variety, Fertilizer, and Sample_time as factors. In addition, we need to create a new variable named ‘plot’ with a unique value for each plot. In addition, we need a create variable for each subject which is plot in this case and contains a unique value for each plot. The plot variable is needed to model the variation in each plot over the sampling time. The plot will be used as a subject with repeated measures. The subject variable can be factor or numeric but the time (it could be year, or sample_time) has to be a factor.\n\n##creating a plot variable \nYield$plot &lt;- factor(paste(Yield$Rep, Yield$Fertilizer, Yield$Variety, sep='-')) \nYield$Rep2 &lt;- factor(paste(Yield$Rep, Yield$Variety, sep='-')) \ntable(Yield$plot) \n\n\n1-1-VAR1 1-1-VAR2 1-2-VAR1 1-2-VAR2 1-3-VAR1 1-3-VAR2 2-1-VAR1 2-1-VAR2 \n       5        5        5        5        5        5        5        5 \n2-2-VAR1 2-2-VAR2 2-3-VAR1 2-3-VAR2 3-1-VAR1 3-1-VAR2 3-2-VAR1 3-2-VAR2 \n       5        5        5        5        5        5        5        5 \n3-3-VAR1 3-3-VAR2 4-1-VAR1 4-1-VAR2 4-2-VAR1 4-2-VAR2 4-3-VAR1 4-3-VAR2 \n       5        5        5        5        5        5        5        5 \n\n\n\ntable(Yield$Fertilizer, Yield$Variety) \n\n   \n    VAR1 VAR2\n  1   20   20\n  2   20   20\n  3   20   20\n\n\nLooks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.\nBefore fitting a model, let’s check the distribution of the response variable.\n\n\n\n\n\n\n\n\n\nFigure 12.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(Yield$Yield)\n\n\n\n12.2.2 Model fit\nThis data can be analyzed either using nlme or mmrm.\nusing lme() from nlme package.\nLet’s say we want to fit a model using AR1 structure as shown in the RCBD repeated measures example. Previously, we used lme() from nlme package to fit the model. In this example, along with nlme() we will also mmrm() function from the mmrm package. In addition, instead of summary() function we will use tidy() function from the ‘broom.mixed’ package to look at estimates of mixed and random effects. This will generate a tidy workflow in particular by providing standardized verbs that provide information on estimates, standard errors, confidence intervals, etc.\n\nnlmemmrm\n\n\n\ncorr_str1 = corAR1(form = ~ Sample_time|Rep/Variety/plot, value = 0.2, fixed = FALSE)\n\nfit1 &lt;- lme(Yield ~ Sample_time1*Variety*Fertilizer,\n                random = ~ 1|Rep/Variety/plot,\n                corr= corr_str1,\n                data = Yield, na.action= na.exclude)\ntidy(fit1)\n\n# A tibble: 30 × 7\n   effect term                      estimate std.error    df statistic   p.value\n   &lt;chr&gt;  &lt;chr&gt;                        &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;\n 1 fixed  (Intercept)                  1.86      0.708    72     2.63  0.0105   \n 2 fixed  Sample_time12                0.515     0.688    72     0.748 0.457    \n 3 fixed  Sample_time13                0.787     0.674    72     1.17  0.247    \n 4 fixed  Sample_time14                1.35      0.675    72     2.00  0.0496   \n 5 fixed  Sample_time15                2.84      0.675    72     4.21  0.0000731\n 6 fixed  VarietyVAR2                 -0.996     0.861     3    -1.16  0.331    \n 7 fixed  Fertilizer2                  1.27      0.861    12     1.47  0.167    \n 8 fixed  Fertilizer3                  2.07      0.861    12     2.40  0.0333   \n 9 fixed  Sample_time12:VarietyVAR2    0.739     0.974    72     0.759 0.451    \n10 fixed  Sample_time13:VarietyVAR2    0.269     0.954    72     0.282 0.779    \n# ℹ 20 more rows\n\n\n\n\n\nfit2 &lt;- mmrm(formula = Yield ~ Sample_time1*Variety*Fertilizer +  \n             ar1(Sample_time1|Rep/plot),\n             data = Yield)\n\ntidy(fit2)\n\n# A tibble: 30 × 6\n   term                      estimate std.error    df statistic   p.value\n   &lt;chr&gt;                        &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;\n 1 (Intercept)                  2.86      0.464 12.7      6.16  0.0000387\n 2 Sample_time12                0.656     0.310  1.81     2.12  0.182    \n 3 Sample_time13                1.40      0.414  2.29     3.39  0.0636   \n 4 Sample_time14                1.46      0.484  2.87     3.01  0.0605   \n 5 Sample_time15                2.47      0.549  3.14     4.50  0.0186   \n 6 VarietyVAR2                 -1.07      0.656 12.7     -1.63  0.128    \n 7 Fertilizer2                  1.67      0.656 12.7      2.55  0.0245   \n 8 Fertilizer3                  0.595     0.656 12.7      0.908 0.381    \n 9 Sample_time12:VarietyVAR2   -0.591     0.438  1.81    -1.35  0.321    \n10 Sample_time13:VarietyVAR2   -0.412     0.586  2.29    -0.704 0.546    \n# ℹ 20 more rows\n\n\n\n\n\n\n\n12.2.3 Model diagnostics\nWe will use check_model() from ‘performance’ package to evaluate the model fitness of model fitted using nlme (mod1). However, the mmrm model class doesn’t work with performance package, so we will evalute the model diagnostics by plotting the residuals using base R functions.\n\nnlmemmrm\n\n\n\ncheck_model(fit1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nplot(residuals(fit2), xlab = \"fitted values\", ylab = \"residuals\")\nqqnorm(residuals(fit2)); qqline(residuals(fit2))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThese diagnostic plots look great! The linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects fall on a straight line so we can be satisfied with that.\n\n\n12.2.4 Inference\n\nnlmemmrm\n\n\n\nanova(fit1, type = \"marginal\")\n\n                                numDF denDF  F-value p-value\n(Intercept)                         1    72 6.899272  0.0105\nSample_time1                        4    72 5.318690  0.0008\nVariety                             1     3 1.338879  0.3310\nFertilizer                          2    12 2.936073  0.0916\nSample_time1:Variety                4    72 0.998154  0.4143\nSample_time1:Fertilizer             8    72 8.158884  &lt;.0001\nVariety:Fertilizer                  2    12 0.237417  0.7923\nSample_time1:Variety:Fertilizer     8    72 0.731698  0.6631\n\n\n\n\n\n#car::Anova(fit2, type = \"III\")\n#Anova.mmrm(fit2, type = \"III\")\n\n\n\n\nThe ANOVA showed a significant effect of Sample_time and Sample_time x Fertilizer interaction effect.\nNext, we can estimate marginal means and confidence intervals for the independent variables using emmeans().\n\nnlmemmrm\n\n\n\nemmeans(fit1,~ Sample_time1)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n Sample_time1 emmean    SE df lower.CL upper.CL\n 1              2.65 0.438  3     1.25     4.04\n 2              4.40 0.438  3     3.01     5.79\n 3              5.53 0.438  3     4.13     6.92\n 4              7.26 0.438  3     5.87     8.66\n 5              8.82 0.438  3     7.42    10.21\n\nResults are averaged over the levels of: Variety, Fertilizer \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(fit1,~ Sample_time1|Fertilizer)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nFertilizer = 1:\n Sample_time1 emmean    SE df lower.CL upper.CL\n 1              1.36 0.562  3   -0.427     3.15\n 2              2.25 0.562  3    0.458     4.03\n 3              2.28 0.562  3    0.495     4.07\n 4              2.65 0.562  3    0.861     4.44\n 5              3.66 0.562  3    1.874     5.45\n\nFertilizer = 2:\n Sample_time1 emmean    SE df lower.CL upper.CL\n 1              3.04 0.562  3    1.248     4.82\n 2              5.17 0.562  3    3.383     6.96\n 3              6.46 0.562  3    4.668     8.24\n 4              8.72 0.562  3    6.935    10.51\n 5             10.09 0.562  3    8.304    11.88\n\nFertilizer = 3:\n Sample_time1 emmean    SE df lower.CL upper.CL\n 1              3.55 0.562  3    1.762     5.34\n 2              5.78 0.562  3    3.995     7.57\n 3              7.84 0.562  3    6.051     9.63\n 4             10.42 0.562  3    8.630    12.21\n 5             12.69 0.562  3   10.905    14.48\n\nResults are averaged over the levels of: Variety \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(fit2,~ Sample_time1)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n Sample_time1 emmean    SE   df lower.CL upper.CL\n 1              3.43 0.189 12.7     3.02     3.84\n 2              5.21 0.169 12.7     4.84     5.58\n 3              6.59 0.163 11.9     6.23     6.94\n 4              7.96 0.169 12.7     7.60     8.33\n 5              9.65 0.189 12.7     9.24    10.06\n\nResults are averaged over the levels of: Variety, Fertilizer, Rep \nConfidence level used: 0.95 \n\n emmeans(fit2,~ Sample_time1|Fertilizer)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nFertilizer = 1:\n Sample_time1 emmean    SE   df lower.CL upper.CL\n 1              2.32 0.328 12.7     1.61     3.03\n 2              2.68 0.293 12.7     2.05     3.32\n 3              3.52 0.283 11.9     2.90     4.14\n 4              3.54 0.293 12.7     2.91     4.18\n 5              4.27 0.328 12.7     3.56     4.98\n\nFertilizer = 2:\n Sample_time1 emmean    SE   df lower.CL upper.CL\n 1              4.26 0.328 12.7     3.55     4.97\n 2              6.37 0.293 12.7     5.74     7.01\n 3              7.82 0.283 11.9     7.21     8.44\n 4              9.66 0.293 12.7     9.03    10.30\n 5             11.54 0.328 12.7    10.83    12.25\n\nFertilizer = 3:\n Sample_time1 emmean    SE   df lower.CL upper.CL\n 1              3.70 0.328 12.7     2.99     4.41\n 2              6.58 0.293 12.7     5.94     7.21\n 3              8.42 0.283 11.9     7.81     9.04\n 4             10.69 0.293 12.7    10.05    11.32\n 5             13.14 0.328 12.7    12.43    13.85\n\nResults are averaged over the levels of: Variety, Rep \nConfidence level used: 0.95 \n\n\n\n\n\n\n\nTo explore more about contrasts and emmeans please refer to Chapter 12.",
    "crumbs": [
      "<span class='chapter-number'>11</span>  <span class='chapter-title'>Repeated Measures</span>"
    ]
  },
  {
    "objectID": "chapters/repeated-measures.html#split-split-plot-repeated-measures",
    "href": "chapters/repeated-measures.html#split-split-plot-repeated-measures",
    "title": "11  Repeated measures mixed models",
    "section": "12.3 Split-split plot repeated measures",
    "text": "12.3 Split-split plot repeated measures\nRecall, we have evaluated the split-split experiment design in Chapter 5, where we had a one factor in main-plot, other in subplot and the third factor in sub-subplot. In this example we will be adding a repeated measures compoenet to the split-split plot design.\n\nphos &lt;- read.csv(here::here(\"data\", \"split_split_repeated.csv\"))\n\n\n\n\nplot\nexperimental unit\n\n\nblock\nreplication unit\n\n\nPtrt\nMain plot, 2 levels\n\n\nInoc\nSplit plot, 2 levels\n\n\nCv\nSplit-split plot, 5 levels\n\n\ntime\ntime points for data collection\n\n\nP_leaf\nleaf phosphorous content\n\n\n\n\n12.3.1 Data Integrity Checks\n\nstr(phos)\n\n'data.frame':   240 obs. of  7 variables:\n $ plot  : int  1 1 1 2 2 2 3 3 3 4 ...\n $ bloc  : int  1 1 1 1 1 1 1 1 1 1 ...\n $ Ptrt  : chr  \"high\" \"high\" \"high\" \"high\" ...\n $ Inoc  : chr  \"none\" \"none\" \"none\" \"none\" ...\n $ Cv    : chr  \"LOUISE\" \"LOUISE\" \"LOUISE\" \"BlancaG\" ...\n $ time  : chr  \"PT1\" \"PT2\" \"PT3\" \"PT1\" ...\n $ P_leaf: num  3154 2331 247 3016 2160 ...\n\n\n\nphos$time = as.factor(phos$time)\nphos1 &lt;- phos %&gt;%   \n  mutate(time1 = as.numeric(time),\n        rep = as.character(bloc),\n        plot = as.character(plot)) \n\n\ntable(phos1$Ptrt, phos1$Inoc, phos1$Cv) \n\n, ,  = ALPOWA\n\n      \n       myco none\n  high   12   12\n  low    12   12\n\n, ,  = BlancaG\n\n      \n       myco none\n  high   12   12\n  low    12   12\n\n, ,  = LOUISE\n\n      \n       myco none\n  high   12   12\n  low    12   12\n\n, ,  = OTIS\n\n      \n       myco none\n  high   12   12\n  low    12   12\n\n, ,  = WALWORTH\n\n      \n       myco none\n  high   12   12\n  low    12   12\n\n\nLooks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.\nBefore fitting a model, let’s check the distribution of the response variable.\n\n\n\n\n\n\n\n\n\nFigure 12.3: Histogram of the dependent variable.\n\n\n\n\n\nhist(phos1$P_leaf)\n\n\n\n12.3.2 Model fit\n\ncorr_str1 = corAR1(form = ~ time1|rep/Ptrt/Inoc/plot, value = 0.2, fixed = FALSE)\n\nfit1 &lt;- lme(P_leaf ~ time*Ptrt*Inoc*Cv,\n                random = ~ 1|rep/Ptrt/Inoc/plot,\n                corr= corr_str1,\n                data = phos1, na.action= na.exclude)\ntidy(fit1)\n\n# A tibble: 60 × 7\n   effect term            estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;              &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)      3175.        82.6   120   38.4    2.63e-69\n 2 fixed  timePT2          -866.        91.6   120   -9.46   3.41e-16\n 3 fixed  timePT3         -3015.        96.9   120  -31.1    2.66e-59\n 4 fixed  Ptrtlow          -185.       101.      3   -1.84   1.64e- 1\n 5 fixed  Inocnone          129.        97.6     6    1.33   2.33e- 1\n 6 fixed  CvBlancaG          48.4       97.6    48    0.496  6.22e- 1\n 7 fixed  CvLOUISE          -23.2       97.6    48   -0.238  8.13e- 1\n 8 fixed  CvOTIS              2.49      97.6    48    0.0255 9.80e- 1\n 9 fixed  CvWALWORTH       -413.        97.6    48   -4.23   1.03e- 4\n10 fixed  timePT2:Ptrtlow   104.       129.    120    0.800  4.25e- 1\n# ℹ 50 more rows\n\n\n\ncheck_model(fit1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\nWe see a cluster of values in residuals which was due to large number of observations having low values.\n\nanova(fit1, type = \"marginal\")\n\n                  numDF denDF   F-value p-value\n(Intercept)           1   120 1477.6555  &lt;.0001\ntime                  2   120  518.4625  &lt;.0001\nPtrt                  1     3    3.3729  0.1636\nInoc                  1     6    1.7592  0.2330\nCv                    4    48    7.5577  0.0001\ntime:Ptrt             2   120    0.6765  0.5103\ntime:Inoc             2   120    2.2797  0.1067\nPtrt:Inoc             1     6    2.4771  0.1666\ntime:Cv               8   120    2.4426  0.0175\nPtrt:Cv               4    48    0.5051  0.7321\nInoc:Cv               4    48    2.1222  0.0925\ntime:Ptrt:Inoc        2   120    0.8339  0.4369\ntime:Ptrt:Cv          8   120    0.2320  0.9843\ntime:Inoc:Cv          8   120    1.0401  0.4100\nPtrt:Inoc:Cv          4    48    0.4733  0.7551\ntime:Ptrt:Inoc:Cv     8   120    0.4155  0.9098\n\n\n\nemmeans(fit1,~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n time emmean   SE df lower.CL upper.CL\n PT1    3096 46.2  3   2948.7     3242\n PT2    2270 46.2  3   2122.7     2416\n PT3     198 46.2  3     50.8      345\n\nResults are averaged over the levels of: Ptrt, Inoc, Cv \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(fit1,~ time|Cv)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nCv = ALPOWA:\n time emmean   SE df lower.CL upper.CL\n PT1    3201 55.5  3  3024.57     3378\n PT2    2225 55.5  3  2047.87     2401\n PT3     178 55.5  3     1.53      355\n\nCv = BlancaG:\n time emmean   SE df lower.CL upper.CL\n PT1    3183 55.5  3  3006.50     3360\n PT2    2334 55.5  3  2157.45     2511\n PT3     210 55.5  3    32.95      386\n\nCv = LOUISE:\n time emmean   SE df lower.CL upper.CL\n PT1    3121 55.5  3  2944.36     3298\n PT2    2366 55.5  3  2189.56     2543\n PT3     174 55.5  3    -2.43      351\n\nCv = OTIS:\n time emmean   SE df lower.CL upper.CL\n PT1    3228 55.5  3  3051.65     3405\n PT2    2253 55.5  3  2076.66     2430\n PT3     234 55.5  3    56.86      410\n\nCv = WALWORTH:\n time emmean   SE df lower.CL upper.CL\n PT1    2744 55.5  3  2567.30     2921\n PT2    2170 55.5  3  1992.90     2346\n PT3     193 55.5  3    15.88      369\n\nResults are averaged over the levels of: Ptrt, Inoc \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nReally low P leaf content at PT3 in all the cultivars.\nThe biggest advantage of mixed models is their incredible flexibility. They handle clustered individuals as well as repeated measures (even in the same model). They handle crossed random factors as well as nested\nThe biggest disadvantage of mixed models, at least for someone new to them, is their incredible flexibility. It’s easy to mis-specify a mixed model, and this is a place where a little knowledge is definitely dangerous.",
    "crumbs": [
      "<span class='chapter-number'>11</span>  <span class='chapter-title'>Repeated Measures</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html",
    "href": "chapters/means-and-contrasts.html",
    "title": "12  Marginal Means & Contrasts",
    "section": "",
    "text": "12.1 Background\nTo start off with, we need to define estimated marginal means (EMM). Estimated marginal means are defined as marginal means of a variable across all levels of other variables in a model, essentially giving a “population-level” average.\nThe emmeans package is one of the most commonly used package in R in determine EMMs. This package provides methods for obtaining EMMs (also known as least-squares means) for factor combinations in a variety of models. The emmeans package is one of several alternatives to facilitate post hoc methods application and contrast analysis. It is a relatively recent replacement for the lsmeans package that some R users may be familiar with. It is intended for use with a wide variety of ANOVA models, including repeated measures and nested designs (mixed models). This is a flexible package that comes with a set of detailed vignettes and works with a lot of different model objects.\nIn this chapter, we will demonstrate the extended use of the emmeans package to calculate estimated marginal means and contrasts.\nTo demonstrate the use of the emmeans package. We will pull the model from split plot lesson (Chapter 6), where we evaluated the effect of Nitrogen and Variety on Oat yield. This data contains 6 blocks, 3 main plots (Variety) and 4 subplots (Nitrogen). The primary outcome variable was oat yield. To read more about the experiment layout details please read RCBD split-plot section in Chapter 6.\nLet’s start the analysis by loading the required libraries for fitting linear mixed models using nlme package.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#background",
    "href": "chapters/means-and-contrasts.html#background",
    "title": "12  Marginal Means & Contrasts",
    "section": "",
    "text": "Marginal means using lmer and nlme\n\n\n\nFor demonstration of the emmeans package, we are fitting model with nlme package. Please note that code below calculating marginal means works for both lmer and nlme models.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#analysis-examples",
    "href": "chapters/means-and-contrasts.html#analysis-examples",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.2 Analysis Examples",
    "text": "12.2 Analysis Examples\nWe will start with loading required R libraries for this analysis.\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(multcompView)\nlibrary(multcomp); library(ggplot2)\n\n\n12.2.1 Import data\nLet’s import oats data from the MASS package.\n\ndata1 &lt;- MASS::oats\n\n\n\nTo read more about data and model fitting explanation please refer to Chapter 6.\n\n\n12.2.2 Model fitting\n\nmodel1 &lt;- lme(Y ~  V + N + V:N ,\n                  random = ~1|B/V,\n                  data = data1, \n                  na.action = na.exclude)\ntidy(model1)\n\nWarning in tidy.lme(model1): ran_pars not yet implemented for multiple levels\nof nesting\n\n\n# A tibble: 12 × 7\n   effect term                estimate std.error    df statistic  p.value\n   &lt;chr&gt;  &lt;chr&gt;                  &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;\n 1 fixed  (Intercept)           80          9.11    45    8.78   2.56e-11\n 2 fixed  VMarvellous            6.67       9.72    10    0.686  5.08e- 1\n 3 fixed  VVictory              -8.50       9.72    10   -0.875  4.02e- 1\n 4 fixed  N0.2cwt               18.5        7.68    45    2.41   2.02e- 2\n 5 fixed  N0.4cwt               34.7        7.68    45    4.51   4.58e- 5\n 6 fixed  N0.6cwt               44.8        7.68    45    5.84   5.48e- 7\n 7 fixed  VMarvellous:N0.2cwt    3.33      10.9     45    0.307  7.60e- 1\n 8 fixed  VVictory:N0.2cwt      -0.333     10.9     45   -0.0307 9.76e- 1\n 9 fixed  VMarvellous:N0.4cwt   -4.17      10.9     45   -0.383  7.03e- 1\n10 fixed  VVictory:N0.4cwt       4.67      10.9     45    0.430  6.70e- 1\n11 fixed  VMarvellous:N0.6cwt   -4.67      10.9     45   -0.430  6.70e- 1\n12 fixed  VVictory:N0.6cwt       2.17      10.9     45    0.199  8.43e- 1\n\n\n\n\n12.2.3 Check Model Assumptions\n\ncheck_model(model1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\nResiduals look good with a small hump in middle and normality curve looks better. ### Model Inference\n\nanova(model1, type = \"marginal\")\n\n            numDF denDF  F-value p-value\n(Intercept)     1    45 77.16729  &lt;.0001\nV               2    10  1.22454  0.3344\nN               3    45 13.02273  &lt;.0001\nV:N             6    45  0.30282  0.9322\n\n\nThe analysis of variance showed a significant N effect and no effect of V and VxN on oat yield.\n\n\n12.2.4 Estimated Marginal Means\nNow that we have fitted a linear mixed model (model1) and it meets the model assumption. Let’s use the emmeans() function to obtain estimated marginal means for main (variety and nitrogen) and interaction (variety x nitrogen) effects.\n\n12.2.4.1 Main effects\n\nm1 &lt;- emmeans(model1, ~V, level = 0.95)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm1\n\n V           emmean  SE df lower.CL upper.CL\n Golden.rain  104.5 7.8  5     84.5      125\n Marvellous   109.8 7.8  5     89.7      130\n Victory       97.6 7.8  5     77.6      118\n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\nm2 &lt;- emmeans(model1, ~N)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm2\n\n N      emmean   SE df lower.CL upper.CL\n 0.0cwt   79.4 7.17  5     60.9     97.8\n 0.2cwt   98.9 7.17  5     80.4    117.3\n 0.4cwt  114.2 7.17  5     95.8    132.7\n 0.6cwt  123.4 7.17  5    104.9    141.8\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nMake sure to read and interpret EMMs carefully. Here, when we calculated EMMs for main effects of V and N, these were averaged over the levels of other factor in experiment. For example, estimated means for each variety were averaged over it’s N treatments, respectively.\n\n\n12.2.4.2 Interaction effects\nNow let’s evaluate the EMMs for the interaction effect of V and N. These can be calculated either using V*N or V|N.\n\nm3 &lt;- emmeans(model1, ~V*N)\nm3\n\n V           N      emmean   SE df lower.CL upper.CL\n Golden.rain 0.0cwt   80.0 9.11  5     56.6    103.4\n Marvellous  0.0cwt   86.7 9.11  5     63.3    110.1\n Victory     0.0cwt   71.5 9.11  5     48.1     94.9\n Golden.rain 0.2cwt   98.5 9.11  5     75.1    121.9\n Marvellous  0.2cwt  108.5 9.11  5     85.1    131.9\n Victory     0.2cwt   89.7 9.11  5     66.3    113.1\n Golden.rain 0.4cwt  114.7 9.11  5     91.3    138.1\n Marvellous  0.4cwt  117.2 9.11  5     93.8    140.6\n Victory     0.4cwt  110.8 9.11  5     87.4    134.2\n Golden.rain 0.6cwt  124.8 9.11  5    101.4    148.2\n Marvellous  0.6cwt  126.8 9.11  5    103.4    150.2\n Victory     0.6cwt  118.5 9.11  5     95.1    141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\nm4 &lt;- emmeans(model1, ~V|N)\nm4\n\nN = 0.0cwt:\n V           emmean   SE df lower.CL upper.CL\n Golden.rain   80.0 9.11  5     56.6    103.4\n Marvellous    86.7 9.11  5     63.3    110.1\n Victory       71.5 9.11  5     48.1     94.9\n\nN = 0.2cwt:\n V           emmean   SE df lower.CL upper.CL\n Golden.rain   98.5 9.11  5     75.1    121.9\n Marvellous   108.5 9.11  5     85.1    131.9\n Victory       89.7 9.11  5     66.3    113.1\n\nN = 0.4cwt:\n V           emmean   SE df lower.CL upper.CL\n Golden.rain  114.7 9.11  5     91.3    138.1\n Marvellous   117.2 9.11  5     93.8    140.6\n Victory      110.8 9.11  5     87.4    134.2\n\nN = 0.6cwt:\n V           emmean   SE df lower.CL upper.CL\n Golden.rain  124.8 9.11  5    101.4    148.2\n Marvellous   126.8 9.11  5    103.4    150.2\n Victory      118.5 9.11  5     95.1    141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nThe EMMs (m3 and m4) gives the same results but the outcome style is litte more explanatory in m4.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#contrasts-using-emmeans",
    "href": "chapters/means-and-contrasts.html#contrasts-using-emmeans",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.3 Contrasts using emmeans",
    "text": "12.3 Contrasts using emmeans\nFirstly, the pairs() function from emmeans package can be used to evaluate the pairwise comparison among treatment objects. The emmean object (m1, m2) will be passed through pairs() function which will provide a p-value adjustment equivalent to the Tukey test.\n\npairs(m1, adjust = \"tukey\")\n\n contrast                 estimate   SE df t.ratio p.value\n Golden.rain - Marvellous    -5.29 7.08 10  -0.748  0.7419\n Golden.rain - Victory        6.88 7.08 10   0.971  0.6104\n Marvellous - Victory        12.17 7.08 10   1.719  0.2458\n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 3 estimates \n\n\n\npairs(m2)\n\n contrast        estimate   SE df t.ratio p.value\n 0.0cwt - 0.2cwt   -19.50 4.44 45  -4.396  0.0004\n 0.0cwt - 0.4cwt   -34.83 4.44 45  -7.853  &lt;.0001\n 0.0cwt - 0.6cwt   -44.00 4.44 45  -9.919  &lt;.0001\n 0.2cwt - 0.4cwt   -15.33 4.44 45  -3.457  0.0064\n 0.2cwt - 0.6cwt   -24.50 4.44 45  -5.523  &lt;.0001\n 0.4cwt - 0.6cwt    -9.17 4.44 45  -2.067  0.1797\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\nHere if we look at the results from code chunk above, it’s easy to interpret results from pairs() function in case of variety comparison becuase there were only 3 groups. But it’s little confusing in case of Nitrogen treatments where we had 4 groups. We can further simplify it by using custom contrasts.\n\n\n\n\n\n\npairs()\n\n\n\nRemember!!\nThe pairs() function can be used to calculate pairwise comparison when treatment groups are less than equal to 3.\n\n\n\n12.3.1 Custom contrasts\nFirst, run emmean object ‘m2’ for nitrogen treatments.\n\nm2\n\n N      emmean   SE df lower.CL upper.CL\n 0.0cwt   79.4 7.17  5     60.9     97.8\n 0.2cwt   98.9 7.17  5     80.4    117.3\n 0.4cwt  114.2 7.17  5     95.8    132.7\n 0.6cwt  123.4 7.17  5    104.9    141.8\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nNow, let’s create a vector for each nitrogen treatment in the same order as presented in output from m2.\n\nA1 = c(1, 0, 0, 0)\nA2 = c(0, 1, 0, 0)\nA3 = c(0, 0, 1, 0)\nA4 = c(0, 0, 0, 1)\n\nThese vectors (A1, A2, A3, A4) represent each Nitrogen treatment in an order as presented in m2 emmeans object. A1, A2, and A3, A4 vectors represents 0.0cwt, 0.2cwt, 0.4cwt, and 0.6cwt treatments, respectively.\nNext step is to create a custom contrasts for comparing ‘0.0cwt’ (A1) treatment to ‘0.2cwt’ (A2), ‘0.4cwt’ (A3), and ‘0.6cwt’ (A4) treatments. This can be evaluated as shown below:\n\ncontrast(m2, method = list(A1 - A2) )\n\n contrast       estimate   SE df t.ratio p.value\n c(1, -1, 0, 0)    -19.5 4.44 45  -4.396  0.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\ncontrast(m2, method = list(A1 - A3) )\n\n contrast       estimate   SE df t.ratio p.value\n c(1, 0, -1, 0)    -34.8 4.44 45  -7.853  &lt;.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\ncontrast(m2, method = list(A1 - A4) )\n\n contrast       estimate   SE df t.ratio p.value\n c(1, 0, 0, -1)      -44 4.44 45  -9.919  &lt;.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\n\nHere the output shows the difference in mean yield between control and 3 N treatments. The results shows that yield was significantly higher N treatments compared to the control (0.0cwt) irrespective of the oat variety.\n\n\n\n\n\n\ncontrast() vs pairs()\n\n\n\nUsing custom contrast() is strongly recommended instead of pairs() when you are comparing multiple treatment groups (&gt;5).",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#compact-letter-displays",
    "href": "chapters/means-and-contrasts.html#compact-letter-displays",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.4 Compact letter displays",
    "text": "12.4 Compact letter displays\nCompact letter displays (CLDs) are a popular way to display multiple comparisons when there are more than few group means to compare. However, they are problematic as they are more prone to misinterpretation. The R package multcompView (Graves et al., 2019) provides an implementation of CLDs creating a display where any two means associated with same symbol are not statistically different.\nThe cld() function from the multcomp package is used to implement CLDs in the form of symbols or letters. The emmeans package provides a emmGrid objects for cld() method.\nLet’s start evaluating CLDs for main effects. We will use emmean objects m1 (for variety) and m2 (for nitrogen) for this. In the output below, groups sharing a letter in the .group are not statistically different from each other.\n\ncld(m1, alpha=0.05, Letters=letters)\n\n V           emmean  SE df lower.CL upper.CL .group\n Victory       97.6 7.8  5     77.6      118  a    \n Golden.rain  104.5 7.8  5     84.5      125  a    \n Marvellous   109.8 7.8  5     89.7      130  a    \n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 3 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n      then we cannot show them to be different.\n      But we also did not show them to be the same. \n\n\n\ncld(m2, alpha=0.05, Letters=letters)\n\n N      emmean   SE df lower.CL upper.CL .group\n 0.0cwt   79.4 7.17  5     60.9     97.8  a    \n 0.2cwt   98.9 7.17  5     80.4    117.3   b   \n 0.4cwt  114.2 7.17  5     95.8    132.7    c  \n 0.6cwt  123.4 7.17  5    104.9    141.8    c  \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 4 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n      then we cannot show them to be different.\n      But we also did not show them to be the same. \n\n\nLet’s have a look at the CLDs for the interaction effect:\n\ncld3 &lt;- cld(m3, alpha=0.05, Letters=letters)\ncld3\n\n V           N      emmean   SE df lower.CL upper.CL .group    \n Victory     0.0cwt   71.5 9.11  5     48.1     94.9  a        \n Golden.rain 0.0cwt   80.0 9.11  5     56.6    103.4  abcde    \n Marvellous  0.0cwt   86.7 9.11  5     63.3    110.1  abc  fg  \n Victory     0.2cwt   89.7 9.11  5     66.3    113.1  ab d f h \n Golden.rain 0.2cwt   98.5 9.11  5     75.1    121.9  abcdefghi\n Marvellous  0.2cwt  108.5 9.11  5     85.1    131.9  abcdefghi\n Victory     0.4cwt  110.8 9.11  5     87.4    134.2   bcdefghi\n Golden.rain 0.4cwt  114.7 9.11  5     91.3    138.1       fghi\n Marvellous  0.4cwt  117.2 9.11  5     93.8    140.6     de  hi\n Victory     0.6cwt  118.5 9.11  5     95.1    141.9    c e g i\n Golden.rain 0.6cwt  124.8 9.11  5    101.4    148.2       fghi\n Marvellous  0.6cwt  126.8 9.11  5    103.4    150.2         hi\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 12 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n      then we cannot show them to be different.\n      But we also did not show them to be the same. \n\n\nInterpretation of these letters is: Here we have a significant difference in grain yield with varieties “victory”, with N treatments of 0.0cwt, 0.2cwt, 0.4cwt, and 0.6wt. Grain yield for Golden.rain variety was significantly lower with 0.0cwt N treatment compared to the 0.2cwt, 0.4cwt, and 0.6wt treatments.\nIn the data set we used for demonstration here, we had equal number of observations in each group. However, this might not be a case every time as it is common to have missing values in the data set. In such cases, readers usually struggle to interpret significant differences among groups. For example, estimated means of two groups are substantially different but they are no statistically different. This normally happens when SE of one group is large due to its small sample size, so it’s hard for it to be statistically different from other groups. In such cases, we can use alternatives to CLDs as shown below.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#alternatives-to-cld",
    "href": "chapters/means-and-contrasts.html#alternatives-to-cld",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.5 Alternatives to CLD",
    "text": "12.5 Alternatives to CLD\n\nEquivalence test\n\nLet’s assume based on subject matter considerations, if mean yield of two groups differ by less than 30 can be considered equivalent. Let’s try equivalence test on clds of nitrogen treatment emmeans (m2)\n\ncld(m2, delta = 30, adjust = \"none\")\n\n N      emmean   SE df lower.CL upper.CL .equiv.set\n 0.0cwt   79.4 7.17  5     60.9     97.8  1        \n 0.2cwt   98.9 7.17  5     80.4    117.3  12       \n 0.4cwt  114.2 7.17  5     95.8    132.7   23      \n 0.6cwt  123.4 7.17  5    104.9    141.8    3      \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nStatistics are tests of equivalence with a threshold of 30 \nP values are left-tailed \nsignificance level used: alpha = 0.05 \nEstimates sharing the same symbol test as equivalent \n\n\nHere, two treatment groups ‘0.0cwt’ and ‘0.2cwt’, ‘0.4cwt’ and ‘0.6cwt’ can be considered equivalent.\n\nSignificance Sets\n\nAnother alternative is to simply reverse all the boolean flags we used in constructing CLDs for m3 first time.\n\ncld(m2, signif = TRUE)\n\n N      emmean   SE df lower.CL upper.CL .signif.set\n 0.0cwt   79.4 7.17  5     60.9     97.8  12        \n 0.2cwt   98.9 7.17  5     80.4    117.3  12        \n 0.4cwt  114.2 7.17  5     95.8    132.7  1         \n 0.6cwt  123.4 7.17  5    104.9    141.8   2        \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 4 estimates \nsignificance level used: alpha = 0.05 \nEstimates sharing the same symbol are significantly different \n\n\n\n\n\n\n\n\nCautionary Note about CLD\n\n\n\nIt’s important to note that we cannot conclude that treatment levels with the same letter are the same. We can only conclude that they are not different.\nThere is a separate branch of statistics, “equivalence testing” that is for ascertaining if things are sufficiently similar to conclude they are equivalent.\nSee Section 2.0.4 for additional warnings about problems with using compact letter display.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#export-emmeans-to-excel-sheet",
    "href": "chapters/means-and-contrasts.html#export-emmeans-to-excel-sheet",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.6 Export emmeans to excel sheet",
    "text": "12.6 Export emmeans to excel sheet\nThe outputs from emmeans() or cld() objects can exported by firstly converting outputs to a data frame and then using writexlsx() function from the ‘writexl’ package to export the outputs.\n\nresult_n &lt;- as.data.frame(summary(m1))\n\n\nwritexl::write_xlsx(result_n)",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#graphical-display-of-emmeans",
    "href": "chapters/means-and-contrasts.html#graphical-display-of-emmeans",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.7 Graphical display of emmeans",
    "text": "12.7 Graphical display of emmeans\nThe results of emmeans() object can be plotted in two different ways. First, we can use base plot() function in R.\n\nplot(m1)\n\n\n\n\n\n\n\nplot(m4)\n\n\n\n\n\n\n\n\nOr we can use ‘ggplot2’ library. We can plot cld3 object in ggplot, with Variety on x-axis and estimated means of yield on y-axis. Different N treatments are presented in groups of different colors.\n\nggplot(cld3) +\n  aes(x = V, y = emmean, color = N) +\n  geom_point(position = position_dodge(width = 0.9)) +\n  geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL), \n                              position = position_dodge(width = 1),\n                width = 0.1) +\n  geom_text(mapping = aes(label = .group, y = upper.CL * 1.05), \n            position = position_dodge(width = 0.8), \n            show.legend = F)+\n  theme_bw()+\n  theme(axis.text= element_text(color = \"black\",\n                                size =12))\n\n\n\n\n\n\n\n\nRecall: groups that do not differ significantly from each other share the same letter.\nwe can also use emmip() built in emmeans package to look at the trend in interaction of variety and nitrogen factors.\n\nemmip(model1, N ~ V)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nMore details on emmeans\n\n\n\nIf you want to read more about emmeans, please refer to vignettes on this CRAN page.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/means-and-contrasts.html#conclusion",
    "href": "chapters/means-and-contrasts.html#conclusion",
    "title": "12  Marginal Means & Contrasts",
    "section": "12.8 Conclusion",
    "text": "12.8 Conclusion\nBe cautious with the terms “significant” and “nonsignificant”, and don’t ever interpret a “non-significant” result as saying that there is no effect. Follow good statistical practices such as getting the model right first, and using adjusted P values for appropriately chosen families of comparisons or contrasts.\n\n\n\n\n\n\nP values, “significance”, and recommendations\n\n\n\nP values are often misinterpreted, and the term “statistical significance” can be misleading. Please refer to this link to read more about basic principles outlined by the American Statistical Association when considering p-values.",
    "crumbs": [
      "<span class='chapter-number'>12</span>  <span class='chapter-title'>Marginal Means and Contrasts</span>"
    ]
  },
  {
    "objectID": "chapters/variance-components.html",
    "href": "chapters/variance-components.html",
    "title": "13  Variance & Variance Components",
    "section": "",
    "text": "13.1 Unequal Variance\nMixed models provide the advantage of being able to estimate the variance of random variables. Instead of looking at a variable as a collection of specific levels to estimate, random effects view variables as being a random drawn from a normal distribution with a standard deviation. The decision of how to designate a variable as random or fixed depends on",
    "crumbs": [
      "<span class='chapter-number'>13</span>  <span class='chapter-title'>Variance and Variance Components</span>"
    ]
  },
  {
    "objectID": "chapters/variance-components.html#unequal-variance",
    "href": "chapters/variance-components.html#unequal-variance",
    "title": "13  Variance & Variance Components",
    "section": "",
    "text": "13.1.1 Case 1: Unequal Variance Due to a Factor\n\nvar_ex1 &lt;- here::here(read.csv(\"data\", \"MET_trial_variance.csv\"))\n\n\nvar_ex1$block &lt;- as.character(var_ex1$block)\nhist(var_ex1$yield)\nboxplot(yield ~ site, data = var_ex1)\n\n\nm1_a &lt;- lme(yield ~ site:variety + variety, \n                random = ~ 1 |site/block, \n                na.action = na.exclude, \n                data = var_ex1)\n\n\nm1_b &lt;- update(m1_a, weights = varIdent(form = ~1|site))\n\n\n\nm1_b &lt;- update(m1_a, weights = varIdent(form = ~1|site))\n\nis equivalent to\n\nm1_b &lt;- lme(yield ~ site:variety + variety, \n                random = ~ 1 |site/block,\n                weights = varIdent(form = ~1|site), \n                na.action = na.exclude, \n                data = var_ex1)\n\n\n\n\n13.1.2 Case 2: Variance is related to the fitted values\n\nvar_ex2 &lt;- read.csv(here::here(\"data\", \"single_trial_variance.csv\"))\n\n\nvar_ex1$block &lt;- as.character(var_ex1$block)\nhist(var_ex2$yield)\n\n\nm2_a &lt;- lme(yield ~ variety, \n               random = ~ 1 |block, \n               na.action = na.exclude, \n               data = var_ex2)\n\n\nm2_b &lt;- update(m2_a, weights = varPower())",
    "crumbs": [
      "<span class='chapter-number'>13</span>  <span class='chapter-title'>Variance and Variance Components</span>"
    ]
  },
  {
    "objectID": "chapters/variance-components.html#coefficient-of-variation",
    "href": "chapters/variance-components.html#coefficient-of-variation",
    "title": "13  Variance & Variance Components",
    "section": "13.2 Coefficient of Variation",
    "text": "13.2 Coefficient of Variation\n\nm2_ave &lt;- fixef(m2_b)[1]\nnames(m2_b) &lt;- NULL\n\n\nm2_cv = sigma(m2_b)/m2_ave*100\nm2_cv\n\n\n13.2.1 Looking at Variance Components\n\nvar_comps &lt;- read.csv(here::here(\"data\", \"potato_tuber_size.csv\"))",
    "crumbs": [
      "<span class='chapter-number'>13</span>  <span class='chapter-title'>Variance and Variance Components</span>"
    ]
  },
  {
    "objectID": "chapters/troubleshooting.html",
    "href": "chapters/troubleshooting.html",
    "title": "14  Troubleshooting",
    "section": "",
    "text": "14.1 Common Errors we Encounter",
    "crumbs": [
      "<span class='chapter-number'>14</span>  <span class='chapter-title'>Troubleshooting</span>"
    ]
  },
  {
    "objectID": "chapters/troubleshooting.html#common-errors-we-encounter",
    "href": "chapters/troubleshooting.html#common-errors-we-encounter",
    "title": "14  Troubleshooting",
    "section": "",
    "text": "14.1.1 Convergence Issues\n[lme4 convergence warnings\nmore\n\n\n14.1.2 Other",
    "crumbs": [
      "<span class='chapter-number'>14</span>  <span class='chapter-title'>Troubleshooting</span>"
    ]
  },
  {
    "objectID": "chapters/additional-resources.html",
    "href": "chapters/additional-resources.html",
    "title": "15  Additional Resources",
    "section": "",
    "text": "15.1 Further Reading",
    "crumbs": [
      "<span class='chapter-number'>15</span>  <span class='chapter-title'>Additional Resources</span>"
    ]
  },
  {
    "objectID": "chapters/additional-resources.html#further-reading",
    "href": "chapters/additional-resources.html#further-reading",
    "title": "15  Additional Resources",
    "section": "",
    "text": "lme4 vignette for fitting linear mixed models\nMixed-Effects Models in S and S-PLUS thee book for nlme, by José C. Pinheiro and Douglas M. Bates. We used this book extensively for developing this guide. Sadly, it’s both out of print and we could not find a free copy online. However, there are affordable used copies available.\nMixed Effects Models and Extensions in Ecology with R by Alain F. Zuur, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith.\nANOVA and Mixed Models by Lukas Meier",
    "crumbs": [
      "<span class='chapter-number'>15</span>  <span class='chapter-title'>Additional Resources</span>"
    ]
  },
  {
    "objectID": "chapters/additional-resources.html#other-resources",
    "href": "chapters/additional-resources.html#other-resources",
    "title": "15  Additional Resources",
    "section": "15.2 Other Resources",
    "text": "15.2 Other Resources\n\nEasy Stats a collection of R packages to assist in statistical modelling, with a big focus on linear models.\nMixed Model CRAN Task View a curated list of R packages relevant to mixed modelling. This is a great place to start\nR-SIG-mixed-models mailing list for help and discussion of mixed-model-related questions, course announcements, etc\nGrammar of Experimental Designs by Emi Tanaka. This has a great description of basic principles of experimental design.",
    "crumbs": [
      "<span class='chapter-number'>15</span>  <span class='chapter-title'>Additional Resources</span>"
    ]
  },
  {
    "objectID": "references.html",
    "href": "references.html",
    "title": "References",
    "section": "",
    "text": "Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015.\n“Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical\nSoftware 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.\n\n\nBolker, Ben, and David Robinson. 2024. Broom.mixed: Tidying Methods\nfor Mixed Models. https://CRAN.R-project.org/package=broom.mixed.\n\n\nHartig, Florian. 2022. DHARMa: Residual Diagnostics for Hierarchical\n(Multi-Level / Mixed) Regression Models. https://CRAN.R-project.org/package=DHARMa.\n\n\nJohn, JA, and ER Williams. 1995. Cyclic and Computer\nGenerated Designs. 2nd ed. New York:\nChapman; Hall/CRC Press. https://doi.org/10.1201/b15075.\n\n\nKuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen.\n2017. “lmerTest Package: Tests in\nLinear Mixed Effects Models.” Journal of Statistical\nSoftware 82 (13): 1–26. https://doi.org/10.18637/jss.v082.i13.\n\n\nLenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka\nLeast-Squares Means. https://CRAN.R-project.org/package=emmeans.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip\nWaggoner, and Dominique Makowski. 2021. “performance: An R Package for\nAssessment, Comparison and Testing of Statistical Models.”\nJournal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nPinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models\nin s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.\n\n\nPinheiro, José, Douglas Bates, and R Core Team. 2023. Nlme: Linear\nand Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.\n\n\nYates, F. 1936. “A New Method of Arranging Variety Trials\nInvolving a Large Number of Varieties.” J Agric Sci 26:\n424–55.",
    "crumbs": [
      "References"
    ]
  }
]