-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsearch.json
531 lines (531 loc) · 204 KB
/
search.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
[
{
"objectID": "index.html",
"href": "index.html",
"title": "Field Guide to the R Mixed Model Wilderness",
"section": "",
"text": "Preface\n“Path in the Wilderness” by Erich Taeubel, Jr.\nRunning mixed models in R is no easy task. There are dozens of packages supporting these aims, each with varying functionality, syntax, and conventions. The linear mixed model ecosystem in R consists of over 80 libraries that either construct and solve mixed model equations or helper packages the process the results from mixed model analysis. These libraries provide a patchwork of overlapping and unique functionality regarding the fundamental structure of mixed models: allowable distributions, nested and crossed random effects, heterogeneous error structures and other facets. No single library has all possible functionality enabled.\nThis patchwork of packages makes it very challenging for statisticians to conduct mixed model analysis and to teach others how to run mixed models in R. The purpose of this guide to to provide some recipes for handling common analytical scenario’s that require mixed models. As a field guide, it is intended to be succinct, and to help researchers meet their analytic goals.\nIn general, the content from this website may not be copied or reproduced without attribution. However, the example code and required data sets to run the code are MIT licensed. These can be accessed on GitHub.",
"crumbs": [
"Preface"
]
},
{
"objectID": "index.html#what-this-does-not-cover",
"href": "index.html#what-this-does-not-cover",
"title": "Field Guide to the R Mixed Model Wilderness",
"section": "What This Does Not Cover",
"text": "What This Does Not Cover\n\nGeneralized linear models where the response variable does not follow a normal distribution. We do address cases of unequal variance, but if another distribution and/or a link function is required for the model, that is not addressed in this guide.\nBasic principles of experimental design. We assume you know this, but if you do not, please check out the Grammar of Experimental Design for guidance on these topics.\nInstructions in using R. We assume familiarity with R. If you need help in learning R, there are numerous guides, including our introductory R course.",
"crumbs": [
"Preface"
]
},
{
"objectID": "index.html#notice",
"href": "index.html#notice",
"title": "Field Guide to the R Mixed Model Wilderness",
"section": "Notice!",
"text": "Notice!\nThis is a work-in-progress and will be updated over time.",
"crumbs": [
"Preface"
]
},
{
"objectID": "chapters/intro.html",
"href": "chapters/intro.html",
"title": "1 Introduction",
"section": "",
"text": "1.1 Terms\nThis guide is focused on frequentist implementations of mixed models in R, covering different scenarios common in the agricultural and life sciences.\nThis is not intended to be a guide to the theory of mixed models, it is focused on implementations of models only.\nPlease read this section and refer back to if when you forget what these terms mean.",
"crumbs": [
"<span class='chapter-number'>1</span> <span class='chapter-title'>Introduction</span>"
]
},
{
"objectID": "chapters/intro.html#terms",
"href": "chapters/intro.html#terms",
"title": "1 Introduction",
"section": "",
"text": "Table 1.1: Terms definitions\n\n\n\n\n\n\n\n\n\nTerm\nDefinition\n\n\n\n\nRandom effect\nAn independent variable where the levels being estimated compose a random sample from a population whose variance will be estimated\n\n\nFixed effect\nAn independent variable with specific, predefined levels to estimate\n\n\nExperimental unit\nThe smallest unit being used for analysis. This could be an animal, a field plot, a person, a meat or muscle sample. The unit may be assessed multiple times or through multiple point in time. When the analysis is all said and done, the predictions occur at this level.",
"crumbs": [
"<span class='chapter-number'>1</span> <span class='chapter-title'>Introduction</span>"
]
},
{
"objectID": "chapters/intro.html#packages",
"href": "chapters/intro.html#packages",
"title": "1 Introduction",
"section": "1.2 Packages",
"text": "1.2 Packages\n\n1.2.1 Table of required packages for modelling\n\n\n\nTable 1.2: Table of required packages\n\n\n\n\n\nPackage\nPurpose\n\n\n\n\nlme4 (Bates et al. 2015)\nmain package for linear mixed models\n\n\nlmerTest (Kuznetsova, Brockhoff, and Christensen 2017)\nfor computing p-values when using lme4\n\n\nnlme (J. Pinheiro, Bates, and R Core Team 2023; J. C. Pinheiro and Bates 2000)\nmain package for linear mixed models and part of ‘base R’\n\n\nemmeans (Lenth 2022)\nfor estimating fixed effects, their confidence intervals and conducting contrasts\n\n\nbroom.mixed (Bolker and Robinson 2024)\npackage for presenting the model summary output into a tidy workflow.\n\n\nDHARMa (Hartig 2022)\nfor evaluating residuals (error terms) in generalized linear models\n\n\nperformance (Lüdecke et al. 2021)\nFor creating diagnostic plots or to compute fit measures\n\n\n\n\n\n\n\n\n1.2.2 Optional packages\n\n\n\nTable 1.3: Table of optional packages\n\n\n\n\n\nPackage Name\nFunction\n\n\nhere\nFor setting work directory\n\n\nggplot\nplotting\n\n\ndesplot\nplotting\n\n\nagridat\nto download example dataset\n\n\nagricolae\nto download example dataset\n\n\n\n\n\n\nThis entire guide will use the here package for loading data. If you can load your data fine without this package, please carry on. ‘here’ is certainly not required for running mixed models.\n\n\n\n\nBates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.\n\n\nBolker, Ben, and David Robinson. 2024. Broom.mixed: Tidying Methods for Mixed Models. https://CRAN.R-project.org/package=broom.mixed.\n\n\nHartig, Florian. 2022. DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. https://CRAN.R-project.org/package=DHARMa.\n\n\nKuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen. 2017. “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software 82 (13): 1–26. https://doi.org/10.18637/jss.v082.i13.\n\n\nLenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski. 2021. “performance: An R Package for Assessment, Comparison and Testing of Statistical Models.” Journal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nPinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models in s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.\n\n\nPinheiro, José, Douglas Bates, and R Core Team. 2023. Nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.",
"crumbs": [
"<span class='chapter-number'>1</span> <span class='chapter-title'>Introduction</span>"
]
},
{
"objectID": "chapters/analysis-tips.html",
"href": "chapters/analysis-tips.html",
"title": "2 Tips on Analysis",
"section": "",
"text": "Below are some things our office frequently says to researchers.\n\n2.0.1 Think About Your Analytical Goals\nThroughout this guide, we have tried to explicitly state the goals of each analysis. This helps informs how to approach the analysis of an experiment. It can be difficult, especially for new scientists-in-training (i.e. graduate students), to understand what it is they want to estimate. You may have been handed a data set you had no role in generating and told to “analyze this” with no additional context. Or perhaps you may have conducted a large study that has some overall goals that are lofty, yet vague. And now you must translate the vague aims into clear statistical questions.\nIt can helpful to think about the exact results you are hoping to get. What does this look like exactly? Do you want to estimate the changes in plant diversity as the result of a herbicide spraying program? Do you want to find out if a fertilizer treatment changed protein content in a crop and by how much? Do you want to know about changes in human diet due to an intervention? What are quantifiable difference that you and/or experts in your domain would find meaningful?\nConsider what the results would look like for (1) the best case scenario where your wildest research dreams come true, and (2) null results, when you find out that your treatment or invention had no effect. It’s very helpful to understand and recognize exactly what both situations look like.\nBy “consider”, we mean: imagine the final plot or table, or summary sentence you want to present, either in a peer-reviewed manuscript, or some output for stakeholders. From this, you can work backwards to determine the analytical approach needed to arrive at that desired final output. Or you may determine that your data are unsuitable to generate the desired output, in which case, it’s best to determine that as soon as possible.\nBy “consider”, we also mean: imagine exactly what the spreadsheet of results would contain after a successful trial. What columns are present and what data are in those cells. If you are planning an experiment, this can help ensure you plan it properly to actually test whatever it is you want to evaluate. If the experiment is done, this enables you to evaluate if you have the information present to test your hypothesis.\nBy taking the time to reflect on what it is you exactly want to analyze, this can save time and prevent you from doing unneeded analyzes that don’t serve this final goal. There is rarely (never?) one way to analyze an experiment or a data set, so use your limited time wisely and focus on what matters to you most.\n\n\n2.0.2 Know That Data Cleaning is Time Consuming\n\n\n\n\n\n\n\n\n\nFigure 2.1: How you will spend your time\n\n\n\n\nThis has and will continue to occupy the majority of researcher’s time when conducting an analysis. Truly, we are sorry for this. But, please know it is not you, it is the nature of data. Plan for and prepare yourself mentally to spend time cleaning and preparing your data for analysis.1 This will likely take way longer than the actual analysis! It is needed to ensure you can actually get correct results in an analysis, and hence data cleaning is worth the time it requires.\n1 For an excellent set of basic instructions on data preparation, please see: Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2–10.\n\n2.0.3 Interpret ANOVA and P-values with Caution\n\nInformally, a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.\n---American Statistical Association\n\nThe great majority of researched are deeply interested in p-values. This is not a bad thing per se, but sometimes the focus is so strong it comes at the expense of other valuable pieces of information, like treatment estimates! Russ Leanth, author of the emmeans package refers to this particular practice as “star gazing”.\nIt is important to evaluate why you want to do ANOVA, what extra information it will bring and what you plan to do with those results. Sometimes, researchers want to conduct an ANOVA even though the original goals of analysis were reached without it. Running an ANOVA may increase or decrease confidence in your other results. That is not at all what ANOVA is intended to do, nor is this what p-values can tell us. ANOVA compares across group variation to within group variation. It cannot tell us if anything is the ‘same’ (there’s a separate branch of analysis, ‘equivalence testing’, for that), and it cannot tell us specifically what is different, unless you are fortunate enough to only have 2 levels in your treatment structure. P-values provide no guarantee that something is truly different or not; it only quantifies the probability you could have observed these results by chance.\nThe American Statistics Association recommends that “Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”2 That article also explains what p-values are telling us and how to avoid committing analytical errors and/or misinterpreting p-values. If you have time to read the full article, it will benefit your research!\n2 Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133.The main problematic behavior I see is researchers using p-values as the sole criteria on whether to present results: “We wanted to test if x, y and z had an effect. We ran some model and found that that only x had a significant effect, and those results indicate…” (while results with a p-value > 0.05 are ignored).\nA better option would be to discuss the the results of the analysis and how they addressed the research questions: how did the dependent variable change (or not change) as a result of the treatments/interventions/independent variables? What are the parameters or treatment predictions and what do they tell us with regard to the research goals? And to bolster those estimates, what are the confidence intervals on those estimates? What are the p-values for the statistical tests? P-values can support the results and conclusions, but the main results desired by a researcher are usually the estimates themselves - so lead with that!\nTo learn more about common pitfalls in interpreting p-values, check out our blog post on the subject and/or this paper3 on the subject.\n3 Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 31(4):337-50.\n\n2.0.4 Comments on Hypothesis Testing and Usage of Treatment Letters\nOften, I see researchers use compact letter display (e.g. “A”, “B”, “C”, ….) for indicating differences among treatments. This makes for concise presentation of results in tables and figures, but it can both kill statistical power and misses nuance in the results.\n\n\n\nImage from a paper published in 2024. Although this was a fully crossed factorial experiment, compact letter display was implemented across all treatment combinations, resulting in some nonsensical comparisons among some more informative contrasts. What a waste.\nImplementing compact letter display can kill statistical power (the probability of detecting true differences) because it requires that all pairwise comparison being made. Doing this, especially when there are many treatment levels, has its perils. The biggest problem is that this creates a multiple testing problem. The RCBD example in this guide has 42 treatments, resulting in a total of 861 comparisons (\\(=42*(42-1)/2\\)), that are then adjusted for multiple tests. With that many tests, a severe adjustment is likely and hence things that are different are not detected. With so many tests, it could be that there is an overall effect due to treatment, but they all share the same letter!\nThe second problem is one of interpretation. Just because two treatments or varieties share a letter does not mean they are equivalent. It only means that they were not found to be different. A funny distinction, but alas. There is an entire branch of statistics, ‘equivalence testing’ devoted to just this topic - how to test if two things are actually the same. This involves the user declaring a maximum allowable numeric difference for a variable in order to determine if two items are statistically different or equivalent - something that these pairwise comparisons are not doing.]\nAnother problem is that doing all pairwise comparison may not align with experimental goals. In many circumstances, not every pairwise combination is of any interest or relevance to the study. Additionally, complex treatment structure may necessitate custom contrasts that highlight differences between the marginal estimate of multiple treatments versus another. For example, there may be 2 levels of ‘high’ nitrogen fertilizer treatment with two different sources (i.e. types of fertilizer). A researcher may want to contrast those two levels together against ‘low’ nitrogen treatment levels.\nOften, researchers have embedded additional structure in the treatments that is not fully reflected in the statistical model. For example, perhaps a study is looking at five different intercropping mixtures, two that incorporate a legume and 3 that do not. Conducting all pairwise comparisons with miss estimating the difference due to including a legume in an intercropping mix and not incorporating one. Soil fertility and other agronomic studies often have complex treatment structure. When it is not practical or financially feasible to have a full factorial experiment, embedding different treatment combinations in the main factor of analysis can accomplish this. This is a good study design approach, but compact letter display is an efficient way to report results. In such cases, custom contrasts are a better choice for hypothesis testing.The emmeans chapter covers how to do this.\n\n\n2.0.5 Final Thoughts\nGood statistical analysis requires a thoughtful, intentional approach. If you have gone to the trouble to conduct a well designed experiment or assemble a useful data set, take the time and effort to analyze it properly.\n\n\n\n\nBroman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.\n\n\nGreenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.\n\n\nWasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.",
"crumbs": [
"<span class='chapter-number'>2</span> <span class='chapter-title'>Tao of Analysis</span>"
]
},
{
"objectID": "chapters/background.html",
"href": "chapters/background.html",
"title": "3 Mixed model theory and background",
"section": "",
"text": "3.1 Model\nMixed-effects models are called “mixed” because they simultaneously model fixed and random effects. Fixed effects (e.g. treatments) represent population-level (average) effects that should persist across experiments. Fixed effects are similar to the parameters found in “traditional” regression techniques like ordinary least squares. Random effects are discrete units sampled from some population (e.g. plots, participants), and thus they are inherently categorical.\nRecall simple linear regression with intercept (\\(\\beta_0\\)) and slope (\\(\\beta_1\\)) effect for subject \\(i\\). The slope and intercept are chosen in a way so that the residual sum of squares is minimized.\n\\[ Y = \\beta_0 + \\beta_1 X + \\epsilon \\]\nIf we consider this model in a mixed model framework, \\(\\beta_0\\) and \\(\\beta_0\\) are considered fixed effects (also known as the population-averaged values) and \\(b_i\\) is a random effect for subject i. The random effect can be thought of as each subject’s deviation from the fixed intercept parameter. The key assumption about \\(b_i\\) is that it is independent, identically and normally distributed with a mean of zero and associated variance. Random effects are especially useful when we have (1) lots of levels (e.g., many species or blocks), (2) relatively little data on each level (although we need multiple samples from most of the levels), and (3) uneven sampling across levels.\nFor example, if we let the intercept be a random effect, it takes the form:\n\\[ Y = \\beta_0 + b_i + \\beta_1 X + \\epsilon \\]\nIn this model, predictions would vary depending on each subject’s random intercept term, but slopes would be the same.\nIn second case, we can have a fixed intercept and a random slope. The model will be:\n\\[ Y = \\beta_0 + (\\beta_1 + b_i)(X) + \\epsilon\\]\nIn this model, the \\(\\beta_i\\) is a random effect for subject \\(i\\). Predictions would vary with random slope term, but the intercept will be the same:\nThird case would be the mixed model with random slope and intercept:\n\\[ Y = (\\beta_0 + a_i) + (\\beta_1 + b_i)(X) + \\epsilon\\]\nIn this model, \\(a_i\\) and \\(b_i\\) are random effects for subject \\(i\\) applied to the intercept and slope, respectively. Predictions would vary depending on each subject’s slope and intercept terms:",
"crumbs": [
"<span class='chapter-number'>3</span> <span class='chapter-title'>Mixed Model Background</span>"
]
},
{
"objectID": "chapters/background.html#model",
"href": "chapters/background.html#model",
"title": "3 Mixed model theory and background",
"section": "",
"text": "Example mixed model with random intercepts but identical slopes.\n\n\n\n\n\n\n\n\n\n\nMixed model with random slopes but identical intercepts.\n\n\n\n\n\n\n\n\n\n\nMixed Model with random intercept and slope",
"crumbs": [
"<span class='chapter-number'>3</span> <span class='chapter-title'>Mixed Model Background</span>"
]
},
{
"objectID": "chapters/background.html#formula-notation",
"href": "chapters/background.html#formula-notation",
"title": "3 Mixed model theory and background",
"section": "3.2 R Formula Syntax for Random and Fixed Effects",
"text": "3.2 R Formula Syntax for Random and Fixed Effects\nFormula notation is often used in the R syntax for linear models. It looks like this: \\(Y ~ X\\), where \\(Y\\) is the dependent variable (the response) and \\(X\\) is/are the independent variable(s) that is, the experimental treatments or interventions.\n\nmy_formula <- formula(Y ~ treatment1 + treatment2)\nclass(my_formula)\n\n[1] \"formula\"\n\n\nThe package ‘lme4’ has some additional conventions regarding the formula. Random effects are put in parentheses and a 1| is used to denote random intercepts (rather than random slopes). The table below provides several examples of random effects in mixed models. The names of grouping factors are denoted g, g1, and g2, and covariates as x.\n\n\n\n\n\n\n\n\nFormula\nAlternative\nMeaning\n\n\n\n\n(1|g)\n1 + (1|g)\nRandom intercept with a fixed mean\n\n\n(1|g1/g2)\n(1| 1) + (1|g1:g2)\nIntercept varying among g1 and g2 within g1\n\n\n(1|g1) + (1|g2)\n1 + (1|g1) + (1|g2)\nIntercept varying among g1 and g2\n\n\nx + (x|g)\n1 + x + (1 + x|g)\nCorrelated random intercept and slope\n\n\nx + (x||g)\n1 + x + (1|g) + (0 + x|g)\nUncorrelated random intercept and slope\n\n\n\nThe first example, (1|g) suffices for most models and is the only structure used in this guide.",
"crumbs": [
"<span class='chapter-number'>3</span> <span class='chapter-title'>Mixed Model Background</span>"
]
},
{
"objectID": "chapters/rcbd.html",
"href": "chapters/rcbd.html",
"title": "4 Randomized Complete Block Design",
"section": "",
"text": "4.1 Background\nThis is a simple model that can serve as a good entrance point to mixed models.\nRandomized complete block design (RCBD) is very common design where experimental treatments are applied at random to experimental units within each block. The block can represent a spatial or temporal unit or even different technicians taking data. The blocks are intended to control for a nuisance source of variation, such as over time, spatial variance, changes in equipment or operators, or myriad other causes. They are a random effect where the actual blocks used in the study are a random sample of a distribution of other blocks.\nThe statistical model:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\] Where:\n\\(\\mu\\) = overall experimental mean \\(\\alpha\\) = treatment effects (fixed) \\(\\beta\\) = block effects (random) \\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\]\nBoth the overall error and the block effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma\\) and \\(sigma_B\\), respectively.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>4</span> <span class='chapter-title'>Randomized Complete Block Design</span>"
]
},
{
"objectID": "chapters/rcbd.html#background",
"href": "chapters/rcbd.html#background",
"title": "4 Randomized Complete Block Design",
"section": "",
"text": "‘iid’ assumption for error terms\n\n\n\nIn this model, the error terms, \\(\\epsilon\\) are assumed to be “iid”, that is, independently and identically distributed. This means they have constant variance and they each individual error term is independent from the others.\nThis guide will later address examples when this assumption is violated and how to handle it.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>4</span> <span class='chapter-title'>Randomized Complete Block Design</span>"
]
},
{
"objectID": "chapters/rcbd.html#example-analysis",
"href": "chapters/rcbd.html#example-analysis",
"title": "4 Randomized Complete Block Design",
"section": "4.2 Example Analysis",
"text": "4.2 Example Analysis\nFirst, load the libraries for analysis and estimation:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr)\n\n\n\n\nNext, let’s load some data. It is located here if you want to download it yourself (recommended).\nThis data set is for a single wheat variety trial conducted in Aberdeen, Idaho in 2015. The trial includes 4 blocks and 42 different treatments (wheat varieties in this case). This experiment consists of a series of plots (the experimental unit) laid out in a rectangular grid in a farm field. The goal of this analysis is the estimate the yield of each variety and the determine the rankings of each variety for the variable.\n\nvar_trial <- read.csv(here::here(\"data\", \"aberdeen2015.csv\"))\n\n\nTable of variables in the data set\n\n\n\n\n\n\nblock\nblocking unit\n\n\nrange\ncolumn position for each plot\n\n\nrow\nrow position for each plot\n\n\nvariety\ncrop variety (the treatment) being evaluated\n\n\nstand_pct\npercentage of the plot with actual plants growing in them\n\n\ndays_to_heading_julian\nJulian days (starting January 1st) until plot “headed” (first spike emerged)\n\n\nlodging\npercentage of plants in the plot that fell down and hence could not be harvested\n\n\nyield_bu_a\nyield (bushels per acre)\n\n\n\nThere are several variables present that are not useful for this analysis. The only thing we are concerned about is block, variety, yield_bu_a, and test_weight.\n\n4.2.1 Data integrity checks\nThe first thing is to make sure the data is what we expect. There are two steps:\n\nmake sure data are the expected data type\ncheck the extent of missing data\ninspect the independent variables and make sure the expected levels are present in the data\ninspect the dependent variable to ensure its distribution is following expectations\n\n\nstr(var_trial)\n\n'data.frame': 168 obs. of 10 variables:\n $ block : int 4 4 4 4 4 4 4 4 4 4 ...\n $ range : int 1 1 1 1 1 1 1 1 1 1 ...\n $ row : int 1 2 3 4 5 6 7 8 9 10 ...\n $ variety : chr \"DAS004\" \"Kaseberg\" \"Bruneau\" \"OR2090473\" ...\n $ stand_pct : int 100 98 96 100 98 100 100 100 99 100 ...\n $ days_to_heading_julian: int 149 146 149 146 146 151 145 145 146 146 ...\n $ height : int 39 35 33 31 33 44 30 36 36 29 ...\n $ lodging : int 0 0 0 0 0 0 0 0 0 0 ...\n $ yield_bu_a : num 128 130 119 115 141 ...\n $ test_weight : num 56.4 55 55.3 54.1 54.1 56.4 54.7 57.5 56.1 53.8 ...\n\n\nThese look okay except for block, which is currently coded as integer (numeric). We don’t want run a regression of block, where block 1 has twice the effect of block 2, and so on. So, converting it to a character will fix that. It can also be converted to a factor, but character variables are a bit easier to work with, and ultimately, equivalent to factor conversion\n\nvar_trial$block <- as.character(var_trial$block)\n\nNext, check the independent variables. Running a cross tabulations is often sufficient to ascertain this.\n\ntable(var_trial$variety, var_trial$block)\n\n \n 1 2 3 4\n 06-03303B 1 1 1 1\n Bobtail 1 1 1 1\n Brundage 1 1 1 1\n Bruneau 1 1 1 1\n DAS003 1 1 1 1\n DAS004 1 1 1 1\n Eltan 1 1 1 1\n IDN-01-10704A 1 1 1 1\n IDN-02-29001A 1 1 1 1\n IDO1004 1 1 1 1\n IDO1005 1 1 1 1\n Jasper 1 1 1 1\n Kaseberg 1 1 1 1\n LCS Artdeco 1 1 1 1\n LCS Biancor 1 1 1 1\n LCS Drive 1 1 1 1\n LOR-833 1 1 1 1\n LOR-913 1 1 1 1\n LOR-978 1 1 1 1\n Madsen 1 1 1 1\n Madsen / Eltan (50/50) 1 1 1 1\n Mary 1 1 1 1\n Norwest Duet 1 1 1 1\n Norwest Tandem 1 1 1 1\n OR2080637 1 1 1 1\n OR2080641 1 1 1 1\n OR2090473 1 1 1 1\n OR2100940 1 1 1 1\n Rosalyn 1 1 1 1\n Stephens 1 1 1 1\n SY Ovation 1 1 1 1\n SY 107 1 1 1 1\n SY Assure 1 1 1 1\n UI Castle CLP 1 1 1 1\n UI Magic CLP 1 1 1 1\n UI Palouse 1 1 1 1\n UI Sparrow 1 1 1 1\n UI-WSU Huffman 1 1 1 1\n WB 456 1 1 1 1\n WB 528 1 1 1 1\n WB1376 CLP 1 1 1 1\n WB1529 1 1 1 1\n\n\nThere are 42 varieties and there appears to be no mis-spellings among them that might confuse R into thinking varieties are different when they are actually the same. R is sensitive to case and white space, which can make it easy to create near duplicate treatments, such as “eltan” and “Eltan” and “Eltan”. There is no evidence of that in this data set. Additionally, it is perfectly balanced, with exactly one observation per treatment per rep. Please note that this does not tell us anything about the extent of missing data.\n\n\n\n\n\n\nMissing Data\n\n\n\nHere is a quick check to count the number of missing data in each column. This is not neededfor the data sets in this tutorial that have already been comprehensively examined, but it is helpful to check that the level of missingness displayed in an R session is what you expect.\n\napply(var_trial, 2, function(x) sum(is.na(x)))\n\n block range row \n 0 0 0 \n variety stand_pct days_to_heading_julian \n 0 0 0 \n height lodging yield_bu_a \n 0 0 0 \n test_weight \n 0 \n\n\nAlas, no missing data!\n\n\nIf there were independent variables with a continuous distribution (a covariate), plot those data.\nLast, check the dependent variable. A histogram is often quite sufficient to accomplish this. This is designed to be a quick check, so no need to spend time making the plot look good.\n\n\n\n\n\n\n\n\n\nFigure 4.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(var_trial$yield_bu_a, main = \"\", xlab = \"yield\")\n\nThe range is roughly falling into the range we expect. We (the authors) know this from talking with the person who generated the data, not through our own intuition. There are mp large spikes of points at a single value (indicating something odd), nor are there any extreme values (low or high) that might indicate problems.\nData are not expected to be normally distributed at this point, so don’t bother running any Shapiro-Wilk tests. This histogram is a check to ensure the the data are entered correctly and they appear valid. It requires a mixture of domain knowledge and statistical training to know this, but over time, if you look at these plots with regularity, you will gain a feel for what your data should look like at this stage.\nThese are not complicated checks. They are designed to be done quickly and should be done for every analysis if you not previously already inspected the data as thus. We do this before every analysis and often discover surprising things! Best to discover these things early, since they are likely to impact the final analysis.\nThis data set is ready for analysis!\n\n\n4.2.2 Model Building\n\n\nRecall the model:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\]\nFor this model, \\(\\alpha_i\\) is the variety effect (fixed) and \\(\\beta_j\\) is the block effect (random).\nHere is the R syntax for the RCBD statistical model:\n\nlme4nlme\n\n\n\nmodel_rcbd_lmer <- lmer(yield_bu_a ~ variety + (1|block),\n data = var_trial, \n na.action = na.exclude)\n\n\n\n\nmodel_rcbd_lme <- lme(yield_bu_a ~ variety,\n random = ~ 1|block,\n data = var_trial, \n na.action = na.exclude)\n\n\n\n\nThe parentheses are used to indicate that ‘block’ is a random effect, and this particular notation (1|block) indicates that a ‘random intercept’ model is being fit. This is the most common approach. It means there is one overall effect fit for each block.\nWe use the argument na.action = na.exclude as instruction for how to handle missing data: conduct the analysis, adjusting as needed for the missing data, and when prediction or residuals are output, please pad them in the appropriate places for missing data so they can be easily merged into the main data set if need be.\n\n\n4.2.3 Check Model Assumptions\n\n\nR syntax for checking model assumptions is the same for lme4 and nlme.\nRemember those iid assumptions? Let’s make sure we actually met them.\n\n4.2.3.1 Old Way\nThere are special plotting function written for lme4 and nlme objects (ie.plot(lmer_object)) for checking the homoscedasticity (constant variance).\n\n\n\n\n\n\n\n\n\nFigure 4.2: Plot of residuals versus fitted values\n\n\n\n\n\nplot(model_rcbd_lmer, resid(., scaled=TRUE) ~ fitted(.), \n xlab = \"fitted values\", ylab = \"studentized residuals\")\n\nWe are looking for a random and uniform distribution of points. This looks good!\nChecking normality requiring first extracting the model residuals with resid() and then generating a qq-plot and line.\n\n\n\n\n\n\n\n\n\nFigure 4.3: QQ-plot of residuals\n\n\n\n\n\nqqnorm(resid(model_rcbd_lmer), main = NULL); qqline(resid(model_rcbd_lmer))\n\nThis is reasonably good. Things do tend to fall apart at the tails a little, so this is not concerning.\n\n\n4.2.3.2 New Way\nNowadays, we can take advantage of the performance package, which provides a comprehensive suite of diagnostic plots.\n\n\nPlease look for check_model() in help tab to find what other checks you can perform using this function. If you would like to check all assumptions you can use the argument check = \"all\".\n\ncheck_model(model_rcbd_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n4.2.4 Inference\n\n\nR syntax for estimating model marginal means is the same for lme4 and nlme.\nEstimates for each treatment level can be obtained with the ‘emmeans’ package.\n\nrcbd_emm <- emmeans(model_rcbd_lmer, ~ variety)\nas.data.frame(rcbd_emm) %>% arrange(desc(emmean))\n\n variety emmean SE df lower.CL upper.CL\n Rosalyn 155.2703 7.212203 77.85 140.91149 169.6292\n IDO1005 153.5919 7.212203 77.85 139.23310 167.9508\n OR2080641 152.6942 7.212203 77.85 138.33536 167.0530\n Bobtail 151.6403 7.212203 77.85 137.28149 165.9992\n UI Sparrow 151.6013 7.212203 77.85 137.24245 165.9601\n Kaseberg 150.9768 7.212203 77.85 136.61794 165.3356\n IDN-01-10704A 148.9861 7.212203 77.85 134.62729 163.3450\n 06-03303B 148.8300 7.212203 77.85 134.47116 163.1888\n WB1529 148.2445 7.212203 77.85 133.88568 162.6034\n DAS003 145.2000 7.212203 77.85 130.84116 159.5588\n IDN-02-29001A 144.5755 7.212203 77.85 130.21665 158.9343\n Bruneau 143.9900 7.212203 77.85 129.63116 158.3488\n SY 107 143.6387 7.212203 77.85 129.27987 157.9975\n WB 528 142.9752 7.212203 77.85 128.61633 157.3340\n OR2080637 141.7652 7.212203 77.85 127.40633 156.1240\n Jasper 141.2968 7.212203 77.85 126.93794 155.6556\n UI Magic CLP 139.5403 7.212203 77.85 125.18149 153.8992\n Madsen 139.2671 7.212203 77.85 124.90826 153.6259\n LCS Biancor 139.1110 7.212203 77.85 124.75213 153.4698\n SY Ovation 138.6426 7.212203 77.85 124.28375 153.0014\n OR2090473 137.8229 7.212203 77.85 123.46407 152.1817\n Madsen / Eltan (50/50) 136.9642 7.212203 77.85 122.60536 151.3230\n UI-WSU Huffman 135.4810 7.212203 77.85 121.12213 149.8398\n Mary 134.8564 7.212203 77.85 120.49762 149.2153\n Norwest Tandem 134.3490 7.212203 77.85 119.99020 148.7079\n Brundage 134.0758 7.212203 77.85 119.71697 148.4346\n IDO1004 132.5145 7.212203 77.85 118.15568 146.8733\n DAS004 132.2413 7.212203 77.85 117.88245 146.6001\n Norwest Duet 132.0852 7.212203 77.85 117.72633 146.4440\n Eltan 131.4606 7.212203 77.85 117.10181 145.8195\n LCS Artdeco 130.8361 7.212203 77.85 116.47729 145.1950\n UI Palouse 130.4848 7.212203 77.85 116.12600 144.8437\n LOR-978 130.4458 7.212203 77.85 116.08697 144.8046\n LCS Drive 128.7674 7.212203 77.85 114.40858 143.1262\n Stephens 127.1671 7.212203 77.85 112.80826 141.5259\n OR2100940 126.1523 7.212203 77.85 111.79342 140.5111\n UI Castle CLP 125.5277 7.212203 77.85 111.16891 139.8866\n WB1376 CLP 123.6932 7.212203 77.85 109.33439 138.0521\n LOR-833 122.7565 7.212203 77.85 108.39762 137.1153\n LOR-913 118.7752 7.212203 77.85 104.41633 133.1340\n WB 456 118.4629 7.212203 77.85 104.10407 132.8217\n SY Assure 111.0468 7.212203 77.85 96.68794 125.4056\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\nThis table indicates the estimated marginal means (“emmeans”, sometimes called “least squares means”), the standard error (“SE”) of those means, the degrees of freedom and the upper and lower bounds of the 95% confidence interval. As an additional step, the emmeans were sorted from largest to smallest.\nAt this point, the analysis goals have been met: we know the estimated means for each treatment and their rankings.\nIf you want to run ANOVA, it can be done quite easily. By default, the Kenward-Rogers method of degrees of freedom approximation is used.\n\n\nThe Type I method is sometimes referred to as the “sequential” sum of squares, because it involves a process of adding terms to the model one at a time. Type I sum of squares is the default hypothesis testing method used by the anova() function. This only matters when a data set is unbalanced across treatments, either due to design or missing data points.\n\nlme4nlme\n\n\n\nanova(model_rcbd_lmer, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \nvariety 18354 447.65 41 123 2.4528 8.017e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_rcbd_lme, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 123 2514.1283 <.0001\nvariety 41 123 2.4528 1e-04\n\n\n\n\n\n\n\n\n\n\n\nna.action = na.exclude\n\n\n\nYou may have noticed the final argument for na.action in the model statement:\nmodel_rcbd_lmer <- lmer(yield_bu_a ~ variety + (1|block),\n data = var_trial, \n na.action = na.exclude)\nThe argument na.action = na.exclude provides instructions for how to handle missing data. na.exclude removes the missing data points before proceeding with the analysis. When any obervation-levels model outputs is generated (e.g. predictions, residuals), they are padded in the appropriate place to account for missing data. This is handy because it makes it easier to add those results to the original data set if so desired.\nSince there are no missing data, this step was not strictly necessary, but it’s a good habit to be in.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>4</span> <span class='chapter-title'>Randomized Complete Block Design</span>"
]
},
{
"objectID": "chapters/factorial-design.html",
"href": "chapters/factorial-design.html",
"title": "5 RCBD Design with Several Crossed Factors",
"section": "",
"text": "5.1 Background\nFactorial design involves studying the impact of multiple factors simultaneously. Each factor can have multiple levels, and combinations of these levels form the experimental conditions. This design allows us to understand the main effects of individual factors and their interactions on the response variable. The statistical model for factorial design is: \\[y_{ij} = \\mu + \\tau_i+ \\beta_j + \\tau_i\\beta_j + \\epsilon_{ij}\\] Where: \\(\\mu\\) = experiment mean, \\(\\tau\\) = effect of factor A, \\(\\beta\\) = effect of factor B, and \\(\\tau\\beta\\) = interaction effect of factor A and B.\nAssumptions of this model includes: independent and identically distributed error terms with a constant variance.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>5</span> <span class='chapter-title'>Factorial RCBD Design</span>"
]
},
{
"objectID": "chapters/factorial-design.html#example-analysis",
"href": "chapters/factorial-design.html#example-analysis",
"title": "5 RCBD Design with Several Crossed Factors",
"section": "5.2 Example Analysis",
"text": "5.2 Example Analysis\nFirst step is to load the libraries required for the analysis:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nNext, we will load the dataset named ‘cochran.factorial’ from the ‘agridat’ package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus. The goal of this analysis is the estimate the effect of d, n, p, k, and their interactions on bean yield.\nNote, while importing the data, d, n, p, and k were converted into factor variables using the mutate() function from dplyr package. This helps in reducing the extra steps of converting each single variable to factor manually.\n\nlibrary(agridat)\ndata1 <- agridat::cochran.factorial %>% \n mutate(d = as.factor(d),\n n = as.factor(n),\n p = as.factor(p),\n k = as.factor(k))\n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nrep\nreplication unit\n\n\ntrt\ntreatment factor, 16 levels\n\n\nd\ndung treatment, 2 levels\n\n\nn\nnitrogen treatment, 2 levels\n\n\np\nphosphorus treatment, 2 levels\n\n\nk\npotassium treatment, 2 levels\n\n\nyield\nyield (lbs)\n\n\n\nThe objective of this example is evaluate the individual and interactive effect of “d”, “n”, “p”, and “k” treatments on yield.\n\n5.2.1 Data Integrity Checks\nFirst step is to Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric/integer.\n\nstr(data1)\n\n'data.frame': 32 obs. of 8 variables:\n $ rep : Factor w/ 2 levels \"R1\",\"R2\": 1 1 1 1 1 1 1 1 1 1 ...\n $ block: Factor w/ 2 levels \"B1\",\"B2\": 1 1 1 1 1 1 1 1 2 2 ...\n $ trt : Factor w/ 16 levels \"(1)\",\"d\",\"dk\",..: 15 10 2 14 5 6 9 11 8 12 ...\n $ yield: int 45 55 53 36 41 48 55 42 50 44 ...\n $ d : Factor w/ 2 levels \"0\",\"1\": 2 2 1 2 1 1 1 2 1 2 ...\n $ n : Factor w/ 2 levels \"0\",\"1\": 2 2 2 1 1 1 2 1 2 1 ...\n $ p : Factor w/ 2 levels \"0\",\"1\": 1 2 2 1 2 1 1 2 1 2 ...\n $ k : Factor w/ 2 levels \"0\",\"1\": 2 1 2 1 1 2 1 2 2 1 ...\n\n\nThis looks good.\nNext step is to inspect the independent variables and make sure the expected levels are present in the data.\n\ntable(data1$d, data1$n, data1$p, data1$k)\n\n, , = 0, = 0\n\n \n 0 1\n 0 2 2\n 1 2 2\n\n, , = 1, = 0\n\n \n 0 1\n 0 2 2\n 1 2 2\n\n, , = 0, = 1\n\n \n 0 1\n 0 2 2\n 1 2 2\n\n, , = 1, = 1\n\n \n 0 1\n 0 2 2\n 1 2 2\n\n\nThe design looks well balanced.\nLast step is to inspect the dependent variable to ensure its distribution follows the bell-shaped curve and no skewness is there.\n\n\n\n\n\n\n\n\n\nFigure 5.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(data1$yield)\n\nNo extreme (low or high) yield values were observed in data.\n\n\n5.2.2 Model fitting\nModel fitting with R is exactly the same as shown in previous chapters: we need to include all effect, as well as the interaction, which is represented by using the colon indicator ‘:’. Therefore, model syntax is:\nyield ~ d + n + p + k + d:n + d:p + d:k + n:p + n:k + p:k + d:n:p:k\nwhich can be abbreviated as:\nyield ~ d*n*p*k\n\nlme4nlme\n\n\n\nmodel1_lmer <- lmer(yield ~ d*n*p*k + (1|block),\n data = data1, \n na.action = na.exclude)\ntidy(model1_lmer)\n\n# A tibble: 18 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 49 3.70 13.2 16.0 4.91e-10\n 2 fixed <NA> d1 -9.5 5.24 -1.81 16.0 8.84e- 2\n 3 fixed <NA> n1 0.500 5.24 0.0955 16.0 9.25e- 1\n 4 fixed <NA> p1 -11.5 5.24 -2.20 16.0 4.31e- 2\n 5 fixed <NA> k1 1.00 5.24 0.191 16.0 8.51e- 1\n 6 fixed <NA> d1:n1 13.5 7.82 1.73 16.0 1.03e- 1\n 7 fixed <NA> d1:p1 15.5 7.82 1.98 16.0 6.49e- 2\n 8 fixed <NA> n1:p1 9.50 7.82 1.22 16.0 2.42e- 1\n 9 fixed <NA> d1:k1 4.00 7.82 0.512 16.0 6.16e- 1\n10 fixed <NA> n1:k1 0.500 7.82 0.0639 16.0 9.50e- 1\n11 fixed <NA> p1:k1 3.00 7.82 0.384 16.0 7.06e- 1\n12 fixed <NA> d1:n1:p1 -14.5 12.1 -1.19 16.0 2.50e- 1\n13 fixed <NA> d1:n1:k1 -17.0 12.1 -1.40 16.0 1.81e- 1\n14 fixed <NA> d1:p1:k1 -7.00 12.1 -0.576 16.0 5.72e- 1\n15 fixed <NA> n1:p1:k1 -4.50 12.1 -0.370 16.0 7.16e- 1\n16 fixed <NA> d1:n1:p1:k1 25.0 19.9 1.26 16.0 2.27e- 1\n17 ran_pars block sd__(Intercep… 1.26 NA NA NA NA \n18 ran_pars Residual sd__Observati… 4.92 NA NA NA NA \n\n\n\n\n\nmodel2_lme <- lme(yield ~ d*n*p*k,\n random = ~ 1|block,\n data = data1, \n na.action = na.exclude)\ntidy(model2_lme)\n\n# A tibble: 18 × 8\n effect group term estimate std.error df statistic p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 49 4.79 15 10.2 3.66e-8\n 2 fixed <NA> d1 -9.5 6.77 15 -1.40 1.81e-1\n 3 fixed <NA> n1 0.500 6.77 15 0.0739 9.42e-1\n 4 fixed <NA> p1 -11.5 6.77 15 -1.70 1.10e-1\n 5 fixed <NA> k1 1.00 6.77 15 0.148 8.85e-1\n 6 fixed <NA> d1:n1 13.5 11.6 15 1.16 2.63e-1\n 7 fixed <NA> d1:p1 15.5 11.6 15 1.34 2.02e-1\n 8 fixed <NA> n1:p1 9.50 11.6 15 0.818 4.26e-1\n 9 fixed <NA> d1:k1 4.00 11.6 15 0.345 7.35e-1\n10 fixed <NA> n1:k1 0.500 11.6 15 0.0431 9.66e-1\n11 fixed <NA> p1:k1 3.00 11.6 15 0.258 8.00e-1\n12 fixed <NA> d1:n1:p1 -14.5 21.0 15 -0.690 5.01e-1\n13 fixed <NA> d1:n1:k1 -17.0 21.0 15 -0.809 4.31e-1\n14 fixed <NA> d1:p1:k1 -7.00 21.0 15 -0.333 7.44e-1\n15 fixed <NA> n1:p1:k1 -4.50 21.0 15 -0.214 8.33e-1\n16 fixed <NA> d1:n1:p1:k1 25.0 39.7 15 0.630 5.38e-1\n17 ran_pars block sd_(Intercept) 3.28 NA NA NA NA \n18 ran_pars Residual sd_Observation 4.92 NA NA NA NA \n\n\n\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nInstead of summary() function, we used tidy() function from the ‘broom.mixed’ package to get a short summary output of the model.\n\n\n\n\n5.2.3 Check Model Assumptions\n\nlme4nlme\n\n\n\ncheck_model(model1_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model2_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nThe linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects all fall nearly on a straight line so we can be satisfied with that.\n\n\n5.2.4 Inference\nWe can get an ANOVA table for the linear mixed model using the function anova(), which works for both lmer() and lme() models..\n\nlme4nlme\n\n\n\ncar::Anova(model1_lmer, type = 'III', test.statistic=\"F\")\n\nAnalysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)\n\nResponse: yield\n F Df Df.res Pr(>F) \n(Intercept) 175.2030 1 20.439 1.729e-11 ***\nd 3.2928 1 20.439 0.08429 . \nn 0.0091 1 20.439 0.92484 \np 4.8252 1 20.439 0.03974 * \nk 0.0365 1 20.439 0.85040 \nd:n 2.9812 1 25.421 0.09637 . \nd:p 3.9300 1 25.421 0.05834 . \nn:p 1.4763 1 25.421 0.23552 \nd:k 0.2617 1 25.421 0.61335 \nn:k 0.0041 1 25.421 0.94951 \np:k 0.1472 1 25.421 0.70440 \nd:n:p 1.4251 1 37.012 0.24016 \nd:n:k 1.9589 1 37.012 0.16996 \nd:p:k 0.3321 1 37.012 0.56789 \nn:p:k 0.1373 1 37.012 0.71313 \nd:n:p:k 1.5778 1 66.709 0.21346 \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model2_lme, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 15 104.83445 <.0001\nd 1 15 1.97029 0.1808\nn 1 15 0.00546 0.9421\np 1 15 2.88720 0.1099\nk 1 15 0.02183 0.8845\nd:n 1 15 1.35278 0.2630\nd:p 1 15 1.78330 0.2017\nn:p 1 15 0.66990 0.4259\nd:k 1 15 0.11876 0.7352\nn:k 1 15 0.00186 0.9662\np:k 1 15 0.06680 0.7996\nd:n:p 1 15 0.47580 0.5009\nd:n:k 1 15 0.65401 0.4313\nd:p:k 1 15 0.11089 0.7437\nn:p:k 1 15 0.04583 0.8334\nd:n:p:k 1 15 0.39719 0.5380\n\n\n\n\n\nLet’s find estimates for some of the factors such as n, p, and n:k interaction effect. This will help us look at the combined effect of n & k on bean yield.\n\nlme4nlme\n\n\n\nemmeans(model1_lmer, specs = ~ n)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n emmean SE df lower.CL upper.CL\n 0 43.8 1.52 37 40.7 46.8\n 1 50.1 1.52 37 47.0 53.2\n\nResults are averaged over the levels of: d, p, k \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model1_lmer, specs = ~ p)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n p emmean SE df lower.CL upper.CL\n 0 47.4 1.52 37 44.3 50.5\n 1 46.5 1.52 37 43.4 49.6\n\nResults are averaged over the levels of: d, n, k \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model1_lmer, specs = ~ n:k)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n k emmean SE df lower.CL upper.CL\n 0 0 42.4 1.95 25.4 38.4 46.4\n 1 0 50.8 1.95 25.4 46.7 54.8\n 0 1 45.1 1.95 25.4 41.1 49.1\n 1 1 49.5 1.95 25.4 45.5 53.5\n\nResults are averaged over the levels of: d, p \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model2_lme, specs = ~ n)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n emmean SE df lower.CL upper.CL\n 0 43.8 2.63 1 10.4 77.1\n 1 50.1 2.63 1 16.7 83.5\n\nResults are averaged over the levels of: d, p, k \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model2_lme, specs = ~ p)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n p emmean SE df lower.CL upper.CL\n 0 47.4 2.63 1 14.0 80.8\n 1 46.5 2.63 1 13.1 79.9\n\nResults are averaged over the levels of: d, n, k \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model2_lme, specs = ~ n:k)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n n k emmean SE df lower.CL upper.CL\n 0 0 42.4 2.9 1 5.50 79.2\n 1 0 50.8 2.9 1 13.88 87.6\n 0 1 45.1 2.9 1 8.25 82.0\n 1 1 49.5 2.9 1 12.63 86.4\n\nResults are averaged over the levels of: d, p \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nIn summary, while working with factorial designs make sure to carefully interpret ANOVA and estimated marginal means for main and interaction effects.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>5</span> <span class='chapter-title'>Factorial RCBD Design</span>"
]
},
{
"objectID": "chapters/split-plot-design.html",
"href": "chapters/split-plot-design.html",
"title": "6 Split Plot Design",
"section": "",
"text": "6.1 Details for Split Plot Designs\nSplit-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized (CRD), completely randomized block (RCBD). The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.\nThe statistical model structure this design:\n\\[y_{ijk} = \\mu + \\alpha_i + \\beta_k + (\\alpha_j\\beta_k) + \\epsilon_{ij} + \\delta_{ijk} \\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\tau\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta \\sim N(0, \\sigma_\\delta)\\]\nBoth the error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma_\\epsilon\\) and \\(\\sigma_\\delta\\), respectively.\nThis is also referred as “Split-Block RCB” design. The statistical model structure for split plot design:\n\\[y_{ijk} = \\mu + \\rho_j + \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\epsilon_{ij} + \\delta_{ijk}\\] Where:\n\\(\\mu\\) = overall experimental mean, \\(\\rho\\) = block effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta \\sim N(0, \\sigma_\\delta)\\]\nBoth the overall error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \\(\\sigma\\) and \\(\\sigma_\\delta\\), respectively.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>6</span> <span class='chapter-title'>Split Plot Design</span>"
]
},
{
"objectID": "chapters/split-plot-design.html#details-for-split-plot-designs",
"href": "chapters/split-plot-design.html#details-for-split-plot-designs",
"title": "6 Split Plot Design",
"section": "",
"text": "Whole Plot Randomized as a completely randomized design\n\n\n\n\n\n\n\n\nWhole Plot Randomized as an RCBD\n\n\n\n\n\n\n\n\n\n\n\n\n\n‘iid’ assumption for error terms\n\n\n\nIn these model, the error terms, \\(\\epsilon\\) are assumed to be “iid”, that is, independently and identically distributed. This means they have constant variance and they each individual error term is independent from the others.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>6</span> <span class='chapter-title'>Split Plot Design</span>"
]
},
{
"objectID": "chapters/split-plot-design.html#analysis-examples",
"href": "chapters/split-plot-design.html#analysis-examples",
"title": "6 Split Plot Design",
"section": "6.2 Analysis Examples",
"text": "6.2 Analysis Examples\nLoad required libraries\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance); library(ggplot2)\nlibrary(broom.mixed)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(ggplot2); library(broom.mixed)\n\n\n\n\n\n6.2.1 Example model for CRD Split Plot Designs\nLet’s import height data. It is located here if you want to download it yourself (recommended).\nThe data (Height data) for this example involves a CRD split plot designed experiment. Treatments are 4 Timings (times) and 8 managements (manage). The whole plots are times and management represents subplot and 3 replications.\n\nheight_data <- readxl::read_excel(here::here(\"data\", \"height_data.xlsx\"))\n\n\nTable of variables in the oat data set\n\n\nrep\nreplication unit\n\n\ntime\nMain plot with 4 levels\n\n\nManage\nSplit-plot with 8 levels\n\n\nsample\ntwo sampling units per each rep\n\n\nheight\nyield (lbs per acre)\n\n\n\n\n6.2.1.1 Data integrity checks\n\nRun a cross tabulation using table() to check the arrangement of whole-plots and sub-plots.\n\n\ntable(height_data$time, height_data$manage)\n\n \n M1 M2 M3 M4 M5 M6 M7 M8\n T1 6 6 6 6 6 6 6 6\n T2 6 6 6 6 6 6 6 6\n T3 6 6 6 6 6 6 6 6\n T4 6 6 6 6 6 6 6 6\n\n\nThe levels of whole plots and subplots are balanced.\n\nLook at structure of the data using str(), this will help in identifying class of the variable. In this data set, class of the whole-plot, sub-plot, and block should be factor/character and response variable (height) should be numeric.\n\n\nstr(height_data)\n\ntibble [192 × 5] (S3: tbl_df/tbl/data.frame)\n $ time : chr [1:192] \"T1\" \"T1\" \"T1\" \"T1\" ...\n $ manage: chr [1:192] \"M1\" \"M2\" \"M3\" \"M4\" ...\n $ rep : chr [1:192] \"R1\" \"R1\" \"R1\" \"R1\" ...\n $ sample: chr [1:192] \"S1\" \"S1\" \"S1\" \"S1\" ...\n $ height: num [1:192] 104.5 92.3 96.8 94.7 105.7 ...\n\n\nThe ‘time’, ‘manage’, and ‘rep’ are character and variable height is numeric. The structure of the data is in format as needed.\n\nCheck the number of missing values in each column.\n\n\napply(height_data, 2, function(x) sum(is.na(x)))\n\n time manage rep sample height \n 0 0 0 0 0 \n\n\n\nExploratory boxplot to look at the height observations at different times with variable managements.\n\n\nggplot(data = height_data, aes(y = height, x = time)) +\n geom_boxplot(aes(fill = manage), alpha = 0.6)\n\n\n\n\n\n\n\n\nLast, check the dependent variable by plotting a histogram of height data.\n\n\n\n\n\n\n\n\n\nFigure 6.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(height_data$height, main = \"\", xlab = \"yield\")\n\nThe distribution of height data looks close to normal.\n\n\n6.2.1.2 Model building\n\n\nRecall the model:\n\\[y_{ijk} = \\mu + \\gamma_i + \\alpha_j + \\beta_k + (\\alpha_j\\beta_k) + \\epsilon_{ijk}\\]\nFor this model, \\(\\gamma\\) = block/rep effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B (fixed).\nIn order to test the main effects of the treatments as well as the interaction between two factors, we can specify that in model as: time + manage + time:manage or time*manage.\nWhen dealing with split plot design across reps or blocks, the random effects needs to be nested hierarchically, from largest unit to smallest. For example, in this example the random effects will be designated as (1 | rep/time). This implies that we use random intercept at each of the rep and time (whole-plot) level.\n\nlme4nlme\n\n\n\nmodel_lmer <- lmer(height ~ time*manage + (1|rep/time), data = height_data)\ntidy(model_lmer)\n\n# A tibble: 35 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 108. 3.19 33.9 4.38 0.00000181\n 2 fixed <NA> timeT2 3.18 2.63 1.21 104. 0.229 \n 3 fixed <NA> timeT3 -2.25 2.63 -0.855 104. 0.394 \n 4 fixed <NA> timeT4 1.28 2.63 0.488 104. 0.627 \n 5 fixed <NA> manageM2 -4.45 2.55 -1.74 152. 0.0832 \n 6 fixed <NA> manageM3 -5.30 2.55 -2.08 152. 0.0395 \n 7 fixed <NA> manageM4 -6.18 2.55 -2.42 152. 0.0166 \n 8 fixed <NA> manageM5 -5.02 2.55 -1.97 152. 0.0511 \n 9 fixed <NA> manageM6 -3.42 2.55 -1.34 152. 0.183 \n10 fixed <NA> manageM7 -9.75 2.55 -3.82 152. 0.000193 \n# ℹ 25 more rows\n\n\n\n\n\nmodel_lme <-lme(height ~ time*manage,\n random = ~ 1|rep/time, data = height_data)\n\ntidy(model_lme)\n\nWarning in tidy.lme(model_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 32 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 108. 3.19 152 33.9 9.59e-73\n 2 fixed timeT2 3.18 2.63 6 1.21 2.72e- 1\n 3 fixed timeT3 -2.25 2.63 6 -0.855 4.25e- 1\n 4 fixed timeT4 1.28 2.63 6 0.488 6.43e- 1\n 5 fixed manageM2 -4.45 2.55 152 -1.74 8.32e- 2\n 6 fixed manageM3 -5.30 2.55 152 -2.08 3.95e- 2\n 7 fixed manageM4 -6.18 2.55 152 -2.42 1.66e- 2\n 8 fixed manageM5 -5.02 2.55 152 -1.97 5.11e- 2\n 9 fixed manageM6 -3.42 2.55 152 -1.34 1.83e- 1\n10 fixed manageM7 -9.75 2.55 152 -3.82 1.93e- 4\n# ℹ 22 more rows\n\n\n\n\n\n\n\n6.2.1.3 Check Model Assumptions\nBefore interpreting the model we should investigate the assumptions of the model to ensure any conclusions we draw are valid. There are assumptions that we can check are 1. Homogeneity (equal variance) 2. normality of residuals 3. values with high leverage.\nWe will use check_model() function from ‘performance’ package. The plots generated using this code gives a visual check of various assumptions including normality of residuals, normality of random effects, heteroscedasticity, homogeneity of variance, and multicollinearity.\n\nlme4nlme\n\n\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nIn this case the residuals fit the assumptions of the model well.\n\n\n6.2.1.4 Inference\nThe anova() function prints the the rows of analysis of variance table for whole-plot, sub-plot, and their interactions. We observed a significant effect of manage factor only.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = 'III', test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: height\n Chisq Df Pr(>Chisq) \n(Intercept) 1148.5658 1 < 2e-16 ***\ntime 4.5139 3 0.21105 \nmanage 15.9090 7 0.02596 * \ntime:manage 24.3349 21 0.27711 \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 152 1148.6202 <.0001\ntime 3 6 1.5046 0.3061\nmanage 7 152 2.2727 0.0315\ntime:manage 21 152 1.1588 0.2955\n\n\n\n\n\nWe can further compute estimated marginal means for each fixed effect and interaction effect can be obtained using emmeans().\n\nlme4nlme\n\n\n\nm1 <- emmeans(model_lmer, ~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm1\n\n time emmean SE df lower.CL upper.CL\n T1 103 2.7 2.27 92.8 114\n T2 106 2.7 2.27 95.5 116\n T3 100 2.7 2.27 89.8 111\n T4 104 2.7 2.27 94.0 115\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nm2 <- emmeans(model_lme, ~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm2\n\n time emmean SE df lower.CL upper.CL\n T1 103 2.7 2 91.6 115\n T2 106 2.7 2 94.2 118\n T3 100 2.7 2 88.6 112\n T4 104 2.7 2 92.8 116\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nFurther, a pairwise comparison or contrasts can be analyzed using estimated means. In this model, ‘time’ factor has 4 levels. We can use pairs() function to evaluate pairwise comparison among different ‘time’ levels.\nHere’s a example using pairs() function to compare difference in height among different time points.\n\nlme4nlme\n\n\n\npairs(m1)\n\n contrast estimate SE df t.ratio p.value\n T1 - T2 -2.68 1.11 6 -2.426 0.1719\n T1 - T3 2.95 1.11 6 2.665 0.1287\n T1 - T4 -1.21 1.11 6 -1.091 0.7072\n T2 - T3 5.63 1.11 6 5.091 0.0089\n T2 - T4 1.48 1.11 6 1.334 0.5767\n T3 - T4 -4.15 1.11 6 -3.756 0.0358\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: kenward-roger \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\n\n\n\npairs(m2)\n\n contrast estimate SE df t.ratio p.value\n T1 - T2 -2.68 1.11 6 -2.426 0.1719\n T1 - T3 2.95 1.11 6 2.665 0.1287\n T1 - T4 -1.21 1.11 6 -1.091 0.7072\n T2 - T3 5.63 1.11 6 5.091 0.0089\n T2 - T4 1.48 1.11 6 1.334 0.5767\n T3 - T4 -4.15 1.11 6 -3.756 0.0358\n\nResults are averaged over the levels of: manage \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\n\n\n\n\n\n\n\n\n\npairs()\n\n\n\nThe default p-value adjustment in pairs() function is “tukey”, other options include “holm”, “hochberg”, “BH”, “BY”, and “none”. In addition, it’s okay to use this function when independent variable has few factors (2-4). For variable with multiple levels, it’s better to use custom contrasts. For more information on custom contrasts please visit Chapter 12.\n\n\n\n\n\n6.2.2 Example model for RCBD Split Plot Designs\nThe oats data used in this example is from the MASS package. The design is RCBD split plot with 6 blocks, 3 main plots and 4 subplots. The primary outcome variable was oat yield.\n\nTable of variables in the oat data set\n\n\nblock\nblocking unit\n\n\nVariety (V)\nMain plot with 3 levels\n\n\nNitrogen (N)\nSplit-plot with 4 levels\n\n\nyield (Y)\nyield (lbs per acre)\n\n\n\nThe objective of this analysis is to study the impact of different varieties and nitrogen application rates on oat yields.\nTo fully examine the yield of oats due to varieties and nutrient levels in a split plots. We will need to statistically analyse and compare the effects of varieties (main plot), nutrient levels (subplot), their interaction.\n\nlibrary(MASS)\ndata(\"oats\")\nhead(oats,5)\n\n B V N Y\n1 I Victory 0.0cwt 111\n2 I Victory 0.2cwt 130\n3 I Victory 0.4cwt 157\n4 I Victory 0.6cwt 174\n5 I Golden.rain 0.0cwt 117\n\n\n\n6.2.2.1 Data integrity checks\nLet’s look at the structure of the data. The “B”, “V”, and “N” needs to be ‘factor’ and “Y” should be numeric.\n\nstr(oats)\n\n'data.frame': 72 obs. of 4 variables:\n $ B: Factor w/ 6 levels \"I\",\"II\",\"III\",..: 1 1 1 1 1 1 1 1 1 1 ...\n $ V: Factor w/ 3 levels \"Golden.rain\",..: 3 3 3 3 1 1 1 1 2 2 ...\n $ N: Factor w/ 4 levels \"0.0cwt\",\"0.2cwt\",..: 1 2 3 4 1 2 3 4 1 2 ...\n $ Y: int 111 130 157 174 117 114 161 141 105 140 ...\n\n\nNext, run the table() command to verify the levels of main-plots and sub-plots.\n\ntable(oats$V, oats$N)\n\n \n 0.0cwt 0.2cwt 0.4cwt 0.6cwt\n Golden.rain 6 6 6 6\n Marvellous 6 6 6 6\n Victory 6 6 6 6\n\n\n\nCheck the number of missing values in each column.\n\n\napply(oats, 2, function(x) sum(is.na(x)))\n\nB V N Y \n0 0 0 0 \n\n\nLast, check the dependent variable by plotting a histogram of yield data.\n\n\n\n\n\n\n\n\n\nFigure 6.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(oats$Y, main = \"\", xlab = \"yield\")\n\n\n\n6.2.2.2 Model Building the Model\nWe are evaluating the effect of V, N and their interaction on yield. The 1|B/V implies that random intercepts vary with block and V within each block.\n\n\nRecall the model:\n\\[y_{ijk} = \\mu + \\rho_j + \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\epsilon_{ij} + \\delta_{ijk}\\] Where:\n\\(\\mu\\) = overall experimental mean, \\(\\rho\\) = block effect (random), \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\alpha\\)\\(\\beta\\) = interaction between factors A and B, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\nlme4nlme\n\n\n\nmodel2_lmer <- lmer(Y ~ V + N + V:N + (1|B/V), \n data = oats, \n na.action = na.exclude)\ntidy(model2_lmer)\n\n# A tibble: 15 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 80.0 9.11 8.78 16.1 1.55e-7\n 2 fixed <NA> VMarvellous 6.67 9.72 0.686 30.2 4.98e-1\n 3 fixed <NA> VVictory -8.50 9.72 -0.875 30.2 3.89e-1\n 4 fixed <NA> N0.2cwt 18.5 7.68 2.41 45.0 2.02e-2\n 5 fixed <NA> N0.4cwt 34.7 7.68 4.51 45.0 4.58e-5\n 6 fixed <NA> N0.6cwt 44.8 7.68 5.84 45.0 5.48e-7\n 7 fixed <NA> VMarvellous:N0… 3.33 10.9 0.307 45.0 7.60e-1\n 8 fixed <NA> VVictory:N0.2c… -0.333 10.9 -0.0307 45.0 9.76e-1\n 9 fixed <NA> VMarvellous:N0… -4.17 10.9 -0.383 45.0 7.03e-1\n10 fixed <NA> VVictory:N0.4c… 4.67 10.9 0.430 45.0 6.70e-1\n11 fixed <NA> VMarvellous:N0… -4.67 10.9 -0.430 45.0 6.70e-1\n12 fixed <NA> VVictory:N0.6c… 2.17 10.9 0.199 45.0 8.43e-1\n13 ran_pars V:B sd__(Intercept) 10.3 NA NA NA NA \n14 ran_pars B sd__(Intercept) 14.6 NA NA NA NA \n15 ran_pars Residual sd__Observation 13.3 NA NA NA NA \n\n\n\n\n\nmodel2_lme <- lme(Y ~ V + N + V:N ,\n random = ~1|B/V,\n data = oats, \n na.action = na.exclude)\ntidy(model2_lme)\n\nWarning in tidy.lme(model2_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 12 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 80 9.11 45 8.78 2.56e-11\n 2 fixed VMarvellous 6.67 9.72 10 0.686 5.08e- 1\n 3 fixed VVictory -8.50 9.72 10 -0.875 4.02e- 1\n 4 fixed N0.2cwt 18.5 7.68 45 2.41 2.02e- 2\n 5 fixed N0.4cwt 34.7 7.68 45 4.51 4.58e- 5\n 6 fixed N0.6cwt 44.8 7.68 45 5.84 5.48e- 7\n 7 fixed VMarvellous:N0.2cwt 3.33 10.9 45 0.307 7.60e- 1\n 8 fixed VVictory:N0.2cwt -0.333 10.9 45 -0.0307 9.76e- 1\n 9 fixed VMarvellous:N0.4cwt -4.17 10.9 45 -0.383 7.03e- 1\n10 fixed VVictory:N0.4cwt 4.67 10.9 45 0.430 6.70e- 1\n11 fixed VMarvellous:N0.6cwt -4.67 10.9 45 -0.430 6.70e- 1\n12 fixed VVictory:N0.6cwt 2.17 10.9 45 0.199 8.43e- 1\n\n\n\n\n\n\n\n6.2.2.3 Check Model Assumptions\nAs shown in example 1, We need to verify the normality of residuals and homogeneous variance. Here we are using the check_model() function from the performance package.\n\nlme4nlme\n\n\n\ncheck_model(model2_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model2_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n\n\n6.2.2.4 Inference\nWe can evaluate the model for the analysis of variance, for V, N and their interaction effect.\n\nlme4nlme\n\n\n\ncar::Anova(model2_lmer, type = \"III\", test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: Y\n Chisq Df Pr(>Chisq) \n(Intercept) 77.1664 1 < 2.2e-16 ***\nV 2.4491 2 0.2939 \nN 39.0683 3 1.679e-08 ***\nV:N 1.8169 6 0.9357 \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model2_lme, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 45 77.16729 <.0001\nV 2 10 1.22454 0.3344\nN 3 45 13.02273 <.0001\nV:N 6 45 0.30282 0.9322\n\n\n\n\n\nNext, we can estimate marginal means for V, N, or their interaction (V*N) effect.\n\nlme4nlme\n\n\n\nemm1 <- emmeans(model2_lmer, ~ V *N) \nemm1\n\n V N emmean SE df lower.CL upper.CL\n Golden.rain 0.0cwt 80.0 9.11 16.1 60.7 99.3\n Marvellous 0.0cwt 86.7 9.11 16.1 67.4 106.0\n Victory 0.0cwt 71.5 9.11 16.1 52.2 90.8\n Golden.rain 0.2cwt 98.5 9.11 16.1 79.2 117.8\n Marvellous 0.2cwt 108.5 9.11 16.1 89.2 127.8\n Victory 0.2cwt 89.7 9.11 16.1 70.4 109.0\n Golden.rain 0.4cwt 114.7 9.11 16.1 95.4 134.0\n Marvellous 0.4cwt 117.2 9.11 16.1 97.9 136.5\n Victory 0.4cwt 110.8 9.11 16.1 91.5 130.1\n Golden.rain 0.6cwt 124.8 9.11 16.1 105.5 144.1\n Marvellous 0.6cwt 126.8 9.11 16.1 107.5 146.1\n Victory 0.6cwt 118.5 9.11 16.1 99.2 137.8\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemm1 <- emmeans(model2_lme, ~ V *N) \nemm1\n\n V N emmean SE df lower.CL upper.CL\n Golden.rain 0.0cwt 80.0 9.11 5 56.6 103.4\n Marvellous 0.0cwt 86.7 9.11 5 63.3 110.1\n Victory 0.0cwt 71.5 9.11 5 48.1 94.9\n Golden.rain 0.2cwt 98.5 9.11 5 75.1 121.9\n Marvellous 0.2cwt 108.5 9.11 5 85.1 131.9\n Victory 0.2cwt 89.7 9.11 5 66.3 113.1\n Golden.rain 0.4cwt 114.7 9.11 5 91.3 138.1\n Marvellous 0.4cwt 117.2 9.11 5 93.8 140.6\n Victory 0.4cwt 110.8 9.11 5 87.4 134.2\n Golden.rain 0.6cwt 124.8 9.11 5 101.4 148.2\n Marvellous 0.6cwt 126.8 9.11 5 103.4 150.2\n Victory 0.6cwt 118.5 9.11 5 95.1 141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nIn the next chapter, we will continue with extension of split plot design called split-split plot design.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>6</span> <span class='chapter-title'>Split Plot Design</span>"
]
},
{
"objectID": "chapters/split-split-plot.html",
"href": "chapters/split-split-plot.html",
"title": "7 Split-Split Plot Design",
"section": "",
"text": "7.1 Details for split-split plot designs\nThe split-split-plot design is an extension of the split-plot design to accommodate a third factor: one factor in main-plot, other in subplot and the third factor in sub-subplot\nThe statistical model structure this design:\n\\[y_{ijk} = \\mu + \\rho_j + \\alpha_i + \\beta_k + (\\alpha_i\\beta_k) + \\tau_n + (\\alpha_i\\tau_n) + (\\tau_n\\beta_k) + (\\alpha_i\\beta_k\\tau_n) + \\epsilon_{ijk} + \\delta_{ijkn}\\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) = main effect of whole plot (fixed), \\(\\beta\\) = main effect of subplot (fixed), \\(\\tau\\) = main effect of sub-subplot, \\(\\epsilon_{ij}\\) = whole plot error, \\(\\delta_{ijk}\\) = subplot error.\n\\[ \\epsilon \\sim N(0, \\sigma_\\epsilon)\\]\n\\[\\ \\delta \\sim N(0, \\sigma_\\delta)\\]\nThe assumptions of the model includes normal distribution of both the error and the rep effects with a mean of zero and standard deviations of \\(\\sigma_\\epsilon\\) and \\(\\sigma_\\delta\\), respectively.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>7</span> <span class='chapter-title'>Split-Split Plot Design</span>"
]
},
{
"objectID": "chapters/split-split-plot.html#example-analysis",
"href": "chapters/split-split-plot.html#example-analysis",
"title": "7 Split-Split Plot Design",
"section": "7.2 Example Analysis",
"text": "7.2 Example Analysis\n\nlme4nlme\n\n\n\nlibrary(dplyr)\nlibrary(lme4); library(lmerTest); library(broom.mixed)\nlibrary(emmeans); library(performance)\n\n\n\n\nlibrary(dplyr)\nlibrary(nlme); library(emmeans)\nlibrary(broom.mixed); library(performance)\n\n\n\n\nIn this example, we have a rice yield data from the agricolae package. The experiment consists of 3 different rice varieties grown under 3 management practices and 5 Nitrogen levels in the split-split plot design.\n\nrice <- read.csv(here::here(\"data\", \"rice_ssp.csv\"))\n\n\nTable of variables in the rice data set\n\n\n\n\n\n\nblock\nblocking unit\n\n\nnitrogen\ndifferent nitrogen fertilizer rates as main plot with 5 levels\n\n\nmanagement\nmanagement practices as subplot with 3 levels\n\n\nvariety\ncrop variety being a sub-subplot with 3 levels\n\n\nyield\nyield (bushels per acre)\n\n\n\n\n7.2.1 Data integrity checks\nBefore analyzing the data let’s do some preliminary data quality checks. We will start with evaluation of the structure of the data where class of block, nitrogen, management and variety should be a character/factor and yield should be numeric.\n\nstr(rice)\n\n'data.frame': 135 obs. of 6 variables:\n $ X : int 1 2 3 4 5 6 7 8 9 10 ...\n $ block : int 1 1 1 1 1 1 1 1 1 1 ...\n $ nitrogen : int 0 0 0 50 50 50 80 80 80 110 ...\n $ management: chr \"m1\" \"m2\" \"m3\" \"m1\" ...\n $ variety : int 1 1 1 1 1 1 1 1 1 1 ...\n $ yield : num 3.32 3.77 4.66 3.19 3.62 ...\n\n\nHere we need to convert block, nitrogen, variety, and management to characters.\n\nrice$block <- as.character(rice$block)\nrice$nitrogen <- as.character(rice$nitrogen)\nrice$management <- as.character(rice$management)\nrice$variety <- as.character(rice$variety)\n\nNext, run a cross tabulations to check balance of observations across independent variables:\n\ntable(rice$variety, rice$nitrogen, rice$management)\n\n, , = m1\n\n \n 0 110 140 50 80\n 1 3 3 3 3 3\n 2 3 3 3 3 3\n 3 3 3 3 3 3\n\n, , = m2\n\n \n 0 110 140 50 80\n 1 3 3 3 3 3\n 2 3 3 3 3 3\n 3 3 3 3 3 3\n\n, , = m3\n\n \n 0 110 140 50 80\n 1 3 3 3 3 3\n 2 3 3 3 3 3\n 3 3 3 3 3 3\n\n\nIt looks perfectly balanced, with exactly 3 observation per treatment group.\nLast, check the distribution of the dependent variable by plotting a histogram of yield values using hist() in R.\n\nhist(rice$yield)\n\n\n\n\n\n\n\n\n\n\nFigure 7.1: Histogram of the dependent variable.\n\n\n\n\n\n\n7.2.2 Model Building\nThe variance analysis of a split-split plot design is divided into three parts: the main-plot, subplot and sub-subplot analysis. We can use the nesting notation in the random part because nitrogen and management are nested in blocks. We can do blocks as fixed or random.\n\nlme4nlme\n\n\n\nmodel_lmer <- lmer(yield ~ nitrogen * management * variety +\n (1 | block / nitrogen / management),\n data = rice,\n na.action = na.exclude)\n\nboundary (singular) fit: see help('isSingular')\n\ntidy(model_lmer)\n\n# A tibble: 49 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 3.90 0.386 10.1 89.7 1.79e-16\n 2 fixed <NA> nitrogen110 0.753 0.545 1.38 89.7 1.71e- 1\n 3 fixed <NA> nitrogen140 0.165 0.545 0.302 89.7 7.63e- 1\n 4 fixed <NA> nitrogen50 0.335 0.545 0.614 89.7 5.41e- 1\n 5 fixed <NA> nitrogen80 1.33 0.545 2.44 89.7 1.68e- 2\n 6 fixed <NA> managementm2 0.420 0.540 0.779 80.0 4.38e- 1\n 7 fixed <NA> managementm3 1.43 0.540 2.65 80.0 9.82e- 3\n 8 fixed <NA> variety2 1.45 0.540 2.68 80.0 8.83e- 3\n 9 fixed <NA> variety3 1.48 0.540 2.74 80.0 7.49e- 3\n10 fixed <NA> nitrogen110:managem… 0.377 0.763 0.493 80.0 6.23e- 1\n# ℹ 39 more rows\n\n\n\n\n\nmodel_lme <- lme(yield ~ nitrogen*management*variety,\n random = ~ 1|block/nitrogen/management,\n data = rice, \n na.action = na.exclude)\ntidy(model_lme)\n\nWarning in tidy.lme(model_lme): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 45 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 3.90 0.386 60 10.1 1.43e-14\n 2 fixed nitrogen110 0.753 0.545 8 1.38 2.05e- 1\n 3 fixed nitrogen140 0.165 0.545 8 0.302 7.70e- 1\n 4 fixed nitrogen50 0.335 0.545 8 0.614 5.56e- 1\n 5 fixed nitrogen80 1.33 0.545 8 2.44 4.08e- 2\n 6 fixed managementm2 0.420 0.540 20 0.779 4.45e- 1\n 7 fixed managementm3 1.43 0.540 20 2.65 1.55e- 2\n 8 fixed variety2 1.45 0.540 60 2.68 9.38e- 3\n 9 fixed variety3 1.48 0.540 60 2.74 7.99e- 3\n10 fixed nitrogen110:managementm2 0.377 0.763 20 0.493 6.27e- 1\n# ℹ 35 more rows\n\n\n\n\n\n\n\nboundary (singular) fit: We get a message that the fit is singular. What does this mean? Some components of the variance-covariance matrix of the random effects are either exactly zero or exactly one. OK what about in English? Basically it means that the algorithm that fits the model parameters doesn’t have enough data to get a good estimate. This often happens when we are trying to fit a model that is too complex for the amount of data we have, or when the random effects are very small and can’t be distinguished from zero. We still get some output but this message should make us take a close look at the random effects and their variances.\n\n\n7.2.3 Check Model Assumptions\nModel Diagnostics: we are looking for a constant variance and normality of residuals. Checking normality requiring first extracting the model residuals and then generating a qq-plot and qq-line. we can do all at one using one function check_model().\n\nlme4nlme\n\n\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_lme, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nConstant variance and normality of residuals looks good. Here, we didn’t observe any anomalies in the model assumptions.\n\n\n7.2.4 Inference\nLet’s look at the analysis of variance for fixed effects and their interaction effect on yield.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = 'III', test.statistic=\"F\")\n\nAnalysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)\n\nResponse: yield\n F Df Df.res Pr(>F) \n(Intercept) 102.1211 1 89.706 < 2e-16 ***\nnitrogen 1.9160 4 86.474 0.11496 \nmanagement 3.6962 2 77.143 0.02932 * \nvariety 4.9129 2 60.000 0.01057 * \nnitrogen:management 0.2118 8 77.143 0.98797 \nnitrogen:variety 2.6681 8 60.000 0.01413 * \nmanagement:variety 2.2193 4 60.000 0.07754 . \nnitrogen:management:variety 0.5289 16 60.000 0.92105 \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 60 102.12108 <.0001\nnitrogen 4 8 1.91603 0.2012\nmanagement 2 20 3.69617 0.0431\nvariety 2 60 4.91295 0.0106\nnitrogen:management 8 20 0.21177 0.9850\nnitrogen:variety 8 60 2.66810 0.0141\nmanagement:variety 4 60 2.21929 0.0775\nnitrogen:management:variety 16 60 0.52893 0.9210\n\n\n\n\n\nHere, we observed a significant impact of management, variety, and nitrogen x variety interaction effect on rice yield. We can estimate the marginal means for each treatment factor (variety, nitrogen, management) which will averaged across other factors and their interaction.\n\nlme4nlme\n\n\n\nemmeans(model_lmer, ~ management)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n management emmean SE df lower.CL upper.CL\n m1 5.90 0.102 11.2 5.68 6.12\n m2 6.49 0.102 11.2 6.26 6.71\n m3 7.28 0.102 11.2 7.05 7.50\n\nResults are averaged over the levels of: nitrogen, variety \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\nemmeans(model_lmer, ~ nitrogen*variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n nitrogen variety emmean SE df lower.CL upper.CL\n 0 1 4.51 0.227 49 4.06 4.97\n 110 1 5.44 0.227 49 4.99 5.90\n 140 1 5.08 0.227 49 4.62 5.53\n 50 1 4.76 0.227 49 4.31 5.22\n 80 1 5.83 0.227 49 5.38 6.29\n 0 2 5.16 0.227 49 4.71 5.62\n 110 2 6.92 0.227 49 6.47 7.38\n 140 2 7.29 0.227 49 6.83 7.74\n 50 2 6.02 0.227 49 5.56 6.47\n 80 2 6.59 0.227 49 6.13 7.04\n 0 3 6.48 0.227 49 6.02 6.93\n 110 3 8.44 0.227 49 7.99 8.90\n 140 3 9.34 0.227 49 8.88 9.79\n 50 3 7.88 0.227 49 7.42 8.34\n 80 3 8.56 0.227 49 8.11 9.02\n\nResults are averaged over the levels of: management \nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model_lme, ~ management)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n management emmean SE df lower.CL upper.CL\n m1 5.90 0.102 2 5.46 6.34\n m2 6.49 0.102 2 6.05 6.92\n m3 7.28 0.102 2 6.84 7.71\n\nResults are averaged over the levels of: nitrogen, variety \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(model_lme, ~ nitrogen*variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n nitrogen variety emmean SE df lower.CL upper.CL\n 0 1 4.51 0.227 2 3.54 5.49\n 110 1 5.44 0.227 2 4.47 6.42\n 140 1 5.08 0.227 2 4.10 6.05\n 50 1 4.76 0.227 2 3.79 5.74\n 80 1 5.83 0.227 2 4.86 6.81\n 0 2 5.16 0.227 2 4.19 6.14\n 110 2 6.92 0.227 2 5.95 7.90\n 140 2 7.29 0.227 2 6.31 8.27\n 50 2 6.02 0.227 2 5.04 6.99\n 80 2 6.59 0.227 2 5.61 7.57\n 0 3 6.48 0.227 2 5.50 7.46\n 110 3 8.44 0.227 2 7.47 9.42\n 140 3 9.34 0.227 2 8.36 10.31\n 50 3 7.88 0.227 2 6.90 8.86\n 80 3 8.56 0.227 2 7.59 9.54\n\nResults are averaged over the levels of: management \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nNotice we get a message that the estimated means for ‘nitrogen x variety’ are averaged over the levels of ‘management’. So we need to be careful about how we interpret these estimates.\n\n\n\n\n\n\nNested random effects\n\n\n\nYou may have noticed the order of random effects in model statement:\nmodel_lme <- lme(yield ~ nitrogen*management*variety,\n random = ~ 1|block/nitrogen/management,\n data = rice, \n na.action = na.exclude)\nThe random effects follow the order of ~1|block/main-plot/split-plot. While fitting the model for split-split plot design please make sure to have a clear understanding of the main plot, split-plot and split-split plot factors to avoid having an erroneous model.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>7</span> <span class='chapter-title'>Split-Split Plot Design</span>"
]
},
{
"objectID": "chapters/strip-plot.html",
"href": "chapters/strip-plot.html",
"title": "8 Strip Plot Design",
"section": "",
"text": "8.1 Background\nIn strip plot design each block or replication is divided into number of vertical and horizontal strips depending on the levels of the respective factors.\nDivide the experimental area into ‘A’ horizontal strips and ‘B’ vertical strips. Each level of factor A is assigned to all the plots in one row, and each level of factor B is assigned to all the plots in one column.\nThe statistical model:\n\\[y_{ijk} = \\mu + \\alpha_j + \\beta_k + \\alpha_j\\beta_k + b_i + r_{ij} + c_{ik} + \\epsilon_{ijk}\\] Where:\n\\(\\mu\\)= overall experimental mean, \\(\\alpha\\) and \\(\\beta\\) are the main effects applied in a horizontal and vertical direction, and \\(\\alpha\\)\\(\\beta\\) represents the interaction between main factors. The random effects in above equation are \\(b_i\\), the random rep effect, \\(r_{ij}\\), the row within rep random effect, \\(c_{ik}\\), the column within rep random effect.\n\\[ b_i \\sim N(0, \\sigma_1^2)\\]\n\\[ r_{ij} \\sim N(0, \\sigma_2^2)\\]\n\\[ c_{ik} \\sim N(0, \\sigma_3^2)\\]\n\\[ \\epsilon_{ijk} \\sim N(0, \\sigma^2)\\]",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>8</span> <span class='chapter-title'>Strip Plot Design</span>"
]
},
{
"objectID": "chapters/strip-plot.html#background",
"href": "chapters/strip-plot.html#background",
"title": "8 Strip Plot Design",
"section": "",
"text": "Vertical strip plot for the first factor – vertical factor.\nHorizontal strip plot for the second factor – horizontal factor.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>8</span> <span class='chapter-title'>Strip Plot Design</span>"
]
},
{
"objectID": "chapters/strip-plot.html#example-analysis",
"href": "chapters/strip-plot.html#example-analysis",
"title": "8 Strip Plot Design",
"section": "8.2 Example Analysis",
"text": "8.2 Example Analysis\nWe will start the analysis first by loading the required libraries for this analysis for lme and lmer models, respectively.\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(performance); library(desplot)\nlibrary(broom.mixed)\n\n\n\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(desplot); library(broom.mixed)\n\n\n\n\nFor this example, we will use Rice strip-plot experiment data from theagridat package. This data contains a strip-plot experiment with three reps, variety as the horizontal strip and nitrogen fertilizer as the vertical strip.\n\ndata1 <- agridat::gomez.stripplot\n\n\nTable of variables in the data set\n\n\nrep\nreplication unit\n\n\nnitro\nnitrogen fertilizer in kg/ha\n\n\ngen\nrice variety\n\n\nrow\nrow (represents gen)\n\n\ncol\ncolumn (represents nitro)\n\n\nyield\ngrain yield in kg/ha\n\n\n\nFor the sake of analysis, ‘row’ and ‘col’ variables are used to represent ‘nitrogen’ and ‘Gen’ factors. The plot below shows the application of treatments in horizontal and vertical direction in a strip plot design.\n\n\n\n\n\n\n\n\n\n\n8.2.1 Data integrity checks\nFirst thing we need to verify is the data types of the variables in data1. The ‘rep’, ‘nitro’, and ‘gen’ needs to be a factor/character variables and ‘yield’ should be numeric.\n\nstr(data1)\n\n'data.frame': 54 obs. of 6 variables:\n $ yield: int 2373 4076 7254 4007 5630 7053 2620 4676 7666 2726 ...\n $ rep : Factor w/ 3 levels \"R1\",\"R2\",\"R3\": 1 1 1 1 1 1 1 1 1 1 ...\n $ nitro: int 0 60 120 0 60 120 0 60 120 0 ...\n $ gen : Factor w/ 6 levels \"G1\",\"G2\",\"G3\",..: 1 1 1 2 2 2 3 3 3 4 ...\n $ col : int 1 3 2 1 3 2 1 3 2 1 ...\n $ row : int 1 1 1 3 3 3 4 4 4 2 ...\n\n\nLet’s convert ‘nitro’ from numeric to factor.\n\ndata1$nitro <- as.factor(data1$nitro)\n\nLet’s have a look at the balance of treatment factors by running a a cross tabulation of independent variables.\n\ntable(data1$gen, data1$nitro)\n\n \n 0 60 120\n G1 3 3 3\n G2 3 3 3\n G3 3 3 3\n G4 3 3 3\n G5 3 3 3\n G6 3 3 3\n\n\nIt looks balanced with 3 number of observations for each variety and nitrogen level.\nNext step is to identify if there are any missing observations in the data set.\n\napply(data1, 2, function(x) sum(is.na(x)))\n\nyield rep nitro gen col row \n 0 0 0 0 0 0 \n\n\nWe don’t have any missing values in this data set.\nLastly, let’s check the distribution of dependent variable by plotting.\n\nhist(data1$yield, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 8.1: Histogram of the dependent variable.\n\n\n\n\nNo extreme values or skewness is present in the yield values.\n\n\n8.2.2 Model Building\nThe impact of nitro, gen, and their interaction was evaluated on rice yield. Three random effects are used to account for rep, row, and column effects, with last two random effects nested within rep, but crossed with each other. The rep, gen nested in rep, and nitro nested in rep were random effects in the model. All random effects are assumed to independent of each other and independent of within group errors.\n\nlme4nlme\n\n\n\nmodel_lmer <- lmer(yield ~ nitro*gen + (1|rep) + \n (1|rep:gen) + (1|rep:nitro), \n data = data1)\ntidy(model_lmer)\n\n# A tibble: 22 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 3572. 572. 6.24 17.8 0.00000732 \n 2 fixed <NA> nitro60 1560. 558. 2.80 22.4 0.0104 \n 3 fixed <NA> nitro120 3976. 558. 7.13 22.4 0.000000341\n 4 fixed <NA> genG2 1363. 717. 1.90 20.9 0.0714 \n 5 fixed <NA> genG3 678. 717. 0.945 20.9 0.355 \n 6 fixed <NA> genG4 487. 717. 0.679 20.9 0.504 \n 7 fixed <NA> genG5 530. 717. 0.739 20.9 0.468 \n 8 fixed <NA> genG6 -364. 717. -0.508 20.9 0.617 \n 9 fixed <NA> nitro60:genG2 219. 741. 0.296 20.0 0.771 \n10 fixed <NA> nitro120:genG2 -1699. 741. -2.29 20.0 0.0328 \n# ℹ 12 more rows\n\n\n\n\n\nmodel_lme <-lme(yield ~ nitro*gen,\n random = list(one = pdBlocked(list(\n pdIdent(~ 0 + rep), \n pdIdent(~ 0 + rep:gen), \n pdIdent(~ 0 + rep:nitro)))),\n data = data1 %>% mutate(one = factor(1)))\n\nsummary(model_lme)\n\nLinear mixed-effects model fit by REML\n Data: data1 %>% mutate(one = factor(1)) \n AIC BIC logLik\n 651.4204 686.2578 -303.7102\n\nRandom effects:\n Composite Structure: Blocked\n\n Block 1: repR1, repR2, repR3\n Formula: ~0 + rep | one\n Structure: Multiple of an Identity\n repR1 repR2 repR3\nStdDev: 393.4278 393.4278 393.4278\n\n Block 2: repR1:genG1, repR2:genG1, repR3:genG1, repR1:genG2, repR2:genG2, repR3:genG2, repR1:genG3, repR2:genG3, repR3:genG3, repR1:genG4, repR2:genG4, repR3:genG4, repR1:genG5, repR2:genG5, repR3:genG5, repR1:genG6, repR2:genG6, repR3:genG6\n Formula: ~0 + rep:gen | one\n Structure: Multiple of an Identity\n repR1:genG1 repR2:genG1 repR3:genG1 repR1:genG2 repR2:genG2 repR3:genG2\nStdDev: 600.1711 600.1711 600.1711 600.1711 600.1711 600.1711\n repR1:genG3 repR2:genG3 repR3:genG3 repR1:genG4 repR2:genG4 repR3:genG4\nStdDev: 600.1711 600.1711 600.1711 600.1711 600.1711 600.1711\n repR1:genG5 repR2:genG5 repR3:genG5 repR1:genG6 repR2:genG6 repR3:genG6\nStdDev: 600.1711 600.1711 600.1711 600.1711 600.1711 600.1711\n\n Block 3: repR1:nitro0, repR2:nitro0, repR3:nitro0, repR1:nitro60, repR2:nitro60, repR3:nitro60, repR1:nitro120, repR2:nitro120, repR3:nitro120\n Formula: ~0 + rep:nitro | one\n Structure: Multiple of an Identity\n repR1:nitro0 repR2:nitro0 repR3:nitro0 repR1:nitro60 repR2:nitro60\nStdDev: 235.2591 235.2591 235.2591 235.2591 235.2591\n repR3:nitro60 repR1:nitro120 repR2:nitro120 repR3:nitro120 Residual\nStdDev: 235.2591 235.2591 235.2591 235.2591 641.5963\n\nFixed effects: yield ~ nitro * gen \n Value Std.Error DF t-value p-value\n(Intercept) 3571.667 572.1257 36 6.242800 0.0000\nnitro60 1560.333 557.9682 36 2.796456 0.0082\nnitro120 3976.333 557.9682 36 7.126452 0.0000\ngenG2 1362.667 717.3336 36 1.899628 0.0655\ngenG3 678.000 717.3336 36 0.945167 0.3509\ngenG4 487.333 717.3336 36 0.679368 0.5012\ngenG5 530.000 717.3336 36 0.738847 0.4648\ngenG6 -364.333 717.3336 36 -0.507899 0.6146\nnitro60:genG2 219.000 740.8516 36 0.295606 0.7692\nnitro120:genG2 -1699.333 740.8516 36 -2.293757 0.0277\nnitro60:genG3 312.333 740.8516 36 0.421587 0.6758\nnitro120:genG3 -357.667 740.8516 36 -0.482778 0.6322\nnitro60:genG4 -65.667 740.8516 36 -0.088637 0.9299\nnitro120:genG4 -941.000 740.8516 36 -1.270160 0.2122\nnitro60:genG5 -28.667 740.8516 36 -0.038694 0.9693\nnitro120:genG5 -2066.000 740.8516 36 -2.788682 0.0084\nnitro60:genG6 -1053.333 740.8516 36 -1.421787 0.1637\nnitro120:genG6 -4691.667 740.8516 36 -6.332802 0.0000\n Correlation: \n (Intr) nitr60 ntr120 genG2 genG3 genG4 genG5 genG6 n60:G2\nnitro60 -0.488 \nnitro120 -0.488 0.500 \ngenG2 -0.627 0.343 0.343 \ngenG3 -0.627 0.343 0.343 0.500 \ngenG4 -0.627 0.343 0.343 0.500 0.500 \ngenG5 -0.627 0.343 0.343 0.500 0.500 0.500 \ngenG6 -0.627 0.343 0.343 0.500 0.500 0.500 0.500 \nnitro60:genG2 0.324 -0.664 -0.332 -0.516 -0.258 -0.258 -0.258 -0.258 \nnitro120:genG2 0.324 -0.332 -0.664 -0.516 -0.258 -0.258 -0.258 -0.258 0.500\nnitro60:genG3 0.324 -0.664 -0.332 -0.258 -0.516 -0.258 -0.258 -0.258 0.500\nnitro120:genG3 0.324 -0.332 -0.664 -0.258 -0.516 -0.258 -0.258 -0.258 0.250\nnitro60:genG4 0.324 -0.664 -0.332 -0.258 -0.258 -0.516 -0.258 -0.258 0.500\nnitro120:genG4 0.324 -0.332 -0.664 -0.258 -0.258 -0.516 -0.258 -0.258 0.250\nnitro60:genG5 0.324 -0.664 -0.332 -0.258 -0.258 -0.258 -0.516 -0.258 0.500\nnitro120:genG5 0.324 -0.332 -0.664 -0.258 -0.258 -0.258 -0.516 -0.258 0.250\nnitro60:genG6 0.324 -0.664 -0.332 -0.258 -0.258 -0.258 -0.258 -0.516 0.500\nnitro120:genG6 0.324 -0.332 -0.664 -0.258 -0.258 -0.258 -0.258 -0.516 0.250\n n120:G2 n60:G3 n120:G3 n60:G4 n120:G4 n60:G5 n120:G5 n60:G6\nnitro60 \nnitro120 \ngenG2 \ngenG3 \ngenG4 \ngenG5 \ngenG6 \nnitro60:genG2 \nnitro120:genG2 \nnitro60:genG3 0.250 \nnitro120:genG3 0.500 0.500 \nnitro60:genG4 0.250 0.500 0.250 \nnitro120:genG4 0.500 0.250 0.500 0.500 \nnitro60:genG5 0.250 0.500 0.250 0.500 0.250 \nnitro120:genG5 0.500 0.250 0.500 0.250 0.500 0.500 \nnitro60:genG6 0.250 0.500 0.250 0.500 0.250 0.500 0.250 \nnitro120:genG6 0.500 0.250 0.500 0.250 0.500 0.250 0.500 0.500\n\nStandardized Within-Group Residuals:\n Min Q1 Med Q3 Max \n-1.52993309 -0.52842524 0.05394367 0.51465584 1.46902934 \n\nNumber of Observations: 54\nNumber of Groups: 1 \n\n#tidy(model_lme)\n\n\n\n\n\n\n\n\n\n\nCrossed random effects\n\n\n\nThis type of variance-covariance structure in lme() is represented by a pdBlocked object with pdIdent elements.\n\n\n\n\n8.2.3 Check Model Assumptions\nLet’s evaluate the assumptions of linear mixed models by looking at the residuals and normality of error terms. ::: panel-tabset #### lme4\n\ncheck_model(model_lmer, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n8.2.3.1 nlme\nplot(model_lme, resid(., scaled=TRUE) ~ fitted(.), \n xlab = \"fitted values\", ylab = \"studentized residuals\")\nqqnorm(residuals(model_lme))\nqqline(residuals(model_lme))\n\n\n\n\n\n\n\n\n\n\n:::\nThe residuals fit the assumptions of the model well.\n\n\n\n8.2.4 Inference\nWe can evaluate the model for the analysis of variance, for main and interaction effects.\n\nlme4nlme\n\n\n\ncar::Anova(model_lmer, type = \"III\", test.statistics = \"F\")\n\nAnalysis of Deviance Table (Type III Wald chisquare tests)\n\nResponse: yield\n Chisq Df Pr(>Chisq) \n(Intercept) 38.9728 1 4.298e-10 ***\nnitro 51.5701 2 6.334e-12 ***\ngen 6.8343 5 0.2333 \nnitro:gen 58.0064 10 8.621e-09 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_lme, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 36 38.97256 <.0001\nnitro 2 36 25.78512 <.0001\ngen 5 36 1.36687 0.2597\nnitro:gen 10 36 5.80061 <.0001\n\n\n\n\n\nAnalysis of variance showed a significant interaction impact of gen and nitro on rice grain yield.\nNext, We can estimate marginal means for nitro and gen interaction effects using the emmeans package.\n\nlme4nlme\n\n\n\nemm1 <- emmeans(model_lmer, ~ nitro*gen) \nemm1\n\n nitro gen emmean SE df lower.CL upper.CL\n 0 G1 3572 572 17.8 2368 4775\n 60 G1 5132 572 17.8 3929 6335\n 120 G1 7548 572 17.8 6345 8751\n 0 G2 4934 572 17.8 3731 6138\n 60 G2 6714 572 17.8 5510 7917\n 120 G2 7211 572 17.8 6008 8415\n 0 G3 4250 572 17.8 3046 5453\n 60 G3 6122 572 17.8 4919 7326\n 120 G3 7868 572 17.8 6665 9072\n 0 G4 4059 572 17.8 2856 5262\n 60 G4 5554 572 17.8 4350 6757\n 120 G4 7094 572 17.8 5891 8298\n 0 G5 4102 572 17.8 2898 5305\n 60 G5 5633 572 17.8 4430 6837\n 120 G5 6012 572 17.8 4809 7215\n 0 G6 3207 572 17.8 2004 4411\n 60 G6 3714 572 17.8 2511 4918\n 120 G6 2492 572 17.8 1289 3695\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemm1 <- emmeans(model_lme, ~ nitro*gen)\n\nWarning in model.matrix.default(trms, m, contrasts.arg = contrasts): variable\n'rep' is absent, its contrast will be ignored\nWarning in model.matrix.default(trms, m, contrasts.arg = contrasts): variable\n'rep' is absent, its contrast will be ignored\n\nemm1\n\nWarning in qt((1 - level)/adiv, df): NaNs produced\n\n\n nitro gen emmean SE df lower.CL upper.CL\n 0 G1 3572 572 0 NaN NaN\n 60 G1 5132 572 0 NaN NaN\n 120 G1 7548 572 0 NaN NaN\n 0 G2 4934 572 0 NaN NaN\n 60 G2 6714 572 0 NaN NaN\n 120 G2 7211 572 0 NaN NaN\n 0 G3 4250 572 0 NaN NaN\n 60 G3 6122 572 0 NaN NaN\n 120 G3 7868 572 0 NaN NaN\n 0 G4 4059 572 0 NaN NaN\n 60 G4 5554 572 0 NaN NaN\n 120 G4 7094 572 0 NaN NaN\n 0 G5 4102 572 0 NaN NaN\n 60 G5 5633 572 0 NaN NaN\n 120 G5 6012 572 0 NaN NaN\n 0 G6 3207 572 0 NaN NaN\n 60 G6 3714 572 0 NaN NaN\n 120 G6 2492 572 0 NaN NaN\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nNote that, confidence intervals were not estimated through emmeans from lme model.\n\n\n\n\n\n\nlme vs lmer\n\n\n\nFor strip plot experiment design, fitting nested and crossed random effects is more complicated through nlme. Therefore, it’s more convenient to use lmer in this case as both models yielded same results in the example shown above.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>8</span> <span class='chapter-title'>Strip Plot Design</span>"
]
},
{
"objectID": "chapters/incomplete-block-design.html",
"href": "chapters/incomplete-block-design.html",
"title": "9 Incomplete Block Design",
"section": "",
"text": "9.1 Background\nThe block design described in Chapter 4 was complete, meaning that each block contained each treatment level at least once. In practice, it may not be possible or advisable to include all treatments in each block, either due to limitations in treatment availability (e.g. limited seed stocks) or the block size becomes too large to serve its original goals of controlling for spatial variation.\nIn such cases, incomplete block designs (IBD) can be used. Incomplete block designs break the experiment into many smaller incomplete blocks that are nested within standard RCBD-style blocks and assigns a subset of the treatment levels to each incomplete block. There are several different approaches Patterson and Williams (1976) for how to assign treatment levels to incomplete blocks and these designs impact the final statistical analysis (and if all treatments included in the experimental design are estimable). An excellent description of incomplete block design is provided in ANOVA and Mixed Models by Lukas Meier.\nIncomplete block designs are grouped into two groups: (1) balanced lattice designs; and (2) partially balanced (also commonly called alpha-lattice) designs. Balanced IBD designs have been previously called “lattice designs” [need refs], but we are not using that term to avoid confusion with alpha-lattice designs, a term that is commonly used.\nIn alpha-lattice design, the blocks are grouped into complete replicates. These designs are also termed as “resolvable incomplete block designs” or “partially balanced incomplete block designs” (paterson?). This design has been more commonly used instead of balanced IBD because of it’s practicability, flexibility, and versatility.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>9</span> <span class='chapter-title'>Incomplete Block Design</span>"
]
},
{
"objectID": "chapters/incomplete-block-design.html#background",
"href": "chapters/incomplete-block-design.html#background",
"title": "9 Incomplete Block Design",
"section": "",
"text": "9.1.1 Statistical Model\nThe statistical model for a balanced incomplete block design is:\n\\[y_{ij} = \\mu + \\alpha_i + \\beta_j + \\epsilon_{ij}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = treatment effects (fixed)\n\\(\\beta\\) = block effects (random)\n\\(\\epsilon\\) = error terms\n\\[ \\epsilon \\sim N(0, \\sigma)\\]\n\\[ \\beta \\sim N(0, \\sigma_b)\\]\nThere are few key points that we need to keep in mind while designing incomplete block experiments:\n\nA drawback of this design is that block effect and treatment effects are confounded.\nTo remove the block effects, it is better compare treatments within a block.\nNo treatment should appear twice in any block as it contributes nothing to within block comparisons.\n\nThe balanced incomplete block designs are guided by strict principles and guidelines including: the number of treatments must be a perfect square (e.g. 25, 36, and so on), and number of replicates must be equal to number of blocks +1.\n\n\n\n\n\n\nNote on Sums of Squares\n\n\n\nBecause the blocks are incomplete, the Type I and Type III sums of squares will be different even when there is no missing data from a trail. That is because the missing treatments in each block represent missing observations (even though they are not missing ‘at random’).",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>9</span> <span class='chapter-title'>Incomplete Block Design</span>"
]
},
{
"objectID": "chapters/incomplete-block-design.html#examples-analyses",
"href": "chapters/incomplete-block-design.html#examples-analyses",
"title": "9 Incomplete Block Design",
"section": "9.2 Examples Analyses",
"text": "9.2 Examples Analyses\n\n9.2.1 Balanced Incomplete Block Design\nWe will demonstrate an example data set designed in a balanced incomplete block design. First, load the libraries required for analysis and estimation.\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\nThe data used for this example analysis was extracted from the agridat package. This example is comprised of soybean balanced incomplete block experiment.\n\ndat <- agridat::weiss.incblock\n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in bu/ac\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.1.1 Data integrity checks\nWe will start inspecting the data set firstly by looking at the class of each variable:\n\nstr(dat)\n\n'data.frame': 186 obs. of 5 variables:\n $ block: Factor w/ 31 levels \"B01\",\"B02\",\"B03\",..: 1 2 3 4 5 6 7 8 9 10 ...\n $ gen : Factor w/ 31 levels \"G01\",\"G02\",\"G03\",..: 24 15 20 18 20 5 22 1 9 14 ...\n $ yield: num 29.8 24.2 30.5 20 35.2 25 23.6 23.6 29.3 25.5 ...\n $ row : int 42 36 30 24 18 12 6 42 36 30 ...\n $ col : int 1 1 1 1 1 1 1 2 2 2 ...\n\n\nThe variables we need for the model are block, genand yield. The block and gen are classified as factor variables and yield is numeric. Therefore, we do not need to change class of any of the required variables.\nNext, let’s check the independent variables. We can look at this by running a cross tabulations among block and gen factors.\n\nagg_tbl <- dat %>% group_by(gen) %>% \n summarise(total_count=n(),\n .groups = 'drop')\nagg_tbl\n\n# A tibble: 31 × 2\n gen total_count\n <fct> <int>\n 1 G01 6\n 2 G02 6\n 3 G03 6\n 4 G04 6\n 5 G05 6\n 6 G06 6\n 7 G07 6\n 8 G08 6\n 9 G09 6\n10 G10 6\n# ℹ 21 more rows\n\n\n\nagg_df <- aggregate(dat$gen, by=list(dat$block), FUN=length)\nagg_df\n\n Group.1 x\n1 B01 6\n2 B02 6\n3 B03 6\n4 B04 6\n5 B05 6\n6 B06 6\n7 B07 6\n8 B08 6\n9 B09 6\n10 B10 6\n11 B11 6\n12 B12 6\n13 B13 6\n14 B14 6\n15 B15 6\n16 B16 6\n17 B17 6\n18 B18 6\n19 B19 6\n20 B20 6\n21 B21 6\n22 B22 6\n23 B23 6\n24 B24 6\n25 B25 6\n26 B26 6\n27 B27 6\n28 B28 6\n29 B29 6\n30 B30 6\n31 B31 6\n\n\nThere are 31 varieties (levels of gen) and it is perfectly balanced, with exactly one observation per treatment per block.\nWe can calculate the sum of missing values in variables in this data set to evaluate the extent of missing values in different variables:\n\napply(dat, 2, function(x) sum(is.na(x)))\n\nblock gen yield row col \n 0 0 0 0 0 \n\n\nNo missing data!\nLast, let’s plot a histogram of the dependent variable. This is a quick check before analysis to see if there is any strong deviation in values.\n\n\n\n\n\n\n\n\n\nFigure 9.1: Histogram of the dependent variable.\n\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\nResponse variable values fall within expected range, with few extreme values on right tail. This data set is ready for analysis!\n\n\n9.2.1.2 Model Building\nWe will be evaluating the response of yield as affected by gen (fixed effect) and block (random effect).\n\n\nPlease note that incomplete block effect can be analyzed as a fixed (intra-block analysis) or a random (inter-block analysis) effect. When we consider block as a random effect, the mean values of a block also contain information about the treatment effects.\n\nlme4nlme\n\n\n\nmodel_icbd <- lmer(yield ~ gen + (1|block),\n data = dat, \n na.action = na.exclude)\ntidy(model_icbd)\n\n# A tibble: 33 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 24.6 0.922 26.7 153. 2.30e-59\n 2 fixed <NA> genG02 2.40 1.17 2.06 129. 4.17e- 2\n 3 fixed <NA> genG03 8.04 1.17 6.88 129. 2.31e-10\n 4 fixed <NA> genG04 2.37 1.17 2.03 129. 4.42e- 2\n 5 fixed <NA> genG05 1.60 1.17 1.37 129. 1.73e- 1\n 6 fixed <NA> genG06 7.39 1.17 6.32 129. 3.82e- 9\n 7 fixed <NA> genG07 -0.419 1.17 -0.359 129. 7.20e- 1\n 8 fixed <NA> genG08 3.04 1.17 2.60 129. 1.04e- 2\n 9 fixed <NA> genG09 4.84 1.17 4.14 129. 6.22e- 5\n10 fixed <NA> genG10 -0.0429 1.17 -0.0367 129. 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\nmodel_icbd1 <- lme(yield ~ gen,\n random = ~ 1|block,\n data = dat, \n na.action = na.exclude)\ntidy(model_icbd1)\n\n# A tibble: 33 × 8\n effect group term estimate std.error df statistic p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 24.6 0.922 125 26.7 2.10e-53\n 2 fixed <NA> genG02 2.40 1.17 125 2.06 4.18e- 2\n 3 fixed <NA> genG03 8.04 1.17 125 6.88 2.54e-10\n 4 fixed <NA> genG04 2.37 1.17 125 2.03 4.43e- 2\n 5 fixed <NA> genG05 1.60 1.17 125 1.37 1.73e- 1\n 6 fixed <NA> genG06 7.39 1.17 125 6.32 4.11e- 9\n 7 fixed <NA> genG07 -0.419 1.17 125 -0.359 7.20e- 1\n 8 fixed <NA> genG08 3.04 1.17 125 2.60 1.04e- 2\n 9 fixed <NA> genG09 4.84 1.17 125 4.14 6.33e- 5\n10 fixed <NA> genG10 -0.0429 1.17 125 -0.0367 9.71e- 1\n# ℹ 23 more rows\n\n\n\n\n\n\n\n9.2.1.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(model_icbd, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(model_icbd1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\n\n\nHere we observed a right skewness in residuals, this can be resolved by using data transformation e.g. log transformation of response variable. Please refer to chapter to read more about data transformation.\n\n\n9.2.1.4 Inference\nWe can extract information about ANOVA using anova().\n\nlme4nlme\n\n\n\nanova(model_icbd, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ngen 1901.1 63.369 30 129.06 17.675 < 2.2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(model_icbd1, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 125 4042.016 <.0001\ngen 30 125 17.675 <.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(model_icbd, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 24.6 0.923 153 22.7 26.4\n G02 27.0 0.923 153 25.2 28.8\n G03 32.6 0.923 153 30.8 34.4\n G04 26.9 0.923 153 25.1 28.8\n G05 26.2 0.923 153 24.4 28.0\n G06 32.0 0.923 153 30.1 33.8\n G07 24.2 0.923 153 22.3 26.0\n G08 27.6 0.923 153 25.8 29.4\n G09 29.4 0.923 153 27.6 31.2\n G10 24.5 0.923 153 22.7 26.4\n G11 27.1 0.923 153 25.2 28.9\n G12 29.3 0.923 153 27.4 31.1\n G13 29.9 0.923 153 28.1 31.8\n G14 24.2 0.923 153 22.4 26.1\n G15 26.1 0.923 153 24.3 27.9\n G16 25.9 0.923 153 24.1 27.8\n G17 19.7 0.923 153 17.9 21.5\n G18 25.7 0.923 153 23.9 27.5\n G19 29.0 0.923 153 27.2 30.9\n G20 33.2 0.923 153 31.3 35.0\n G21 31.1 0.923 153 29.3 32.9\n G22 25.2 0.923 153 23.3 27.0\n G23 29.8 0.923 153 28.0 31.6\n G24 33.6 0.923 153 31.8 35.5\n G25 27.0 0.923 153 25.2 28.8\n G26 27.1 0.923 153 25.3 29.0\n G27 23.8 0.923 153 22.0 25.6\n G28 26.5 0.923 153 24.6 28.3\n G29 24.8 0.923 153 22.9 26.6\n G30 36.2 0.923 153 34.4 38.0\n G31 27.1 0.923 153 25.3 28.9\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(model_icbd1, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 24.6 0.922 30 22.7 26.5\n G02 27.0 0.922 30 25.1 28.9\n G03 32.6 0.922 30 30.7 34.5\n G04 26.9 0.922 30 25.1 28.8\n G05 26.2 0.922 30 24.3 28.1\n G06 32.0 0.922 30 30.1 33.8\n G07 24.2 0.922 30 22.3 26.0\n G08 27.6 0.922 30 25.7 29.5\n G09 29.4 0.922 30 27.5 31.3\n G10 24.5 0.922 30 22.6 26.4\n G11 27.1 0.922 30 25.2 28.9\n G12 29.3 0.922 30 27.4 31.1\n G13 29.9 0.922 30 28.1 31.8\n G14 24.2 0.922 30 22.4 26.1\n G15 26.1 0.922 30 24.2 28.0\n G16 25.9 0.922 30 24.0 27.8\n G17 19.7 0.922 30 17.8 21.6\n G18 25.7 0.922 30 23.8 27.6\n G19 29.0 0.922 30 27.2 30.9\n G20 33.2 0.922 30 31.3 35.0\n G21 31.1 0.922 30 29.2 33.0\n G22 25.2 0.922 30 23.3 27.1\n G23 29.8 0.922 30 27.9 31.7\n G24 33.6 0.922 30 31.8 35.5\n G25 27.0 0.922 30 25.1 28.9\n G26 27.1 0.922 30 25.3 29.0\n G27 23.8 0.922 30 21.9 25.7\n G28 26.5 0.922 30 24.6 28.4\n G29 24.8 0.922 30 22.9 26.6\n G30 36.2 0.922 30 34.3 38.1\n G31 27.1 0.922 30 25.2 29.0\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n9.2.2 Partially Balanced IBD (Alpha Lattice Design)\nThe statistical model for partially balanced design includes:\n\\[y_{ij(l)} = \\mu + \\alpha_i + \\beta_{i(l)} + \\tau_j + \\epsilon_{ij(l)}\\]\nWhere:\n\\(\\mu\\) = overall experimental mean\n\\(\\alpha\\) = replicate effect (random)\n\\(\\beta\\) = incomplete block effect (random)\n\\(\\tau\\) = treatment effect (fixed)\n\\(\\epsilon_{ij(l)}\\) = intra-block residual\nThe data used in this example is published in Cyclic and Computer Generated Designs (John and Williams 1995). The trial was laid out in an alpha lattice design. This trial data had 24 genotypes (“gen”), 6 incomplete blocks, each replicated 3 times.\nLet’s start analyzing this example first by loading the required libraries for linear mixed models:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(performance)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans)\nlibrary(dplyr); library(performance)\n\n\n\n\n\ndata1 <- agridat::john.alpha\n\n\nTable of variables in the data set\n\n\nblock\nincomplete blocking unit\n\n\ngen\ngenotype (variety) factor\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\ngrain yield in tonnes/ha\n\n\n\n\n\n\n\n\n\n\n\n\n\n9.2.2.1 Data integrity checks\nLet’s look into the structure of the data first to verify the class of the variables.\n\nstr(data1)\n\n'data.frame': 72 obs. of 7 variables:\n $ plot : int 1 2 3 4 5 6 7 8 9 10 ...\n $ rep : Factor w/ 3 levels \"R1\",\"R2\",\"R3\": 1 1 1 1 1 1 1 1 1 1 ...\n $ block: Factor w/ 6 levels \"B1\",\"B2\",\"B3\",..: 1 1 1 1 2 2 2 2 3 3 ...\n $ gen : Factor w/ 24 levels \"G01\",\"G02\",\"G03\",..: 11 4 5 22 21 10 20 2 23 14 ...\n $ yield: num 4.12 4.45 5.88 4.58 4.65 ...\n $ row : int 1 2 3 4 5 6 7 8 9 10 ...\n $ col : int 1 1 1 1 1 1 1 1 1 1 ...\n\n\nNext step is to evaluate the independent variables. First, check the number of treatments per replication (each treatment should be replicated 3 times).\n\nagg_tbl <- data1 %>% group_by(gen) %>% \n summarise(total_count=n(),\n .groups = 'drop')\nagg_tbl\n\n# A tibble: 24 × 2\n gen total_count\n <fct> <int>\n 1 G01 3\n 2 G02 3\n 3 G03 3\n 4 G04 3\n 5 G05 3\n 6 G06 3\n 7 G07 3\n 8 G08 3\n 9 G09 3\n10 G10 3\n# ℹ 14 more rows\n\n\nThis looks balanced, as expected.\nAlso, let’s have a look at the number of times each treatment appear per block.\n\nagg_blk <- aggregate(data1$gen, by=list(data1$block), FUN=length)\nagg_blk\n\n Group.1 x\n1 B1 12\n2 B2 12\n3 B3 12\n4 B4 12\n5 B5 12\n6 B6 12\n\n\n12 treatments randomly appear in incomplete block. Each incomplete block has same number of treatments.\nLastly, before fitting the model, it’s a good idea to look at the distribution of dependent variable, yield.\n\n\n\n\n\n\n\n\n\nFigure 9.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(data1$yield, main = \"\", xlab = \"yield\")\n\nThe response variables seems to follow a normal distribution curve, with fewer values on extreme lower and higher ends.\n\n\n9.2.2.2 Model Building\n\nlme4nlme\n\n\n\nmod_alpha <- lmer(yield ~ gen + (1|rep/block),\n data = data1, \n na.action = na.exclude)\ntidy(mod_alpha)\n\n# A tibble: 27 × 8\n effect group term estimate std.error statistic df p.value\n <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed <NA> (Intercept) 5.11 0.276 18.5 6.19 0.00000118 \n 2 fixed <NA> genG02 -0.629 0.269 -2.34 38.2 0.0248 \n 3 fixed <NA> genG03 -1.61 0.268 -6.00 37.7 0.000000590\n 4 fixed <NA> genG04 -0.618 0.268 -2.30 37.7 0.0269 \n 5 fixed <NA> genG05 -0.0705 0.258 -0.274 34.8 0.786 \n 6 fixed <NA> genG06 -0.571 0.268 -2.13 37.7 0.0398 \n 7 fixed <NA> genG07 -0.997 0.258 -3.87 34.8 0.000457 \n 8 fixed <NA> genG08 -0.580 0.268 -2.16 37.7 0.0370 \n 9 fixed <NA> genG09 -1.61 0.258 -6.21 35.3 0.000000390\n10 fixed <NA> genG10 -0.735 0.259 -2.83 35.9 0.00754 \n# ℹ 17 more rows\n\n\n\n\n\nmod_alpha1 <- lme(yield ~ gen,\n random = ~ 1|rep/block,\n data = data1, \n na.action = na.exclude)\ntidy(mod_alpha1)\n\nWarning in tidy.lme(mod_alpha1): ran_pars not yet implemented for multiple\nlevels of nesting\n\n\n# A tibble: 24 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 5.11 0.276 31 18.5 2.63e-18\n 2 fixed genG02 -0.629 0.269 31 -2.34 2.61e- 2\n 3 fixed genG03 -1.61 0.268 31 -6.00 1.23e- 6\n 4 fixed genG04 -0.618 0.268 31 -2.30 2.81e- 2\n 5 fixed genG05 -0.0705 0.258 31 -0.274 7.86e- 1\n 6 fixed genG06 -0.571 0.268 31 -2.13 4.12e- 2\n 7 fixed genG07 -0.997 0.258 31 -3.87 5.23e- 4\n 8 fixed genG08 -0.580 0.268 31 -2.16 3.84e- 2\n 9 fixed genG09 -1.61 0.258 31 -6.21 6.71e- 7\n10 fixed genG10 -0.735 0.259 31 -2.83 8.05e- 3\n# ℹ 14 more rows\n\n\n\n\n\n\n\n9.2.2.3 Check Model Assumptions\nLet’s verify the assumption of linear mixed models including normal distribution and constant variance of residuals.\n\nlme4nlme\n\n\n\ncheck_model(mod_alpha, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(mod_alpha1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nWe observed a few extremes in residuals and normality curve showed a right skewness. #### Inference\nLet’s ANOVA table using anova() from lmer and lme models, respectively.\n\nlme4nlme\n\n\n\nanova(mod_alpha, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ngen 10.679 0.46429 23 34.902 5.4478 4.229e-06 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(mod_alpha1, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 31 470.9507 <.0001\ngen 23 31 5.4478 <.0001\n\n\n\n\n\nLet’s look at the estimated marginal means of yield for each variety (gen).\n\nlme4nlme\n\n\n\nemmeans(mod_alpha, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 5.11 0.279 6.20 4.43 5.78\n G02 4.48 0.279 6.20 3.80 5.15\n G03 3.50 0.279 6.20 2.82 4.18\n G04 4.49 0.279 6.20 3.81 5.17\n G05 5.04 0.278 6.19 4.36 5.71\n G06 4.54 0.278 6.19 3.86 5.21\n G07 4.11 0.279 6.20 3.43 4.79\n G08 4.53 0.279 6.20 3.85 5.20\n G09 3.50 0.278 6.19 2.83 4.18\n G10 4.37 0.279 6.20 3.70 5.05\n G11 4.28 0.279 6.20 3.61 4.96\n G12 4.76 0.279 6.20 4.08 5.43\n G13 4.76 0.278 6.19 4.08 5.43\n G14 4.78 0.278 6.19 4.10 5.45\n G15 4.97 0.278 6.19 4.29 5.65\n G16 4.73 0.279 6.20 4.05 5.41\n G17 4.60 0.278 6.19 3.93 5.28\n G18 4.36 0.279 6.20 3.69 5.04\n G19 4.84 0.278 6.19 4.16 5.52\n G20 4.04 0.278 6.19 3.36 4.72\n G21 4.80 0.278 6.19 4.12 5.47\n G22 4.53 0.278 6.19 3.85 5.20\n G23 4.25 0.278 6.19 3.58 4.93\n G24 4.15 0.279 6.20 3.48 4.83\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(mod_alpha1, ~ gen)\n\n gen emmean SE df lower.CL upper.CL\n G01 5.11 0.276 2 3.92 6.30\n G02 4.48 0.276 2 3.29 5.67\n G03 3.50 0.276 2 2.31 4.69\n G04 4.49 0.276 2 3.30 5.68\n G05 5.04 0.276 2 3.85 6.22\n G06 4.54 0.276 2 3.35 5.72\n G07 4.11 0.276 2 2.92 5.30\n G08 4.53 0.276 2 3.34 5.72\n G09 3.50 0.276 2 2.31 4.69\n G10 4.37 0.276 2 3.19 5.56\n G11 4.28 0.276 2 3.10 5.47\n G12 4.76 0.276 2 3.57 5.94\n G13 4.76 0.276 2 3.57 5.95\n G14 4.78 0.276 2 3.59 5.96\n G15 4.97 0.276 2 3.78 6.16\n G16 4.73 0.276 2 3.54 5.92\n G17 4.60 0.276 2 3.42 5.79\n G18 4.36 0.276 2 3.17 5.55\n G19 4.84 0.276 2 3.65 6.03\n G20 4.04 0.276 2 2.85 5.23\n G21 4.80 0.276 2 3.61 5.98\n G22 4.53 0.276 2 3.34 5.72\n G23 4.25 0.276 2 3.06 5.44\n G24 4.15 0.276 2 2.97 5.34\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\n\nJohn, JA, and ER Williams. 1995. Cyclic and Computer Generated Designs. 2nd ed. New York: Chapman; Hall/CRC Press. https://doi.org/10.1201/b15075.\n\n\nPatterson, H. D., and E. R. Williams. 1976. “A New Class of Resolvable Incomplete Block Designs.” Biometrika 63 (1): 83–92. https://doi.org/10.2307/2335087.\n\n\nYates, F. 1936. “A New Method of Arranging Variety Trials Involving a Large Number of Varieties.” J Agric Sci 26: 424–55.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>9</span> <span class='chapter-title'>Incomplete Block Design</span>"
]
},
{
"objectID": "chapters/latin-design.html",
"href": "chapters/latin-design.html",
"title": "10 Latin Square Design",
"section": "",
"text": "10.1 Background\nIn the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.\nAdvantages of Latin square design are:\nDisadvantages:\nStatistical model for a response in Latin square design is:\n\\(Y_{ijk} = \\mu + \\alpha_i + \\beta_j + \\gamma_k + \\epsilon_{ijk}\\)\nwhere, \\(\\mu\\) is the experiment mean, \\(\\alpha_i's\\) represents treatment effect, \\(\\beta\\) and \\(\\gamma\\) are the row- and column specific effects.\nAssumptions of this design includes normality and independent distribution of error (\\(\\epsilon_{ijk}\\)) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>10</span> <span class='chapter-title'>Latin Square Design</span>"
]
},
{
"objectID": "chapters/latin-design.html#background",
"href": "chapters/latin-design.html#background",
"title": "10 Latin Square Design",
"section": "",
"text": "The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels.\nThe analysis is quite simple.\n\n\n\nA Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤ t≤ 10.\nAny additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.\nThe effect of each treatment on the response must be approximately same across the rows and columns.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>10</span> <span class='chapter-title'>Latin Square Design</span>"
]
},
{
"objectID": "chapters/latin-design.html#example-analysis",
"href": "chapters/latin-design.html#example-analysis",
"title": "10 Latin Square Design",
"section": "10.2 Example Analysis",
"text": "10.2 Example Analysis\nLet’s start the analysis firstly by loading the required libraries:\n\nlme4nlme\n\n\n\nlibrary(lme4); library(lmerTest); library(emmeans); library(performance)\nlibrary(dplyr); library(broom.mixed); library(agridat); library(desplot)\n\n\n\n\nlibrary(nlme); library(broom.mixed); library(emmeans); library(performance)\nlibrary(dplyr); library(agridat); library(desplot)\n\n\n\n\nThe data used in this example is extracted from the agridat package. In this experiment, 5 treatments (A = Dusted before rains. B = Dusted after rains. C = Dusted once each week. D = Drifting, once each week. E = Not dusted) were tested to control stem rust in wheat.\n\ndat <- agridat::goulden.latin\n\n\nTable of variables in the data set\n\n\ntrt\ntreatment factor, 5 levels\n\n\nrow\nrow position for each plot\n\n\ncol\ncolumn position for each plot\n\n\nyield\nwheat yield\n\n\n\n\n10.2.1 Data integrity checks\nFirstly, let’s verify the class of variables in the dataset using str() function in base R\n\nstr(dat)\n\n'data.frame': 25 obs. of 4 variables:\n $ trt : Factor w/ 5 levels \"A\",\"B\",\"C\",\"D\",..: 2 3 4 5 1 4 1 3 2 5 ...\n $ yield: num 4.9 9.3 7.6 5.3 9.3 6.4 4 15.4 7.6 6.3 ...\n $ row : int 5 4 3 2 1 5 4 3 2 1 ...\n $ col : int 1 1 1 1 1 2 2 2 2 2 ...\n\n\nHere yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change ‘row’ and ‘col’ from integer t factor/character.\n\ndat1 <- dat |> \n mutate(row = as.factor(row),\n col = as.factor(col))\n\nNext, to verify if the data meets the assumption of the Latin square design let’s plot the field layout for this experiment.\n\n\n\n\n\n\n\n\n\nThis looks great! Here we can see that there are equal number (5) of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn’t appear more than once in each row and column.\nNext step is to check if there are any missing values in response variable.\n\napply(dat, 2, function(x) sum(is.na(x)))\n\n trt yield row col \n 0 0 0 0 \n\n\nNo missing values detected in this data set.\nBefore fitting the model, let’s create a histogram of response variable to see if there are extreme values.\n\n\n\n\n\n\nHistogram of the dependent variable.\n\n\n\n\nhist(dat$yield, main = \"\", xlab = \"yield\")\n\n\n\n10.2.2 Model fitting\nHere we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.\nVarCorr(m1_b)\n\nlme4nlme\n\n\n\nm1_a <- lmer(yield ~ trt + (1|row) + (1|col),\n data = dat1,\n na.action = na.exclude)\nsummary(m1_a) \n\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: yield ~ trt + (1 | row) + (1 | col)\n Data: dat1\n\nREML criterion at convergence: 89.8\n\nScaled residuals: \n Min 1Q Median 3Q Max \n-1.3994 -0.5383 -0.1928 0.5220 1.8429 \n\nRandom effects:\n Groups Name Variance Std.Dev.\n row (Intercept) 1.8660 1.3660 \n col (Intercept) 0.2336 0.4833 \n Residual 2.3370 1.5287 \nNumber of obs: 25, groups: row, 5; col, 5\n\nFixed effects:\n Estimate Std. Error df t value Pr(>|t|) \n(Intercept) 6.8400 0.9420 11.9446 7.261 1.03e-05 ***\ntrtB -0.3800 0.9669 12.0000 -0.393 0.7012 \ntrtC 6.2800 0.9669 12.0000 6.495 2.96e-05 ***\ntrtD 1.1200 0.9669 12.0000 1.158 0.2692 \ntrtE -1.9200 0.9669 12.0000 -1.986 0.0704 . \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n (Intr) trtB trtC trtD \ntrtB -0.513 \ntrtC -0.513 0.500 \ntrtD -0.513 0.500 0.500 \ntrtE -0.513 0.500 0.500 0.500\n\n\n\n\n\nm1_b <- lme(yield ~ trt,\n random =list(~1|row, ~1|col),\n data = dat, \n na.action = na.exclude)\n\nsummary(m1_b)\n\nLinear mixed-effects model fit by REML\n Data: dat \n AIC BIC logLik\n 106.0974 114.0633 -45.04872\n\nRandom effects:\n Formula: ~1 | row\n (Intercept)\nStdDev: 1.344469\n\n Formula: ~1 | col %in% row\n (Intercept) Residual\nStdDev: 1.494696 0.628399\n\nFixed effects: yield ~ trt \n Value Std.Error DF t-value p-value\n(Intercept) 6.84 0.9419764 16 7.261328 0.0000\ntrtB -0.38 1.0254756 16 -0.370560 0.7158\ntrtC 6.28 1.0254756 16 6.123987 0.0000\ntrtD 1.12 1.0254756 16 1.092176 0.2909\ntrtE -1.92 1.0254756 16 -1.872302 0.0796\n Correlation: \n (Intr) trtB trtC trtD \ntrtB -0.544 \ntrtC -0.544 0.500 \ntrtD -0.544 0.500 0.500 \ntrtE -0.544 0.500 0.500 0.500\n\nStandardized Within-Group Residuals:\n Min Q1 Med Q3 Max \n-0.5686726 -0.2469684 -0.1061146 0.2349101 0.7617205 \n\nNumber of Observations: 25\nNumber of Groups: \n row col %in% row \n 5 25 \n\n\n\n\n\n\n\n10.2.3 Check Model Assumptions\nThis step involves inspection of model residuals. by using check_model() function from the “performance” package.\n\nlme4nlme\n\n\n\ncheck_model(m1_a, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncheck_model(m1_b, check = c(\"linearity\", \"normality\"))\n\n\n\n\n\n\n\n\n\n\n\nThese visuals imply that assumptions of linear model have been met.\n\n\n10.2.4 Inference\nWe can now proceed to the variance partioning. In this case, we will use anova() with type = 1 or type = \"sequesntial\" for lmer() and lme() models, respectively.\n\nlme4nlme\n\n\n\nanova(m1_a, type = \"1\")\n\nType I Analysis of Variance Table with Satterthwaite's method\n Sum Sq Mean Sq NumDF DenDF F value Pr(>F) \ntrt 196.61 49.152 4 12 21.032 2.366e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n\n\n\n\nanova(m1_b, type = \"sequential\")\n\n numDF denDF F-value p-value\n(Intercept) 1 16 132.38123 <.0001\ntrt 4 16 18.69608 <.0001\n\n\n\n\n\nBoth models have detected a significant treatment effect. Here we observed a significant impact on fungicide treatment on crop yield. Let’s have a look at the estimated marginal means of wheat yield with each treatment using emmeans() function.\n\nlme4nlme\n\n\n\nemmeans(m1_a, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 11.9 4.79 8.89\n B 6.46 0.942 11.9 4.41 8.51\n C 13.12 0.942 11.9 11.07 15.17\n D 7.96 0.942 11.9 5.91 10.01\n E 4.92 0.942 11.9 2.87 6.97\n\nDegrees-of-freedom method: kenward-roger \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(m1_b, ~ trt)\n\n trt emmean SE df lower.CL upper.CL\n A 6.84 0.942 4 4.22 9.46\n B 6.46 0.942 4 3.84 9.08\n C 13.12 0.942 4 10.50 15.74\n D 7.96 0.942 4 5.34 10.58\n E 4.92 0.942 4 2.30 7.54\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nWe see that wheat yield was higher with ‘C’ fungicide treatment compared to other fungicides applied in this study. Which implies that ‘C’ fungicide was more efficient in controlling the stem rust in wheat.",
"crumbs": [
"Experiment designs",
"<span class='chapter-number'>10</span> <span class='chapter-title'>Latin Square Design</span>"
]
},
{
"objectID": "chapters/repeated-measures.html",
"href": "chapters/repeated-measures.html",
"title": "11 Repeated measures mixed models",
"section": "",
"text": "12 Example Analysis\nIn the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects.\nStudies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time of each subject. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects.\nIn these models, the ‘iid’ assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms.\nThere are several types of covariance structures:\nThe repeated measures syntax in nlme follow this convention: corr = corAR1(value = (b/w -1 & 1), form = ~ t|g, fixed = (T or F)).\nOne can use differnt correlation structure classes such as CorAR1(), corCompSymm(), CorSymm().\nFor form(), ~ t or ~ t|g, specifying a time covariate t and, optionally a grouping factor g. When we use ~t|g form, the correlation structure is assumed to apply only to observations within the same grouping level.\nThe default starting value is zero, and if fixed = FALSE (the current nlme default), this value will be allowed to change during the model fitting process. A covariate for this correlation structure must be a integer value.\nThere are several other options in the nlme machinery (search “cor” for more options and details on the syntax).\nFitting models with correlated observations requires new libraries including mmrm and nlme. The lmer package allows random effects only.\nIn this tutorial we will analyze the data with repeated measures from different experiment designs including randomized complete block design, split plot, and split-split plot design.\nFor examples used in this chapter we will fitting model using mmrm and lme packages. So, let’s start with loading the required libraries for this analysis.\nFirst, we will start with the first example from a randomized complete block design with repeated measures.",
"crumbs": [
"<span class='chapter-number'>11</span> <span class='chapter-title'>Repeated Measures</span>"
]
},
{
"objectID": "chapters/repeated-measures.html#rcbd-repeated-measures",
"href": "chapters/repeated-measures.html#rcbd-repeated-measures",
"title": "11 Repeated measures mixed models",
"section": "12.1 RCBD Repeated Measures",
"text": "12.1 RCBD Repeated Measures\nThe example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.\nWe need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).\n\ndat <- agriTutorial::sorghum %>% \n mutate(week = as.numeric(factweek),\n block = as.character(varblock)) \n\n\nTable of variables in the data set\n\n\nblock\nblocking unit\n\n\nReplicate\nreplication unit\n\n\nWeek\nTime points when data was collected\n\n\nvariety\ntreatment factor, 4 levels\n\n\ny\nyield (lbs)\n\n\n\n\n12.1.1 Data Integrity Checks\nLet’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.\n\nstr(dat)\n\n'data.frame': 100 obs. of 9 variables:\n $ y : num 5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...\n $ variety : Factor w/ 4 levels \"1\",\"2\",\"3\",\"4\": 1 1 1 1 1 1 1 1 1 1 ...\n $ Replicate: Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ factweek : Factor w/ 5 levels \"1\",\"2\",\"3\",\"4\",..: 1 2 3 4 5 1 2 3 4 5 ...\n $ factplot : Factor w/ 20 levels \"1\",\"2\",\"3\",\"4\",..: 1 1 1 1 1 2 2 2 2 2 ...\n $ varweek : int 1 2 3 4 5 1 2 3 4 5 ...\n $ varblock : int 1 1 1 1 1 2 2 2 2 2 ...\n $ week : num 1 2 3 4 5 1 2 3 4 5 ...\n $ block : chr \"1\" \"1\" \"1\" \"1\" ...\n\n\nIn this data, we have block, factplot, factweek as factor variables and y & week as numeric.\n\ntable(dat$variety, dat$block)\n\n \n 1 2 3 4 5\n 1 5 5 5 5 5\n 2 5 5 5 5 5\n 3 5 5 5 5 5\n 4 5 5 5 5 5\n\n\nThe cross tabulation shows a equal number of variety treatments in each block.\n\nggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +\n geom_boxplot() + \n #scale_fill_brewer(palette=\"Dark2\") +\n scale_fill_viridis_d(option = \"F\") +\n theme_bw()\n\n\n\n\n\n\n\n\nLooks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. One last step before we fit model is to look at the distribution of response variable.\n\nhist(dat$y, main = \"\", xlab = \"yield\")\n\n\n\n\n\n\n\n\n\n\nFigure 12.1: Histogram of the dependent variable.\n\n\n\n\n\n\n12.1.2 Model Building\nLet’s fit the basic model first using lme() from the nlme package.\n\nlm1 <- lme(y ~ variety + factweek + variety:factweek,\n random = ~1|block/factplot,\n data = dat,\n na.action = na.exclude)\n\nThe model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.\nBy adding this correlation structure, we are accounting for variation caused by repeated measurements over weeks for each plot. The AR1 structure assumes that data points collected more proximate are more correlated. Whereas, the compound symmetry structure assumes that correlation is equal for all time gaps. Here, we will fit model with both correlation structures and compare models to find out the best fit model.\nIn this analysis, time variable is week and it must be numeric.\n\ncs1 <- corAR1(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\ncs2 <- corCompSymm(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)\n\nIn the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices. In nlme, please search the help tool to know more about functions for different correlation structure classes.\n\nlm2 <- update(lm1, corr = cs1)\nlm3 <- update(lm1, corr= cs2)\n\nNow let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova() or by compare_performance() function from the ‘performance’ library.\n\nanovaperformance\n\n\n\nanova(lm1, lm2, lm3)\n\n Model df AIC BIC logLik Test L.Ratio p-value\nlm1 1 23 18.837478 73.62409 13.58126 \nlm2 2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487 <.0001\nlm3 3 24 20.837478 78.00612 13.58126 \n\n\n\n\n\nresult <- compare_performance(lm1, lm2, lm3)\n\nSome of the nested models seem to be identical and probably only vary in\n their random effects.\n\nprint_md(result)\n\n\nComparison of Model Performance Indices\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nModel\nAIC (weights)\nAICc (weights)\nBIC (weights)\nR2 (cond.)\nR2 (marg.)\nICC\nRMSE\nSigma\n\n\n\n\nlm1\nlme\n-50.5 (<.001)\n-36.0 (<.001)\n9.4 (<.001)\n0.99\n0.37\n0.98\n0.10\n0.13\n\n\nlm2\nlme\n-77.5 (>.999)\n-61.5 (>.999)\n-15.0 (>.999)\n0.97\n0.41\n0.95\n0.15\n0.18\n\n\nlm3\nlme\n-48.5 (<.001)\n-32.5 (<.001)\n14.0 (<.001)\n0.98\n0.37\n0.98\n0.11\n0.14\n\n\n\n\n\n\n\n\nWe prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.\nLet’s run a tidy() on lm2 model to look at the estimates for random and fixed effects.\n\ntidy(lm2)\n\nWarning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of\nnesting\n\n\n# A tibble: 20 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 4.24 0.291 64 14.6 5.44e-22\n 2 fixed variety2 0.906 0.114 12 7.94 4.05e- 6\n 3 fixed variety3 0.646 0.114 12 5.66 1.05e- 4\n 4 fixed variety4 0.912 0.114 12 8.00 3.78e- 6\n 5 fixed factweek2 -0.196 0.0571 64 -3.44 1.04e- 3\n 6 fixed factweek3 -0.836 0.0755 64 -11.1 1.60e-16\n 7 fixed factweek4 -1.16 0.0867 64 -13.3 4.00e-20\n 8 fixed factweek5 -1.54 0.0943 64 -16.3 1.57e-24\n 9 fixed variety2:factweek2 0.0280 0.0807 64 0.347 7.30e- 1\n10 fixed variety3:factweek2 0.382 0.0807 64 4.73 1.26e- 5\n11 fixed variety4:factweek2 -0.0140 0.0807 64 -0.174 8.63e- 1\n12 fixed variety2:factweek3 0.282 0.107 64 2.64 1.03e- 2\n13 fixed variety3:factweek3 0.662 0.107 64 6.20 4.55e- 8\n14 fixed variety4:factweek3 0.388 0.107 64 3.64 5.55e- 4\n15 fixed variety2:factweek4 0.228 0.123 64 1.86 6.77e- 2\n16 fixed variety3:factweek4 0.744 0.123 64 6.06 7.86e- 8\n17 fixed variety4:factweek4 0.390 0.123 64 3.18 2.28e- 3\n18 fixed variety2:factweek5 0.402 0.133 64 3.01 3.70e- 3\n19 fixed variety3:factweek5 0.672 0.133 64 5.04 4.11e- 6\n20 fixed variety4:factweek5 0.222 0.133 64 1.66 1.01e- 1\n\n\n\n\n12.1.3 Check Model Assumptions\n\ncheck_model(lm2, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n12.1.4 Inference\nThe ANOVA table suggests a significant effect of the variety, week, and variety x week interaction effect.\n\nanova(lm2, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 64 212.10509 <.0001\nvariety 3 12 28.28895 <.0001\nfactweek 4 64 74.79758 <.0001\nvariety:factweek 12 64 7.03546 <.0001\n\n\nWe can estimate the marginal means for variety and week effect and their interaction using emmeans() function.\n\nmean_1 <- emmeans(lm2, ~ variety)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nmean_1\n\n variety emmean SE df lower.CL upper.CL\n 1 3.50 0.288 4 2.70 4.29\n 2 4.59 0.288 4 3.79 5.39\n 3 4.63 0.288 4 3.84 5.43\n 4 4.61 0.288 4 3.81 5.40\n\nResults are averaged over the levels of: factweek \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nmean_2 <- emmeans(lm2, ~ variety*factweek)\nmean_2\n\n variety factweek emmean SE df lower.CL upper.CL\n 1 1 4.24 0.291 4 3.43 5.05\n 2 1 5.15 0.291 4 4.34 5.96\n 3 1 4.89 0.291 4 4.08 5.70\n 4 1 5.15 0.291 4 4.35 5.96\n 1 2 4.05 0.291 4 3.24 4.85\n 2 2 4.98 0.291 4 4.17 5.79\n 3 2 5.07 0.291 4 4.27 5.88\n 4 2 4.94 0.291 4 4.14 5.75\n 1 3 3.41 0.291 4 2.60 4.21\n 2 3 4.59 0.291 4 3.79 5.40\n 3 3 4.71 0.291 4 3.91 5.52\n 4 3 4.71 0.291 4 3.90 5.51\n 1 4 3.09 0.291 4 2.28 3.89\n 2 4 4.22 0.291 4 3.41 5.03\n 3 4 4.48 0.291 4 3.67 5.28\n 4 4 4.39 0.291 4 3.58 5.20\n 1 5 2.70 0.291 4 1.89 3.51\n 2 5 4.01 0.291 4 3.20 4.82\n 3 5 4.02 0.291 4 3.21 4.83\n 4 5 3.83 0.291 4 3.03 4.64\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\n\n\n\nTime variable\n\n\n\nHere is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.",
"crumbs": [
"<span class='chapter-number'>11</span> <span class='chapter-title'>Repeated Measures</span>"
]
},
{
"objectID": "chapters/repeated-measures.html#split-plot-repeated-measures",
"href": "chapters/repeated-measures.html#split-plot-repeated-measures",
"title": "11 Repeated measures mixed models",
"section": "12.2 Split Plot Repeated Measures",
"text": "12.2 Split Plot Repeated Measures\nRecall, we have evaluated split plot design Chapter 5. In this example we will use the same methodology used in Chapter 5 and update it with repeated measures component.\nNext, let’s load “Yield” data. It is located here.\n\nYield <- read.csv(here::here(\"data/Yield.csv\"))\n\nThis example contains yield data in a split-plot design. The yield data was collected repeatedly from the same Reps over 5 Sample_times. In this data set, we have:\n\nTable of variables in the data set\n\n\nRep\nreplication unit\n\n\nVariety\nMain plot, 2 levels\n\n\nFertilizer\nSplit plot, 3 levels\n\n\nYield\ncrop yield\n\n\nSample_time\ntime points for data collection\n\n\n\n\n12.2.1 Data Integrity Checks\nFirstly, we need to look at the class of variables in the data set.\n\nstr(Yield)\n\n'data.frame': 120 obs. of 6 variables:\n $ Sample_time: int 1 1 1 1 1 1 1 1 1 1 ...\n $ Variety : chr \"VAR1\" \"VAR1\" \"VAR1\" \"VAR1\" ...\n $ Fertilizer : int 1 1 1 1 2 2 2 2 3 3 ...\n $ Rep : int 1 2 3 4 1 2 3 4 1 2 ...\n $ pH : num 7.07 7.06 7.08 7.09 7.13 7.12 7.15 7.14 7.18 7.18 ...\n $ Yield : num 0.604 0.595 3.145 3.091 2.415 ...\n\n\nWe will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect.\n\n\nFor lme(), independent variables in a character/factor form works fine. But, for mmrm() independent variables must be a factor. Thus, for sake of consistancy, we will be using independent variables in factor class.\n\nYield$Variety <- factor(Yield$Variety) \nYield$Fertilizer <- factor(Yield$Fertilizer) \nYield$Sample_time1 <- factor(Yield$Sample_time) \nYield$Rep <- factor(Yield$Rep) \n\nTo fit model, we first need to convert Variety, Fertilizer, and Sample_time as factors. In addition, we need to create a new variable named ‘plot’ with a unique value for each plot. In addition, we need a create variable for each subject which is plot in this case and contains a unique value for each plot. The plot variable is needed to model the variation in each plot over the sampling time. The plot will be used as a subject with repeated measures. The subject variable can be factor or numeric but the time (it could be year, or sample_time) has to be a factor.\n\n##creating a plot variable \nYield$plot <- factor(paste(Yield$Rep, Yield$Fertilizer, Yield$Variety, sep='-')) \nYield$Rep2 <- factor(paste(Yield$Rep, Yield$Variety, sep='-')) \ntable(Yield$plot) \n\n\n1-1-VAR1 1-1-VAR2 1-2-VAR1 1-2-VAR2 1-3-VAR1 1-3-VAR2 2-1-VAR1 2-1-VAR2 \n 5 5 5 5 5 5 5 5 \n2-2-VAR1 2-2-VAR2 2-3-VAR1 2-3-VAR2 3-1-VAR1 3-1-VAR2 3-2-VAR1 3-2-VAR2 \n 5 5 5 5 5 5 5 5 \n3-3-VAR1 3-3-VAR2 4-1-VAR1 4-1-VAR2 4-2-VAR1 4-2-VAR2 4-3-VAR1 4-3-VAR2 \n 5 5 5 5 5 5 5 5 \n\n\n\ntable(Yield$Fertilizer, Yield$Variety) \n\n \n VAR1 VAR2\n 1 20 20\n 2 20 20\n 3 20 20\n\n\nLooks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.\nBefore fitting a model, let’s check the distribution of the response variable.\n\n\n\n\n\n\n\n\n\nFigure 12.2: Histogram of the dependent variable.\n\n\n\n\n\nhist(Yield$Yield)\n\n\n\n12.2.2 Model fit\nThis data can be analyzed either using nlme or mmrm.\nusing lme() from nlme package.\nLet’s say we want to fit a model using AR1 structure as shown in the RCBD repeated measures example. Previously, we used lme() from nlme package to fit the model. In this example, along with nlme() we will also mmrm() function from the mmrm package. In addition, instead of summary() function we will use tidy() function from the ‘broom.mixed’ package to look at estimates of mixed and random effects. This will generate a tidy workflow in particular by providing standardized verbs that provide information on estimates, standard errors, confidence intervals, etc.\n\nnlmemmrm\n\n\n\ncorr_str1 = corAR1(form = ~ Sample_time|Rep/Variety/plot, value = 0.2, fixed = FALSE)\n\nfit1 <- lme(Yield ~ Sample_time1*Variety*Fertilizer,\n random = ~ 1|Rep/Variety/plot,\n corr= corr_str1,\n data = Yield, na.action= na.exclude)\ntidy(fit1)\n\n# A tibble: 30 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 1.86 0.708 72 2.63 0.0105 \n 2 fixed Sample_time12 0.515 0.688 72 0.748 0.457 \n 3 fixed Sample_time13 0.787 0.674 72 1.17 0.247 \n 4 fixed Sample_time14 1.35 0.675 72 2.00 0.0496 \n 5 fixed Sample_time15 2.84 0.675 72 4.21 0.0000731\n 6 fixed VarietyVAR2 -0.996 0.861 3 -1.16 0.331 \n 7 fixed Fertilizer2 1.27 0.861 12 1.47 0.167 \n 8 fixed Fertilizer3 2.07 0.861 12 2.40 0.0333 \n 9 fixed Sample_time12:VarietyVAR2 0.739 0.974 72 0.759 0.451 \n10 fixed Sample_time13:VarietyVAR2 0.269 0.954 72 0.282 0.779 \n# ℹ 20 more rows\n\n\n\n\n\nfit2 <- mmrm(formula = Yield ~ Sample_time1*Variety*Fertilizer + \n ar1(Sample_time1|Rep/plot),\n data = Yield)\n\ntidy(fit2)\n\n# A tibble: 30 × 6\n term estimate std.error df statistic p.value\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 (Intercept) 2.86 0.464 12.7 6.16 0.0000387\n 2 Sample_time12 0.656 0.310 1.81 2.12 0.182 \n 3 Sample_time13 1.40 0.414 2.29 3.39 0.0636 \n 4 Sample_time14 1.46 0.484 2.87 3.01 0.0605 \n 5 Sample_time15 2.47 0.549 3.14 4.50 0.0186 \n 6 VarietyVAR2 -1.07 0.656 12.7 -1.63 0.128 \n 7 Fertilizer2 1.67 0.656 12.7 2.55 0.0245 \n 8 Fertilizer3 0.595 0.656 12.7 0.908 0.381 \n 9 Sample_time12:VarietyVAR2 -0.591 0.438 1.81 -1.35 0.321 \n10 Sample_time13:VarietyVAR2 -0.412 0.586 2.29 -0.704 0.546 \n# ℹ 20 more rows\n\n\n\n\n\n\n\n12.2.3 Model diagnostics\nWe will use check_model() from ‘performance’ package to evaluate the model fitness of model fitted using nlme (mod1). However, the mmrm model class doesn’t work with performance package, so we will evalute the model diagnostics by plotting the residuals using base R functions.\n\nnlmemmrm\n\n\n\ncheck_model(fit1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\n\n\n\nplot(residuals(fit2), xlab = \"fitted values\", ylab = \"residuals\")\nqqnorm(residuals(fit2)); qqline(residuals(fit2))\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThese diagnostic plots look great! The linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects fall on a straight line so we can be satisfied with that.\n\n\n12.2.4 Inference\n\nnlmemmrm\n\n\n\nanova(fit1, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 72 6.899272 0.0105\nSample_time1 4 72 5.318690 0.0008\nVariety 1 3 1.338879 0.3310\nFertilizer 2 12 2.936073 0.0916\nSample_time1:Variety 4 72 0.998154 0.4143\nSample_time1:Fertilizer 8 72 8.158884 <.0001\nVariety:Fertilizer 2 12 0.237417 0.7923\nSample_time1:Variety:Fertilizer 8 72 0.731698 0.6631\n\n\n\n\n\n#car::Anova(fit2, type = \"III\")\n#Anova.mmrm(fit2, type = \"III\")\n\n\n\n\nThe ANOVA showed a significant effect of Sample_time and Sample_time x Fertilizer interaction effect.\nNext, we can estimate marginal means and confidence intervals for the independent variables using emmeans().\n\nnlmemmrm\n\n\n\nemmeans(fit1,~ Sample_time1)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 2.65 0.438 3 1.25 4.04\n 2 4.40 0.438 3 3.01 5.79\n 3 5.53 0.438 3 4.13 6.92\n 4 7.26 0.438 3 5.87 8.66\n 5 8.82 0.438 3 7.42 10.21\n\nResults are averaged over the levels of: Variety, Fertilizer \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(fit1,~ Sample_time1|Fertilizer)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nFertilizer = 1:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 1.36 0.562 3 -0.427 3.15\n 2 2.25 0.562 3 0.458 4.03\n 3 2.28 0.562 3 0.495 4.07\n 4 2.65 0.562 3 0.861 4.44\n 5 3.66 0.562 3 1.874 5.45\n\nFertilizer = 2:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 3.04 0.562 3 1.248 4.82\n 2 5.17 0.562 3 3.383 6.96\n 3 6.46 0.562 3 4.668 8.24\n 4 8.72 0.562 3 6.935 10.51\n 5 10.09 0.562 3 8.304 11.88\n\nFertilizer = 3:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 3.55 0.562 3 1.762 5.34\n 2 5.78 0.562 3 3.995 7.57\n 3 7.84 0.562 3 6.051 9.63\n 4 10.42 0.562 3 8.630 12.21\n 5 12.69 0.562 3 10.905 14.48\n\nResults are averaged over the levels of: Variety \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\n\n\nemmeans(fit2,~ Sample_time1)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 3.43 0.189 12.7 3.02 3.84\n 2 5.21 0.169 12.7 4.84 5.58\n 3 6.59 0.163 11.9 6.23 6.94\n 4 7.96 0.169 12.7 7.60 8.33\n 5 9.65 0.189 12.7 9.24 10.06\n\nResults are averaged over the levels of: Variety, Fertilizer, Rep \nConfidence level used: 0.95 \n\n emmeans(fit2,~ Sample_time1|Fertilizer)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nFertilizer = 1:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 2.32 0.328 12.7 1.61 3.03\n 2 2.68 0.293 12.7 2.05 3.32\n 3 3.52 0.283 11.9 2.90 4.14\n 4 3.54 0.293 12.7 2.91 4.18\n 5 4.27 0.328 12.7 3.56 4.98\n\nFertilizer = 2:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 4.26 0.328 12.7 3.55 4.97\n 2 6.37 0.293 12.7 5.74 7.01\n 3 7.82 0.283 11.9 7.21 8.44\n 4 9.66 0.293 12.7 9.03 10.30\n 5 11.54 0.328 12.7 10.83 12.25\n\nFertilizer = 3:\n Sample_time1 emmean SE df lower.CL upper.CL\n 1 3.70 0.328 12.7 2.99 4.41\n 2 6.58 0.293 12.7 5.94 7.21\n 3 8.42 0.283 11.9 7.81 9.04\n 4 10.69 0.293 12.7 10.05 11.32\n 5 13.14 0.328 12.7 12.43 13.85\n\nResults are averaged over the levels of: Variety, Rep \nConfidence level used: 0.95 \n\n\n\n\n\n\n\nTo explore more about contrasts and emmeans please refer to Chapter 12.",
"crumbs": [
"<span class='chapter-number'>11</span> <span class='chapter-title'>Repeated Measures</span>"
]
},
{
"objectID": "chapters/repeated-measures.html#split-split-plot-repeated-measures",
"href": "chapters/repeated-measures.html#split-split-plot-repeated-measures",
"title": "11 Repeated measures mixed models",
"section": "12.3 Split-split plot repeated measures",
"text": "12.3 Split-split plot repeated measures\nRecall, we have evaluated the split-split experiment design in Chapter 5, where we had a one factor in main-plot, other in subplot and the third factor in sub-subplot. In this example we will be adding a repeated measures compoenet to the split-split plot design.\n\nphos <- read.csv(here::here(\"data\", \"split_split_repeated.csv\"))\n\n\n\n\nplot\nexperimental unit\n\n\nblock\nreplication unit\n\n\nPtrt\nMain plot, 2 levels\n\n\nInoc\nSplit plot, 2 levels\n\n\nCv\nSplit-split plot, 5 levels\n\n\ntime\ntime points for data collection\n\n\nP_leaf\nleaf phosphorous content\n\n\n\n\n12.3.1 Data Integrity Checks\n\nstr(phos)\n\n'data.frame': 240 obs. of 7 variables:\n $ plot : int 1 1 1 2 2 2 3 3 3 4 ...\n $ bloc : int 1 1 1 1 1 1 1 1 1 1 ...\n $ Ptrt : chr \"high\" \"high\" \"high\" \"high\" ...\n $ Inoc : chr \"none\" \"none\" \"none\" \"none\" ...\n $ Cv : chr \"LOUISE\" \"LOUISE\" \"LOUISE\" \"BlancaG\" ...\n $ time : chr \"PT1\" \"PT2\" \"PT3\" \"PT1\" ...\n $ P_leaf: num 3154 2331 247 3016 2160 ...\n\n\n\nphos$time = as.factor(phos$time)\nphos1 <- phos %>% \n mutate(time1 = as.numeric(time),\n rep = as.character(bloc),\n plot = as.character(plot)) \n\n\ntable(phos1$Ptrt, phos1$Inoc, phos1$Cv) \n\n, , = ALPOWA\n\n \n myco none\n high 12 12\n low 12 12\n\n, , = BlancaG\n\n \n myco none\n high 12 12\n low 12 12\n\n, , = LOUISE\n\n \n myco none\n high 12 12\n low 12 12\n\n, , = OTIS\n\n \n myco none\n high 12 12\n low 12 12\n\n, , = WALWORTH\n\n \n myco none\n high 12 12\n low 12 12\n\n\nLooks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.\nBefore fitting a model, let’s check the distribution of the response variable.\n\n\n\n\n\n\n\n\n\nFigure 12.3: Histogram of the dependent variable.\n\n\n\n\n\nhist(phos1$P_leaf)\n\n\n\n12.3.2 Model fit\n\ncorr_str1 = corAR1(form = ~ time1|rep/Ptrt/Inoc/plot, value = 0.2, fixed = FALSE)\n\nfit1 <- lme(P_leaf ~ time*Ptrt*Inoc*Cv,\n random = ~ 1|rep/Ptrt/Inoc/plot,\n corr= corr_str1,\n data = phos1, na.action= na.exclude)\ntidy(fit1)\n\n# A tibble: 60 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 3175. 82.6 120 38.4 2.63e-69\n 2 fixed timePT2 -866. 91.6 120 -9.46 3.41e-16\n 3 fixed timePT3 -3015. 96.9 120 -31.1 2.66e-59\n 4 fixed Ptrtlow -185. 101. 3 -1.84 1.64e- 1\n 5 fixed Inocnone 129. 97.6 6 1.33 2.33e- 1\n 6 fixed CvBlancaG 48.4 97.6 48 0.496 6.22e- 1\n 7 fixed CvLOUISE -23.2 97.6 48 -0.238 8.13e- 1\n 8 fixed CvOTIS 2.49 97.6 48 0.0255 9.80e- 1\n 9 fixed CvWALWORTH -413. 97.6 48 -4.23 1.03e- 4\n10 fixed timePT2:Ptrtlow 104. 129. 120 0.800 4.25e- 1\n# ℹ 50 more rows\n\n\n\ncheck_model(fit1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\nWe see a cluster of values in residuals which was due to large number of observations having low values.\n\nanova(fit1, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 120 1477.6555 <.0001\ntime 2 120 518.4625 <.0001\nPtrt 1 3 3.3729 0.1636\nInoc 1 6 1.7592 0.2330\nCv 4 48 7.5577 0.0001\ntime:Ptrt 2 120 0.6765 0.5103\ntime:Inoc 2 120 2.2797 0.1067\nPtrt:Inoc 1 6 2.4771 0.1666\ntime:Cv 8 120 2.4426 0.0175\nPtrt:Cv 4 48 0.5051 0.7321\nInoc:Cv 4 48 2.1222 0.0925\ntime:Ptrt:Inoc 2 120 0.8339 0.4369\ntime:Ptrt:Cv 8 120 0.2320 0.9843\ntime:Inoc:Cv 8 120 1.0401 0.4100\nPtrt:Inoc:Cv 4 48 0.4733 0.7551\ntime:Ptrt:Inoc:Cv 8 120 0.4155 0.9098\n\n\n\nemmeans(fit1,~ time)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\n time emmean SE df lower.CL upper.CL\n PT1 3096 46.2 3 2948.7 3242\n PT2 2270 46.2 3 2122.7 2416\n PT3 198 46.2 3 50.8 345\n\nResults are averaged over the levels of: Ptrt, Inoc, Cv \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\nemmeans(fit1,~ time|Cv)\n\nNOTE: Results may be misleading due to involvement in interactions\n\n\nCv = ALPOWA:\n time emmean SE df lower.CL upper.CL\n PT1 3201 55.5 3 3024.57 3378\n PT2 2225 55.5 3 2047.87 2401\n PT3 178 55.5 3 1.53 355\n\nCv = BlancaG:\n time emmean SE df lower.CL upper.CL\n PT1 3183 55.5 3 3006.50 3360\n PT2 2334 55.5 3 2157.45 2511\n PT3 210 55.5 3 32.95 386\n\nCv = LOUISE:\n time emmean SE df lower.CL upper.CL\n PT1 3121 55.5 3 2944.36 3298\n PT2 2366 55.5 3 2189.56 2543\n PT3 174 55.5 3 -2.43 351\n\nCv = OTIS:\n time emmean SE df lower.CL upper.CL\n PT1 3228 55.5 3 3051.65 3405\n PT2 2253 55.5 3 2076.66 2430\n PT3 234 55.5 3 56.86 410\n\nCv = WALWORTH:\n time emmean SE df lower.CL upper.CL\n PT1 2744 55.5 3 2567.30 2921\n PT2 2170 55.5 3 1992.90 2346\n PT3 193 55.5 3 15.88 369\n\nResults are averaged over the levels of: Ptrt, Inoc \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nReally low P leaf content at PT3 in all the cultivars.\nThe biggest advantage of mixed models is their incredible flexibility. They handle clustered individuals as well as repeated measures (even in the same model). They handle crossed random factors as well as nested\nThe biggest disadvantage of mixed models, at least for someone new to them, is their incredible flexibility. It’s easy to mis-specify a mixed model, and this is a place where a little knowledge is definitely dangerous.",
"crumbs": [
"<span class='chapter-number'>11</span> <span class='chapter-title'>Repeated Measures</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html",
"href": "chapters/means-and-contrasts.html",
"title": "12 Marginal Means & Contrasts",
"section": "",
"text": "12.1 Background\nTo start off with, we need to define estimated marginal means (EMM). Estimated marginal means are defined as marginal means of a variable across all levels of other variables in a model, essentially giving a “population-level” average.\nThe emmeans package is one of the most commonly used package in R in determine EMMs. This package provides methods for obtaining EMMs (also known as least-squares means) for factor combinations in a variety of models. The emmeans package is one of several alternatives to facilitate post hoc methods application and contrast analysis. It is a relatively recent replacement for the lsmeans package that some R users may be familiar with. It is intended for use with a wide variety of ANOVA models, including repeated measures and nested designs (mixed models). This is a flexible package that comes with a set of detailed vignettes and works with a lot of different model objects.\nIn this chapter, we will demonstrate the extended use of the emmeans package to calculate estimated marginal means and contrasts.\nTo demonstrate the use of the emmeans package. We will pull the model from split plot lesson (Chapter 6), where we evaluated the effect of Nitrogen and Variety on Oat yield. This data contains 6 blocks, 3 main plots (Variety) and 4 subplots (Nitrogen). The primary outcome variable was oat yield. To read more about the experiment layout details please read RCBD split-plot section in Chapter 6.\nLet’s start the analysis by loading the required libraries for fitting linear mixed models using nlme package.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#background",
"href": "chapters/means-and-contrasts.html#background",
"title": "12 Marginal Means & Contrasts",
"section": "",
"text": "Marginal means using lmer and nlme\n\n\n\nFor demonstration of the emmeans package, we are fitting model with nlme package. Please note that code below calculating marginal means works for both lmer and nlme models.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#analysis-examples",
"href": "chapters/means-and-contrasts.html#analysis-examples",
"title": "12 Marginal Means & Contrasts",
"section": "12.2 Analysis Examples",
"text": "12.2 Analysis Examples\nWe will start with loading required R libraries for this analysis.\n\nlibrary(nlme); library(performance); library(emmeans)\nlibrary(dplyr); library(broom.mixed); library(multcompView)\nlibrary(multcomp); library(ggplot2)\n\n\n12.2.1 Import data\nLet’s import oats data from the MASS package.\n\ndata1 <- MASS::oats\n\n\n\nTo read more about data and model fitting explanation please refer to Chapter 6.\n\n\n12.2.2 Model fitting\n\nmodel1 <- lme(Y ~ V + N + V:N ,\n random = ~1|B/V,\n data = data1, \n na.action = na.exclude)\ntidy(model1)\n\nWarning in tidy.lme(model1): ran_pars not yet implemented for multiple levels\nof nesting\n\n\n# A tibble: 12 × 7\n effect term estimate std.error df statistic p.value\n <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n 1 fixed (Intercept) 80 9.11 45 8.78 2.56e-11\n 2 fixed VMarvellous 6.67 9.72 10 0.686 5.08e- 1\n 3 fixed VVictory -8.50 9.72 10 -0.875 4.02e- 1\n 4 fixed N0.2cwt 18.5 7.68 45 2.41 2.02e- 2\n 5 fixed N0.4cwt 34.7 7.68 45 4.51 4.58e- 5\n 6 fixed N0.6cwt 44.8 7.68 45 5.84 5.48e- 7\n 7 fixed VMarvellous:N0.2cwt 3.33 10.9 45 0.307 7.60e- 1\n 8 fixed VVictory:N0.2cwt -0.333 10.9 45 -0.0307 9.76e- 1\n 9 fixed VMarvellous:N0.4cwt -4.17 10.9 45 -0.383 7.03e- 1\n10 fixed VVictory:N0.4cwt 4.67 10.9 45 0.430 6.70e- 1\n11 fixed VMarvellous:N0.6cwt -4.67 10.9 45 -0.430 6.70e- 1\n12 fixed VVictory:N0.6cwt 2.17 10.9 45 0.199 8.43e- 1\n\n\n\n\n12.2.3 Check Model Assumptions\n\ncheck_model(model1, check = c('normality', 'linearity'))\n\n\n\n\n\n\n\n\nResiduals look good with a small hump in middle and normality curve looks better. ### Model Inference\n\nanova(model1, type = \"marginal\")\n\n numDF denDF F-value p-value\n(Intercept) 1 45 77.16729 <.0001\nV 2 10 1.22454 0.3344\nN 3 45 13.02273 <.0001\nV:N 6 45 0.30282 0.9322\n\n\nThe analysis of variance showed a significant N effect and no effect of V and VxN on oat yield.\n\n\n12.2.4 Estimated Marginal Means\nNow that we have fitted a linear mixed model (model1) and it meets the model assumption. Let’s use the emmeans() function to obtain estimated marginal means for main (variety and nitrogen) and interaction (variety x nitrogen) effects.\n\n12.2.4.1 Main effects\n\nm1 <- emmeans(model1, ~V, level = 0.95)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm1\n\n V emmean SE df lower.CL upper.CL\n Golden.rain 104.5 7.8 5 84.5 125\n Marvellous 109.8 7.8 5 89.7 130\n Victory 97.6 7.8 5 77.6 118\n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\nm2 <- emmeans(model1, ~N)\n\nNOTE: Results may be misleading due to involvement in interactions\n\nm2\n\n N emmean SE df lower.CL upper.CL\n 0.0cwt 79.4 7.17 5 60.9 97.8\n 0.2cwt 98.9 7.17 5 80.4 117.3\n 0.4cwt 114.2 7.17 5 95.8 132.7\n 0.6cwt 123.4 7.17 5 104.9 141.8\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nMake sure to read and interpret EMMs carefully. Here, when we calculated EMMs for main effects of V and N, these were averaged over the levels of other factor in experiment. For example, estimated means for each variety were averaged over it’s N treatments, respectively.\n\n\n12.2.4.2 Interaction effects\nNow let’s evaluate the EMMs for the interaction effect of V and N. These can be calculated either using V*N or V|N.\n\nm3 <- emmeans(model1, ~V*N)\nm3\n\n V N emmean SE df lower.CL upper.CL\n Golden.rain 0.0cwt 80.0 9.11 5 56.6 103.4\n Marvellous 0.0cwt 86.7 9.11 5 63.3 110.1\n Victory 0.0cwt 71.5 9.11 5 48.1 94.9\n Golden.rain 0.2cwt 98.5 9.11 5 75.1 121.9\n Marvellous 0.2cwt 108.5 9.11 5 85.1 131.9\n Victory 0.2cwt 89.7 9.11 5 66.3 113.1\n Golden.rain 0.4cwt 114.7 9.11 5 91.3 138.1\n Marvellous 0.4cwt 117.2 9.11 5 93.8 140.6\n Victory 0.4cwt 110.8 9.11 5 87.4 134.2\n Golden.rain 0.6cwt 124.8 9.11 5 101.4 148.2\n Marvellous 0.6cwt 126.8 9.11 5 103.4 150.2\n Victory 0.6cwt 118.5 9.11 5 95.1 141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\n\nm4 <- emmeans(model1, ~V|N)\nm4\n\nN = 0.0cwt:\n V emmean SE df lower.CL upper.CL\n Golden.rain 80.0 9.11 5 56.6 103.4\n Marvellous 86.7 9.11 5 63.3 110.1\n Victory 71.5 9.11 5 48.1 94.9\n\nN = 0.2cwt:\n V emmean SE df lower.CL upper.CL\n Golden.rain 98.5 9.11 5 75.1 121.9\n Marvellous 108.5 9.11 5 85.1 131.9\n Victory 89.7 9.11 5 66.3 113.1\n\nN = 0.4cwt:\n V emmean SE df lower.CL upper.CL\n Golden.rain 114.7 9.11 5 91.3 138.1\n Marvellous 117.2 9.11 5 93.8 140.6\n Victory 110.8 9.11 5 87.4 134.2\n\nN = 0.6cwt:\n V emmean SE df lower.CL upper.CL\n Golden.rain 124.8 9.11 5 101.4 148.2\n Marvellous 126.8 9.11 5 103.4 150.2\n Victory 118.5 9.11 5 95.1 141.9\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nThe EMMs (m3 and m4) gives the same results but the outcome style is litte more explanatory in m4.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#contrasts-using-emmeans",
"href": "chapters/means-and-contrasts.html#contrasts-using-emmeans",
"title": "12 Marginal Means & Contrasts",
"section": "12.3 Contrasts using emmeans",
"text": "12.3 Contrasts using emmeans\nFirstly, the pairs() function from emmeans package can be used to evaluate the pairwise comparison among treatment objects. The emmean object (m1, m2) will be passed through pairs() function which will provide a p-value adjustment equivalent to the Tukey test.\n\npairs(m1, adjust = \"tukey\")\n\n contrast estimate SE df t.ratio p.value\n Golden.rain - Marvellous -5.29 7.08 10 -0.748 0.7419\n Golden.rain - Victory 6.88 7.08 10 0.971 0.6104\n Marvellous - Victory 12.17 7.08 10 1.719 0.2458\n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 3 estimates \n\n\n\npairs(m2)\n\n contrast estimate SE df t.ratio p.value\n 0.0cwt - 0.2cwt -19.50 4.44 45 -4.396 0.0004\n 0.0cwt - 0.4cwt -34.83 4.44 45 -7.853 <.0001\n 0.0cwt - 0.6cwt -44.00 4.44 45 -9.919 <.0001\n 0.2cwt - 0.4cwt -15.33 4.44 45 -3.457 0.0064\n 0.2cwt - 0.6cwt -24.50 4.44 45 -5.523 <.0001\n 0.4cwt - 0.6cwt -9.17 4.44 45 -2.067 0.1797\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nP value adjustment: tukey method for comparing a family of 4 estimates \n\n\nHere if we look at the results from code chunk above, it’s easy to interpret results from pairs() function in case of variety comparison becuase there were only 3 groups. But it’s little confusing in case of Nitrogen treatments where we had 4 groups. We can further simplify it by using custom contrasts.\n\n\n\n\n\n\npairs()\n\n\n\nRemember!!\nThe pairs() function can be used to calculate pairwise comparison when treatment groups are less than equal to 3.\n\n\n\n12.3.1 Custom contrasts\nFirst, run emmean object ‘m2’ for nitrogen treatments.\n\nm2\n\n N emmean SE df lower.CL upper.CL\n 0.0cwt 79.4 7.17 5 60.9 97.8\n 0.2cwt 98.9 7.17 5 80.4 117.3\n 0.4cwt 114.2 7.17 5 95.8 132.7\n 0.6cwt 123.4 7.17 5 104.9 141.8\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \n\n\nNow, let’s create a vector for each nitrogen treatment in the same order as presented in output from m2.\n\nA1 = c(1, 0, 0, 0)\nA2 = c(0, 1, 0, 0)\nA3 = c(0, 0, 1, 0)\nA4 = c(0, 0, 0, 1)\n\nThese vectors (A1, A2, A3, A4) represent each Nitrogen treatment in an order as presented in m2 emmeans object. A1, A2, and A3, A4 vectors represents 0.0cwt, 0.2cwt, 0.4cwt, and 0.6cwt treatments, respectively.\nNext step is to create a custom contrasts for comparing ‘0.0cwt’ (A1) treatment to ‘0.2cwt’ (A2), ‘0.4cwt’ (A3), and ‘0.6cwt’ (A4) treatments. This can be evaluated as shown below:\n\ncontrast(m2, method = list(A1 - A2) )\n\n contrast estimate SE df t.ratio p.value\n c(1, -1, 0, 0) -19.5 4.44 45 -4.396 0.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\ncontrast(m2, method = list(A1 - A3) )\n\n contrast estimate SE df t.ratio p.value\n c(1, 0, -1, 0) -34.8 4.44 45 -7.853 <.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\ncontrast(m2, method = list(A1 - A4) )\n\n contrast estimate SE df t.ratio p.value\n c(1, 0, 0, -1) -44 4.44 45 -9.919 <.0001\n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \n\n\nHere the output shows the difference in mean yield between control and 3 N treatments. The results shows that yield was significantly higher N treatments compared to the control (0.0cwt) irrespective of the oat variety.\n\n\n\n\n\n\ncontrast() vs pairs()\n\n\n\nUsing custom contrast() is strongly recommended instead of pairs() when you are comparing multiple treatment groups (>5).",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#compact-letter-displays",
"href": "chapters/means-and-contrasts.html#compact-letter-displays",
"title": "12 Marginal Means & Contrasts",
"section": "12.4 Compact letter displays",
"text": "12.4 Compact letter displays\nCompact letter displays (CLDs) are a popular way to display multiple comparisons when there are more than few group means to compare. However, they are problematic as they are more prone to misinterpretation. The R package multcompView (Graves et al., 2019) provides an implementation of CLDs creating a display where any two means associated with same symbol are not statistically different.\nThe cld() function from the multcomp package is used to implement CLDs in the form of symbols or letters. The emmeans package provides a emmGrid objects for cld() method.\nLet’s start evaluating CLDs for main effects. We will use emmean objects m1 (for variety) and m2 (for nitrogen) for this. In the output below, groups sharing a letter in the .group are not statistically different from each other.\n\ncld(m1, alpha=0.05, Letters=letters)\n\n V emmean SE df lower.CL upper.CL .group\n Victory 97.6 7.8 5 77.6 118 a \n Golden.rain 104.5 7.8 5 84.5 125 a \n Marvellous 109.8 7.8 5 89.7 130 a \n\nResults are averaged over the levels of: N \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 3 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n then we cannot show them to be different.\n But we also did not show them to be the same. \n\n\n\ncld(m2, alpha=0.05, Letters=letters)\n\n N emmean SE df lower.CL upper.CL .group\n 0.0cwt 79.4 7.17 5 60.9 97.8 a \n 0.2cwt 98.9 7.17 5 80.4 117.3 b \n 0.4cwt 114.2 7.17 5 95.8 132.7 c \n 0.6cwt 123.4 7.17 5 104.9 141.8 c \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 4 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n then we cannot show them to be different.\n But we also did not show them to be the same. \n\n\nLet’s have a look at the CLDs for the interaction effect:\n\ncld3 <- cld(m3, alpha=0.05, Letters=letters)\ncld3\n\n V N emmean SE df lower.CL upper.CL .group \n Victory 0.0cwt 71.5 9.11 5 48.1 94.9 a \n Golden.rain 0.0cwt 80.0 9.11 5 56.6 103.4 abcde \n Marvellous 0.0cwt 86.7 9.11 5 63.3 110.1 abc fg \n Victory 0.2cwt 89.7 9.11 5 66.3 113.1 ab d f h \n Golden.rain 0.2cwt 98.5 9.11 5 75.1 121.9 abcdefghi\n Marvellous 0.2cwt 108.5 9.11 5 85.1 131.9 abcdefghi\n Victory 0.4cwt 110.8 9.11 5 87.4 134.2 bcdefghi\n Golden.rain 0.4cwt 114.7 9.11 5 91.3 138.1 fghi\n Marvellous 0.4cwt 117.2 9.11 5 93.8 140.6 de hi\n Victory 0.6cwt 118.5 9.11 5 95.1 141.9 c e g i\n Golden.rain 0.6cwt 124.8 9.11 5 101.4 148.2 fghi\n Marvellous 0.6cwt 126.8 9.11 5 103.4 150.2 hi\n\nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 12 estimates \nsignificance level used: alpha = 0.05 \nNOTE: If two or more means share the same grouping symbol,\n then we cannot show them to be different.\n But we also did not show them to be the same. \n\n\nInterpretation of these letters is: Here we have a significant difference in grain yield with varieties “victory”, with N treatments of 0.0cwt, 0.2cwt, 0.4cwt, and 0.6wt. Grain yield for Golden.rain variety was significantly lower with 0.0cwt N treatment compared to the 0.2cwt, 0.4cwt, and 0.6wt treatments.\nIn the data set we used for demonstration here, we had equal number of observations in each group. However, this might not be a case every time as it is common to have missing values in the data set. In such cases, readers usually struggle to interpret significant differences among groups. For example, estimated means of two groups are substantially different but they are no statistically different. This normally happens when SE of one group is large due to its small sample size, so it’s hard for it to be statistically different from other groups. In such cases, we can use alternatives to CLDs as shown below.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#alternatives-to-cld",
"href": "chapters/means-and-contrasts.html#alternatives-to-cld",
"title": "12 Marginal Means & Contrasts",
"section": "12.5 Alternatives to CLD",
"text": "12.5 Alternatives to CLD\n\nEquivalence test\n\nLet’s assume based on subject matter considerations, if mean yield of two groups differ by less than 30 can be considered equivalent. Let’s try equivalence test on clds of nitrogen treatment emmeans (m2)\n\ncld(m2, delta = 30, adjust = \"none\")\n\n N emmean SE df lower.CL upper.CL .equiv.set\n 0.0cwt 79.4 7.17 5 60.9 97.8 1 \n 0.2cwt 98.9 7.17 5 80.4 117.3 12 \n 0.4cwt 114.2 7.17 5 95.8 132.7 23 \n 0.6cwt 123.4 7.17 5 104.9 141.8 3 \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nStatistics are tests of equivalence with a threshold of 30 \nP values are left-tailed \nsignificance level used: alpha = 0.05 \nEstimates sharing the same symbol test as equivalent \n\n\nHere, two treatment groups ‘0.0cwt’ and ‘0.2cwt’, ‘0.4cwt’ and ‘0.6cwt’ can be considered equivalent.\n\nSignificance Sets\n\nAnother alternative is to simply reverse all the boolean flags we used in constructing CLDs for m3 first time.\n\ncld(m2, signif = TRUE)\n\n N emmean SE df lower.CL upper.CL .signif.set\n 0.0cwt 79.4 7.17 5 60.9 97.8 12 \n 0.2cwt 98.9 7.17 5 80.4 117.3 12 \n 0.4cwt 114.2 7.17 5 95.8 132.7 1 \n 0.6cwt 123.4 7.17 5 104.9 141.8 2 \n\nResults are averaged over the levels of: V \nDegrees-of-freedom method: containment \nConfidence level used: 0.95 \nP value adjustment: tukey method for comparing a family of 4 estimates \nsignificance level used: alpha = 0.05 \nEstimates sharing the same symbol are significantly different \n\n\n\n\n\n\n\n\nCautionary Note about CLD\n\n\n\nIt’s important to note that we cannot conclude that treatment levels with the same letter are the same. We can only conclude that they are not different.\nThere is a separate branch of statistics, “equivalence testing” that is for ascertaining if things are sufficiently similar to conclude they are equivalent.\nSee Section 2.0.4 for additional warnings about problems with using compact letter display.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#export-emmeans-to-excel-sheet",
"href": "chapters/means-and-contrasts.html#export-emmeans-to-excel-sheet",
"title": "12 Marginal Means & Contrasts",
"section": "12.6 Export emmeans to excel sheet",
"text": "12.6 Export emmeans to excel sheet\nThe outputs from emmeans() or cld() objects can exported by firstly converting outputs to a data frame and then using writexlsx() function from the ‘writexl’ package to export the outputs.\n\nresult_n <- as.data.frame(summary(m1))\n\n\nwritexl::write_xlsx(result_n)",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#graphical-display-of-emmeans",
"href": "chapters/means-and-contrasts.html#graphical-display-of-emmeans",
"title": "12 Marginal Means & Contrasts",
"section": "12.7 Graphical display of emmeans",
"text": "12.7 Graphical display of emmeans\nThe results of emmeans() object can be plotted in two different ways. First, we can use base plot() function in R.\n\nplot(m1)\n\n\n\n\n\n\n\nplot(m4)\n\n\n\n\n\n\n\n\nOr we can use ‘ggplot2’ library. We can plot cld3 object in ggplot, with Variety on x-axis and estimated means of yield on y-axis. Different N treatments are presented in groups of different colors.\n\nggplot(cld3) +\n aes(x = V, y = emmean, color = N) +\n geom_point(position = position_dodge(width = 0.9)) +\n geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL), \n position = position_dodge(width = 1),\n width = 0.1) +\n geom_text(mapping = aes(label = .group, y = upper.CL * 1.05), \n position = position_dodge(width = 0.8), \n show.legend = F)+\n theme_bw()+\n theme(axis.text= element_text(color = \"black\",\n size =12))\n\n\n\n\n\n\n\n\nRecall: groups that do not differ significantly from each other share the same letter.\nwe can also use emmip() built in emmeans package to look at the trend in interaction of variety and nitrogen factors.\n\nemmip(model1, N ~ V)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nMore details on emmeans\n\n\n\nIf you want to read more about emmeans, please refer to vignettes on this CRAN page.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/means-and-contrasts.html#conclusion",
"href": "chapters/means-and-contrasts.html#conclusion",
"title": "12 Marginal Means & Contrasts",
"section": "12.8 Conclusion",
"text": "12.8 Conclusion\nBe cautious with the terms “significant” and “nonsignificant”, and don’t ever interpret a “non-significant” result as saying that there is no effect. Follow good statistical practices such as getting the model right first, and using adjusted P values for appropriately chosen families of comparisons or contrasts.\n\n\n\n\n\n\nP values, “significance”, and recommendations\n\n\n\nP values are often misinterpreted, and the term “statistical significance” can be misleading. Please refer to this link to read more about basic principles outlined by the American Statistical Association when considering p-values.",
"crumbs": [
"<span class='chapter-number'>12</span> <span class='chapter-title'>Marginal Means and Contrasts</span>"
]
},
{
"objectID": "chapters/variance-components.html",
"href": "chapters/variance-components.html",
"title": "13 Variance & Variance Components",
"section": "",
"text": "13.1 Unequal Variance\nMixed models provide the advantage of being able to estimate the variance of random variables. Instead of looking at a variable as a collection of specific levels to estimate, random effects view variables as being a random drawn from a normal distribution with a standard deviation. The decision of how to designate a variable as random or fixed depends on",
"crumbs": [
"<span class='chapter-number'>13</span> <span class='chapter-title'>Variance and Variance Components</span>"
]
},
{
"objectID": "chapters/variance-components.html#unequal-variance",
"href": "chapters/variance-components.html#unequal-variance",
"title": "13 Variance & Variance Components",
"section": "",
"text": "13.1.1 Case 1: Unequal Variance Due to a Factor\n\nvar_ex1 <- here::here(read.csv(\"data\", \"MET_trial_variance.csv\"))\n\n\nvar_ex1$block <- as.character(var_ex1$block)\nhist(var_ex1$yield)\nboxplot(yield ~ site, data = var_ex1)\n\n\nm1_a <- lme(yield ~ site:variety + variety, \n random = ~ 1 |site/block, \n na.action = na.exclude, \n data = var_ex1)\n\n\nm1_b <- update(m1_a, weights = varIdent(form = ~1|site))\n\n\n\nm1_b <- update(m1_a, weights = varIdent(form = ~1|site))\n\nis equivalent to\n\nm1_b <- lme(yield ~ site:variety + variety, \n random = ~ 1 |site/block,\n weights = varIdent(form = ~1|site), \n na.action = na.exclude, \n data = var_ex1)\n\n\n\n\n13.1.2 Case 2: Variance is related to the fitted values\n\nvar_ex2 <- read.csv(here::here(\"data\", \"single_trial_variance.csv\"))\n\n\nvar_ex1$block <- as.character(var_ex1$block)\nhist(var_ex2$yield)\n\n\nm2_a <- lme(yield ~ variety, \n random = ~ 1 |block, \n na.action = na.exclude, \n data = var_ex2)\n\n\nm2_b <- update(m2_a, weights = varPower())",
"crumbs": [
"<span class='chapter-number'>13</span> <span class='chapter-title'>Variance and Variance Components</span>"
]
},
{
"objectID": "chapters/variance-components.html#coefficient-of-variation",
"href": "chapters/variance-components.html#coefficient-of-variation",
"title": "13 Variance & Variance Components",
"section": "13.2 Coefficient of Variation",
"text": "13.2 Coefficient of Variation\n\nm2_ave <- fixef(m2_b)[1]\nnames(m2_b) <- NULL\n\n\nm2_cv = sigma(m2_b)/m2_ave*100\nm2_cv\n\n\n13.2.1 Looking at Variance Components\n\nvar_comps <- read.csv(here::here(\"data\", \"potato_tuber_size.csv\"))",
"crumbs": [
"<span class='chapter-number'>13</span> <span class='chapter-title'>Variance and Variance Components</span>"
]
},
{
"objectID": "chapters/troubleshooting.html",
"href": "chapters/troubleshooting.html",
"title": "14 Troubleshooting",
"section": "",
"text": "14.1 Common Errors we Encounter",
"crumbs": [
"<span class='chapter-number'>14</span> <span class='chapter-title'>Troubleshooting</span>"
]
},
{
"objectID": "chapters/troubleshooting.html#common-errors-we-encounter",
"href": "chapters/troubleshooting.html#common-errors-we-encounter",
"title": "14 Troubleshooting",
"section": "",
"text": "14.1.1 Convergence Issues\n[lme4 convergence warnings\nmore\n\n\n14.1.2 Other",
"crumbs": [
"<span class='chapter-number'>14</span> <span class='chapter-title'>Troubleshooting</span>"
]
},
{
"objectID": "chapters/additional-resources.html",
"href": "chapters/additional-resources.html",
"title": "15 Additional Resources",
"section": "",
"text": "15.1 Further Reading",
"crumbs": [
"<span class='chapter-number'>15</span> <span class='chapter-title'>Additional Resources</span>"
]
},
{
"objectID": "chapters/additional-resources.html#further-reading",
"href": "chapters/additional-resources.html#further-reading",
"title": "15 Additional Resources",
"section": "",
"text": "lme4 vignette for fitting linear mixed models\nMixed-Effects Models in S and S-PLUS thee book for nlme, by José C. Pinheiro and Douglas M. Bates. We used this book extensively for developing this guide. Sadly, it’s both out of print and we could not find a free copy online. However, there are affordable used copies available.\nMixed Effects Models and Extensions in Ecology with R by Alain F. Zuur, Elena N. Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith.\nANOVA and Mixed Models by Lukas Meier",
"crumbs": [
"<span class='chapter-number'>15</span> <span class='chapter-title'>Additional Resources</span>"
]
},
{
"objectID": "chapters/additional-resources.html#other-resources",
"href": "chapters/additional-resources.html#other-resources",
"title": "15 Additional Resources",
"section": "15.2 Other Resources",
"text": "15.2 Other Resources\n\nEasy Stats a collection of R packages to assist in statistical modelling, with a big focus on linear models.\nMixed Model CRAN Task View a curated list of R packages relevant to mixed modelling. This is a great place to start\nR-SIG-mixed-models mailing list for help and discussion of mixed-model-related questions, course announcements, etc\nGrammar of Experimental Designs by Emi Tanaka. This has a great description of basic principles of experimental design.",
"crumbs": [
"<span class='chapter-number'>15</span> <span class='chapter-title'>Additional Resources</span>"
]
},
{
"objectID": "references.html",
"href": "references.html",
"title": "References",
"section": "",
"text": "Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015.\n“Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical\nSoftware 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.\n\n\nBolker, Ben, and David Robinson. 2024. Broom.mixed: Tidying Methods\nfor Mixed Models. https://CRAN.R-project.org/package=broom.mixed.\n\n\nHartig, Florian. 2022. DHARMa: Residual Diagnostics for Hierarchical\n(Multi-Level / Mixed) Regression Models. https://CRAN.R-project.org/package=DHARMa.\n\n\nJohn, JA, and ER Williams. 1995. Cyclic and Computer\nGenerated Designs. 2nd ed. New York:\nChapman; Hall/CRC Press. https://doi.org/10.1201/b15075.\n\n\nKuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen.\n2017. “lmerTest Package: Tests in\nLinear Mixed Effects Models.” Journal of Statistical\nSoftware 82 (13): 1–26. https://doi.org/10.18637/jss.v082.i13.\n\n\nLenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka\nLeast-Squares Means. https://CRAN.R-project.org/package=emmeans.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip\nWaggoner, and Dominique Makowski. 2021. “performance: An R Package for\nAssessment, Comparison and Testing of Statistical Models.”\nJournal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nPinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models\nin s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.\n\n\nPinheiro, José, Douglas Bates, and R Core Team. 2023. Nlme: Linear\nand Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.\n\n\nYates, F. 1936. “A New Method of Arranging Variety Trials\nInvolving a Large Number of Varieties.” J Agric Sci 26:\n424–55.",
"crumbs": [
"References"
]
}
]