Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Dec 20, 2024
1 parent 4d0c29b commit 91aa56a
Show file tree
Hide file tree
Showing 8 changed files with 63 additions and 56 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
07cd4f79
e9660535
23 changes: 14 additions & 9 deletions chapters/analysis-tips.html
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ <h2 id="toc-title">Table of contents</h2>
<li><a href="#know-that-data-cleaning-is-time-consuming" id="toc-know-that-data-cleaning-is-time-consuming" class="nav-link" data-scroll-target="#know-that-data-cleaning-is-time-consuming"><span class="header-section-number">2.0.2</span> Know That Data Cleaning is Time Consuming</a></li>
<li><a href="#interpret-anova-and-p-values-with-caution" id="toc-interpret-anova-and-p-values-with-caution" class="nav-link" data-scroll-target="#interpret-anova-and-p-values-with-caution"><span class="header-section-number">2.0.3</span> Interpret ANOVA and P-values with Caution</a></li>
<li><a href="#sec-cld_warning" id="toc-sec-cld_warning" class="nav-link" data-scroll-target="#sec-cld_warning"><span class="header-section-number">2.0.4</span> Comments on Hypothesis Testing and Usage of Treatment Letters</a></li>
<li><a href="#final-thoughts" id="toc-final-thoughts" class="nav-link" data-scroll-target="#final-thoughts"><span class="header-section-number">2.0.5</span> Final Thoughts</a></li>
</ul>
</nav>
</div>
Expand Down Expand Up @@ -323,12 +324,12 @@ <h1 class="title"><span class="chapter-number">2</span>&nbsp; <span class="chapt
<p>Below are some things our office frequently says to researchers.</p>
<section id="think-about-your-analytical-goals" class="level3" data-number="2.0.1">
<h3 data-number="2.0.1" class="anchored" data-anchor-id="think-about-your-analytical-goals"><span class="header-section-number">2.0.1</span> Think About Your Analytical Goals</h3>
<p>Throughout this guide, we have tried to explicitly state the goals of each analysis. This helps informs how to approach the analysis of an experiment. It can be difficult, especially for new scientists-in-training (i.e.&nbsp;graduate students), to understand what it is they want to estimate. You may have been handed a data set you had no role in generating and told to “analyze this” with no additional context. Or perhaps you may have conducted a large study that has some overall goals that are lofty, yet vague.</p>
<p>Throughout this guide, we have tried to explicitly state the goals of each analysis. This helps informs how to approach the analysis of an experiment. It can be difficult, especially for new scientists-in-training (i.e.&nbsp;graduate students), to understand what it is they want to estimate. You may have been handed a data set you had no role in generating and told to “analyze this” with no additional context. Or perhaps you may have conducted a large study that has some overall goals that are lofty, yet vague. And now you must translate the vague aims into clear statistical questions.</p>
<p>It can helpful to think about the exact results you are hoping to get. What does this look like exactly? Do you want to estimate the changes in plant diversity as the result of a herbicide spraying program? Do you want to find out if a fertilizer treatment changed protein content in a crop and by how much? Do you want to know about changes in human diet due to an intervention? What are quantifiable difference that you and/or experts in your domain would find meaningful?</p>
<p>Consider what the results would look like for (1) the best case scenario when your wildest dreams come true, and (2) null results, when you find out that your treatment or invention had no effect. It’s very helpful to understand and recognize both situations.</p>
<p>By “consider”, we mean: imagine the final plot or table, or summary sentence you want to present, either in a peer-reviewed manuscript, or some output for stakeholders. From this, you work backwards to determine the analytical approach needed to arrive at that final output. Or you may determine that your data are unsuitable to generate the desired output, in which case, it’s best to determine that as soon as possible.</p>
<p>By “consider”, we also mean: imagine exactly what the spreadsheet of results would say - what columns are present and what data are in the cells. If you are planning an experiment, this can help ensure you plan it properly to actually test whatever it is you want to evaluate. If the experiment is done, this enables you to evaluate if you have the information present to test your hypothesis.</p>
<p>By taking the time to reflect on what it is you exactly want to analyze, this can save time and prevent you from doing unneeded analyzes that don’t serve this final goal. There is rarely (never?) one way to analyze an experiment or a data set, so use your limited time wisely and focus on what matters.</p>
<p><strong><em>Consider</em></strong> what the results would look like for (1) the best case scenario where your wildest research dreams come true, and (2) null results, when you find out that your treatment or invention had no effect. It’s very helpful to understand and recognize exactly what both situations look like.</p>
<p>By “consider”, we mean: imagine the final plot or table, or summary sentence you want to present, either in a peer-reviewed manuscript, or some output for stakeholders. From this, you can work backwards to determine the analytical approach needed to arrive at that desired final output. Or you may determine that your data are unsuitable to generate the desired output, in which case, it’s best to determine that as soon as possible.</p>
<p>By “consider”, we also mean: imagine exactly what the spreadsheet of results would contain after a successful trial. What columns are present and what data are in those cells. If you are planning an experiment, this can help ensure you plan it properly to actually test whatever it is you want to evaluate. If the experiment is done, this enables you to evaluate if you have the information present to test your hypothesis.</p>
<p>By taking the time to reflect on what it is you exactly want to analyze, this can save time and prevent you from doing unneeded analyzes that don’t serve this final goal. There is rarely (never?) one way to analyze an experiment or a data set, so use your limited time wisely and focus on what matters to you most.</p>
</section>
<section id="know-that-data-cleaning-is-time-consuming" class="level3 page-columns page-full" data-number="2.0.2">
<h3 data-number="2.0.2" class="anchored" data-anchor-id="know-that-data-cleaning-is-time-consuming"><span class="header-section-number">2.0.2</span> Know That Data Cleaning is Time Consuming</h3>
Expand All @@ -346,7 +347,7 @@ <h3 data-number="2.0.2" class="anchored" data-anchor-id="know-that-data-cleaning
</figure>
</div>
</div></div></div>
<p>This has and will continue to occupy the majority of researcher’s time when conducting an analysis. Truly, we are sorry for this. But, please know it is not you, it is the nature of data. Please plan for and prepare yourself mentally to spend time cleaning and preparing your data for analysis.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> This will take way longer than the actual analysis! It is needed to ensure you can actually get correct results in an analysis, and hence data cleaning is worth the time it requires.</p>
<p>This has and will continue to occupy the majority of researcher’s time when conducting an analysis. Truly, we are sorry for this. But, please know it is not you, it is the nature of data. Plan for and prepare yourself mentally to spend time cleaning and preparing your data for analysis.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> This will likely take way longer than the actual analysis! It is needed to ensure you can actually get correct results in an analysis, and hence data cleaning is worth the time it requires.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;For an excellent set of basic instructions on data preparation, please see: Broman, K. W., &amp; Woo, K. H. (2018). <a href="https://doi.org/10.1080/00031305.2017.1375989">Data Organization in Spreadsheets</a>. The American Statistician, 72(1), 2–10.</p></div></div></section>
<section id="interpret-anova-and-p-values-with-caution" class="level3 page-columns page-full" data-number="2.0.3">
<h3 data-number="2.0.3" class="anchored" data-anchor-id="interpret-anova-and-p-values-with-caution"><span class="header-section-number">2.0.3</span> Interpret ANOVA and P-values with Caution</h3>
Expand All @@ -367,11 +368,15 @@ <h3 data-number="2.0.4" class="anchored" data-anchor-id="sec-cld_warning"><span

<div class="no-row-height column-margin column-container"><div class="">
<p><img src="../img/cld.png" class="img-fluid"></p>
<p>Image from a paper published in 2024. Although this was a fully crossed factorial experiment, compact letter display was implemented across all treatment combinations, resulting in some nonsensical comparisons among some more informative contrasts.</p>
<p>Image from a paper published in 2024. Although this was a fully crossed factorial experiment, compact letter display was implemented across all treatment combinations, resulting in some nonsensical comparisons among some more informative contrasts. What a waste.</p>
</div></div><p>Implementing compact letter display can kill statistical power (the probability of detecting true differences) because it requires that all pairwise comparison being made. Doing this, especially when there are many treatment levels, has its perils. The biggest problem is that this creates a multiple testing problem. The RCBD example in this guide has 42 treatments, resulting in a total of 861 comparisons (<span class="math inline">\(=42*(42-1)/2\)</span>), that are then adjusted for multiple tests. With that many tests, a severe adjustment is likely and hence things that are different are not detected. With so many tests, it could be that there is an overall effect due to treatment, but they all share the same letter!</p>
<p>The second problem is one of interpretation. Just because two treatments or varieties share a letter does not mean they are equivalent. It only means that they were not found to be different. A funny distinction, but alas. There is an entire branch of statistics, ‘equivalence testing’ devoted to just this topic - how to test if two things are actually the same. This involves the user declaring a maximum allowable numeric difference for a variable in order to determine if two items are statistically different or equivalent - something that these pairwise comparisons are not doing.]</p>
<p>Another problem is that doing all pairwise comparison may not align with experimental goals. In many circumstances, not every pairwise combination is of any interest or relevance to the study. Additionally, complex treatment structure may necessitate custom constrasts that highlight differences between the marginal estimate of multiple treatments versus another. For example, there may be 2 levels of ‘high’ nitrogen fertilizer treatment with two different sources (i.e.&nbsp;types of fertilizer). A researcher may want to contrast those two levels together against ‘low’ nitrogen treatment levels.</p>
<p>Often, researchers have embedded additional structure in the treatments that is not fully reflected in the statistical model. For example, perhaps a study is looking at five different intercropping mixtures, two that incorporate a legume and 3 that do not. Conducting all pairwise comparisons with miss estimating the difference due to including a legume in an intercropping mix and not incorporating one. Soil fertility and other agronomic studies often have complex treatment structure. When it is not practical or financially feasible to have a full factorial experiment, embedding different treatment combinations in the main factor of analysis can accomplish this. This is a good study design approach, and we have statistical tools to analyze it.</p>
<p>Another problem is that doing all pairwise comparison may not align with experimental goals. In many circumstances, not every pairwise combination is of any interest or relevance to the study. Additionally, complex treatment structure may necessitate custom contrasts that highlight differences between the marginal estimate of multiple treatments versus another. For example, there may be 2 levels of ‘high’ nitrogen fertilizer treatment with two different sources (i.e.&nbsp;types of fertilizer). A researcher may want to contrast those two levels together against ‘low’ nitrogen treatment levels.</p>
<p>Often, researchers have embedded additional structure in the treatments that is not fully reflected in the statistical model. For example, perhaps a study is looking at five different intercropping mixtures, two that incorporate a legume and 3 that do not. Conducting all pairwise comparisons with miss estimating the difference due to including a legume in an intercropping mix and not incorporating one. Soil fertility and other agronomic studies often have complex treatment structure. When it is not practical or financially feasible to have a full factorial experiment, embedding different treatment combinations in the main factor of analysis can accomplish this. This is a good study design approach, but compact letter display is an efficient way to report results. In such cases, custom contrasts are a better choice for hypothesis testing.The <a href="chapters/means-and-contrasts.qmd">emmeans chapter</a> covers how to do this.</p>
</section>
<section id="final-thoughts" class="level3" data-number="2.0.5">
<h3 data-number="2.0.5" class="anchored" data-anchor-id="final-thoughts"><span class="header-section-number">2.0.5</span> Final Thoughts</h3>
<p>Good statistical analysis requires a thoughtful, intentional approach. If you have gone to the trouble to conduct a well designed experiment or assemble a useful data set, take the time and effort to analyze it properly.</p>


<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list" style="display: none">
Expand Down
Loading

0 comments on commit 91aa56a

Please sign in to comment.