The data-intensive nature of the 21st-century biology makes it important for scientists to have a basic proficiency in statistics. Whether it is thousands of gene expression levels as measured by a RNA-Seq, millions of polymorphisms that have been genotyped in a case-control study or more general questions of how to properly design an experiment, you will constantly be confronted with how to collect, analyze and interpret data. This course provides the key statistical concepts and methods necessary for extracting biological insights from data.
A common misconception about "doing" statistics is that it is useful only for analyzing data after an experiment has been performed. In fact, statistical methods are an integral part of designing experiments as well. How small of an effect size do you want to be able to detect? What sample size will you need? What is the power of your experiment?
In this ten week course, we will not be able to cover every specific topic that might arise in the course of your research. Thus, we will focus on fundamental concepts that will provide you with the tools necessary to address routine statistical analyses and the foundation to learn about more specialized topics.
Throughout this course, we will make use of the freely available statistical software R (http://www.r-project.org), which has a well established integrated development environment (RStudioLinks to an external site) that is helpful for working with your data. Problem sets should be completed using R markdown (or Jupyter notebooks), though you are free to use other tools if you prefer.
Matreyek, K.A., Starita, L.M., Stephany, J.J. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50, 874–882 (2018). https://doi.org/10.1038/s41588-018-0122-z
VAMPseq_TPMT.txt
VAMPseq_PTEN.txt
These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.