MysoreSparrow · Nov 4, 2020
diff --git a/‎.DS_Store
-2 KB b/‎.DS_Store
-2 KB
diff --git a/‎IntroToStat-DLC-20171022.pdf
-2.33 MB b/‎IntroToStat-DLC-20171022.pdf
-2.33 MB
diff --git a/‎IntroToStat-DLC-20171024.pdf
-5.94 MB b/‎IntroToStat-DLC-20171024.pdf
-5.94 MB
diff --git a/‎IntroToStat-DLC-20180212.pdf
-2.68 MB b/‎IntroToStat-DLC-20180212.pdf
-2.68 MB
diff --git a/‎IntroToStat-DLC-20181105.pdf
-1.2 MB b/‎IntroToStat-DLC-20181105.pdf
-1.2 MB
diff --git a/‎IntroToStat-DLC-20190212.pdf
-1.18 MB b/‎IntroToStat-DLC-20190212.pdf
-1.18 MB
diff --git a/‎IntroToStatSlides-AnnotatedSlides-DLC20181105.pdf
-6.82 MB b/‎IntroToStatSlides-AnnotatedSlides-DLC20181105.pdf
-6.82 MB
diff --git a/‎IntroToStat-DLC-20191126.pdf ‎IntroToStatSlides-DLC20201105.pdf
1.18 MB b/‎IntroToStat-DLC-20191126.pdf ‎IntroToStatSlides-DLC20201105.pdf
1.18 MB
diff --git a/‎index.md
+1-1 b/‎index.md
+1-1
diff --git a/‎index.md~
+73 b/‎index.md~
+73
diff --git a/‎practical.Rmd~
+20-21 b/‎practical.Rmd~
+20-21
diff --git a/‎practical.tex
+23-75 b/‎practical.tex
+23-75
@@ -40,7 +40,7 @@ After this course you should be able to:-
 
 ### Course Materials
 
-- [Lecture (pdf)](IntroToStatSlides-DLC20200211.pdf)
+- [Lecture (pdf)](IntroToStatSlides-DLC20201105.pdf)
 <!---
 Old link
 https://docs.google.com/forms/d/e/1FAIpQLScblQ_-ISfSCGp_EIVPPI_mnrJHttaKxln8vVoyjJFvS8BL1w/viewform)
 
@@ -0,0 +1,73 @@
+### Introduction to Statistical Analysis.
+
+This course provides a refresher on the foundations of statistical analysis. Practicals are conducted using the 'Shiny' package; which provides an accessible interface to the R statistical language.
+
+Note that this is not a course for learning about the R statistical language itself. If you wish to learn more about R, please see other courses at the University of Cambridge
+
+- [An Introduction to Solving Biological Problems with R](http://cambiotraining.github.io/r-intro/)
+
+### Authors
+
+- Dominique-Laurent Couturier
+- Mark Fernandes
+- Matthew Eldridge
+
+(Acknowledgements: Mark Dunning, Robert Nicholls, Sarah Vowler, Deepak Parashar, Sarah Dawson, Elizabeth Merrell)
+
+### Aims
+
+During this course you will learn about:
+
+- Different types of data, distributions and structure within data
+- Summary statistics for continuous and discrete data
+- Formulating a null hypothesis
+- Assumptions of one-sample and two-sample t-tests
+- Interpreting the result of a statistical test
+- Statistical tests of categorical variables (Chi-squared and Fisher's exact tests)
+- Non-parametric versions of one- and two-sample tests (Wilcoxon tests)
+
+We will not cover ANOVA or linear regression here but these are the topics of a [more advanced course](https://bioinformatics-core-shared-training.github.io/linear-models-r)
+
+### Learning Objectives
+
+After this course you should be able to:-
+
+- State the assumptions required for a one-sample and two-sample t-test and be able to interpret the results of such a test
+- Know when to apply a paired or independent two-sample t-test
+- To perform simple statistical calculations using the online app
+- Understand the limitations of the tests taught within the course
+- Know when more complex statistical methods are required
+
+### Course Materials
+
+- [Lecture (pdf)](IntroToStatSlides-DLC20200211.pdf)
+<!---
+Old link
+https://docs.google.com/forms/d/e/1FAIpQLScblQ_-ISfSCGp_EIVPPI_mnrJHttaKxln8vVoyjJFvS8BL1w/viewform)
+-->
+- [Online quiz](https://goo.gl/forms/QABUxPKA988HUVeO2)
+- [Practical](practical.html)
+- [Interactive document to record your answers for the group exercise](https://etherpad.wikimedia.org/p/Intro_stat_261119)
+- [Example data for the course](CourseData.zip)
+
+### Software Requirements
+
+You will need an internet connection in order to run the practicals and examples
+
+- [Central limit theorem app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/central-limit-theorem)
+- [One sample test app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/OneSampleTest)
+- [Two sample test app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/TwoSampleTest)
+- [Contingency table app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/contingency-table)
+
+### Further Reading
+
+- A [Course Manual](manual.pdf)
+- Using R for Introductory stats [free eBook pdf](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
+- Learning Statistics with R [free textbook pdf](http://health.adelaide.edu.au/psychology/ccs/teaching/lsr/)
+
+### Feedback
+
+- [Feedback form](https://www.surveymonkey.co.uk/r/STATFEB) for course run on 11th February 2020
+
+### Funding
+This course has received funding from the [CRUK Cambridge Centre](https://crukcambridgecentre.org.uk). If you are researching Cancer in Cambridge please consider becoming a member.
@@ -20,14 +20,14 @@ output:
     toc_depth: '3'
 ---
 
-<!--- rmarkdown::render("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroToStat/practical.Rmd") --->
-<!--- setwd("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroToStat/") --->
+<!--- rmarkdown::render("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroductionToStats/practical.Rmd") --->
+<!--- setwd("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroductionToStats/") --->
 <img src="stylesheets/logo.png" style="position:absolute;top:0px;right:0px;" width="300" />
 
 
 ---
 
-```{r eval=TRUE, echo=F, results="asis"}
+```{r eval=TRUE, echo=FALSE, warning=FALSE, results="asis"}
 #BiocStyle::markdown()
 library("knitr")
 opts_chunk$set(tidy=FALSE,dev="png",fig.show="as.is",
@@ -52,7 +52,7 @@ The tab **Estimated coverage of Student's CI** in the shiny app **central-limit-
 
 1. Assuming that the simulated data are normally distributed, what is the probability of the **true** mean belonging to a confidence interval?
 2. Let X denote a random variable that equals 1 if the **true mean belongs to the confidence interval** and 0 otherwise. What is the distribution of X?     
-3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed?
+<!--- 3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed? (too complex) --->
 
 <span style="color:rgb(235, 7, 142)">**Question (ii):**</span>
 
@@ -247,7 +247,7 @@ cat("From this histogram it is difficult to tell whether the differences between
 # Two-Sample Tests
 
 Use our Shiny app [http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest](http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest)
-to perform tests of equality of means/medians. [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.
+to perform tests of equality of means/medians. <!--- [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.--->
 
 &nbsp;
 
@@ -425,6 +425,7 @@ cat("Both tests show that there is insufficient evidence to reject the null hypo
 
 &nbsp;
 
+<!---
 ## Disease association
 
 The following table gives the frequencies of wild-type and knock-out mice developing a disease thought to be associated to the absence of the knock-out gene. 
@@ -463,6 +464,7 @@ colnames(.Table) <- c('WT', 'KO')
 Enter the data into the [Shiny app](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/). Select the **Fisher's exact test** option to compare the proportion of mice in each group that developed the disease.
 
 <span style="color:rgb(235, 7, 142)">**Question:**</span>  What is your p-value? How do you interpret the result?
+---> 
 
 ------
 ```{r}
@@ -487,7 +489,11 @@ There is evidence of an association between mouse type and disease X.")
 
 # Small-Group Exercise: Choosing a test 
 
-In this section, we invite you to form small groups to select a dataset and discuss what methods/tests you would use to analyse those data.
+In this section, we invite you to form small groups. Each group will be assigned one of the exercises. 
+
+At the end of the time assigned for the exercise we will go through each of the problems in turn and invite a representative of each group to present the problem to the rest of the class along with the analysis (descriptive analysis, statistical tests) the group felt was most appropriate and any conclusions made.
+
+If time allows, it would be beneficial for groups to familiarize themselves with some of the other exercises so that they can contribute to the presentations made by other groups.
 
 You should use this [interactive document](https://public.etherpad-mozilla.org/p/2019-02-12-intro-to-stats) to record your observations.
 
@@ -503,7 +509,7 @@ library(Biobase)
 
 &nbsp;
 
-## Group 1: Plant Growth `data1.csv`
+## Group Exercise 1: Plant Growth `data1.csv`
 
 Darwin (1876) studied the growth of *pairs* of zea may (aka corn) seedlings, one produced by cross-fertilization and the other produced by self-fertilization, but otherwise grown under identical conditions. His goal was to demonstrate the greater vigour of the cross-fertilized plants. The data recorded are the final height (inches, to the nearest 1/8th) of the plants in each pair.
 
@@ -536,7 +542,7 @@ write.csv(td, file="mystery-data/data1.csv",quote=FALSE,row.names=FALSE)
 
 &nbsp;
 
-## Group 2: Florence Nightingale `data2.csv`
+## Group Exercise 2: Florence Nightingale `data2.csv`
 
 In the history of data visualization, Florence Nightingale is best remembered for her role as a social activist and her view that statistical data, presented in charts and diagrams, could be used as powerful arguments for medical reform.
 
@@ -590,7 +596,7 @@ Night.flt <- Night %>% filter(Cause=="Disease") %>% select(Regime,Deaths)
 
 &nbsp;
 
-## Group 3: Effect of bran on diet: `data3.csv`
+## Group Exercise 3: Effect of bran on diet: `data3.csv`
 
 The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several different bran preparations are available, and a clinician wants to test the efficacy of two of them on patients, since favourable claims have been made for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. By random allocation the clinician selects two groups of patients aged 40-64 with diverticulosis of comparable severity. Sample 1 contains 15 patients who are given treatment A, and sample 2 contains 12 patients who are given treatment B.
 
@@ -618,7 +624,7 @@ t.test(Time~Group,data,var.equal=TRUE)
 
 &nbsp;
 
-## Group 4: Effect of Autism drug  `data4.csv`
+## Group Exercise 4: Effect of Autism drug  `data4.csv`
 
 Consider a clinical investigation to assess the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with autism. If the drug is effective, children will exhibit fewer repetitive behaviors on treatment as compared to when they are untreated. A total of 8 children with autism enroll in the study. Each child is observed by the study psychologist for a period of 3 hours both before treatment and then again after taking the new drug for 1 week. The time that each child is engaged in repetitive behavior during each 3 hour observation period is measured. Repetitive behavior is scored on a scale of 0 to 100 and scores represent the percent of the observation time in which the child is engaged in repetitive behavior. For example, a score of 0 indicates that during the entire observation period the child did not engage in repetitive behavior while a score of 100 indicates that the child was constantly engaged in repetitive behavior. 
 
@@ -637,16 +643,9 @@ write.csv(data, file="mystery-data/data4.csv",quote=FALSE,row.names=FALSE)
 
 ```
 
-```{r}
-###http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric5.html
-## Non-parametric
-## Non-matched
-## Sign-test
-```
-
 &nbsp;
 
-## Group 5: CD4 `data5.csv`
+## Group Exercise 5: CD4 `data5.csv`
 
 CD4 cells are carried in the blood as part of the human immune system. One of the effects of the HIV virus is that these cells die. The count of CD4 cells is used in determining the onset of full-blown AIDS in a patient. In this study of the effectiveness of a new anti-viral drug on HIV, 20 HIV-positive patients had their CD4 counts recorded and then were put on a course of treatment with this drug. After using the drug for one year, their CD4 counts were again recorded. 
 
@@ -670,7 +669,7 @@ t.test(data[,1],data[,2],paired=TRUE)
 
 &nbsp;
 
-## Group 6: Drink Driving `data6.csv`
+## Group Exercise 6: Drink Driving `data6.csv`
 
 Drunk driving is one of the main causes of car accidents. Interviews with drunk drivers who were involved in accidents and survived revealed that one of the main problems is that drivers do not realize that they are impaired, thinking “I only had 1-2 drinks … I am OK to drive.”
 
@@ -705,7 +704,7 @@ write.csv(data2,file="mystery-data/data6.csv",quote=FALSE,row.names=FALSE)
 
 &nbsp;
 
-## Group 7: Pollution in Trees `data7.csv`
+## Group Exercise 7: Pollution in Trees `data7.csv`
 
 Laureysens et al. (2004) measured metal content in the wood of 13 poplar clones growing in a polluted area, once in August and once in November. Concentrations of aluminum (in micrograms of Al per gram of wood) are shown below.
 
@@ -733,7 +732,7 @@ boxplot(data)
 
 &nbsp;
 
-## Group 8: Salaries for Professors  `data8.csv`
+## Group Exercise 8: Salaries for Professors  `data8.csv`
 
 The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S. The data were collected as part of the on-going effort of the college's administration to monitor salary differences between male and female faculty members. (salary given as nine-month salary, in dollars.)
 
 
@@ -1,3 +1,6 @@
+\PassOptionsToPackage{unicode=true}{hyperref} % options for packages loaded elsewhere
+\PassOptionsToPackage{hyphens}{url}
+%
 \documentclass[]{article}
 \usepackage{lmodern}
 \usepackage{amssymb,amsmath}
@@ -6,30 +9,32 @@
 \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
   \usepackage[T1]{fontenc}
   \usepackage[utf8]{inputenc}
+  \usepackage{textcomp} % provides euro and other symbols
 \else % if luatex or xelatex
-  \ifxetex
-    \usepackage{mathspec}
-  \else
-    \usepackage{fontspec}
-  \fi
+  \usepackage{unicode-math}
   \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
 \fi
 % use upquote if available, for straight quotes in verbatim environments
 \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
 % use microtype if available
 \IfFileExists{microtype.sty}{%
-\usepackage{microtype}
+\usepackage[]{microtype}
 \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
 }{}
-\usepackage[margin=1in]{geometry}
+\IfFileExists{parskip.sty}{%
+\usepackage{parskip}
+}{% else
+\setlength{\parindent}{0pt}
+\setlength{\parskip}{6pt plus 2pt minus 1pt}
+}
 \usepackage{hyperref}
-\hypersetup{unicode=true,
+\hypersetup{
             pdftitle={Introduction to Statistical Analysis},
-            pdfauthor={D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S. Vowler)},
+            pdfauthor={D.-L. Couturier and M. Fernandes (with contributions of M. Eldridge, M. Dunning and S. Vowler)},
             pdfborder={0 0 0},
             breaklinks=true}
 \urlstyle{same}  % don't use monospace font for urls
-\usepackage{longtable,booktabs}
+\usepackage[margin=1in]{geometry}
 \usepackage{graphicx,grffile}
 \makeatletter
 \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
@@ -39,12 +44,6 @@
 % margins by default, and it is still possible to overwrite the defaults
 % using explicit options in \includegraphics[width, height, ...]{}
 \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
-\IfFileExists{parskip.sty}{%
-\usepackage{parskip}
-}{% else
-\setlength{\parindent}{0pt}
-\setlength{\parskip}{6pt plus 2pt minus 1pt}
-}
 \setlength{\emergencystretch}{3em}  % prevent overfull lines
 \providecommand{\tightlist}{%
   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
@@ -59,32 +58,16 @@
 \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
 \fi
 
-%%% Use protect on footnotes to avoid problems with footnotes in titles
-\let\rmarkdownfootnote\footnote%
-\def\footnote{\protect\rmarkdownfootnote}
-
-%%% Change title format to be more compact
-\usepackage{titling}
-
-% Create subtitle command for use in maketitle
-\newcommand{\subtitle}[1]{
-  \posttitle{
-    \begin{center}\large#1\end{center}
-    }
-}
+% set default figure placement to htbp
+\makeatletter
+\def\fps@figure{htbp}
+\makeatother
 
-\setlength{\droptitle}{-2em}
 
-  \title{Introduction to Statistical Analysis}
-    \pretitle{\vspace{\droptitle}\centering\huge}
-  \posttitle{\par}
-    \author{D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S.
-Vowler)}
-    \preauthor{\centering\large\emph}
-  \postauthor{\par}
-    \date{}
-    \predate{}\postdate{}
-  
+\title{Introduction to Statistical Analysis}
+\author{D.-L. Couturier and M. Fernandes (with contributions of M. Eldridge, M.
+Dunning and S. Vowler)}
+\date{}
 
 \begin{document}
 \maketitle
@@ -469,40 +452,6 @@ \subsection{Birth-weight of twins}\label{birth-weight-of-twins}}
 
 ~
 
-\hypertarget{disease-association}{%
-\subsection{Disease association}\label{disease-association}}
-
-The following table gives the frequencies of wild-type and knock-out
-mice developing a disease thought to be associated to the absence of the
-knock-out gene.
-
-\begin{longtable}[]{@{}lrrr@{}}
-\toprule
-~ & WT & KO & Total\tabularnewline
-\midrule
-\endhead
-Disease & 1 & 7 & 8\tabularnewline
-No disease & 9 & 3 & 12\tabularnewline
-Total & 10 & 10 & 20\tabularnewline
-\bottomrule
-\end{longtable}
-
-{\textbf{Question:}} What are your null and alternative hypotheses?
-
-\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
-
-{\textbf{Question:}} What are your expected frequencies?
-
-\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
-
-Enter the data into the
-\href{http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/}{Shiny
-app}. Select the \textbf{Fisher's exact test} option to compare the
-proportion of mice in each group that developed the disease.
-
-{\textbf{Question:}} What is your p-value? How do you interpret the
-result?
-
 \begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
 
 ~
@@ -679,5 +628,4 @@ \subsection{\texorpdfstring{Group Exercise 8: Salaries for Professors
 {\emph{Is there evidence that Female professors are paid differently to
 their Male counterparts?}}
 
-
 \end{document}