Skip to content

Commit 10096a6

Browse files
Dominique-Laurent CouturierDominique-Laurent Couturier
Dominique-Laurent Couturier
authored and
Dominique-Laurent Couturier
committedNov 4, 2020
usual update
1 parent 4db696d commit 10096a6

12 files changed

+117
-97
lines changed
 

‎.DS_Store

-2 KB
Binary file not shown.

‎IntroToStat-DLC-20171022.pdf

-2.33 MB
Binary file not shown.

‎IntroToStat-DLC-20171024.pdf

-5.94 MB
Binary file not shown.

‎IntroToStat-DLC-20180212.pdf

-2.68 MB
Binary file not shown.

‎IntroToStat-DLC-20181105.pdf

-1.2 MB
Binary file not shown.

‎IntroToStat-DLC-20190212.pdf

-1.18 MB
Binary file not shown.
-6.82 MB
Binary file not shown.
Binary file not shown.

‎index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ After this course you should be able to:-
4040

4141
### Course Materials
4242

43-
- [Lecture (pdf)](IntroToStatSlides-DLC20200211.pdf)
43+
- [Lecture (pdf)](IntroToStatSlides-DLC20201105.pdf)
4444
<!---
4545
Old link
4646
https://docs.google.com/forms/d/e/1FAIpQLScblQ_-ISfSCGp_EIVPPI_mnrJHttaKxln8vVoyjJFvS8BL1w/viewform)

‎index.md~

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
### Introduction to Statistical Analysis.
2+
3+
This course provides a refresher on the foundations of statistical analysis. Practicals are conducted using the 'Shiny' package; which provides an accessible interface to the R statistical language.
4+
5+
Note that this is not a course for learning about the R statistical language itself. If you wish to learn more about R, please see other courses at the University of Cambridge
6+
7+
- [An Introduction to Solving Biological Problems with R](http://cambiotraining.github.io/r-intro/)
8+
9+
### Authors
10+
11+
- Dominique-Laurent Couturier
12+
- Mark Fernandes
13+
- Matthew Eldridge
14+
15+
(Acknowledgements: Mark Dunning, Robert Nicholls, Sarah Vowler, Deepak Parashar, Sarah Dawson, Elizabeth Merrell)
16+
17+
### Aims
18+
19+
During this course you will learn about:
20+
21+
- Different types of data, distributions and structure within data
22+
- Summary statistics for continuous and discrete data
23+
- Formulating a null hypothesis
24+
- Assumptions of one-sample and two-sample t-tests
25+
- Interpreting the result of a statistical test
26+
- Statistical tests of categorical variables (Chi-squared and Fisher's exact tests)
27+
- Non-parametric versions of one- and two-sample tests (Wilcoxon tests)
28+
29+
We will not cover ANOVA or linear regression here but these are the topics of a [more advanced course](https://bioinformatics-core-shared-training.github.io/linear-models-r)
30+
31+
### Learning Objectives
32+
33+
After this course you should be able to:-
34+
35+
- State the assumptions required for a one-sample and two-sample t-test and be able to interpret the results of such a test
36+
- Know when to apply a paired or independent two-sample t-test
37+
- To perform simple statistical calculations using the online app
38+
- Understand the limitations of the tests taught within the course
39+
- Know when more complex statistical methods are required
40+
41+
### Course Materials
42+
43+
- [Lecture (pdf)](IntroToStatSlides-DLC20200211.pdf)
44+
<!---
45+
Old link
46+
https://docs.google.com/forms/d/e/1FAIpQLScblQ_-ISfSCGp_EIVPPI_mnrJHttaKxln8vVoyjJFvS8BL1w/viewform)
47+
-->
48+
- [Online quiz](https://goo.gl/forms/QABUxPKA988HUVeO2)
49+
- [Practical](practical.html)
50+
- [Interactive document to record your answers for the group exercise](https://etherpad.wikimedia.org/p/Intro_stat_261119)
51+
- [Example data for the course](CourseData.zip)
52+
53+
### Software Requirements
54+
55+
You will need an internet connection in order to run the practicals and examples
56+
57+
- [Central limit theorem app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/central-limit-theorem)
58+
- [One sample test app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/OneSampleTest)
59+
- [Two sample test app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/TwoSampleTest)
60+
- [Contingency table app](http://bioinformatics.cruk.cam.ac.uk/apps/stats/contingency-table)
61+
62+
### Further Reading
63+
64+
- A [Course Manual](manual.pdf)
65+
- Using R for Introductory stats [free eBook pdf](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
66+
- Learning Statistics with R [free textbook pdf](http://health.adelaide.edu.au/psychology/ccs/teaching/lsr/)
67+
68+
### Feedback
69+
70+
- [Feedback form](https://www.surveymonkey.co.uk/r/STATFEB) for course run on 11th February 2020
71+
72+
### Funding
73+
This course has received funding from the [CRUK Cambridge Centre](https://crukcambridgecentre.org.uk). If you are researching Cancer in Cambridge please consider becoming a member.

‎practical.Rmd~

+20-21
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ output:
2020
toc_depth: '3'
2121
---
2222

23-
<!--- rmarkdown::render("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroToStat/practical.Rmd") --->
24-
<!--- setwd("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroToStat/") --->
23+
<!--- rmarkdown::render("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroductionToStats/practical.Rmd") --->
24+
<!--- setwd("~/courses/cruk/IntroductionToStatisticalAnalysis/git_IntroductionToStats/") --->
2525
<img src="stylesheets/logo.png" style="position:absolute;top:0px;right:0px;" width="300" />
2626

2727

2828
---
2929

30-
```{r eval=TRUE, echo=F, results="asis"}
30+
```{r eval=TRUE, echo=FALSE, warning=FALSE, results="asis"}
3131
#BiocStyle::markdown()
3232
library("knitr")
3333
opts_chunk$set(tidy=FALSE,dev="png",fig.show="as.is",
@@ -52,7 +52,7 @@ The tab **Estimated coverage of Student's CI** in the shiny app **central-limit-
5252

5353
1. Assuming that the simulated data are normally distributed, what is the probability of the **true** mean belonging to a confidence interval?
5454
2. Let X denote a random variable that equals 1 if the **true mean belongs to the confidence interval** and 0 otherwise. What is the distribution of X?
55-
3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed?
55+
<!--- 3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed? (too complex) --->
5656

5757
<span style="color:rgb(235, 7, 142)">**Question (ii):**</span>
5858

@@ -247,7 +247,7 @@ cat("From this histogram it is difficult to tell whether the differences between
247247
# Two-Sample Tests
248248

249249
Use our Shiny app [http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest](http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest)
250-
to perform tests of equality of means/medians. [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.
250+
to perform tests of equality of means/medians. <!--- [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.--->
251251

252252
&nbsp;
253253

@@ -425,6 +425,7 @@ cat("Both tests show that there is insufficient evidence to reject the null hypo
425425

426426
&nbsp;
427427

428+
<!---
428429
## Disease association
429430

430431
The following table gives the frequencies of wild-type and knock-out mice developing a disease thought to be associated to the absence of the knock-out gene.
@@ -463,6 +464,7 @@ colnames(.Table) <- c('WT', 'KO')
463464
Enter the data into the [Shiny app](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/). Select the **Fisher's exact test** option to compare the proportion of mice in each group that developed the disease.
464465

465466
<span style="color:rgb(235, 7, 142)">**Question:**</span> What is your p-value? How do you interpret the result?
467+
--->
466468

467469
------
468470
```{r}
@@ -487,7 +489,11 @@ There is evidence of an association between mouse type and disease X.")
487489

488490
# Small-Group Exercise: Choosing a test
489491

490-
In this section, we invite you to form small groups to select a dataset and discuss what methods/tests you would use to analyse those data.
492+
In this section, we invite you to form small groups. Each group will be assigned one of the exercises.
493+
494+
At the end of the time assigned for the exercise we will go through each of the problems in turn and invite a representative of each group to present the problem to the rest of the class along with the analysis (descriptive analysis, statistical tests) the group felt was most appropriate and any conclusions made.
495+
496+
If time allows, it would be beneficial for groups to familiarize themselves with some of the other exercises so that they can contribute to the presentations made by other groups.
491497

492498
You should use this [interactive document](https://public.etherpad-mozilla.org/p/2019-02-12-intro-to-stats) to record your observations.
493499

@@ -503,7 +509,7 @@ library(Biobase)
503509

504510
&nbsp;
505511

506-
## Group 1: Plant Growth `data1.csv`
512+
## Group Exercise 1: Plant Growth `data1.csv`
507513

508514
Darwin (1876) studied the growth of *pairs* of zea may (aka corn) seedlings, one produced by cross-fertilization and the other produced by self-fertilization, but otherwise grown under identical conditions. His goal was to demonstrate the greater vigour of the cross-fertilized plants. The data recorded are the final height (inches, to the nearest 1/8th) of the plants in each pair.
509515

@@ -536,7 +542,7 @@ write.csv(td, file="mystery-data/data1.csv",quote=FALSE,row.names=FALSE)
536542

537543
&nbsp;
538544

539-
## Group 2: Florence Nightingale `data2.csv`
545+
## Group Exercise 2: Florence Nightingale `data2.csv`
540546

541547
In the history of data visualization, Florence Nightingale is best remembered for her role as a social activist and her view that statistical data, presented in charts and diagrams, could be used as powerful arguments for medical reform.
542548

@@ -590,7 +596,7 @@ Night.flt <- Night %>% filter(Cause=="Disease") %>% select(Regime,Deaths)
590596

591597
&nbsp;
592598

593-
## Group 3: Effect of bran on diet: `data3.csv`
599+
## Group Exercise 3: Effect of bran on diet: `data3.csv`
594600

595601
The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several different bran preparations are available, and a clinician wants to test the efficacy of two of them on patients, since favourable claims have been made for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. By random allocation the clinician selects two groups of patients aged 40-64 with diverticulosis of comparable severity. Sample 1 contains 15 patients who are given treatment A, and sample 2 contains 12 patients who are given treatment B.
596602

@@ -618,7 +624,7 @@ t.test(Time~Group,data,var.equal=TRUE)
618624

619625
&nbsp;
620626

621-
## Group 4: Effect of Autism drug `data4.csv`
627+
## Group Exercise 4: Effect of Autism drug `data4.csv`
622628

623629
Consider a clinical investigation to assess the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with autism. If the drug is effective, children will exhibit fewer repetitive behaviors on treatment as compared to when they are untreated. A total of 8 children with autism enroll in the study. Each child is observed by the study psychologist for a period of 3 hours both before treatment and then again after taking the new drug for 1 week. The time that each child is engaged in repetitive behavior during each 3 hour observation period is measured. Repetitive behavior is scored on a scale of 0 to 100 and scores represent the percent of the observation time in which the child is engaged in repetitive behavior. For example, a score of 0 indicates that during the entire observation period the child did not engage in repetitive behavior while a score of 100 indicates that the child was constantly engaged in repetitive behavior.
624630

@@ -637,16 +643,9 @@ write.csv(data, file="mystery-data/data4.csv",quote=FALSE,row.names=FALSE)
637643

638644
```
639645

640-
```{r}
641-
###http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric5.html
642-
## Non-parametric
643-
## Non-matched
644-
## Sign-test
645-
```
646-
647646
&nbsp;
648647

649-
## Group 5: CD4 `data5.csv`
648+
## Group Exercise 5: CD4 `data5.csv`
650649

651650
CD4 cells are carried in the blood as part of the human immune system. One of the effects of the HIV virus is that these cells die. The count of CD4 cells is used in determining the onset of full-blown AIDS in a patient. In this study of the effectiveness of a new anti-viral drug on HIV, 20 HIV-positive patients had their CD4 counts recorded and then were put on a course of treatment with this drug. After using the drug for one year, their CD4 counts were again recorded.
652651

@@ -670,7 +669,7 @@ t.test(data[,1],data[,2],paired=TRUE)
670669

671670
&nbsp;
672671

673-
## Group 6: Drink Driving `data6.csv`
672+
## Group Exercise 6: Drink Driving `data6.csv`
674673

675674
Drunk driving is one of the main causes of car accidents. Interviews with drunk drivers who were involved in accidents and survived revealed that one of the main problems is that drivers do not realize that they are impaired, thinking “I only had 1-2 drinks … I am OK to drive.”
676675

@@ -705,7 +704,7 @@ write.csv(data2,file="mystery-data/data6.csv",quote=FALSE,row.names=FALSE)
705704

706705
&nbsp;
707706

708-
## Group 7: Pollution in Trees `data7.csv`
707+
## Group Exercise 7: Pollution in Trees `data7.csv`
709708

710709
Laureysens et al. (2004) measured metal content in the wood of 13 poplar clones growing in a polluted area, once in August and once in November. Concentrations of aluminum (in micrograms of Al per gram of wood) are shown below.
711710

@@ -733,7 +732,7 @@ boxplot(data)
733732

734733
&nbsp;
735734

736-
## Group 8: Salaries for Professors `data8.csv`
735+
## Group Exercise 8: Salaries for Professors `data8.csv`
737736

738737
The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S. The data were collected as part of the on-going effort of the college's administration to monitor salary differences between male and female faculty members. (salary given as nine-month salary, in dollars.)
739738

‎practical.tex

+23-75
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
\PassOptionsToPackage{unicode=true}{hyperref} % options for packages loaded elsewhere
2+
\PassOptionsToPackage{hyphens}{url}
3+
%
14
\documentclass[]{article}
25
\usepackage{lmodern}
36
\usepackage{amssymb,amsmath}
@@ -6,30 +9,32 @@
69
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
710
\usepackage[T1]{fontenc}
811
\usepackage[utf8]{inputenc}
12+
\usepackage{textcomp} % provides euro and other symbols
913
\else % if luatex or xelatex
10-
\ifxetex
11-
\usepackage{mathspec}
12-
\else
13-
\usepackage{fontspec}
14-
\fi
14+
\usepackage{unicode-math}
1515
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
1616
\fi
1717
% use upquote if available, for straight quotes in verbatim environments
1818
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
1919
% use microtype if available
2020
\IfFileExists{microtype.sty}{%
21-
\usepackage{microtype}
21+
\usepackage[]{microtype}
2222
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
2323
}{}
24-
\usepackage[margin=1in]{geometry}
24+
\IfFileExists{parskip.sty}{%
25+
\usepackage{parskip}
26+
}{% else
27+
\setlength{\parindent}{0pt}
28+
\setlength{\parskip}{6pt plus 2pt minus 1pt}
29+
}
2530
\usepackage{hyperref}
26-
\hypersetup{unicode=true,
31+
\hypersetup{
2732
pdftitle={Introduction to Statistical Analysis},
28-
pdfauthor={D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S. Vowler)},
33+
pdfauthor={D.-L. Couturier and M. Fernandes (with contributions of M. Eldridge, M. Dunning and S. Vowler)},
2934
pdfborder={0 0 0},
3035
breaklinks=true}
3136
\urlstyle{same} % don't use monospace font for urls
32-
\usepackage{longtable,booktabs}
37+
\usepackage[margin=1in]{geometry}
3338
\usepackage{graphicx,grffile}
3439
\makeatletter
3540
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
@@ -39,12 +44,6 @@
3944
% margins by default, and it is still possible to overwrite the defaults
4045
% using explicit options in \includegraphics[width, height, ...]{}
4146
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
42-
\IfFileExists{parskip.sty}{%
43-
\usepackage{parskip}
44-
}{% else
45-
\setlength{\parindent}{0pt}
46-
\setlength{\parskip}{6pt plus 2pt minus 1pt}
47-
}
4847
\setlength{\emergencystretch}{3em} % prevent overfull lines
4948
\providecommand{\tightlist}{%
5049
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
@@ -59,32 +58,16 @@
5958
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
6059
\fi
6160

62-
%%% Use protect on footnotes to avoid problems with footnotes in titles
63-
\let\rmarkdownfootnote\footnote%
64-
\def\footnote{\protect\rmarkdownfootnote}
65-
66-
%%% Change title format to be more compact
67-
\usepackage{titling}
68-
69-
% Create subtitle command for use in maketitle
70-
\newcommand{\subtitle}[1]{
71-
\posttitle{
72-
\begin{center}\large#1\end{center}
73-
}
74-
}
61+
% set default figure placement to htbp
62+
\makeatletter
63+
\def\fps@figure{htbp}
64+
\makeatother
7565

76-
\setlength{\droptitle}{-2em}
7766

78-
\title{Introduction to Statistical Analysis}
79-
\pretitle{\vspace{\droptitle}\centering\huge}
80-
\posttitle{\par}
81-
\author{D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S.
82-
Vowler)}
83-
\preauthor{\centering\large\emph}
84-
\postauthor{\par}
85-
\date{}
86-
\predate{}\postdate{}
87-
67+
\title{Introduction to Statistical Analysis}
68+
\author{D.-L. Couturier and M. Fernandes (with contributions of M. Eldridge, M.
69+
Dunning and S. Vowler)}
70+
\date{}
8871

8972
\begin{document}
9073
\maketitle
@@ -469,40 +452,6 @@ \subsection{Birth-weight of twins}\label{birth-weight-of-twins}}
469452

470453
~
471454

472-
\hypertarget{disease-association}{%
473-
\subsection{Disease association}\label{disease-association}}
474-
475-
The following table gives the frequencies of wild-type and knock-out
476-
mice developing a disease thought to be associated to the absence of the
477-
knock-out gene.
478-
479-
\begin{longtable}[]{@{}lrrr@{}}
480-
\toprule
481-
~ & WT & KO & Total\tabularnewline
482-
\midrule
483-
\endhead
484-
Disease & 1 & 7 & 8\tabularnewline
485-
No disease & 9 & 3 & 12\tabularnewline
486-
Total & 10 & 10 & 20\tabularnewline
487-
\bottomrule
488-
\end{longtable}
489-
490-
{\textbf{Question:}} What are your null and alternative hypotheses?
491-
492-
\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
493-
494-
{\textbf{Question:}} What are your expected frequencies?
495-
496-
\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
497-
498-
Enter the data into the
499-
\href{http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/}{Shiny
500-
app}. Select the \textbf{Fisher's exact test} option to compare the
501-
proportion of mice in each group that developed the disease.
502-
503-
{\textbf{Question:}} What is your p-value? How do you interpret the
504-
result?
505-
506455
\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
507456

508457
~
@@ -679,5 +628,4 @@ \subsection{\texorpdfstring{Group Exercise 8: Salaries for Professors
679628
{\emph{Is there evidence that Female professors are paid differently to
680629
their Male counterparts?}}
681630

682-
683631
\end{document}

0 commit comments

Comments
 (0)