You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This course provides a refresher on the foundations of statistical analysis. Practicals are conducted using the 'Shiny' package; which provides an accessible interface to the R statistical language.
4
+
5
+
Note that this is not a course for learning about the R statistical language itself. If you wish to learn more about R, please see other courses at the University of Cambridge
6
+
7
+
- [An Introduction to Solving Biological Problems with R](http://cambiotraining.github.io/r-intro/)
8
+
9
+
### Authors
10
+
11
+
- Dominique-Laurent Couturier
12
+
- Mark Fernandes
13
+
- Matthew Eldridge
14
+
15
+
(Acknowledgements: Mark Dunning, Robert Nicholls, Sarah Vowler, Deepak Parashar, Sarah Dawson, Elizabeth Merrell)
16
+
17
+
### Aims
18
+
19
+
During this course you will learn about:
20
+
21
+
- Different types of data, distributions and structure within data
22
+
- Summary statistics for continuous and discrete data
23
+
- Formulating a null hypothesis
24
+
- Assumptions of one-sample and two-sample t-tests
25
+
- Interpreting the result of a statistical test
26
+
- Statistical tests of categorical variables (Chi-squared and Fisher's exact tests)
27
+
- Non-parametric versions of one- and two-sample tests (Wilcoxon tests)
28
+
29
+
We will not cover ANOVA or linear regression here but these are the topics of a [more advanced course](https://bioinformatics-core-shared-training.github.io/linear-models-r)
30
+
31
+
### Learning Objectives
32
+
33
+
After this course you should be able to:-
34
+
35
+
- State the assumptions required for a one-sample and two-sample t-test and be able to interpret the results of such a test
36
+
- Know when to apply a paired or independent two-sample t-test
37
+
- To perform simple statistical calculations using the online app
38
+
- Understand the limitations of the tests taught within the course
39
+
- Know when more complex statistical methods are required
- Using R for Introductory stats [free eBook pdf](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
66
+
- Learning Statistics with R [free textbook pdf](http://health.adelaide.edu.au/psychology/ccs/teaching/lsr/)
67
+
68
+
### Feedback
69
+
70
+
- [Feedback form](https://www.surveymonkey.co.uk/r/STATFEB) for course run on 11th February 2020
71
+
72
+
### Funding
73
+
This course has received funding from the [CRUK Cambridge Centre](https://crukcambridgecentre.org.uk). If you are researching Cancer in Cambridge please consider becoming a member.
@@ -52,7 +52,7 @@ The tab **Estimated coverage of Student's CI** in the shiny app **central-limit-
52
52
53
53
1. Assuming that the simulated data are normally distributed, what is the probability of the **true** mean belonging to a confidence interval?
54
54
2. Let X denote a random variable that equals 1 if the **true mean belongs to the confidence interval** and 0 otherwise. What is the distribution of X?
55
-
3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed?
55
+
<!--- 3. What is the probability that 0 confidence intervals out of 50 contain the **true mean** if data are normally distributed? (too complex) --->
@@ -247,7 +247,7 @@ cat("From this histogram it is difficult to tell whether the differences between
247
247
# Two-Sample Tests
248
248
249
249
Use our Shiny app [http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest](http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest)
250
-
to perform tests of equality of means/medians. [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.
250
+
to perform tests of equality of means/medians. <!--- [http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table) to perform tests of equality of proportions.--->
251
251
252
252
253
253
@@ -425,6 +425,7 @@ cat("Both tests show that there is insufficient evidence to reject the null hypo
425
425
426
426
427
427
428
+
<!---
428
429
## Disease association
429
430
430
431
The following table gives the frequencies of wild-type and knock-out mice developing a disease thought to be associated to the absence of the knock-out gene.
Enter the data into the [Shiny app](http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/). Select the **Fisher's exact test** option to compare the proportion of mice in each group that developed the disease.
464
465
465
466
<span style="color:rgb(235, 7, 142)">**Question:**</span> What is your p-value? How do you interpret the result?
467
+
--->
466
468
467
469
------
468
470
```{r}
@@ -487,7 +489,11 @@ There is evidence of an association between mouse type and disease X.")
487
489
488
490
# Small-Group Exercise: Choosing a test
489
491
490
-
In this section, we invite you to form small groups to select a dataset and discuss what methods/tests you would use to analyse those data.
492
+
In this section, we invite you to form small groups. Each group will be assigned one of the exercises.
493
+
494
+
At the end of the time assigned for the exercise we will go through each of the problems in turn and invite a representative of each group to present the problem to the rest of the class along with the analysis (descriptive analysis, statistical tests) the group felt was most appropriate and any conclusions made.
495
+
496
+
If time allows, it would be beneficial for groups to familiarize themselves with some of the other exercises so that they can contribute to the presentations made by other groups.
491
497
492
498
You should use this [interactive document](https://public.etherpad-mozilla.org/p/2019-02-12-intro-to-stats) to record your observations.
493
499
@@ -503,7 +509,7 @@ library(Biobase)
503
509
504
510
505
511
506
-
## Group 1: Plant Growth `data1.csv`
512
+
## Group Exercise 1: Plant Growth `data1.csv`
507
513
508
514
Darwin (1876) studied the growth of *pairs* of zea may (aka corn) seedlings, one produced by cross-fertilization and the other produced by self-fertilization, but otherwise grown under identical conditions. His goal was to demonstrate the greater vigour of the cross-fertilized plants. The data recorded are the final height (inches, to the nearest 1/8th) of the plants in each pair.
## Group Exercise 2: Florence Nightingale `data2.csv`
540
546
541
547
In the history of data visualization, Florence Nightingale is best remembered for her role as a social activist and her view that statistical data, presented in charts and diagrams, could be used as powerful arguments for medical reform.
## Group Exercise 3: Effect of bran on diet: `data3.csv`
594
600
595
601
The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several different bran preparations are available, and a clinician wants to test the efficacy of two of them on patients, since favourable claims have been made for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. By random allocation the clinician selects two groups of patients aged 40-64 with diverticulosis of comparable severity. Sample 1 contains 15 patients who are given treatment A, and sample 2 contains 12 patients who are given treatment B.
## Group Exercise 4: Effect of Autism drug `data4.csv`
622
628
623
629
Consider a clinical investigation to assess the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with autism. If the drug is effective, children will exhibit fewer repetitive behaviors on treatment as compared to when they are untreated. A total of 8 children with autism enroll in the study. Each child is observed by the study psychologist for a period of 3 hours both before treatment and then again after taking the new drug for 1 week. The time that each child is engaged in repetitive behavior during each 3 hour observation period is measured. Repetitive behavior is scored on a scale of 0 to 100 and scores represent the percent of the observation time in which the child is engaged in repetitive behavior. For example, a score of 0 indicates that during the entire observation period the child did not engage in repetitive behavior while a score of 100 indicates that the child was constantly engaged in repetitive behavior.
CD4 cells are carried in the blood as part of the human immune system. One of the effects of the HIV virus is that these cells die. The count of CD4 cells is used in determining the onset of full-blown AIDS in a patient. In this study of the effectiveness of a new anti-viral drug on HIV, 20 HIV-positive patients had their CD4 counts recorded and then were put on a course of treatment with this drug. After using the drug for one year, their CD4 counts were again recorded.
Drunk driving is one of the main causes of car accidents. Interviews with drunk drivers who were involved in accidents and survived revealed that one of the main problems is that drivers do not realize that they are impaired, thinking “I only had 1-2 drinks … I am OK to drive.”
## Group Exercise 7: Pollution in Trees `data7.csv`
709
708
710
709
Laureysens et al. (2004) measured metal content in the wood of 13 poplar clones growing in a polluted area, once in August and once in November. Concentrations of aluminum (in micrograms of Al per gram of wood) are shown below.
711
710
@@ -733,7 +732,7 @@ boxplot(data)
733
732
734
733
735
734
736
-
## Group 8: Salaries for Professors `data8.csv`
735
+
## Group Exercise 8: Salaries for Professors `data8.csv`
737
736
738
737
The 2008-09 nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S. The data were collected as part of the on-going effort of the college's administration to monitor salary differences between male and female faculty members. (salary given as nine-month salary, in dollars.)
0 commit comments