Skip to content

Commit f1576b0

Browse files
committed
Update t-test example to Wind Speed (as Temperature was not normally-distributed)
1 parent f0475fa commit f1576b0

7 files changed

+620
-567
lines changed

Slides-day1.Rmd

+5-5
Original file line numberDiff line numberDiff line change
@@ -1350,7 +1350,7 @@ boxplot(patients$Weight ~ patients$Sex, horizontal = T)
13501350
1. Import these data into R
13511351
2. What data types are present? Try to think of ways to create the following plots from the data
13521352
+ Scatter plot two variables. e.g. Solar Radiation against Ozone
1353-
+ A histogram. e.g. Temperature
1353+
+ A histogram. e.g. Wind Speed
13541354
+ Boxplot of a continuous variable against a categorical variable. e.g. Ozone level per month
13551355

13561356

@@ -1378,7 +1378,7 @@ plot(weather$Solar.R, weather$Ozone)
13781378
## Suggestions
13791379

13801380
```{r}
1381-
hist(weather$Temp)
1381+
hist(weather$Wind)
13821382
```
13831383

13841384

@@ -1555,7 +1555,7 @@ par(mfrow=c(2,2))
15551555
plot(weather$Solar.R,weather$Ozone, col="orange", pch=16,
15561556
ylab="Ozone level", xlab="Solar Radiation",
15571557
main="Relationship between ozone level and solar radiation")
1558-
hist(weather$Temp, col="purple", xlab="Temperature", main="Distribution of Temperature", breaks=50:100, freq=FALSE)
1558+
hist(weather$Wind, col="purple", xlab="Wind Speed", main="Distribution of Wind Speed", breaks=20, freq=FALSE)
15591559
boxplot(weather$Ozone~weather$Month,col=rainbow(5))
15601560
```
15611561

@@ -1572,8 +1572,8 @@ plot(weather$Solar.R, weather$Ozone, col="orange", pch=16,
15721572
## Solutions
15731573

15741574
```{r,fig.height=5}
1575-
hist(weather$Temp, col="purple", xlab="Temperature",
1576-
main="Distribution of Temperature", breaks = 50:100,
1575+
hist(weather$Wind, col="purple", xlab="Temperature",
1576+
main="Distribution of Temperature", breaks = 20,
15771577
freq=FALSE)
15781578
```
15791579

Slides-day1.html

+294-265
Large diffs are not rendered by default.

Slides-day2.Rmd

+26-26
Original file line numberDiff line numberDiff line change
@@ -521,33 +521,33 @@ Recall our histogram of temperature from yesterday:
521521
- An assumption we rely on for various statistical tests
522522

523523
```{r fig.height=4}
524-
hist(weather$Temp, col="purple", xlab="Temperature",
525-
main="Distribution of Temperature",
526-
breaks = 50:100, freq=FALSE)
524+
hist(weather$Wind, col="purple", xlab="Wind Speed",
525+
main="Distribution of Wind Speed",
526+
breaks = 20, freq=FALSE)
527527
```
528528

529529
## Create a normal distribution curve
530530

531531
- If our data are normally-distributed, we can calculate the probability of drawing particular values.
532-
+ e.g. a temperature of 80
532+
+ e.g. a temperature of 10
533533

534534
```{r eval=FALSE}
535-
tempMean <- mean(weather$Temp)
536-
tempSD <- sd(weather$Temp)
537-
dnorm(80, mean=tempMean, sd=tempSD)
535+
tempMean <- mean(weather$Wind)
536+
tempSD <- sd(weather$Wind)
537+
dnorm(10, mean=tempMean, sd=tempSD)
538538
```
539539

540540
```{r echo=FALSE}
541-
tempMean <- mean(weather$Temp)
542-
tempSD <- sd(weather$Temp)
541+
tempMean <- mean(weather$Wind)
542+
tempSD <- sd(weather$Wind)
543543
```
544544

545545
- We can overlay this on the histogram using `points` as we just saw:
546546
```{r eval=FALSE}
547-
hist(weather$Temp, col="purple", xlab="Temperature",
548-
main="Distribution of Temperature",
549-
breaks = 50:100, freq=FALSE)
550-
points(80, dnorm(80, mean=tempMean, sd=tempSD),
547+
hist(weather$Wind, col="purple", xlab="Wind Speed",
548+
main="Distribution of Wind Speed",
549+
breaks = 20, freq=FALSE)
550+
points(10, dnorm(10, mean=tempMean, sd=tempSD),
551551
col="red", pch=16)
552552
```
553553

@@ -558,15 +558,15 @@ points(80, dnorm(80, mean=tempMean, sd=tempSD),
558558
+ use `lines` in this case rather than `points`
559559

560560
```{r eval=FALSE}
561-
xs <- c(50,60,70,80,90,100)
561+
xs <- c(0,5,10,15,20)
562562
ys <- dnorm(xs, mean=tempMean, sd=tempSD)
563563
lines(xs, ys, col="red")
564564
```
565565

566566
```{r fig.height=4,echo=FALSE}
567-
hist(weather$Temp,col="purple",xlab="Temperature",
568-
main="Distribution of Temperature",breaks = 50:100,freq=FALSE)
569-
xs <- c(50,60,70,80,90,100)
567+
hist(weather$Wind,col="purple",xlab="Wind Speed",
568+
main="Distribution of Wind Speed",breaks = 20,freq=FALSE)
569+
xs <- c(0,5,10,15,20)
570570
ys <- dnorm(xs, mean=tempMean,sd=tempSD)
571571
lines(xs,ys,col="red")
572572
```
@@ -577,15 +577,15 @@ lines(xs,ys,col="red")
577577
+ We can generate x values using the `seq()` function
578578

579579
```{r eval=FALSE}
580-
xs <- seq(50,100, length.out = 10000)
580+
xs <- seq(00,20, length.out = 10000)
581581
ys <- dnorm(xs, mean=tempMean, sd=tempSD)
582582
lines(xs, ys, col="red")
583583
```
584584

585585
```{r fig.height=4,echo=FALSE}
586-
hist(weather$Temp,col="purple",xlab="Temperature",
587-
main="Distribution of Temperature",breaks = 50:100,freq=FALSE)
588-
xs <- seq(50,100,length.out = 10000)
586+
hist(weather$Wind,col="purple",xlab="Wind Speed",
587+
main="Distribution of Wind Speed",breaks = 20,freq=FALSE)
588+
xs <- seq(00,20, length.out = 10000)
589589
ys <- dnorm(xs, mean=tempMean,sd=tempSD)
590590
lines(xs,ys,col="red")
591591
```
@@ -596,10 +596,10 @@ lines(xs,ys,col="red")
596596

597597
$$t = \frac{\bar{x} -\mu_0}{s / \sqrt(n)}$$
598598

599-
- Say a temperature of 50; which from the histogram seems to be unlikely
599+
- Say a Wind Speed of 2; which from the histogram seems to be unlikely
600600

601601
```{r}
602-
t <- (tempMean - 50) / (tempSD/sqrt(length(weather$Temp)))
602+
t <- (tempMean - 2) / (tempSD/sqrt(length(weather$Temp)))
603603
t
604604
```
605605

@@ -608,7 +608,7 @@ t
608608
- ...or use the **`t.test()`** function to compute the statistic and corresponding p-value
609609

610610
```{r}
611-
t.test(weather$Temp, mu=50)
611+
t.test(weather$Wind, mu=2)
612612
```
613613

614614

@@ -658,9 +658,9 @@ chisq.test(); fisher.test()
658658
+ whether the assumptions of the test are met, etc.
659659
+ Consult your local statistician if not sure
660660
+ An upcoming course that will help
661-
+ [Introduction to Statistical Analysis](http://training.csx.cam.ac.uk/bioinformatics/event/1809255)
661+
+ [Introduction to Statistical Analysis](http://bioinformatics-core-shared-training.github.io/IntroductionToStats/)
662662
+ Some good references:
663-
+ [Statistical Analysis Using R (Course from the Babaraham Bioinformatics Core)](http://www.bioinformatics.babraham.ac.uk/training.html#rstats)
663+
+ [Statistical Analysis Using R (Course from the Babaraham Bioinformatics Core)](http://training.csx.cam.ac.uk/bioinformatics/event/1827771)
664664
+ [Quick-R guide to stats](http://www.statmethods.net/stats/index.html)
665665
+ [Simple R eBook](https://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf)
666666
+ [R wiki](https://en.wikibooks.org/wiki/R_Programming/Descriptive_Statistics)

Slides-day2.html

+294-270
Large diffs are not rendered by default.

exercise4b.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ output: pdf_document
1818
##YOUR CODE HERE
1919
```
2020

21-
# Histogram of temperature; density on y axis, coloured purple, broken into bins of size 1 unit
21+
# Histogram of Wind Speed; density on y axis, coloured purple, broken into 20 bins of equal size
2222

2323
```{r}
2424
##YOUR CODE HERE

images/r-virus.png

264 KB
Loading

solution-exercise4b.pdf

-944 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)