@@ -521,33 +521,33 @@ Recall our histogram of temperature from yesterday:
521
521
- An assumption we rely on for various statistical tests
522
522
523
523
``` {r fig.height=4}
524
- hist(weather$Temp , col="purple", xlab="Temperature ",
525
- main="Distribution of Temperature ",
526
- breaks = 50:100 , freq=FALSE)
524
+ hist(weather$Wind , col="purple", xlab="Wind Speed ",
525
+ main="Distribution of Wind Speed ",
526
+ breaks = 20 , freq=FALSE)
527
527
```
528
528
529
529
## Create a normal distribution curve
530
530
531
531
- If our data are normally-distributed, we can calculate the probability of drawing particular values.
532
- + e.g. a temperature of 80
532
+ + e.g. a temperature of 10
533
533
534
534
``` {r eval=FALSE}
535
- tempMean <- mean(weather$Temp )
536
- tempSD <- sd(weather$Temp )
537
- dnorm(80 , mean=tempMean, sd=tempSD)
535
+ tempMean <- mean(weather$Wind )
536
+ tempSD <- sd(weather$Wind )
537
+ dnorm(10 , mean=tempMean, sd=tempSD)
538
538
```
539
539
540
540
``` {r echo=FALSE}
541
- tempMean <- mean(weather$Temp )
542
- tempSD <- sd(weather$Temp )
541
+ tempMean <- mean(weather$Wind )
542
+ tempSD <- sd(weather$Wind )
543
543
```
544
544
545
545
- We can overlay this on the histogram using ` points ` as we just saw:
546
546
``` {r eval=FALSE}
547
- hist(weather$Temp , col="purple", xlab="Temperature ",
548
- main="Distribution of Temperature ",
549
- breaks = 50:100 , freq=FALSE)
550
- points(80 , dnorm(80 , mean=tempMean, sd=tempSD),
547
+ hist(weather$Wind , col="purple", xlab="Wind Speed ",
548
+ main="Distribution of Wind Speed ",
549
+ breaks = 20 , freq=FALSE)
550
+ points(10 , dnorm(10 , mean=tempMean, sd=tempSD),
551
551
col="red", pch=16)
552
552
```
553
553
@@ -558,15 +558,15 @@ points(80, dnorm(80, mean=tempMean, sd=tempSD),
558
558
+ use ` lines ` in this case rather than ` points `
559
559
560
560
``` {r eval=FALSE}
561
- xs <- c(50,60,70,80,90,100 )
561
+ xs <- c(0,5,10,15,20 )
562
562
ys <- dnorm(xs, mean=tempMean, sd=tempSD)
563
563
lines(xs, ys, col="red")
564
564
```
565
565
566
566
``` {r fig.height=4,echo=FALSE}
567
- hist(weather$Temp ,col="purple",xlab="Temperature ",
568
- main="Distribution of Temperature ",breaks = 50:100 ,freq=FALSE)
569
- xs <- c(50,60,70,80,90,100 )
567
+ hist(weather$Wind ,col="purple",xlab="Wind Speed ",
568
+ main="Distribution of Wind Speed ",breaks = 20 ,freq=FALSE)
569
+ xs <- c(0,5,10,15,20 )
570
570
ys <- dnorm(xs, mean=tempMean,sd=tempSD)
571
571
lines(xs,ys,col="red")
572
572
```
@@ -577,15 +577,15 @@ lines(xs,ys,col="red")
577
577
+ We can generate x values using the ` seq() ` function
578
578
579
579
``` {r eval=FALSE}
580
- xs <- seq(50,100 , length.out = 10000)
580
+ xs <- seq(00,20 , length.out = 10000)
581
581
ys <- dnorm(xs, mean=tempMean, sd=tempSD)
582
582
lines(xs, ys, col="red")
583
583
```
584
584
585
585
``` {r fig.height=4,echo=FALSE}
586
- hist(weather$Temp ,col="purple",xlab="Temperature ",
587
- main="Distribution of Temperature ",breaks = 50:100 ,freq=FALSE)
588
- xs <- seq(50,100, length.out = 10000)
586
+ hist(weather$Wind ,col="purple",xlab="Wind Speed ",
587
+ main="Distribution of Wind Speed ",breaks = 20 ,freq=FALSE)
588
+ xs <- seq(00,20, length.out = 10000)
589
589
ys <- dnorm(xs, mean=tempMean,sd=tempSD)
590
590
lines(xs,ys,col="red")
591
591
```
@@ -596,10 +596,10 @@ lines(xs,ys,col="red")
596
596
597
597
$$ t = \frac{\bar{x} -\mu_0}{s / \sqrt(n)} $$
598
598
599
- - Say a temperature of 50 ; which from the histogram seems to be unlikely
599
+ - Say a Wind Speed of 2 ; which from the histogram seems to be unlikely
600
600
601
601
``` {r}
602
- t <- (tempMean - 50 ) / (tempSD/sqrt(length(weather$Temp)))
602
+ t <- (tempMean - 2 ) / (tempSD/sqrt(length(weather$Temp)))
603
603
t
604
604
```
605
605
608
608
- ...or use the ** ` t.test() ` ** function to compute the statistic and corresponding p-value
609
609
610
610
``` {r}
611
- t.test(weather$Temp , mu=50 )
611
+ t.test(weather$Wind , mu=2 )
612
612
```
613
613
614
614
@@ -658,9 +658,9 @@ chisq.test(); fisher.test()
658
658
+ whether the assumptions of the test are met, etc.
659
659
+ Consult your local statistician if not sure
660
660
+ An upcoming course that will help
661
- + [ Introduction to Statistical Analysis] ( http://training.csx.cam.ac.uk/bioinformatics/event/1809255 )
661
+ + [ Introduction to Statistical Analysis] ( http://bioinformatics-core-shared- training.github.io/IntroductionToStats/ )
662
662
+ Some good references:
663
- + [ Statistical Analysis Using R (Course from the Babaraham Bioinformatics Core)] ( http://www.bioinformatics.babraham .ac.uk/training.html#rstats )
663
+ + [ Statistical Analysis Using R (Course from the Babaraham Bioinformatics Core)] ( http://training.csx.cam .ac.uk/bioinformatics/event/1827771 )
664
664
+ [ Quick-R guide to stats] ( http://www.statmethods.net/stats/index.html )
665
665
+ [ Simple R eBook] ( https://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf )
666
666
+ [ R wiki] ( https://en.wikibooks.org/wiki/R_Programming/Descriptive_Statistics )
0 commit comments