-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of github.com:bioinformatics-core-shared-traini…
…ng/Bulk_RNAseq_Course_Base
- Loading branch information
Showing
17 changed files
with
2,565 additions
and
617 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
--- | ||
title: "Introduction to RNAseq analysis in R" | ||
date: "March 2023" | ||
date: "October 2024" | ||
output: | ||
ioslides_presentation: | ||
css: css/stylesheet.css | ||
|
@@ -15,353 +15,28 @@ output: | |
|
||
<div style="line-height: 50%;"><br></div> | ||
|
||
<img src="images/workflow_3Day.svg" class="centerimg" style="width: 80%; margin-top: 60px;"> | ||
|
||
|
||
<img src="images/01s_workflow_3Day.svg" class="centerimg" style="width: 80%; margin-top: 60px;"> | ||
|
||
## General idea behind RNAseq data analysis | ||
|
||
<img src="images/RNAseq_data_ana_gen_idea.png" class="centerimg" style="width: 90%"> | ||
|
||
|
||
|
||
## General idea behind any statistical test | ||
|
||
<img src="images/general_idea_statistical_test.png" class="centerimg" style="width: 90%"> | ||
|
||
|
||
## Sources of Noise (Variance) | ||
|
||
<img src="images/Noise.svg" style="width: 65%; margin-left: 23%"> | ||
|
||
|
||
## Normalisation | ||
|
||
* Counting estimates the *relative* counts for each gene | ||
|
||
* Does this **accurately** represent the original population of RNAs? | ||
|
||
* The relationship between counts and RNA expression is not the same for all | ||
genes across all samples | ||
|
||
|
||
<div style="width: 30%; | ||
float: left; | ||
border-style: solid; | ||
border-width: 1px; | ||
border-radius: 25px; | ||
padding: 20px; | ||
margin-right: 10%; | ||
margin-left: 10%;"> | ||
<span style="color: #2e3192">**Library Size**</span> | ||
|
||
Differing sequencing depth | ||
|
||
</div> | ||
|
||
<div style="width: 30%; | ||
float: left; | ||
border-style: solid; | ||
border-width: 1px; | ||
border-radius: 25px; | ||
padding: 20px;"> | ||
<span style="color: #2e3192">**Gene properties**</span> | ||
|
||
Length, GC content, sequence | ||
|
||
</div> | ||
|
||
<div style="width: 40%; | ||
float: left; | ||
border-style: solid; | ||
border-width: 1px; | ||
border-radius: 25px; | ||
padding: 20px; | ||
clear: both; | ||
margin-top: 20px; | ||
margin-left: 27%"> | ||
<span style="color: #2e3192;">**Library composition**</span> | ||
|
||
Quantification is relative - changes in | ||
relative abundance for one gene will affect the relative abundances of other genes | ||
|
||
"Composition Bias" | ||
|
||
</div> | ||
|
||
|
||
## General principle behind normalisation | ||
|
||
* Normalization has two steps | ||
* Scaling | ||
* First get size factors or normalization factors | ||
* Usually one size factor per sample | ||
* Scale the counts by divide the raw counts of a sample with sample specific size factor | ||
* Transformation: Transform the data after scaling | ||
* Per million | ||
* log2 | ||
* square root transformation | ||
* Pearson residuals (eg. sctransform) | ||
|
||
* Normalization removes technical variance but not biological variance | ||
* Normalization helps in making two samples comparable | ||
|
||
|
||
## Normalization toy example | ||
|
||
<img src="images/normalisation_toy_example.png" class="centerimg" style="width: 90%"> | ||
|
||
|
||
## DESeq2 analysis workflow | ||
|
||
|
||
<div style="line-height: 50%;"><br></div> | ||
|
||
<img src="images/DESeq2_workflow_00.png" class="centerimg" style="width: 25%"> | ||
|
||
## DESeq2 Normalisation | ||
|
||
|
||
<div class="smalltext" style="margin-left: 25px"> | ||
1. Geometric mean is calculated for each gene across all samples. | ||
2. The counts for a gene in each sample is then divided by this mean. | ||
3. The median of these ratios in a sample is the size factor (normalization factor) for that sample. | ||
4. DESEq2 normalization corrects for library size and RNA composition bias | ||
5. Composition bias: Arise for example when only a small number of genes are very highly expressed in one sample but not in the other. | ||
</div> | ||
|
||
<img src="images/DESeq2_workflow_01.png" style="width: 15%; float: left"> | ||
|
||
<img src="images/GeometricScaling.svg" style="margin-left: 15%; width: 60%"> | ||
|
||
|
||
|
||
## Differential Expression | ||
|
||
Simple difference in means | ||
|
||
<img src="images/DifferenceInMeans.png" class="centerimg" style="width: 60%;"> | ||
|
||
<div style="text-align: right"> | ||
Replication introduces variation | ||
</div> | ||
|
||
## Differential Expression - Modelling population distributions | ||
<img src="images/06s_RNAseq_data_ana_gen_idea.png" class="centerimg" style="width: 90%"> | ||
|
||
* Normal (Gaussian) Distribution - t-test | ||
## Day 2 in detail | ||
|
||
* Two parameters - $mean$ and $sd$ ($sd^2 = variance$) | ||
* Experimental Design | ||
* General Statistical Principles | ||
* Statistics specific to RNASeq Differential Expression | ||
* The 'under the hood' of the DESEQ2 workflow | ||
* How to run basic DESEQ2 in R | ||
* Different linear models and how to choose the best for your experiment | ||
* How to use different models in R | ||
|
||
* Suitable for microarray data but not for RNAseq data | ||
## The Results Table | ||
|
||
<div style="width: 60%; margin-left: 16%; padding-top: 5px"> | ||
<img src="images/06s_ResultsTab.png" class="centerimg" style="width: 90%"> | ||
|
||
```{r diffInMeans, echo=FALSE, fig.width=7, fig.height=4} | ||
library(shape) | ||
x1 <- seq(0, 6, length=100) | ||
hx1 <- dnorm(x1, mean = 3, sd = 1) | ||
x2 <- seq(2, 12, length=100) | ||
hx2 <- dnorm(x2, mean = 7, sd = 1.5) | ||
par(bg=NA, mar=c(5, 4, 0, 4) + 0.1) | ||
|
||
plot(x1, hx1, type="l", lty=1, | ||
xlab="x value", ylab="Density", | ||
col="tomato", ylim=c(0, 0.6), xlim=c(0, 13)) | ||
lines(x2, hx2, type="l", col="steelblue") | ||
abline(v=3, col="tomato3", lty=2) | ||
abline(v=7, col="steelblue3", lty=2) | ||
Arrows(3.3, 0.5, 6.7, 0.5, code = 3, arr.type = "curved") | ||
``` | ||
</div> | ||
|
||
## Differential Expression - Modelling population distributions | ||
|
||
* Count data - Poisson distribution | ||
|
||
* One parameter - $mean$ $(\lambda)$ | ||
|
||
* $variance$ = $mean$ | ||
|
||
<div style="width: 60%; margin-left: 16%; padding-top: 5px"> | ||
```{r poissonDistr, echo=FALSE, fig.width=7, fig.height=4} | ||
x1 <- seq(0, 20) | ||
hx1 <- dpois(x1, lambda = 1) | ||
hx2 <- dpois(x1, lambda = 4) | ||
hx3 <- dpois(x1, lambda = 10) | ||
par(bg=NA, mar=c(5, 4, 0, 4) + 0.1) | ||
plot(x1, hx1, type="l", lty=1, | ||
xlab="k", ylab="P(X=k)") | ||
lines(x1, hx2, type="l") | ||
lines(x1, hx3, type="l") | ||
cols <- c("coral2", "darkgoldenrod1", "deepskyblue3") | ||
points(x1, hx1, bg=cols[1], pch=21) | ||
points(x1, hx2, bg=cols[2], pch=21) | ||
points(x1, hx3, bg=cols[3], pch=21) | ||
leg <- c(expression(paste(lambda, " = ", 1)), | ||
expression(paste(lambda, " = ", 4)), | ||
expression(paste(lambda, " = ", 10))) | ||
legend("topright", legend = leg, pt.bg = cols, pch=21, bty="n") | ||
``` | ||
</div> | ||
|
||
## Differential Expression - Modelling population distributions | ||
|
||
<img src="images/DESeq2_workflow_02.png" style="width: 16%; float: left; | ||
margin-top: 40px"> | ||
|
||
|
||
<div style="width: 45%; float: left; | ||
margin-right: 10px; | ||
margin-left: 30px; | ||
margin-top: 40px"> | ||
|
||
* Use the Negative Binomial distribution | ||
|
||
* In the NB distribution $mean$ not equal to $variance$ | ||
|
||
* Two paramenters - $mean$ and $dispersion$ | ||
|
||
* $dispersion$ describes how $variance$ changes with $mean$ | ||
|
||
</div> | ||
|
||
<img src="images/NegativeBinomialDistribution.png" style="width: 33%; | ||
margin-top: 40px"> | ||
|
||
<div style="text-align: right"> | ||
Anders, S. & Huber, W. (2010) Genome Biology | ||
</div> | ||
|
||
## Differential Expression - estimating dispersion | ||
|
||
|
||
<img src="images/DESeq2_workflow_03.png" style="width: 16%; float: left; | ||
margin-top: 40px"> | ||
|
||
<div style="width: 40%; float: left; | ||
margin-right: 10px; | ||
margin-left: 30px; | ||
margin-top: 40px"> | ||
|
||
* Estimating the dispersion parameter can be difficult with a small number of samples | ||
|
||
* DESeq2 models the variance as the sum of technical and biological variance | ||
|
||
* Esimate dispersion for each gene | ||
|
||
* ‘Share’ dispersion information between genes to obtain fitted estimate | ||
|
||
* Shrink gene-wise estimates towards the fitted estimates | ||
|
||
</div> | ||
|
||
<img src="images/dispersion.png" style="width: 38%; margin-top: 40px"> | ||
|
||
|
||
## Differential Expression - worrying dispersion plot examples | ||
|
||
<!-- | ||
A note about these dispersion plots: | ||
I wrote the Harvard team and got the response below. This is basically what Dom | ||
surmised. For the second plot, Dom thinks it is conceivable that there could be | ||
nothing wrong with the data as such and that this pattern could arise if you | ||
had a particularly unusual treatment, perhaps resulting in extreme | ||
downregulation of a large cohort of genes and extreme upregulation of another | ||
large cohort of genes. Either way, in both cases the thing to do is not to | ||
worry about trying to interpret the problem from the dispersion plot, but to go | ||
back to the raw data and figure out what is unusual. | ||
From: Piper, Mary <[email protected]> | ||
Sent: 01 July 2020 01:19 | ||
To: Ashley Sawle <[email protected]> | ||
Cc: HSPH-HBCTraining <[email protected]> | ||
Subject: Re: A question about your RNAseq course from a fellow trainer | ||
Hi Ash, | ||
Glad that our materials are useful to you - we have converted the DGE materials | ||
to an online course format too, which is available at: | ||
https://hbctraining.github.io/DGE_workshop_salmon_online/schedule/. I added | ||
these dispersion plots a while ago, and I believe that the first plot was from | ||
data that was highly contaminated with rRNA. I think the rRNA was | ||
computationally removed prior to the analysis from a low input RNA-seq library | ||
back 3-4 years ago, but there were still large differences in the complexity of | ||
the samples (the data was a real mess). The second plot was from a student who | ||
had taken our course; I know the data was really weird in that it had very few | ||
genes with higher mean counts (it also had weird MA plot and poor clustering by | ||
PCA). However, since I had not analyzed the data, I only offered suggestions | ||
for looking into the dataset - I don't know if they were able to rescue their | ||
dataset (b/c I believe they also did not have any/many DE genes). So, the bad | ||
dispersion plot is likely due to the strange nature of their data with few | ||
genes with higher mean counts (so the dispersion could not be estimated as | ||
accurately across genes with higher mean counts) and/or affected by the outlier | ||
sample/s. | ||
Note that in the online materials, I have an additional bad dispersion plot in | ||
an exercise. This plot was from a pseudobulk scRNA-seq analysis - the data | ||
reflect a single cell type that had huge variations in the number of cells | ||
collapsed together per sample to generate the sample-level counts. Some samples | ||
had only a handful of cells, while other samples had thousands. Therefore, you | ||
can imagine the variation being quite large between samples of the same sample | ||
group. | ||
Hope this helps, and please let me know if you have additional questions. | ||
Best wishes, | ||
Mary | ||
--> | ||
|
||
<div><br></div> | ||
|
||
<img src="images/bad_dispersion.png" class="centerimg" style="width: 100%"> | ||
|
||
<div style="text-align: right;"> | ||
Bad dispersion plots from: https://github.com/hbctraining/DGE_workshop | ||
</div> | ||
|
||
## Differential Expression - linear models | ||
|
||
* Calculate coefficients describing change in gene expression | ||
|
||
* Linear Model $\rightarrow$ General Linear Model | ||
|
||
<img src="images/DESeq2_workflow_04.png" style="width: 16%; float: left; | ||
padding-top: 5px"> | ||
|
||
<div style="width: 30%; margin-left: 20%; padding-top: 5px"> | ||
```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.width=7, fig.height=4} | ||
library(tidyverse) | ||
dat <- data.frame(C1=rnorm(6, 4, 1), | ||
C2=rnorm(6, 6, 1.3)) %>% | ||
gather("Cat", "Expression") %>% | ||
mutate(Group=as.numeric(factor(Cat))) | ||
ewe <- lm(dat$Expression~dat$Group) | ||
par(bg=NA, mar=c(5, 4, 0, 4) + 0.1) | ||
plot(dat$Group, dat$Expression, | ||
pch=21, | ||
bg=rep(c("tomato", "steelblue"), each=6), | ||
xlim=c(0, 3), | ||
ylim=c(0, 8), xaxt="n", xlab="Group", ylab = "Expression") | ||
axis(1, at = 1:2) | ||
abline(h=5, lty=2, col="grey") | ||
abline(ewe, col="red") | ||
``` | ||
</div> | ||
|
||
## Towards biological meaning - hierachical clustering {#less_space_after_title} | ||
|
||
<div style="line-height: 50%;"><br></div> | ||
|
||
<img src="images/BioMeaning.svg" class="centerimg" style="width: 100%; | ||
display: block;"> | ||
|
||
## | ||
|
||
<div style="text-align: center; margin-top: 30%"> | ||
<span style="color: #2e3192; font-size: 80px">**Thank you**</span> | ||
</div> |
312 changes: 33 additions & 279 deletions
312
Markdowns/06_Introduction_to_RNAseq_Analysis_in_R.html
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Oops, something went wrong.