diff --git a/Markdowns/05_Data_Exploration.Rmd b/Markdowns/05_Data_Exploration.Rmd index 464a09d..05ec424 100644 --- a/Markdowns/05_Data_Exploration.Rmd +++ b/Markdowns/05_Data_Exploration.Rmd @@ -382,7 +382,7 @@ replicate. If the PCA plot showed separation of samples by time, it might be worthwhile including time in the downstream analysis to account for the time-based effect. -## Hierachical clustering +# Hierachical clustering Earlier, we used principle component analysis to assess sources of variation in the data set and the relationship between the samples. Another method for diff --git a/Markdowns/05_Data_Exploration.html b/Markdowns/05_Data_Exploration.html index f5e9e58..efecf08 100644 --- a/Markdowns/05_Data_Exploration.html +++ b/Markdowns/05_Data_Exploration.html @@ -529,7 +529,7 @@

Discussion: shape = "TimePoint", size = 5) + geom_text_repel(aes(x = PC1, y = PC2, label = SampleName), box.padding = 0.8) -

+

The mislabelled samples are SRR7657882, which is labelled as Infected but should be Uninfected, and SRR7657873, which is labelled as Uninfected but should be Infected. Let’s fix the sample sheet.

We’re going to use another dplyr command mutate.

sampleinfo <- mutate(sampleinfo, Status = case_when(
@@ -549,8 +549,9 @@ 

Discussion:

Also, there appears to be a strong difference between days 11 and 33 post infection for the infected group, but the day 11 and day 33 samples for the uninfected are mixed together.

Clustering in the PCA plot can be used to motivate changes to the design matrix in light of potential batch effects. For example, imagine that the first replicate of each group was prepared at a separate time from the second replicate. If the PCA plot showed separation of samples by time, it might be worthwhile including time in the downstream analysis to account for the time-based effect.

-
-

Hierachical clustering

+
+
+

Hierachical clustering

Earlier, we used principle component analysis to assess sources of variation in the data set and the relationship between the samples. Another method for looking at the relationship between the samples can be to run hierarchical clustering based on the Euclidean distance between the samples. Hierarchical clustering can often provide a clearer view of the clustering of the different sample groups than other methods such as PCA.

We will use the package ggdendro to plot the clustering results using the function ggdendrogram.

library(ggdendro)
@@ -567,7 +568,6 @@ 

Hierachical clustering

We can see from this that the infected and uninfected samples cluster separately and that day 11 and day 33 samples cluster separately for infected samples, but not for uninfected samples.


-

References

diff --git a/Markdowns/05_Data_Exploration.pdf b/Markdowns/05_Data_Exploration.pdf index 47e1adc..5498bae 100644 Binary files a/Markdowns/05_Data_Exploration.pdf and b/Markdowns/05_Data_Exploration.pdf differ