add scClassify vignette

SydneyBioX · Nov 29, 2024 · d7b85a8 · d7b85a8
1 parent fe4693d
commit d7b85a8
Showing 7 changed files with 2,089 additions and 190 deletions.
diff --git a/README.md b/README.md
@@ -3,6 +3,6 @@
 
 Data can be accessed from dropbox: https://www.dropbox.com/scl/fi/6icd5vix870uoffv9p3zb/data.zip?rlkey=hu1tvpbdg0msykrud05hbclj6&st=2qbsk235&dl=0
 
-Website at https://sydneybiox.github.io/HKU_SCDNEY_2024/
+Website at https://sydneybiox.github.io/HKUST_workshop/
 
 Slide at https://www.dropbox.com/scl/fi/3f9wxsd4rnq3a5mf44nwc/HKU_Workshop2024_v1_morning.pptx?rlkey=gq8xjeuktayokkie0q1ogeq8s&dl=0
diff --git a/vignettes/VisiumVersion3.Rmd b/vignettes/VisiumVersion3.Rmd
@@ -1,6 +1,7 @@
 ---
 title: "Unlocking single cell spatial omics analyses with scdney - Visium"
 author: Yue Cao,
+        Daniel Kim,
         Andy Tran,
         Dario Strbenac,
         Nicholas Robertson
@@ -10,7 +11,7 @@ affiliation:
   - School of Mathematics and Statistics, University of Sydney, Australia;     
   - Charles Perkins Centre, University of Sydney, Australia;   
   - Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.  
-date: 15 October, 2023
+date: 29 November, 2024
 
 params:
   evalc: TRUE   ## EDIT to TRUE when generating output, otherwise 'FALSE'

diff --git a/vignettes/VisiumVersion3.html b/vignettes/VisiumVersion3.html
diff --git a/vignettes/breastCancerIMC.Rmd b/vignettes/breastCancerIMC.Rmd
@@ -1,8 +1,8 @@
 ---
 title: "Unlocking single cell spatial omics analyses with scdney"
 author: Yue Cao,
-        Andy Tran,
         Daniel Kim,
+        Andy Tran,
         Dario Strbenac,
         Nicholas Robertson,
         Helen Fu,
@@ -12,7 +12,7 @@ affiliation:
   - School of Mathematics and Statistics, University of Sydney, Australia;
   - Faculty of Medicine and Health, University of Sydney, Australia;     
   - Charles Perkins Centre, University of Sydney, Australia;   
-date: 24 July, 2024
+date: 29 November, 2024
 params:
   evalc: TRUE   ## EDIT to TRUE when generating output, otherwise 'FALSE'
   show: 'hide'  ## EDIT to 'as.is' when generating Suggestions, otherwise 'hide'

diff --git a/vignettes/breastCancerIMC.html b/vignettes/breastCancerIMC.html
diff --git a/vignettes/scClassify.Rmd b/vignettes/scClassify.Rmd
@@ -0,0 +1,127 @@
+---
+title: "Cell type classification with scClassify"
+author: Yue Cao,
+        Daniel Kim,
+        Andy Tran,
+        Dario Strbenac,
+        Nicholas Robertson,
+        Helen Fu,
+        Jean Yang
+affiliation:
+  - Sydney Precision Data Science Centre, University of Sydney, Australia;    
+  - School of Mathematics and Statistics, University of Sydney, Australia;
+  - Faculty of Medicine and Health, University of Sydney, Australia;     
+  - Charles Perkins Centre, University of Sydney, Australia;   
+date: 29 November, 2024
+params:
+  evalc: TRUE   ## EDIT to TRUE when generating output, otherwise 'FALSE'
+  show: 'hide'  ## EDIT to 'as.is' when generating Suggestions, otherwise 'hide'
+output:
+  html_document:
+    css: https://use.fontawesome.com/releases/v5.0.6/css/all.css
+    code_folding: hide
+    fig_height: 12
+    fig_width: 12
+    toc: yes
+    number_sections: false
+    toc_depth: 3
+    toc_float: yes
+    self_contained: true
+editor_options: 
+  markdown: 
+    wrap: 72
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE,  message=FALSE, warning= FALSE)
+
+```
+
+
+
+```{r}
+library(scClassify)
+library(ggplot2)
+library(reshape2)
+
+```
+
+
+## Overview
+
+scClassify performs classification of cells for single-cell RNA-sequencing data using single and multiple references. It takes in a normalised (i.e., log2 transformed) training data and a reference data.    
+
+For demonstration purposes, we will take a subset of single-cell pancreas datasets from two independent studies (Wang et al., and Xin et al.).
+
+
+## Loading the data 
+
+```{r}
+ 
+data("scClassify_example")
+
+# training data
+training_celltype <- scClassify_example$xin_cellTypes
+training_data <- scClassify_example$exprsMat_xin_subset
+ 
+# testing data 
+# here we get the cell type in the testing data
+# so that we can compare between the predicted and the 
+testing_celltype <- scClassify_example$wang_cellTypes 
+testing_data <- scClassify_example$exprsMat_wang_subset
+ 
+```
+
+
+## Running scClassify 
+
+```{r fig.height=6, fig.width=6, warning=FALSE}
+scClassify_res <- scClassify(exprsMat_train = training_data,
+                             cellTypes_train = training_celltype,
+                             exprsMat_test = testing_data,
+                             cellTypes_test = testing_celltype, # or leave out if testing cell type unknown 
+                             tree = "HOPACH",
+                             algorithm = "WKNN",
+                             selectFeatures = c("limma"),
+                             similarity = c("pearson"),
+                             returnList = FALSE,
+                             verbose = FALSE)
+```
+
+## Checking result 
+
+
+We can check the cell type tree generated by the reference data:
+
+```{r fig.height=4, fig.width=4}
+plotCellTypeTree(cellTypeTree(scClassify_res$trainRes))
+```
+
+Check the prediction results.
+
+```{r}
+
+confusion_matrix <- table(scClassify_res$testRes$test$pearson_WKNN_limma$predRes, testing_celltype)
+confusion_matrix
+
+```
+
+
+Visually inspect the prediction results. 
+
+
+
+```{r fig.height=4, fig.width=6}
+ 
+# Convert the table into a data frame for ggplot
+conf_matrix_df <- as.data.frame(confusion_matrix )
+colnames(conf_matrix_df) <- c("Predicted", "Actual", "Count")
+
+# Create the heatmap
+ggplot(conf_matrix_df, aes(x = Actual, y = Predicted, fill = Count)) +
+  geom_tile(color = "white") +
+  scale_fill_gradient(low = "white", high = "steelblue") +
+  geom_text(aes(label = Count), color = "black" )
+ 
+```
+
diff --git a/vignettes/scClassify.html b/vignettes/scClassify.html