-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Loading status checks…
add scClassify vignette
Showing
7 changed files
with
2,089 additions
and
190 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
--- | ||
title: "Cell type classification with scClassify" | ||
author: Yue Cao, | ||
Daniel Kim, | ||
Andy Tran, | ||
Dario Strbenac, | ||
Nicholas Robertson, | ||
Helen Fu, | ||
Jean Yang | ||
affiliation: | ||
- Sydney Precision Data Science Centre, University of Sydney, Australia; | ||
- School of Mathematics and Statistics, University of Sydney, Australia; | ||
- Faculty of Medicine and Health, University of Sydney, Australia; | ||
- Charles Perkins Centre, University of Sydney, Australia; | ||
date: 29 November, 2024 | ||
params: | ||
evalc: TRUE ## EDIT to TRUE when generating output, otherwise 'FALSE' | ||
show: 'hide' ## EDIT to 'as.is' when generating Suggestions, otherwise 'hide' | ||
output: | ||
html_document: | ||
css: https://use.fontawesome.com/releases/v5.0.6/css/all.css | ||
code_folding: hide | ||
fig_height: 12 | ||
fig_width: 12 | ||
toc: yes | ||
number_sections: false | ||
toc_depth: 3 | ||
toc_float: yes | ||
self_contained: true | ||
editor_options: | ||
markdown: | ||
wrap: 72 | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning= FALSE) | ||
``` | ||
|
||
|
||
|
||
```{r} | ||
library(scClassify) | ||
library(ggplot2) | ||
library(reshape2) | ||
``` | ||
|
||
|
||
## Overview | ||
|
||
scClassify performs classification of cells for single-cell RNA-sequencing data using single and multiple references. It takes in a normalised (i.e., log2 transformed) training data and a reference data. | ||
|
||
For demonstration purposes, we will take a subset of single-cell pancreas datasets from two independent studies (Wang et al., and Xin et al.). | ||
|
||
|
||
## Loading the data | ||
|
||
```{r} | ||
data("scClassify_example") | ||
# training data | ||
training_celltype <- scClassify_example$xin_cellTypes | ||
training_data <- scClassify_example$exprsMat_xin_subset | ||
# testing data | ||
# here we get the cell type in the testing data | ||
# so that we can compare between the predicted and the | ||
testing_celltype <- scClassify_example$wang_cellTypes | ||
testing_data <- scClassify_example$exprsMat_wang_subset | ||
``` | ||
|
||
|
||
## Running scClassify | ||
|
||
```{r fig.height=6, fig.width=6, warning=FALSE} | ||
scClassify_res <- scClassify(exprsMat_train = training_data, | ||
cellTypes_train = training_celltype, | ||
exprsMat_test = testing_data, | ||
cellTypes_test = testing_celltype, # or leave out if testing cell type unknown | ||
tree = "HOPACH", | ||
algorithm = "WKNN", | ||
selectFeatures = c("limma"), | ||
similarity = c("pearson"), | ||
returnList = FALSE, | ||
verbose = FALSE) | ||
``` | ||
|
||
## Checking result | ||
|
||
|
||
We can check the cell type tree generated by the reference data: | ||
|
||
```{r fig.height=4, fig.width=4} | ||
plotCellTypeTree(cellTypeTree(scClassify_res$trainRes)) | ||
``` | ||
|
||
Check the prediction results. | ||
|
||
```{r} | ||
confusion_matrix <- table(scClassify_res$testRes$test$pearson_WKNN_limma$predRes, testing_celltype) | ||
confusion_matrix | ||
``` | ||
|
||
|
||
Visually inspect the prediction results. | ||
|
||
|
||
|
||
```{r fig.height=4, fig.width=6} | ||
# Convert the table into a data frame for ggplot | ||
conf_matrix_df <- as.data.frame(confusion_matrix ) | ||
colnames(conf_matrix_df) <- c("Predicted", "Actual", "Count") | ||
# Create the heatmap | ||
ggplot(conf_matrix_df, aes(x = Actual, y = Predicted, fill = Count)) + | ||
geom_tile(color = "white") + | ||
scale_fill_gradient(low = "white", high = "steelblue") + | ||
geom_text(aes(label = Count), color = "black" ) | ||
``` | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.