generated from carpentries/workbench-template-rmd
-
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
145 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
--- | ||
title: Introduction to Bioconductor and the SingleCellExperiment class | ||
teaching: 10 # Minutes of teaching in the lesson | ||
exercises: 2 # Minutes of exercises in the lesson | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- What is Bioconductor? | ||
- How is single-cell data store in the Bioconductor ecosystem? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- TODO | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
# Setup | ||
|
||
```{r setup, message = FALSE, warning=FALSE} | ||
library("SummarizedExperiment") | ||
library("SingleCellExperiment") | ||
library("MouseGastrulationData") | ||
``` | ||
|
||
# The `SingleCellExperiment` class | ||
|
||
One of the main strengths of the Bioconductor project lies in the use of a common data infrastructure that powers interoperability across packages. | ||
|
||
Users should be able to analyze their data using functions from different Bioconductor packages without the need to convert between formats. To this end, the `SingleCellExperiment` class (from the _SingleCellExperiment_ package) serves as the common currency for data exchange across 70+ single-cell-related Bioconductor packages. | ||
|
||
This class implements a data structure that stores all aspects of our single-cell data - gene-by-cell expression data, per-cell metadata and per-gene annotation - and manipulate them in a synchronized manner. | ||
|
||
```{r, echo=FALSE} | ||
knitr::include_graphics("http://bioconductor.org/books/3.17/OSCA.intro/images/SingleCellExperiment.png") | ||
``` | ||
|
||
Let's start with an example dataset. | ||
|
||
```{r, message = FALSE} | ||
sce <- WTChimeraData(samples=5) | ||
sce | ||
``` | ||
|
||
We can think of this (and other) class as a _container_, that contains several different pieces of data in so-called _slots_. | ||
|
||
The _getter_ methods are used to extract information from the slots and the _setter_ methods are used to add information into the slots. These are the only ways to interact with the objects (rather than directly accessing the slots). | ||
|
||
Depending on the object, slots can contain different types of data (e.g., numeric matrices, lists, etc.). We will here review the main slots of the SingleCellExperiment class as well as their getter/setter methods. | ||
|
||
## The `assays` | ||
|
||
This is arguably the most fundamental part of the object that contains the count matrix, and potentially other matrices with transformed data. We can access the _list_ of matrices with the `assays` function and individual matrices with the `assay` function. If one of these matrices is called "counts", we can use the special `counts` getter (and the analogous `logcounts`). | ||
|
||
```{r} | ||
identical(assay(sce), counts(sce)) | ||
counts(sce)[1:3, 1:3] | ||
``` | ||
|
||
You will notice that in this case we have a sparse matrix of class "dgTMatrix" inside the object. More generally, any "matrix-like" object can be used, e.g., dense matrices or HDF5-backed matrices (see "Working with large data"). | ||
|
||
## The `colData` and `rowData` | ||
|
||
Conceptually, these are two data frames that annotate the columns and the rows of your assay, respectively. | ||
|
||
One can interact with them as usual, e.g., by extracting columns or adding additional variables as columns. | ||
|
||
```{r} | ||
colData(sce) | ||
rowData(sce) | ||
``` | ||
|
||
Note the `$` short cut. | ||
|
||
```{r} | ||
identical(colData(sce)$sum, sce$sum) | ||
sce$my_sum <- colSums(counts(sce)) | ||
colData(sce) | ||
``` | ||
|
||
## The `reducedDims` | ||
|
||
Everything that we have described so far (except for the `counts` getter) is part of the `SummarizedExperiment` class that SingleCellExperiment extends. | ||
|
||
One of the peculiarity of SingleCellExperiment is its ability to store reduced dimension matrices within the object. These may include PCA, t-SNE, UMAP, etc. | ||
|
||
```{r} | ||
reducedDims(sce) | ||
``` | ||
|
||
As for the other slots, we have the usual setter/getter, but it is somewhat rare to interact directly with these functions. | ||
|
||
It is more common for other functions to _store_ this information in the object, e.g., the `runPCA` function from the `scater` package. | ||
|
||
Here, we use `scater`'s `plotReducedDim` function as an example of how to extract this information _indirectly_ from the objects. Note that one could obtain the same results (somewhat less efficiently) by extracting the corresponding `reducedDim` matrix and `ggplot`. | ||
|
||
```{r} | ||
library(scater) | ||
plotReducedDim(sce, "pca.corrected.E8.5", colour_by = "celltype.mapped") | ||
``` | ||
|
||
:::::::::::::::::::::::::::::::::: challenge | ||
|
||
#### Exercise 1 | ||
|
||
Create a `SingleCellExperiment` object: Try and create a SingleCellExperiment object "from scratch". Start from a matrix (either randomly generated or with some fake data in it) and add one or more columns as colData. | ||
|
||
:::::::::::::: hint | ||
|
||
The `SingleCellExperiment` function can be used to create a new SingleCellExperiment object | ||
|
||
::::::::::::::::::::::: | ||
|
||
:::::::::::::: solution | ||
|
||
TODO | ||
::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
:::::::::::::::::::::::::::::::::: challenge | ||
|
||
#### Exercise 2 | ||
|
||
Combining two objects: The `MouseGastrulationData` package contains several datasets. Download sample 6 of the chimera experiment by running `sce6 <- WTChimeraData(sample=6)`. Use the `cbind` function to combine the new data with the `sce` object created before. | ||
|
||
:::::::::::::: solution | ||
|
||
TODO | ||
::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- TODO | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
# Further Reading | ||
|
||
* OSCA book, [Introduction](https://bioconductor.org/books/release/OSCA.intro) |