Skip to content

Commit

Permalink
Updated further reading and heading levels in all lessons
Browse files Browse the repository at this point in the history
  • Loading branch information
csmagnano committed May 3, 2024
1 parent ad7e228 commit f667320
Show file tree
Hide file tree
Showing 5 changed files with 91 additions and 80 deletions.
33 changes: 19 additions & 14 deletions episodes/cell_type_annotation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ exercises: 15 # Minutes of exercises in the lesson
::::::::::::::::::::::::::::::::::::::::::::::::


# Setup
## Setup

```{r setup, message = FALSE}
library(BiocStyle)
Expand All @@ -35,7 +35,7 @@ library(scater)
library(scran)
```

# Data retrieval
## Data retrieval

```{r data, message = FALSE}
sce <- WTChimeraData(samples = 5, type = "processed")
Expand All @@ -50,14 +50,14 @@ ind <- sample(ncol(sce), 1000)
sce <- sce[,ind]
```

# Preprocessing
## Preprocessing

```{r preproc, warning = FALSE}
sce <- logNormCounts(sce)
sce <- runPCA(sce)
```

# Clustering
## Clustering

Clustering is an unsupervised learning procedure that is used to empirically
define groups of cells with similar expression profiles.
Expand Down Expand Up @@ -104,7 +104,7 @@ sce <- runUMAP(sce, dimred = "PCA")
plotReducedDim(sce, "UMAP", color_by = "label")
```

# Marker gene detection
## Marker gene detection

To interpret clustering results as obtained in the previous section, we identify
the genes that drive separation between clusters. These marker genes allow us to
Expand Down Expand Up @@ -156,7 +156,7 @@ top.markers <- head(rownames(markers[[1]]))
plotExpression(sce, features = top.markers, x = "label", color_by = "label")
```

# Cell type annotation
## Cell type annotation

The most challenging task in scRNA-seq data analysis is arguably the
interpretation of the results.
Expand All @@ -182,7 +182,7 @@ reference datasets where each sample or cell has already been annotated with its
putative biological state by domain experts.
Here, we will demonstrate both approaches on the wild-type chimera dataset.

## Assigning cell labels from reference data
### Assigning cell labels from reference data

A conceptually straightforward annotation approach is to compare the single-cell
expression profiles with previously annotated reference datasets.
Expand Down Expand Up @@ -303,7 +303,7 @@ tab <- table(res$pruned.labels, sce$celltype.mapped)
pheatmap(log2(tab + 10), color = colorRampPalette(c("white", "blue"))(101))
```

## Assigning cell labels from gene sets
### Assigning cell labels from gene sets

A related strategy is to explicitly identify sets of marker genes that are highly
expressed in each individual cell.
Expand Down Expand Up @@ -397,19 +397,15 @@ a fitted three-component mixture, and the grey curve represents a fitted normal
distribution. Vertical lines represent threshold estimates corresponding to each
estimate of the distribution.

# Session Info
## Session Info

```{r sessionInfo}
sessionInfo()
```

# Further Reading

* OSCA book, [Chapters 5-7](https://bioconductor.org/books/release/OSCA.basic/clustering.html)
* Assigning cell types with SingleR ([the book](https://bioconductor.org/books/release/SingleRBook/)).
* The [AUCell](https://bioconductor.org/packages/AUCell) package vignette.

# Exercises
## Exercises

:::::::::::::::::::::::::::::::::: challenge

Expand Down Expand Up @@ -484,6 +480,15 @@ TODO

:::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::: checklist
## Further Reading

* OSCA book, [Chapters 5-7](https://bioconductor.org/books/release/OSCA.basic/clustering.html)
* Assigning cell types with SingleR ([the book](https://bioconductor.org/books/release/SingleRBook/)).
* The [AUCell](https://bioconductor.org/packages/AUCell) package vignette.

::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- TODO
Expand Down
42 changes: 21 additions & 21 deletions episodes/hca.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ exercises: 10 # Minutes of exercises in the lesson

::::::::::::::::::::::::::::::::::::::::::::::::

# HCA Project
## HCA Project

The Human Cell Atlas (HCA) is a large project that aims to learn from and map
every cell type in the human body. The project extracts spatial and molecular
Expand All @@ -27,15 +27,15 @@ international collaborative that charts healthy cells in the human body at all
ages. There are about 37.2 trillion cells in the human body. To read more about
the project, head over to their website at https://www.humancellatlas.org.

# CELLxGENE
## CELLxGENE

CELLxGENE is a database and a suite of tools that help scientists to find,
download, explore, analyze, annotate, and publish single cell data. It includes
several analytic and visualization tools to help you to discover single cell
data patterns. To see the list of tools, browse to
https://cellxgene.cziscience.com/.

# CELLxGENE | Census
## CELLxGENE | Census

The Census provides efficient computational tooling to access, query, and
analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access
Expand All @@ -44,7 +44,7 @@ through TileDB-SOMA, or get slices in AnnData or Seurat objects, thus
accelerating your research by significantly minimizing data harmonization at
https://chanzuckerberg.github.io/cellxgene-census/.

# The CuratedAtlasQueryR Project
## The CuratedAtlasQueryR Project

To systematically characterize the immune system across tissues, demographics
and multiple studies, single cell transcriptomics data was harmonized from the
Expand All @@ -71,7 +71,7 @@ accessing atlas-level datasets programmatically and reproducibly.

![](figures/curatedAtlasQuery.png)

Check warning on line 72 in episodes/hca.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: figures/curatedAtlasQuery.png

# Data Sources in R / Bioconductor
## Data Sources in R / Bioconductor

There are a few options to access single cell data with R / Bioconductor.

Expand All @@ -81,7 +81,7 @@ There are a few options to access single cell data with R / Bioconductor.
| [cellxgenedp](https://bioconductor.org/packages/cellxgenedp) | [CellxGene](https://cellxgene.cziscience.com/) | Human and mouse SC data including HCA |
| [CuratedAtlasQueryR](https://stemangiola.github.io/CuratedAtlasQueryR/) | [CellxGene](https://cellxgene.cziscience.com/) | fine-grained query capable CELLxGENE data including HCA |

# Installation
## Installation

```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
Expand All @@ -90,14 +90,14 @@ if (!requireNamespace("BiocManager", quietly = TRUE))
BiocManager::install("CuratedAtlasQueryR")
```

# Package load
## Package load

```{r, include = TRUE, results = "hide", message = FALSE, warning = FALSE}
library(CuratedAtlasQueryR)
library(dplyr)
```

# HCA Metadata
## HCA Metadata

The metadata allows the user to get a lay of the land of what is available
via the package. In this example, we are using the sample database URL which
Expand All @@ -115,7 +115,7 @@ metadata |>
glimpse()
```

# A note on the piping operator
## A note on the piping operator

The vignette materials provided by `CuratedAtlasQueryR` show the use of the
'native' R pipe (implemented after R version `4.1.0`). For those not familiar
Expand All @@ -136,7 +136,7 @@ iris |>
aggregate(. ~ Species, data = _, mean)
```

# Summarizing the metadata
## Summarizing the metadata

For each distinct tissue and dataset combination, count the number of datasets
by tissue type.
Expand All @@ -147,36 +147,36 @@ metadata |>
count(tissue)
```

# Columns available in the metadata
## Columns available in the metadata

```{r, message = FALSE}
head(names(metadata), 10)
```

# Available assays
## Available assays

```{r}
metadata |>
distinct(assay, dataset_id) |>
count(assay)
```

# Available organisms
## Available organisms

```{r}
metadata |>
distinct(organism, dataset_id) |>
count(organism)
```

## Download single-cell RNA sequencing counts
### Download single-cell RNA sequencing counts

The data can be provided as either "counts" or counts per million "cpm" as given
by the `assays` argument in the `get_single_cell_experiment()` function. By
default, the `SingleCellExperiment` provided will contain only the 'counts'
data.

### Query raw counts
#### Query raw counts

```{r, message = FALSE}
single_cell_counts <-
Expand All @@ -192,7 +192,7 @@ single_cell_counts <-
single_cell_counts
```

### Query counts scaled per million
#### Query counts scaled per million

This is helpful if just few genes are of interest, as they can be compared
across samples.
Expand All @@ -208,7 +208,7 @@ metadata |>
get_single_cell_experiment(assays = "cpm")
```

### Extract only a subset of genes
#### Extract only a subset of genes

```{r, message = FALSE}
single_cell_counts <-
Expand All @@ -224,7 +224,7 @@ single_cell_counts <-
single_cell_counts
```

### Extracting counts as a Seurat object
#### Extracting counts as a Seurat object

If needed, the H5 `SingleCellExperiment` can be converted into a Seurat object.
Note that it may take a long time and use a lot of memory depending on how many
Expand All @@ -244,9 +244,9 @@ single_cell_counts <-
single_cell_counts
```

## Save your `SingleCellExperiment`
### Save your `SingleCellExperiment`

### Saving as HDF5
#### Saving as HDF5

The recommended way of saving these `SingleCellExperiment` objects, if
necessary, is to use `saveHDF5SummarizedExperiment` from the `HDF5Array`
Expand All @@ -256,7 +256,7 @@ package.
single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts")
```

# Exercises
## Exercises

:::::::::::::::::::::::::::::::::: challenge

Expand Down
24 changes: 12 additions & 12 deletions episodes/intro-sce.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ exercises: 10 # Minutes of exercises in the lesson

::::::::::::::::::::::::::::::::::::::::::::::::

# Setup
## Setup

```{r setup, message = FALSE, warning=FALSE}
library(SummarizedExperiment)
Expand All @@ -29,17 +29,17 @@ library(MouseGastrulationData)
library(BiocStyle)
```

# Bioconductor
## Bioconductor

## Overview
### Overview

Within the R ecosystem, the Bioconductor project provides tools for the analysis and comprehension of high-throughput genomics data.
The scope of the project covers microarray data, various forms of sequencing (RNA-seq, ChIP-seq, bisulfite, genotyping, etc.), proteomics, flow cytometry and more.
One of Bioconductor's main selling points is the use of common data structures to promote interoperability between packages,
allowing code written by different people (from different organizations, in different countries) to work together seamlessly in complex analyses.
By extending R to genomics, Bioconductor serves as a powerful addition to the computational biologist's toolkit.

## Installing Bioconductor Packages
### Installing Bioconductor Packages

The default repository for R packages is the [Comprehensive R Archive Network](https://cran.r-project.org/mirrors.html) (CRAN), which is home to over 13,000 different R packages.
We can easily install packages from CRAN - say, the popular `r CRANpkg("ggplot2")` package for data visualization - by opening up R and typing in:
Expand Down Expand Up @@ -78,7 +78,7 @@ BiocManager::install("scater")
Packages only need to be installed once, and then they are available for all subsequent uses of a particular R installation.
There is no need to repeat the installation every time we start R.

## Finding relevant packages
### Finding relevant packages

To find relevant Bioconductor packages, one useful resource is the [BiocViews](https://bioconductor.org/packages/release/BiocViews.html) page.
This provides a hierarchically organized view of annotations associated with each Bioconductor package.
Expand All @@ -87,7 +87,7 @@ This gives us a listing of all Bioconductor packages that might be useful for ou
CRAN uses the similar concept of ["Task views"](https://cran.r-project.org/web/views/), though this is understandably more general than genomics.
For example, the [Cluster task view page](https://cran.r-project.org/web/views/Cluster.html) lists an assortment of packages that are relevant to cluster analyses.

## Staying up to date
### Staying up to date

Updating all R/Bioconductor packages is as simple as running `BiocManager::install()` without any arguments.
This will check for more recent versions of each package (within a Bioconductor release) and prompt the user to update if any are available.
Expand All @@ -96,7 +96,7 @@ This will check for more recent versions of each package (within a Bioconductor
BiocManager::install()
```

# The `SingleCellExperiment` class
## The `SingleCellExperiment` class

One of the main strengths of the Bioconductor project lies in the use of a common data infrastructure that powers interoperability across packages.

Expand All @@ -110,7 +110,7 @@ knitr::include_graphics("http://bioconductor.org/books/3.17/OSCA.intro/images/Si

Let's start with an example dataset.

```{r, message = FALSE}
```{r, message = FALSE, warning=FALSE}
sce <- WTChimeraData(samples=5)
sce
```
Expand All @@ -121,7 +121,7 @@ The _getter_ methods are used to extract information from the slots and the _set

Depending on the object, slots can contain different types of data (e.g., numeric matrices, lists, etc.). We will here review the main slots of the SingleCellExperiment class as well as their getter/setter methods.

## The `assays`
### The `assays`

This is arguably the most fundamental part of the object that contains the count matrix, and potentially other matrices with transformed data. We can access the _list_ of matrices with the `assays` function and individual matrices with the `assay` function. If one of these matrices is called "counts", we can use the special `counts` getter (and the analogous `logcounts`).

Expand All @@ -132,7 +132,7 @@ counts(sce)[1:3, 1:3]

You will notice that in this case we have a sparse matrix of class "dgTMatrix" inside the object. More generally, any "matrix-like" object can be used, e.g., dense matrices or HDF5-backed matrices (see "Working with large data").

## The `colData` and `rowData`
### The `colData` and `rowData`

Conceptually, these are two data frames that annotate the columns and the rows of your assay, respectively.

Expand All @@ -151,7 +151,7 @@ sce$my_sum <- colSums(counts(sce))
colData(sce)
```

## The `reducedDims`
### The `reducedDims`

Everything that we have described so far (except for the `counts` getter) is part of the `SummarizedExperiment` class that SingleCellExperiment extends. You can find a complete lesson on the `SummarizedExperiment` class [here](https://carpentries-incubator.github.io/bioc-intro/60-next-steps.html).

Check warning on line 156 in episodes/intro-sce.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://carpentries-incubator.github.io/bioc-intro/60-next-steps.html)

Expand Down Expand Up @@ -196,7 +196,7 @@ Combining two objects: The `MouseGastrulationData` package contains several data

:::::::::::::: checklist

# Further Reading
## Further Reading

* OSCA book, [Introduction](https://bioconductor.org/books/release/OSCA.intro)

Expand Down
Loading

0 comments on commit f667320

Please sign in to comment.