Skip to content

Commit

Permalink
Updated tutorial vignette.
Browse files Browse the repository at this point in the history
  • Loading branch information
PratibhaPanwar authored Nov 2, 2024
1 parent ff9b1dc commit e07d631
Showing 1 changed file with 36 additions and 38 deletions.
74 changes: 36 additions & 38 deletions vignettes/clustSIGNAL_tutorial.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ library(ggplot2)
library(patchwork)
```

## 3. How to run clustSIGNAL
# 3. How to run clustSIGNAL

### Example datasets
## Example datasets

For this tutorial, we will use sampled data from two publicly available
datasets - (i) a subsample from the SeqFISH mouse embryo dataset from [Lohoff et
Expand All @@ -89,7 +89,7 @@ for 181 samples, with 155 genes and a total of 1,027,080 cells. Here, we use a
subset data of 6000 cells randomly selected from only 3 samples - Animal 1
Bregma -0.09 (2080 cells), Animal 7 Bregma 0.16 (1936 cells), and Animal 7
Bregma -0.09 (1984 cells), excluding cells that had been annotated as
'ambiguous' and 20 genes that were assessed using a different technology.
'Ambiguous' and 20 genes that were assessed using a different technology.

These sampled datasets are available with clustSIGNAL package and can be
accessed as below:
Expand All @@ -106,9 +106,9 @@ data(mHypothal)
# logcounts and cell metadata, respectively, to your environment
```

### Creating SpatialExperiment objects
## Creating SpatialExperiment objects

clustSIGNAL requires a SpatialExperiment (spe) object as input, so first we need
ClustSIGNAL requires a SpatialExperiment (spe) object as input, so first we need
to create a spe object from the gene expression and cell metadata we have in our
environment.

Expand All @@ -118,7 +118,7 @@ dataframe of cell characteristics, including the x-y coordinates of each cell.
Let us work with the MERFISH data first, by creating a spe object from the data.

```{r hypothal_data_prep}
# to create spe object we need gene expression, cell metadata, and cell location
# to create spe object we need gene expression, cell metadata, and cell locations
spe_mh <- SpatialExperiment(assays = list(logcounts = mh_expr),
colData = mh_data,
# spatialCoordsNames requires column names in
Expand All @@ -132,9 +132,9 @@ gene expression data are stored here - multiple assays can be stored in one spe
object), colData (the cell characteristics are stored here), rowData (the gene
characteristics can be stored here), spatialCoords (the x-y coordinates of each
cell are stored here), reducedDims (any low embeddings data can be stored here),
and imgData (any images from the data can be stored here).
and imgData (any images from the dataset can be stored here).

### Running clustSIGNAL
## Running clustSIGNAL

ClustSIGNAL comes with many parameters that can be explored. Most of these
parameters have default values and do not need to be specified when running the
Expand Down Expand Up @@ -195,18 +195,18 @@ res_mh |> names()

The resulting spe object contains the adaptively smoothed gene expression data
as an additional assay, initial clusters, entropy values, and clustSIGNAL
clusters. Essentiallu, the final spe object contains data from the input spe
clusters. Essentially, the final spe object contains data from the input spe
object plus the outputs from the clustSIGNAL run.

```{r hypothal_speFinal}
res_mh$spe_final
spe_mh <- res_mh$spe_final
```

## 4. Assessing relevance of clusters
# 4. Assessing relevance of clusters

In this section, we will analyse the results from clustSIGNAL through clustering
metrics and visualisation.
metrics and visualisations.

Clustering metrics such as ARI (adjusted rand index) and NMI (normalised mutual
information) allow us to compare clustering performed by two methods. Here, we
Expand All @@ -222,10 +222,7 @@ spe_mh |> colData() %>%
# group cells by samples and for cells in each sample
# calculate the following metrics
summarise(ARI = aricode::ARI(Cell_class, clustSIGNAL), # calculate ARI
NMI = aricode::NMI(Cell_class, clustSIGNAL), # calculate NMI
min_Entropy = min(entropy), # calculate minimum entropy
max_Entropy = max(entropy), # calculate minimum entropy
mean_Entropy = mean(entropy)) # calculate minimum entropy
NMI = aricode::NMI(Cell_class, clustSIGNAL)) # calculate NMI
```

The clustering output can also be visualised by plotting the spatial coordinates
Expand All @@ -240,19 +237,18 @@ object.
reducedDim(spe_mh, "spatial") <- spatialCoords(spe_mh)
```

To specify that we want to create spatial plots, we just specify the low
embedding name in dimred option, in this case it is "spatial".
To specify that we want to create spatial plots, we just mention the correct low
embedding name in the dimred option, in this case it is "spatial".

```{r hypothal_spatialPlots}
p1 <- scater::plotReducedDim(spe_mh,
# specify spatial low dimension
dimred = "spatial",
colour_by = "clustSIGNAL",
point_alpha = 1,
point_size = 1) +
scater::plotReducedDim(spe_mh,
# specify spatial low dimension
dimred = "spatial",
colour_by = "clustSIGNAL",
point_alpha = 1,
point_size = 1) +
# to separate out the 3 samples in the dataset
facet_wrap(vars(spe_mh[[smp_label]]), scales = "free")
p1
facet_wrap(vars(spe_mh[[smp_label]]), scales = "free")
```

Here, the x and y axes are the x-y coordinates of the cells. The dataset
Expand All @@ -269,8 +265,7 @@ table(spe_mh$Cell_class, spe_mh$clustSIGNAL)

In this table, the rows show published manual annotations and columns show
clustSIGNAL cluster labels. ClustSIGNAL is able to capture the distinct cell
types and also identify subgroups in some cases - inhibitory and excitatory
neurons.
types and also identify subgroups in some cases, e.g., the inhibitory neurons.

To assess how distinct these clusters are, we investigate the top marker genes
in each cluster using the FindAllMarkers function in Seurat R package.
Expand All @@ -289,7 +284,7 @@ markers, we will use default values, which includes using the data layer
containing logcounts.

```{r hypothal_clusterMarkers}
# to specify that the cluster number information is in clustSIGNAL column
# to specify that the cluster labels are in clustSIGNAL column
Idents(seu_mh) <- "clustSIGNAL"
# this will identify marker genes in each cluster using default values
markers_mh <- Seurat::FindAllMarkers(seu_mh)
Expand All @@ -314,21 +309,23 @@ Seurat::DoHeatmap(seu_mh, slot = "data", features = top10$gene) + NoLegend()

Here, the genes are shown along the y-axis and the cells, grouped by the cluster
they belong to, are displayed along the x-axis. The values in the heatmap are
logcounts of top 10 marker genes in each cluster.
logcounts of top 10 marker genes in each cluster. The heatmap shows that the
clusters associated with inhibitory neurons have different gene expression
patterns, which accounts for their separation by clustSIGNAL.

## 5. Exploring clustSIGNAL outputs
# 5. Exploring clustSIGNAL outputs

Other than the cluster labels, clustSIGNAL also generates smoothed gene
expression, initial cluster labels, and cell neihbourhood-specific entropy
values. Other outputs such as nearest neighbour matrix, and initial cluster
values. Other outputs such as nearest neighbour matrix and initial cluster
based neighbourhood compositions are also accessible by running clustSIGNAL
functions sequentially.

The *p1_clustering* function generates the initial cluster labels and adds them
to the spe object, the *neighbourDetect* function produces nearest neighbour
matrix and neighbourhood compositions, the *entropyMeasure* function adds
entropy values to the spe object, the *adaptiveSmoothing* function adds smoothed
gene expression to the spe object, and the *p2_clustering* function generates
The *p1_clustering()* function generates the initial cluster labels and adds them
to the spe object, the *neighbourDetect()* function produces nearest neighbour
matrix and neighbourhood compositions, the *entropyMeasure()* function adds
entropy values to the spe object, the *adaptiveSmoothing()* function adds smoothed
gene expression to the spe object, and the *p2_clustering()* function generates
the final clusters and adds them to the spe object.

Of these additional outputs, the entropy values can be valuable in exploring
Expand Down Expand Up @@ -356,7 +353,7 @@ spe_me
```

To explore the dataset, we can estimate the spread and distribution of the
entropy values using histogram and spatial plots respectively.
entropy values using histogram and spatial plots, respectively.

```{r compare_hist}
# histogram plots to show entropy spread
Expand Down Expand Up @@ -394,7 +391,8 @@ s2 <- scater::plotReducedDim(spe_mh, # plotting hypothalamus data
# to separate out the 3 samples in the dataset
facet_wrap(vars(spe_mh[[smp_label]]), scales = "free")
(h1 + h2) / (s1 + s2) + patchwork::plot_layout(widths = c(1, 3))
(h1 + h2 + patchwork::plot_layout(widths = c(1, 3))) /
(s1 + s2 + patchwork::plot_layout(widths = c(1, 3)))
```

The entropy plots help us gauge the "domainness" in the samples. For example,
Expand Down

0 comments on commit e07d631

Please sign in to comment.