Skip to content

Commit

Permalink
Merge pull request #43 from fhdsl/cansavvy/minor-updates
Browse files Browse the repository at this point in the history
Add bit about 'wikipedia'
  • Loading branch information
cansavvy authored Dec 7, 2023
2 parents f912652 + 33ef340 commit 2f7fa9e
Show file tree
Hide file tree
Showing 6 changed files with 69 additions and 20 deletions.
3 changes: 3 additions & 0 deletions 01-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ ottrpal::set_knitr_image_path()
ottrpal::include_slide("https://docs.google.com/presentation/d/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY/edit#slide=id.gd422c5de97_0_0")
```

This is a *living* course meaning it is constantly changing and being updated. The goal for this course is to be a "wikipedia" of omic data.
If you'd like to contribute, [you can file a pull request on GitHub](https://github.com/fhdsl/Choosing_Genomics_Tools) if you are comfortable with that sort of thing or email `[email protected]` to ask how to get started.

## Target Audience

The course is intended for students in the biomedical sciences and researchers who have been given data and don't know what to do with it or would like an overview of the different genomic data types that are out there.
Expand Down
18 changes: 9 additions & 9 deletions 10c-spatial-transcriptomics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Spatial transcriptomics (ST) technologies have been developed as a solution to t
1. **Describe tissue-specific cellular neighborhoods of cell types and cell type sub-populations:** Although scRNA-seq continues to be a powerful method to assign biological identities to a mixture of cells, integrated analysis of ST combined with scRNA-seq adds crucial information to cell phenotypes by describing the neighborhoods where cells occur [@longo2021integrating]. Many methods to phenotype ST data are available, with most of them relying on the availability of a curated (scRNA-seq) cell type reference. Once cell identities have been determined, clustering or spatial statistics can be applied to describe the composition of tissue niches or domains. The explosion of ST data has resulted on novel and comprehensive tissue- or disease-specific atlases, not only describing the cell types within organs, but also the functional cell-cell relationships that result from spatial organization (e.g., @guilliams2022spatial; @wu2021single).
2. **Uncover spatially regulated biological processes:** With ST data, there comes the ability to detect genes or gene pathways that are expressed in specific areas within tissues (i.e., spatially-restricted expression). Detecting genes with spatially-restricted expression is key to achieve further understanding of specific biological processes, such as tissue gradients, cell differentiation, or signaling pathways. For example, cancer researchers are now able to study signaling pathways restricted to the tumor-stroma interface [@hunter2021spatially], which could lead to the discovery of mechanisms representing cancer vulnerabilities resulting from interactions between the tumor and stroma cells.
3. **Investigate cell-cell interactions:** From basic to applied tissue biology research, the study of cell-cell interactions is of high interest, especially the interactions that occur via ligand-receptor pairs. The construction of comprehensive databases of ligand-receptor interactions has been possible due the large amounts of single-cell data sets produced by researchers. A major contribution of ST to the study of tissue biology is the addition of the spatial context to previously identified ligand-receptor interactions. Because single-cell RNA-seq requires physical separation of cells, current ligand-receptor databases represent hypotheses which ST can help to address by using models of spatial co-localization, enabling in-situ examination of cell-cell interactions and communication [@raredon2023comprehensive; @wang2023promising].
4. **Integrate imaging data:** Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., @ravi2022spatially). The oppoosite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., @hu2021spagcn; @xu2022deepst; @tan2020spacell). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy.
4. **Integrate imaging data:** Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., @ravi2022spatially). The opposite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., @hu2021spagcn; @xu2022deepst; @tan2020spacell). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy.
5. **Identify biomarkers and drug targets:** The use of ST allows the exploration of tissue niche-specific expression patterns and gene pathway analysis. This exploration can lead to generation of hypotheses about potential biomarkers for specific tissue functions or disease states. Furthermore, the molecular interactions predicted using scRNA-seq (e.g., ligand-receptor), can now be put in context of the larger tissue architecture using ST data. The spatial context of these interactions will likely boost the identification of novel drug targets, as well as improved understanding of current therapies [@lyubetskaya2022assessment; @zhang2022clinical].

## Overview of a spatial transcriptomics workflow
Expand All @@ -38,11 +38,11 @@ Some of the commonalities in the workflows are presented here:

3. **RNA quantification:** The method used to count the number of captured or hybridized RNA molecules greatly varies from technology to technology. Capture methods often involve release of the RNA molecules from the tissue or slide, followed by library preparation, amplification, next generation sequencing, and read mapping to a reference genome. In this case, libraries are spatially multiplexed, whereby barcodes indicate the spatial location originating the captured RNA molecules. In imaging-based methods, segmentation is required to delineate the cell borders. Then, coded fluorescent probes are counted within each segmented cells.

4. **Data quality control and pre-processing:** As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriuptomics data typically contain an exceess of zeroes and high gene dropout [@zhao2022modeling]. Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercided to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly [@ospina2023primer]. In imaging-based methods, the area of cells can be used to detect "doublets" generated during image segmentation. Once filtering has been performed, gene count normalization and transfromation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform [@hafemeister2019normalization] can be also used.
4. **Data quality control and pre-processing:** As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriptomics data typically contain an excess of zeroes and high gene dropout [@zhao2022modeling]. Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercised to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly [@ospina2023primer]. In imaging-based methods, the area of cells can be used to detect "doublets" generated during image segmentation. Once filtering has been performed, gene count normalization and transformation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform [@hafemeister2019normalization] can be also used.

5. **Visualization:** Similar to scRNA-seq data, dimension reduction methods such as the Uniform Manifold Approximation and Projection (UMAP) are key to visualize the heterogeneity of the data set. Nonetheless, given the additional modality provided by the spatial coordinates, spatial gene expression heatmaps can be generated, whcih can be compared against the imaging data (e.g., H&, IHC, mIF) to gain further insights into overall tissue architecture.
5. **Visualization:** Similar to scRNA-seq data, dimension reduction methods such as the Uniform Manifold Approximation and Projection (UMAP) are key to visualize the heterogeneity of the data set. Nonetheless, given the additional modality provided by the spatial coordinates, spatial gene expression heatmaps can be generated, which can be compared against the imaging data (e.g., H&, IHC, mIF) to gain further insights into overall tissue architecture.

6. **Clustering and cell/tissue domain phenotyping:** There is a plethora of clustering approaches, ranging from employed in scRNA-seq analysis (e.g., Louvain) to novel neural network classification. Some methods take advantage of the spatial location information and/or tissue image to inform clustering. Compared to clustering, cell/domain phenotyping is an area of even more active development, withn the majority of methods relying on the use of a comprehensive single-cell, tissue specific atlas from which cell types (i.e., "labels") are obtained. Canonical marker-based phenotyping is still widely used, and in many cases unavoidable to identify specific cell populations. general, it is advisable to use the expert validation of a tissue biologist or pathologist to ascertain if clustering and phenotyping are capturing the tissue architecture adequately.
6. **Clustering and cell/tissue domain phenotyping:** There is a plethora of clustering approaches, ranging from employed in scRNA-seq analysis (e.g., Louvain) to novel neural network classification. Some methods take advantage of the spatial location information and/or tissue image to inform clustering. Compared to clustering, cell/domain phenotyping is an area of even more active development, within the majority of methods relying on the use of a comprehensive single-cell, tissue specific atlas from which cell types (i.e., "labels") are obtained. Canonical marker-based phenotyping is still widely used, and in many cases unavoidable to identify specific cell populations. general, it is advisable to use the expert validation of a tissue biologist or pathologist to ascertain if clustering and phenotyping are capturing the tissue architecture adequately.

## Spatial transcriptomic data **strengths**:

Expand All @@ -67,7 +67,7 @@ Some of the commonalities in the workflows are presented here:

#### [Space Ranger](https://www.10xgenomics.com/support/software/space-ranger/downloads)

- **Pros:** Space Ranger is a software package developed by 10x Genomics specifically for processing and analyzing spatial transcriptomics raw data generated by their platform (Visium). It provides a streamlined workflow for processing raw data, including image registration, assignement of read counts to spots, and counting transcripts. Outputs from Space Ranger are commonly the input of many other ST analytical software.
- **Pros:** Space Ranger is a software package developed by 10x Genomics specifically for processing and analyzing spatial transcriptomics raw data generated by their platform (Visium). It provides a streamlined workflow for processing raw data, including image registration, assignment of read counts to spots, and counting transcripts. Outputs from Space Ranger are commonly the input of many other ST analytical software.
- **Cons:** Space Ranger has been designed to process only 10x Genomics data. The software does not provide methods to extract insights, which is accomplished by integration with other analytical suites. Requires knowledge of command line use.

#### [GeomxTools](https://www.bioconductor.org/packages/release/bioc/html/GeomxTools.html)
Expand All @@ -89,17 +89,17 @@ Some of the commonalities in the workflows are presented here:

#### [Giotto](https://giottosuite.readthedocs.io/en/latest/index.html)

- **Pros:** The analytical suite Giotto in a collection of methods to study spatial gene expression, agnostic to the platform used to generate the data. It allows users to perform data pre-processing, clustering, visualization, detection of spatially variable genes, and expression co-localization analysis. Computatiionally intensiuve analysis can be condcicted in the cloud via integration with Terra.bio or locally using a Docker container. Some of the statistical methods in Giotto implicitly make use of the spatial coordinates to detect patterns.
- **Pros:** The analytical suite Giotto in a collection of methods to study spatial gene expression, agnostic to the platform used to generate the data. It allows users to perform data pre-processing, clustering, visualization, detection of spatially variable genes, and expression co-localization analysis. Computationally intensive analysis can be conducted in the cloud via integration with Terra.bio or locally using a Docker container. Some of the statistical methods in Giotto implicitly make use of the spatial coordinates to detect patterns.
- **Cons:** Requires some familiarity with R, as well as bioinformatics and spatial statistics concepts. Installation requires setting up Python, as some modules use that language.

#### [spatialGE](https://fridleylab.github.io/spatialGE/) and [spatialGE-web](https://spatialge.moffitt.org/)

- **Pros:** The spatialGE analysis suite allows users to study STdata form multiple platfoms, including methods for pre-processing, clustering/domain detection, spatially variable genes, and functional analysis via detection of gene expression gradients and/or gene set enrichment spatial patterns. All the functionality of the R package has been implemented on a point-and-click web application requiring no coding experience and email notifications when analyses are completed. Statistcial methods in spatialGE implicitly take into account the spatial coordinates during calculations.
- **Pros:** The spatialGE analysis suite allows users to study STdata form multiple platforms, including methods for pre-processing, clustering/domain detection, spatially variable genes, and functional analysis via detection of gene expression gradients and/or gene set enrichment spatial patterns. All the functionality of the R package has been implemented on a point-and-click web application requiring no coding experience and email notifications when analyses are completed. Statistcial methods in spatialGE implicitly take into account the spatial coordinates during calculations.
- **Cons:** Use of the spatialGE R package requires familiarity with the language. The spatialGE web application by-pass the need of R coding, however computationally-intensive methods can take time to complete.

#### [Loupe](https://support.10xgenomics.com/spatial-gene-expression/software/visualization/latest/what-is-loupe-browser)

- **Pros:** The Loupe browser is a point-and-click tool for exploration of both non-spatial scRNA-seq and ST. Loupe takes Visium outputs and allows visualization of gene expression, clustering, and detection of diferentially expressed genes. The tool also allows for easy registration and comparative analysis of Visium imaging and expression data.
- **Pros:** The Loupe browser is a point-and-click tool for exploration of both non-spatial scRNA-seq and ST. Loupe takes Visium outputs and allows visualization of gene expression, clustering, and detection of differentially expressed genes. The tool also allows for easy registration and comparative analysis of Visium imaging and expression data.
- **Cons:** Loupe allows basic exploration of the data. To perform functional-level analysis of ST data, the use of additional tools might be required.

#### [ST Pipeline](https://pypi.org/project/stpipeline/)
Expand Down Expand Up @@ -162,7 +162,7 @@ Some of the commonalities in the workflows are presented here:

#### [CellChat](<https://htmlpreview.github.io/?https://github.com/sqjin/CellChat/blob/master/tutorial/CellChat_analysis_of_spatial_imaging_data.html>)

- **Pros:** CellChat is an algorithm to infer cell communications via ligand-receptor interactions. CellChat was designed for non-spatial scRNA data, however, a recent implementation has been included to account for distances between cells in ST experiments. The package includes a comprehensive ligand-receptor data base which is queried after quantification of probability of interaction betwee two given cell types.
- **Pros:** CellChat is an algorithm to infer cell communications via ligand-receptor interactions. CellChat was designed for non-spatial scRNA data, however, a recent implementation has been included to account for distances between cells in ST experiments. The package includes a comprehensive ligand-receptor data base which is queried after quantification of probability of interaction between two given cell types.
- **Cons:** Requires familiarity with R programming. The spatial implementation of CellChat has been tested on Visium data.

## More tools and tutorials regarding spatial transcriptomics
Expand Down
13 changes: 3 additions & 10 deletions About.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,16 @@ These credits are based on our [course contributors table guidelines](https://gi
|**Pedagogy**||
|Lead Content Instructor(s)|[Candace Savonen]|
|Lecturer(s)|[Candace Savonen]|
|Content Contributor(s)| [Cailin Jordan] - sc-ATAC-Seq, [Carrie Wright], [Claire Mills] - Whole Genome Sequencing, [Jacob Greene] - ChIP-seq |
|Content Contributor(s)| [Cailin Jordan] - sc-ATAC-Seq <br> [Carrie Wright] <br> [Claire Mills] - Whole Genome Sequencing<br> [Jacob Greene] - ChIP-seq <br> [Oscar Ospina] - Spatial transcriptomics|
|Content Directors|[Jeff Leek]|
|Content Consultants|[Carrie Wright], [Cliff Meyer] - ATAC-seq, [Frederick Tan]|
|Content Consultants|[Carrie Wright]<br> [Cliff Meyer] - ATAC-seq <br> [Frederick Tan]<br>|
|Acknowledgments||
|**Production**||
|Content Publisher|[Ira Gooding]|
|Content Publishing Reviewers|[Ira Gooding]|
|**Technical**||
|Course Publishing Engineer|[Candace Savonen]|
|Template Publishing Engineers|[Candace Savonen], [Carrie Wright]|
|Publishing Maintenance Engineer|[Candace Savonen]|
|Technical Publishing Stylists|[Carrie Wright], [Candace Savonen]|
|Package Developers ([ottrpal])[Candace Savonen], [John Muschelli], [Carrie Wright]|
|**Art and Design**||
|Illustrator| [Candace Savonen]|
|Figure Artist| [Candace Savonen] and [Claire Mills]|
|Videographer|[Candace Savonen]|
|Videography Editor| [Candace Savonen]|
|**Funding**||
|Funder|[National Cancer Institute (NCI)](https://www.cancer.gov/) UE5 CA254170|
|Funding Staff| [Sandy Ormbrek], [Shasta Nicholson] |
Expand All @@ -50,6 +42,7 @@ devtools::session_info()
[Frederick Tan]: https://bse.carnegiescience.edu/dr-frederick-tan
[Jacob Greene]: https://www.linkedin.com/in/jacob-greene-890aa318a/
[Jeff Leek]: https://jtleek.com/
[Oscar Ospina]: https://linkedin.com/in/oscareospina/
[John Muschelli]: https://johnmuschelli.com/
[Sandy Ormbrek]: https://hutchdatascience.org/ourteam/
[Shasta Nicholson]: https://hutchdatascience.org/ourteam/
Expand Down
Loading

0 comments on commit 2f7fa9e

Please sign in to comment.