-
Notifications
You must be signed in to change notification settings - Fork 5
Project 5
Single-cell data provides an in depth estimate of the cell composition of tumor tissues. Nevertheless, the complexity and price of such data prevents the large scale analysis of such data. On the other hand, bulk RNA-seq is relatively cheap and data for thousands of tumors, including rare tumors, are available.
Deconvolution methods allow to leverage the few single-cell datasets available to infer the composition of bulk RNA-seq data. The project consists in setting up a pipeline to systematically perform deconvolution on bulk sequencing data using the omnideconv R package that allows to easily run multiple deconvolution methods. You will run the pipeline on malignant pleural mesothelioma samples from the MESOMICs project (see Mangiante et al. Nat Genet 2023 and its short research briefing), a rare and deadly disease arising in the linings of the lungs. The MESOMICS project revealed that mesothelioma can be decomposed into three archetypes that seem to experience vastly different microenvironments. Nevertheless, these analyses relied on first-generation deconvolution methods based on sorted cells unspecific to mesothelioma; using next-generation deconvolution methods such as omnideconv will allow a much more precise assessment of the internal composition of mesothelioma.
Fig. 1 | Multi-omic analyses reveal three molecular archetypes of malignant pleural mesothelioma with different tumor microenvironments
- Bulk sequencing data from 200+ malignant pleural mesothelioma (public data, copied from https://github.com/IARCbioinfo/MESOMICS_data/ into /data/Training-MG/files/data/Project5_deconvolution_mesothelioma/MESOMICS_bulk_gene_count_matrix_1pass.csv for MESOMICS samples). Corresponding sample information data MangianteEtAl2023_TableS2-3_SamplesOverview.xlsx is available in the same folder (copied from the supplementary data of Mangiante et al. Nat Genet 2023). Robject MESOMICS.gene.RData also contains the expression data as a SummarizedExperiment R bioconductor object in different formats (read counts, TPMs).
- Annotated Single-cell reference data from several mesothelioma (protected data for 2 samples in R singleCellExperiment objects at /data/Training-MG/files/data/Project5_deconvolution_mesothelioma/*_sce_object.rds). These 2 objects come from preprocessed, clustered and annotated samples. These analyses were done in Python following the "Single-cell best practices" guide (https://www.sc-best-practices.org/cellular_structure/annotation.html).
- R programming
- Install omnideconv and all dependencies following instructions here https://github.com/omnideconv/omnideconv
- load all data
- create the different inputs
- Perform the deconvolution
- Visualise and analyse results
- Clean the script to make it easily reusable with more samples
- omnideconv documentation https://omnideconv.org/
- MESOMICS project https://rarecancersgenomics.com/mesomics/
[email protected] (Nicolas Alcala)