Single cell genomics has emerged as an important technology in modern biology, mainly aimed at understanding the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. A massive amount of data generated requires the application of machine learning technologies to derive biological value. The paper Machine learning for single cell genomics data analysis provides a solid introduction to machine learning techniques and principles and a survey of recent literature to keep in mind. The single cell genomics course is another great resource to get a feel of the field, the kind of data generated, and analysis techniques and things to look out for. Finally, this github repo links reading materials for getting started with single-cell genomics.
In this mini-project, you will make use of single cell genomics data generated and provided through Kaggle. These are results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes with the values of the matrix showing how strong "expression" of the corresponding gene is in the corresponding cell. The corresponding notebook provides an exploration of the data that you are expected to analyse. You are expected to:
- Perform exploratory analysis of the data provides and decide on some of the questions you can address through machine learning
- Make decisions on how you will deal with challenges derived from sparsity, heterogeneity, and scale of SC genomic data
- Build machine learning models to explore the questions identified from exploratory analysis
- Explore how you can visualise the results
In addition to the resources linked above, the following papers will introduce you to SC genomics.
- Hie, B., Peters, J., Nyquist, S. K., Shalek, A. K., Berger, B., & Bryson, B. D. (2020). Computational Methods for Single-Cell RNA Sequencing. Annual Review of Biomedical Data Science, 3(1), 339–364. https://doi.org/10.1146/annurev-biodatasci-012220-100601
- Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S. O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. de, Cappuccio, A., … Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. In Genome Biology (Vol. 21, Issue 1). Genome Biology. https://doi.org/10.1186/s13059-020-1926-6
- Raimundo, F., Meng-Papaxanthos, L., Vallot, C., & Vert, J. P. (2021). Machine learning for single-cell genomics data analysis. Current Opinion in Systems Biology, 26, 64–71. https://doi.org/10.1016/j.coisb.2021.04.006
- Miller, H. E., Gorthi, A., Bassani, N., Lawrence, L. A., Iskra, B. S., & Bishop, A. J. R. (2020). Reconstruction of ewing sarcoma developmental context from mass-scale transcriptomics reveals characteristics of ewsr1-fli1 permissibility. Cancers, 12(4). https://doi.org/10.3390/cancers12040948
- Andrews, T. S., Kiselev, V. Y., McCarthy, D., & Hemberg, M. (2021). Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nature Protocols, 16(1). https://doi.org/10.1038/s41596-020-00409-w