Covid-19 has affected all of our lives, with the effects still being felt two years later. Although vaccines have been rolled out worldwide, new variants have continued to threaten intervention efforts. Whole-genome sequencing of SARS-CoV2 has been key to understanding the transmission dynamics and emergence of new variants worldwide. Genomics has been critical in formulating interventions. Genomic sequencing has played a major role in understanding the pathogenicity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). With the current pandemic, it is essential that SARS-CoV-2 viruses are sequenced regularly to determine mutations and genomic modifications in different geographical locations. In Kenya, the KEMRI-WT were the first to generate whole-genome sequences, and the paper was recently published, contributing to worldwide efforts.
The focus of this mini-project is to reproduce the analysis done in this paper and then replicate the study using the data generated from the Kenyan samples available from Githinji 2021. In reproducing the analytics, you are expected to develop a reproducible pipeline, which you will test in datasets generated from other locations. Formulate questions to explore using these and other publicly available datasets, especially from Africa.
You are required to explore different approaches for deriving value from the available data.