Snakemake Pipeline for Creating an Alternate Reference Genome: A Case Study on an Asexual Triploid Arctic Daphnia pulicaria population
This workflow uses conda with Snakemake in order to create an alternate reference genome for a population of triploid Daphnia pulicaria in West Greendland. For read mapping, the genome assembly ASM2113471v1 of Daphnia pulex was used as reference.
A future study aims to take a closer look at methylation patterns throughout centuries in asexual triploid Arctic Daphnia pulicaria strains. Its current proposal suggests using a Daphnia pulex reference genome for mapping. This is possible since those strains belong to the larger Daphnia pulex species complex, as well as due to the fact that Daphnia pulex and Daphnia pulicaria are known to be closely related and form hybrids. It must be noted that this project's Arctic Daphnia pulicaria differentiate themselves from regular Daphnia pulicaria. They are classified in an entirely separate clade of their own, namely the Polar Daphnia pulicaria clade, by means of the the mitochondrial ND5 (Colbourne et al., 1998, Frisch).
This project puts forward the creation of an alternate reference genome unique to the Arctic Daphnia pulicaria population of West Greenland, which is the main motivation. Another motivation is the establishment of a methodology to create an alternate reference genome for a triploid organism, seeing as most tools out there are adapted to diploids. It is theorized that using this newly created reference genome would yield better mapping and downstream analysis results in the future study, but has yet to be demonstrated.
For that, a customized Snakemake workflow was put together from Illumina 1.9 short-read Whole-Genome Sequencing (WGS) data of 10 D. pulicaria samples, where individual eggs underwent Whole Genome Amplification (WGA) with the TruePrime Single Cell WGA Kit version 2.0. The workflow consists of five main steps: quality control, mapping, processing and cleaning of BAM files, variant calling and filtering, and lastly the creation of the alternate reference genome.
- Clone this repository using
git clone https://github.com/wassimsalam01/snakemake-triploid-alt-ref-genome-pipeline.git
-
Install miniforge in the home directory following the installation guide here.
-
Install Snakemake following the installation guide here.
-
Adjust values of
--mail-user
,--partition
and--qos
inslurm/config.yaml
and insnakemake.sh
. -
Place PE reads in a directory named
reads
-
Create Slurm output and error directories
cd slurm
mkdir slurm/output slurm/error
- Test the workflow with a dry run
mamba activate snakemake
snakemake --profile slurm/ -n > dryrun.txt
- Launch the workflow with
sbatch snakemake.sh
J. K. Colbourne, T. J. Crease, L. J. Weider, P. D. N. Hebert, F. Duferesne, A. Hobæk, Phylogenetics and evolution of a circumarctic species complex (Cladocera: Daphnia pulex), Biological Journal of the Linnean Society, Volume 65, Issue 3, November 1998, Pages 347–365, https://doi.org/10.1111/j.1095-8312.1998.tb01146.x