The European Genome-phenome Archive (EGA) currently stores nearly 3 million BAM and CRAM files — and this number continues to grow thanks to the contributions of the scientific community.
To improve the quality reports we generate for each of these files, we have developed a set of pipelines that automate the use of multiple bioinformatics tools for comprehensive quality assessment.
If you'd like to use these pipelines, please follow the Start Guide. For further details on how the scripts work, refer to the Documentation.
To illustrate the pipeline, we ran it on a small BAM file (Note: The BAM file contains alignments exclusively from chromosome 11) from the 1000 Genomes Project.
You can download the input file here, and you can review the resulting output folder.
It matches the structure and content you should expect if you follow the steps in the guide.
To learn more about the 1000 Genomes Project, visit their official website.
We know the test file is relatively small, so we also evaluated the pipeline on:
- A 176 GB WGS BAM file
- A 5.8 GB RNA-seq BAM file
You can check the runtime performance and resource usage in the test/performance_logs
folder.