Apollo Mapping

Reference-based mapping analysis of fungal genomes

Pipeline information

Author(s): Boas van der Putten, Roxanne Wolthuis
Organization: Rijksinstituut voor Volksgezondheid en Milieu (RIVM)
Department: Infektieziekteonderzoek, Diagnostiek en Laboratorium Surveillance (IDS), Bacteriologie (BPD)
Start date: 07 - 04 - 2023
Commissioned by: Thijs Bosch

About this project

Apollo-mapping is the first pipeline created in the Apollo pipeline series. The Goal of these pipelines is to set up a routine surveillance for fungi (A.fumigatus, Candida). The apollo-mapping pipeline is created with the juno-template and juno-library.

The input of the pipeline is raw Illumina paired-end data in the form of two fastq files (with extension .fastq, .fastq.gz, .fq or .fq.gz), containing the forward and the reversed reads ('R1' and 'R2' must be part of the file name, respectively).

The pipeline uses the following tools(NOT COMPLETE):

FastQC (Andrews, 2010) is used to assess the quality of the raw Illumina reads
FastP (Chen, Zhou, Chen and Gu, 2018) is used to remove poor quality data and adapter sequences
Picard determines the library fragment lengths
MultiQC (Ewels, Magnusson, Lundin, & Käller, 2016) is used to summarize analysis results and quality assessments in a single report for dynamic visualization.
Kraken2 and Bracken for identification of fungal species.

Prerequisities

Linux environment
(mini)conda
Python 3.11

Installation

Clone the repository.

git clone https://github.com/RIVM-bioinformatics/apollo-mapping.git

Go to the pipeline directory.

cd apollo-mapping

Create & activate mamba environment.

conda env update -f envs/mamba.yaml

conda activate mamba

Create & activate apollo environment.

mamba env update -f envs/apollo_mapping.yaml

conda activate apollo_mapping

Example of run:

python3 apollo_mapping.py -i [input] -o [output] -s [species]

Parameters & Usage

Command for help

-h, --help Shows the help of the pipeline

Required parameters

-i, --input Relative or absolute path to the input directory. It must contain all the raw reads (fastq) files for all samples to be processed (not in subfolders)
-s, --species Species to use, choose from: ['candida_auris', 'aspergillus_fumigatus']

Optional parameters

-o --output Relative or absolute path to the output directory. If none is given, an 'output' directory will be created in the current directory
-w, --workdir Relative or absolute path to the working directory. If none is given, the current directory is used.
-ex, --exclusionfile Path to the file that contains samplenames to be excluded.
-p, --prefix Conda or singularity prefix. Basically a path to the place where you want to store the conda environments or the singularity images.
-l, --local If this flag is present, the pipeline will be run locally (not attempting to send the jobs to an HPC cluster**). The default is to assume that you are working on a cluster. **Note that currently only LSF clusters are supported.
-tl, --time-limit Time limit per job in minutes (passed as -W argument to bsub). Jobs will be killed if not finished in this time.
-u, --unlock Unlock output directory (passed to snakemake).
-n, --dryrun Dry run printing steps to be taken in the pipeline without actually running it (passed to snakemake).
-q, --queue Name of the queue that the job will be submitted to if working on a cluster.
-mpt, --mean-quality-treshold Phred score to be used as threshold for cleaning (filtering) fastq files.
-ws, --window-size Window size to use for cleaning (filtering) fastq files.
-ml, --minimum-lenth Minimum length for fastq reads to be kept after trimming.
--no-containers Use conda environments instead of containers.
--snakemake-args Extra arguments to be passed to snakemake API (https://snakemake.readthedocs.io/en/stable/api_reference/snakemake.html).
--reference Reference genome to use default is chosen based on species argument, defaults per species can be found in: /mnt/db/apollo/mapping/[species]
--db-dir Kraken2 database directory (should include fungi!)

The base command to run this program.

python3 apollo-mapping.py -i [dir/to/fasta_or_fastq_files] -s [species]

An example on how to run the pipeline.

python3 apollo-mapping.py -i [dir/to/fasta_or_fastq_files] -o [/path/to/output/location] -s aspergillus_fumigatus

Detailed information about the pipeline can be found in the [documentation](link to other docs). This documentation is only suitable for users that have access to the RIVM Linux environment.

Explanation of the output

audit_trail: Logs of conda, git and the pipeline, a sample sheet, the used parameters and a snakemake report.
clean_fastq: cleaned fastq files.
identify_species: Output of kraken and bracken for species identification.
log: Log with output and error file from the cluster for each Snakemake rule/step that is performed.
mapped_reads: Mapping output.
multiqc: Multiqc output and multiqc html report.
qc_clean_fastq: Quality control of clean fastq reads.
qc_mapping: Quality control of mapping.
reference: Reference genome used.
variant: Variant calling results.

Issues

This pipeline only works on the RIVM cluster.

Future ideas for this pipeline

Make this pipeline available and user friendly for users outside RIVM.

License

This pipeline is licensed with a AGPL3 license. Detailed information can be found inside the 'LICENSE' file in this repository.

Contact

Contact person: IDS-Bioinformatics
Email: [email protected]

Acknowledgements

Contribution guidelines

Apollo pipelines use a feature branch workflow. To work on features, create a branch from the main branch to make changes to. This branch can be merged to the main branch via a pull request. Hotfixes for bugs can be committed to the main branch.

Please adhere to the conventional commits specification for commit messages. These commit messages can be picked up by release please to create meaningful release messages.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
base_juno_pipeline @ 3d05d8b		base_juno_pipeline @ 3d05d8b
config		config
envs		envs
files		files
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
apollo_mapping.py		apollo_mapping.py
mypy.ini		mypy.ini
run_pipeline.sh		run_pipeline.sh
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apollo Mapping

Reference-based mapping analysis of fungal genomes

Pipeline information

About this project

Prerequisities

Installation

Parameters & Usage

Command for help

Required parameters

Optional parameters

The base command to run this program.

An example on how to run the pipeline.

Explanation of the output

Issues

Future ideas for this pipeline

License

Contact

Acknowledgements

Contribution guidelines

About

Releases 9

Packages

Contributors 2

Languages

License

RIVM-bioinformatics/apollo-mapping

Folders and files

Latest commit

History

Repository files navigation

Apollo Mapping

Reference-based mapping analysis of fungal genomes

Pipeline information

About this project

Prerequisities

Installation

Parameters & Usage

Command for help

Required parameters

Optional parameters

The base command to run this program.

An example on how to run the pipeline.

Explanation of the output

Issues

Future ideas for this pipeline

License

Contact

Acknowledgements

Contribution guidelines

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Packages