CT-PHL Bacterial Identification and Resistance Detection
C-BIRD is a pipeline that makes de novo assembly from Illumina paired-end reads and uses k-mer based approaches where they are available. It works on Terra.Bio platform as well as any Linux machine which has Cromwell or miniwdl workflow engines. As its name indicates, C-BIRD is designed for only rapid bacterial identification and antimicrobial resistance detection.
The main goal of this project is to create a small, fast, and accurate workflow which can work in a cloud environment with high reproducibility and parallelization. C-BIRD uses minimalized docker containers for each pipeline step to achieve this goal. C-BIRD will be validated for a selected set of bacteria.
C-BIRD has been created with a minimalistic approach. Producing clinically meaningful results and generating individual reports for each sample is within this project's scope. Any typing (except MLST) or further analysis is out of this project's scope. However, extra tools and programs may be added for validation purposes.
Terra users can add C-BIRD to their existing workspace in Terra via Dockstore.
C-BIRD deliberately avoids auto-updates of the necessary databases for strict control and validation purposes. The following databases and files should be installed or uploaded manually. Please check wiki for detailed instructions.
File | Comments |
---|---|
Kraken2/Bracken database | Standard 8 (required) |
Mash sketch | custom mash sketch (required) |
Adapters fasta | Your sequencing adapters' list as a fasta file (optional) |
Target genes fasta | Extra set of genes/proteins as a fasta file containing protein sequences (optional) |
C-BIRD uses Kraken2 and Braken for taxonomic profiling of reads, which serves as a contamination check. It can be expected to have a high abundance estimation from pure isolates in general. However, there are some exceptions due to the restrictions of databases, k-mer based approaches, and highly similar organisms. Results should be interpreted considering these factors.
Mash is used to determine the identity of bacteria for selected genera with a custom mash sketch (Acinetobacter, Citrobacter, Enterobacter, Escherichia, Klebsiella, Kluyvera, Morganella, Proteus, Providencia, Pseudomonas, Raoultella, Salmonella, Serratia).
Detection of AMR genes depends on NCBI's AMRFinderPlus program and its database.
The following programs and tools are used in the C-BIRD pipeline.
Tools | Version | Comments |
---|---|---|
FastP | 0.23.4 | QC, adapter removal, quality filtering and trimming |
BBTools | 39.06 | phiX removal & optional normalization |
Kraken2 | 2.1.3 | Taxonomic profiling & contamination check |
Bracken | 2.9 | Abundance estimation |
SPAdes | 4.0.0 | De novo assembly |
Mash | 2.3 | Bacterial identification |
QUAST | 5.2.0 | Genome assembly evaluation |
BUSCO | 5.7.1 | Genomic data quality assessment |
mlst | 2.23.0 | MLST typing |
AMRFinderPlus | 3.12.8 | AMR gene identification |
BLAST+ | 2.15.0 | Target gene search |
PlasmidFinder | 2.1.6 | Plasmid detection |
Cbird-Util | 1.2 | Individual summary report generation |
In addition to outputs generated in each step by the specific programs, C-BIRD creates additional summary reports in HTML for each sample.
Basic report
Advanced report
QC report
SPAdes may fail if an authorization domain is defined for the workspace on Terra.
C-BIRD includes modified and unmodified codes of Theiagen's Public Health Bacterial Genomics workflows. If you need a more sophisticated pipeline, please check Theiagen's TheiaProk workflow.