You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!, My colleagues and I have been actively working on enhancing the nf-core/bacass workflow to address lab-specific challenges in bacterial genome assembly. We are happy to add these improvements into the main nf-core/bacass repository in case you are interested.
Added a local KmerFinder module for read quality control (QC) and purity assessment.
Developed a local module to compile KmerFinder results from all samples into a comprehensive CSV summary file.
Implemented a method to group input samples (*.fastq, *.fasta, and other files...) based on the reference genome estimated with KmerFinder.
Created a local module to identify the reference genome estimated with KmerFinder in the NCBI database and download this genome. This reference genome is then utilized to retrieve relevant metrics from QUAST, such as the percentage of genome fraction. This functionality is particularly valuable when input samples belong to different species, requiring more than one reference for a comprehensive by_reference_genome report.
2. Quast Assembly QC by Grouping Samples:
Modified Quast execution when KmerFinder is invoked. Now, Quast runs twice:
Initial 'general' Quast without reference genome files (*.fna, *.gff).
Subsequent 'by reference genome' Quast, providing a Quast reports that agregates samples and their reference genome (estimated with kmerfinder).
3. Custom MultiQC Reports:
Incorporated a custom MultiQC module into the workflow.
Added multiqc_config.yml files for short, long and hybrid assembly modes (they work when kmerfinder is invoked, otherwise standard multiqc report is generated).
Upon invoking KmerFinder, a custom MultiQC HTML report is generated using the MULTIQC() module. This report consolidates metrics from KmerFinder, Quast, and other relevant sources, presenting them together in an overview table located in the first section of the report. See image:
Foot note
If you think these improvements could be implemented in nf-core/bacass, let me know so I can work on the test data and test profile.
The text was updated successfully, but these errors were encountered:
Daniel-VM
changed the title
Request for forked repo enhancements: KmerFinder, Estimation of Reference Genome, Custom Quast, and Custom MultiQC Reports
Improvements in forked repository: KmerFinder, Estimation of Reference Genome, Custom Quast, and Custom MultiQC Reports
Jan 4, 2024
Looks actually very interesting to me. But I do not have experience with those tools and did not run the branch. I trust your judgment for now that this indeed is good practice.
I browsed over it and there were some minor points, e.g. a param with skip_* (such as skip_kmerfinder) shouldnt be by default true I think. I am not very much into MultiQC, so I am not sure why a custom module is needed, I thought MultiQC can be configured to do almost anything, but you have your reasons I assume.
So yes, I think that would be nice to add into the nf-core repo.
Great, I still have to fix some minor points in the code and run additional tests. Once it's ready, I will let you know.
Regarding the MultiQC custom table at the beginning of the report, it will play a similar role to the default "General Stats" table. However, in the custom table you can add metrics from both supported and non-suported MultiQC tools (in fact, any data you wish to). This custom table contains most of the key metrics that my group needs to check in bacterial genome assembly analysis. Since not all the tools we check have a supported module in MultiQC, we gather all the metrics from these tools (that we use for analysis) and consolidate them in a single table. The pipeline also exports this table in CSV format.
Description of feature
Overview
Hello!, My colleagues and I have been actively working on enhancing the nf-core/bacass workflow to address lab-specific challenges in bacterial genome assembly. We are happy to add these improvements into the main nf-core/bacass repository in case you are interested.
Currently, these enhancements have been implemented in my local fork of nf-core/bacass on the buisciii-develop branch.
Breaking down implementations:
1. Kmerfinder Subworkflow:
2. Quast Assembly QC by Grouping Samples:
3. Custom MultiQC Reports:
Foot note
If you think these improvements could be implemented in nf-core/bacass, let me know so I can work on the test data and test profile.
The text was updated successfully, but these errors were encountered: