Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added QIIME2 custom reference database support. #667

Merged
merged 65 commits into from
Dec 19, 2023

Conversation

MatthewJM96
Copy link

@MatthewJM96 MatthewJM96 commented Nov 28, 2023

Added support for using custom reference databases in QIIME2 taxonomic classification via the --qiime_ref_tax_custom flag. This brings QIIME2 taxonomic classification into alignment with Kraken and Dada which allow the same.

Testing should probably be added, I could do with some advice on how to make this possible with some reduced database that matches the requirement on what can be passed to the flag (must be a directory or tarball as in the Kraken implementation).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • CHANGELOG.md is updated.

Copy link

github-actions bot commented Nov 28, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6b71e4d

+| ✅ 154 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-12-19 09:22:05

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that PR!
Looks good to me, see comments below.

I think a proper test file could be created with greengenes85 with files

file = [ "https://data.qiime2.org/2023.7/tutorials/training-feature-classifiers/85_otus.fasta", "https://data.qiime2.org/2023.7/tutorials/training-feature-classifiers/85_otu_taxonomy.txt" ]
and uploaded to https://github.com/nf-core/test-datasets/tree/ampliseq/testdata (maybe into a new folder such as "DB") if small enough and activated in e.g. https://github.com/nf-core/ampliseq/blob/dev/conf/test_reftaxcustom.config which would require an update of https://github.com/nf-core/ampliseq/blob/dev/tests/pipeline/reftaxcustom.nf.test and
"{BARRNAP={barrnap=0.9}, CUSTOM_DUMPSOFTWAREVERSIONS={python=3.11.4, yaml=6.0}, CUTADAPT_BASIC={cutadapt=3.4}, DADA2_DENOISING={R=4.3.1, dada2=1.28.0}, DADA2_FILTNTRIM={R=4.3.1, dada2=1.28.0}, DADA2_QUALITY1={R=4.3.1, ShortRead=1.58.0, dada2=1.28.0}, DADA2_TAXONOMY={R=4.3.1, dada2=1.28.0}, FASTQC={fastqc=0.12.1}, KRAKEN2_KRAKEN2={kraken2=2.1.2, pigz=2.6}, PHYLOSEQ={R=4.3.0, phyloseq=1.44.0}, RENAME_RAW_DATA_FILES={sed=4.7}, TRUNCLEN={pandas=1.1.5, python=3.9.1}, Workflow={nf-core/ampliseq=2.8.0dev}}"

CHANGELOG.md Outdated Show resolved Hide resolved
lib/WorkflowAmpliseq.groovy Outdated Show resolved Hide resolved
nextflow_schema.json Show resolved Hide resolved
subworkflows/local/qiime2_preptax.nf Show resolved Hide resolved
workflows/ampliseq.nf Outdated Show resolved Hide resolved
@MatthewJM96
Copy link
Author

I've begun putting some testing in

Yes, separate tests are fine if its not really possible to fit into existing tests.

I've put a test with tarball into the existing reftaxcustom case, and added a qiimecustom that tests with a file pair. I think that's a reasonable balance to test different input patterns.

Does it therefore make sense to separate out the logic that sets run_qiime2 for differentiating between the downstream analysis in qiime and the taxonomic alignment in qiime?

run_qiime2 is used for taxonomic classification here and for downstream analysis here. I guess it could make sense to separate those (maybe here) into run_qiime2_downstreamanaylsis and run_qiime2_taxonomy and potentially add another one for blast consensus (or keep blast consensus & scikit learn in "taxonomy"). Just do it as intuitive as possible and as easy to maintain as possible (keep checks to a minimum).

The idea would be that some users might want QIIME's taxonomy but not all the rest? If so, why not keep --skip_qiime but allow it to be combined with --qiime_taxonomy instead of the quite long params you suggest?

I've added a --skip_qiime_downstream flag and separated out calculation of a run_qiime2 that applies to downstream and a run_qiime2_taxonomy that applies just to the taxonomy stage, using this is the test cases.

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matthew,

looks great. Did you run your newly added tests and test itself and made sure files look fine? If not I'll have a look before I give my ok here.

nextflow_schema.json Outdated Show resolved Hide resolved
@MatthewJM96
Copy link
Author

MatthewJM96 commented Dec 8, 2023

Hi Matthew,

looks great. Did you run your newly added tests and test itself and made sure files look fine? If not I'll have a look before I give my ok here.

I ran them manually but also added the new test to the CI pipeline and it looks like it passed. I couldn't figure a good way to snapshot the QIIME taxonomic classification as I guess the algorithm isn't deterministic, so I just checked that the classifier and taxonomy tsv report are produced.

Edit: caught one more assertion that was bad, one of the failing tests (doubleprimers) in this round looks spurious, and the test is succeeding on my own end so hopefully succeeds in this rerun.

@d4straub
Copy link
Collaborator

Thanks, I have the feeling I should also run a few tests just to make sure, I scheduled some time tomorrow, so I expect to approve the PR then.

@d4straub
Copy link
Collaborator

d4straub commented Dec 12, 2023

I tested, and found:
(1) there is something wrong with phyloseq, I attempted to fix it in #676
(2) when running nextflow run MatthewJM96/ampliseq -r qiime2_custom_db -profile test_qiimecustom,singularity --outdir result_test_qiimecustom_qiime2_custom_db_23-12-12 I found that result_test_qiimecustom_qiime2_custom_db_23-12-12/summary_report/summary_report.html didnt contain the section about the taxonomy, #673 should fix that.

I think the ideal sequence should be:
(1) After #676 is merged
(2) integrate dev into that PR
(3) revert 4464c38
(4) all should be fine (check if summary_report.html is fine with -profile test_qiimecustom) and we merge.

Sorry that you run into that phyloseq bug, I hope it works now.

edit: some points are solved above
edit2: summary_report.html with -profile test_qiimecustom does not contain taxonomy section yet. Not sure what preventing it...

@d4straub
Copy link
Collaborator

I found the problem and fixed the report. When all tests passed I'll merge it if you do not have any objections.

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@d4straub d4straub merged commit a86f9c7 into nf-core:dev Dec 19, 2023
18 checks passed
@d4straub d4straub mentioned this pull request Jan 12, 2024
10 tasks
@MatthewJM96
Copy link
Author

I found the problem and fixed the report. When all tests passed I'll merge it if you do not have any objections.

Sorry for no reply, was on leave! Thanks for looking at those last things and the advice along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants