Added QIIME2 custom reference database support. #667

MatthewJM96 · 2023-11-28T22:19:07Z

Added support for using custom reference databases in QIIME2 taxonomic classification via the --qiime_ref_tax_custom flag. This brings QIIME2 taxonomic classification into alignment with Kraken and Dada which allow the same.

Testing should probably be added, I could do with some advice on how to make this possible with some reduced database that matches the requirement on what can be passed to the flag (must be a directory or tarball as in the Kraken implementation).

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
CHANGELOG.md is updated.

Release 2.6.0

Release 2.6.1

Release 2.7.0

Release 2.7.1

…me database.

…stored in either a directory or a tarball.

github-actions · 2023-11-28T22:20:56Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 6b71e4d

+| ✅ 154 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

readme - README did not have a Nextflow minimum version badge.
schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config
files_unchanged - File ignored due to lint config: .gitattributes
actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-ampliseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-ampliseq_logo_light.png
files_exist - File found: docs/images/nf-core-ampliseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowAmpliseq.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-ampliseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.8.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (266 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: release-announcments.yml
actions_schema_validation - Workflow validation passed: awstest.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.10
Run at 2023-12-19 09:22:05

d4straub

Thanks for that PR!
Looks good to me, see comments below.

I think a proper test file could be created with greengenes85 with files

ampliseq/conf/ref_databases.config

Line 305 in 1067c7c

    
           file = [ "https://data.qiime2.org/2023.7/tutorials/training-feature-classifiers/85_otus.fasta", "https://data.qiime2.org/2023.7/tutorials/training-feature-classifiers/85_otu_taxonomy.txt" ]

and uploaded to https://github.com/nf-core/test-datasets/tree/ampliseq/testdata (maybe into a new folder such as "DB") if small enough and activated in e.g. https://github.com/nf-core/ampliseq/blob/dev/conf/test_reftaxcustom.config which would require an update of https://github.com/nf-core/ampliseq/blob/dev/tests/pipeline/reftaxcustom.nf.test and

ampliseq/tests/pipeline/reftaxcustom.nf.test.snap

Line 16 in 1067c7c

    
           "{BARRNAP={barrnap=0.9}, CUSTOM_DUMPSOFTWAREVERSIONS={python=3.11.4, yaml=6.0}, CUTADAPT_BASIC={cutadapt=3.4}, DADA2_DENOISING={R=4.3.1, dada2=1.28.0}, DADA2_FILTNTRIM={R=4.3.1, dada2=1.28.0}, DADA2_QUALITY1={R=4.3.1, ShortRead=1.58.0, dada2=1.28.0}, DADA2_TAXONOMY={R=4.3.1, dada2=1.28.0}, FASTQC={fastqc=0.12.1}, KRAKEN2_KRAKEN2={kraken2=2.1.2, pigz=2.6}, PHYLOSEQ={R=4.3.0, phyloseq=1.44.0}, RENAME_RAW_DATA_FILES={sed=4.7}, TRUNCLEN={pandas=1.1.5, python=3.9.1}, Workflow={nf-core/ampliseq=2.8.0dev}}"

CHANGELOG.md

lib/WorkflowAmpliseq.groovy

nextflow_schema.json

subworkflows/local/qiime2_preptax.nf

workflows/ampliseq.nf

… --qiime_ref_tax_custom and --classifier.

Co-authored-by: Daniel Straub <[email protected]>

…f_tax_custom.

…en-source-background-assets/ampliseq into qiime2_custom_db Conflicts: subworkflows/local/qiime2_preptax.nf workflows/ampliseq.nf

…en-source-background-assets/ampliseq into qiime2_custom_db

… pair.

…ns for qiime2.

… file pair.

…to qiime2_custom_db

MatthewJM96 · 2023-12-07T22:15:31Z

I've begun putting some testing in

Yes, separate tests are fine if its not really possible to fit into existing tests.

I've put a test with tarball into the existing reftaxcustom case, and added a qiimecustom that tests with a file pair. I think that's a reasonable balance to test different input patterns.

Does it therefore make sense to separate out the logic that sets run_qiime2 for differentiating between the downstream analysis in qiime and the taxonomic alignment in qiime?

run_qiime2 is used for taxonomic classification here and for downstream analysis here. I guess it could make sense to separate those (maybe here) into run_qiime2_downstreamanaylsis and run_qiime2_taxonomy and potentially add another one for blast consensus (or keep blast consensus & scikit learn in "taxonomy"). Just do it as intuitive as possible and as easy to maintain as possible (keep checks to a minimum).

The idea would be that some users might want QIIME's taxonomy but not all the rest? If so, why not keep --skip_qiime but allow it to be combined with --qiime_taxonomy instead of the quite long params you suggest?

I've added a --skip_qiime_downstream flag and separated out calculation of a run_qiime2 that applies to downstream and a run_qiime2_taxonomy that applies just to the taxonomy stage, using this is the test cases.

…ME_PREPTAX.

…imecustom.

… files it emits are emitted.

d4straub

Hi Matthew,

looks great. Did you run your newly added tests and test itself and made sure files look fine? If not I'll have a look before I give my ok here.

nextflow_schema.json

Co-authored-by: Daniel Straub <[email protected]>

MatthewJM96 · 2023-12-08T14:27:43Z

Hi Matthew,

looks great. Did you run your newly added tests and test itself and made sure files look fine? If not I'll have a look before I give my ok here.

I ran them manually but also added the new test to the CI pipeline and it looks like it passed. I couldn't figure a good way to snapshot the QIIME taxonomic classification as I guess the algorithm isn't deterministic, so I just checked that the classifier and taxonomy tsv report are produced.

Edit: caught one more assertion that was bad, one of the failing tests (doubleprimers) in this round looks spurious, and the test is succeeding on my own end so hopefully succeeds in this rerun.

…to qiime2_custom_db

d4straub · 2023-12-11T08:21:05Z

Thanks, I have the feeling I should also run a few tests just to make sure, I scheduled some time tomorrow, so I expect to approve the PR then.

d4straub · 2023-12-12T11:52:32Z

I tested, and found:
(1) there is something wrong with phyloseq, I attempted to fix it in #676
(2) when running nextflow run MatthewJM96/ampliseq -r qiime2_custom_db -profile test_qiimecustom,singularity --outdir result_test_qiimecustom_qiime2_custom_db_23-12-12 I found that result_test_qiimecustom_qiime2_custom_db_23-12-12/summary_report/summary_report.html didnt contain the section about the taxonomy, #673 should fix that.

I think the ideal sequence should be:
~~(1) After #676 is merged~~
~~(2) integrate dev into that PR~~
~~(3) revert 4464c38~~
(4) all should be fine (check if summary_report.html is fine with -profile test_qiimecustom) and we merge.

Sorry that you run into that phyloseq bug, I hope it works now.

edit: some points are solved above
edit2: summary_report.html with -profile test_qiimecustom does not contain taxonomy section yet. Not sure what preventing it...

d4straub · 2023-12-19T09:33:44Z

I found the problem and fixed the report. When all tests passed I'll merge it if you do not have any objections.

d4straub

LGTM!

MatthewJM96 · 2024-01-15T10:33:48Z

I found the problem and fixed the report. When all tests passed I'll merge it if you do not have any objections.

Sorry for no reply, was on leave! Thanks for looking at those last things and the advice along the way.

d4straub and others added 16 commits June 26, 2023 19:28

Merge pull request nf-core#601 from nf-core/dev

9ac22ba

Release 2.6.0

Merge pull request nf-core#604 from nf-core/dev

3b252d2

Release 2.6.1

Merge pull request nf-core#648 from nf-core/dev

4e48b71

Release 2.7.0

Merge pull request nf-core#660 from nf-core/dev

113e90b

Release 2.7.1

Add params.qiime_ref_tax_custom in preparation of allowing custom qii…

d86c569

…me database.

Implementation of logic to handle a custom qiime2 reference database …

439097c

…stored in either a directory or a tarball.

Some params checking logic.

14c89b9

Loose . lying around.

d214ec0

Only perform collect if going to FORMAT_TAXONOMY_QIIME.

9346d7a

Set into new channel when branching on ch_qiime_ref_taxonomy.£

ef053b1

Try to unpack the database dir into component files using a module.

a48a09f

Remove map wrapping the combine.

a9971b6

Remove unpack in favour of map and filter.

aac51bd

Glob results in list in all circumstances, check length instead.

1b2825e

Merge remote-tracking branch 'upstream/dev' into qiime2_custom_db

a903fbe

Update CHANGELOG.md.

a4219a0

d4straub reviewed Nov 29, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

lib/WorkflowAmpliseq.groovy Outdated Show resolved Hide resolved

nextflow_schema.json Show resolved Hide resolved

subworkflows/local/qiime2_preptax.nf Show resolved Hide resolved

workflows/ampliseq.nf Outdated Show resolved Hide resolved

Matthew Marshall and others added 12 commits November 29, 2023 10:12

Update error message when passing both one of --qiime_ref_taxonomy or…

0ccf6e6

… --qiime_ref_tax_custom and --classifier.

Update CHANGELOG.md with pull request number.

590f415

Co-authored-by: Daniel Straub <[email protected]>

Add support for specifying two (possibly gzipped) files as --qiime_re…

f5d80f5

…f_tax_custom.

Only support providing two files separated by a comma.

7016682

Fix split returns a String[] and we actually need an ArrayList.

79cbfe8

Merge branch 'qiime2_custom_db' of https://gitlab.stfc.ac.uk/omics/op…

03881b2

…en-source-background-assets/ampliseq into qiime2_custom_db Conflicts: subworkflows/local/qiime2_preptax.nf workflows/ampliseq.nf

Move ch_ref_database set into correct scope.

6d767bc

Merge branch 'qiime2_custom_db' of https://gitlab.stfc.ac.uk/omics/op…

af1674b

…en-source-background-assets/ampliseq into qiime2_custom_db

Try using map to work through list of files.

f76b49b

Merge branch 'qiime2_custom_db' of https://gitlab.stfc.ac.uk/omics/op…

31a310d

…en-source-background-assets/ampliseq into qiime2_custom_db

Can't call processes from inside maps.

0890a0e

Merge branch 'qiime2_custom_db' of https://gitlab.stfc.ac.uk/omics/op…

b6d0c2a

…en-source-background-assets/ampliseq into qiime2_custom_db

Matthew Marshall added 10 commits December 7, 2023 21:21

Fix path for testing tarball passed to --qiime_ref_tax_custom.

549c166

Add snapshot of files coming from qiime2 taxonomy.

8516534

Work towards a qiime_ref_tax_custom specific test.

745cab7

Skip dada tax.

a1dfb5b

Sequence then taxonomy file for file pair to --qiime_ref_tax_custom.

51dc97e

Clarify in help text of --qiime_ref_tax_custom the ordering of a file…

a33f17f

… pair.

Update snapshots to include qiime2 in both correctly and add assertio…

8f57fae

…ns for qiime2.

Make ordering of sequence and taxonomy files deterministic in case of…

74e05b2

… file pair.

Fix filtering in file pair case.

b65df44

Merge branch 'qiime2_custom_db' of github.com:MatthewJM96/ampliseq in…

93e174e

…to qiime2_custom_db

Matthew Marshall added 4 commits December 7, 2023 22:31

Fix version mixing in --qiime_ref_taxonomy case.

45bee71

Update software version expectations for tests that no longer run QII…

3c9eaf1

…ME_PREPTAX.

Remove assertions on dada2 tax and phyloseq files existing in test_qi…

07f4407

…imecustom.

Looks like qiime2 tax alignment is non-deterministic, just verify the…

1c129e5

… files it emits are emitted.

d4straub reviewed Dec 8, 2023

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

Make --skip_qiime_downstream help text clearer.

2ace595

Co-authored-by: Daniel Straub <[email protected]>

Matthew Marshall added 2 commits December 8, 2023 14:48

Remove assertion on qiime phyloseq file no longer produced.

4464c38

Merge branch 'qiime2_custom_db' of github.com:MatthewJM96/ampliseq in…

1fb5089

…to qiime2_custom_db

d4straub and others added 2 commits December 12, 2023 13:50

Merge branch 'dev' into qiime2_custom_db

0287ba9

Fix reporting

6b71e4d

d4straub approved these changes Dec 19, 2023

View reviewed changes

d4straub merged commit a86f9c7 into nf-core:dev Dec 19, 2023
18 checks passed

d4straub mentioned this pull request Dec 19, 2023

Add custom qiime reference database support to Ampliseq. #665

Closed

d4straub mentioned this pull request Jan 12, 2024

Release 2.8.0 #690

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added QIIME2 custom reference database support. #667

Added QIIME2 custom reference database support. #667

MatthewJM96 commented Nov 28, 2023 •

edited

Loading

github-actions bot commented Nov 28, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub left a comment

MatthewJM96 commented Dec 7, 2023

d4straub left a comment

MatthewJM96 commented Dec 8, 2023 •

edited

Loading

d4straub commented Dec 11, 2023

d4straub commented Dec 12, 2023 •

edited

Loading

d4straub commented Dec 19, 2023

d4straub left a comment

MatthewJM96 commented Jan 15, 2024

Added QIIME2 custom reference database support. #667

Added QIIME2 custom reference database support. #667

Conversation

MatthewJM96 commented Nov 28, 2023 • edited Loading

PR checklist

github-actions bot commented Nov 28, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub left a comment

Choose a reason for hiding this comment

MatthewJM96 commented Dec 7, 2023

d4straub left a comment

Choose a reason for hiding this comment

MatthewJM96 commented Dec 8, 2023 • edited Loading

d4straub commented Dec 11, 2023

d4straub commented Dec 12, 2023 • edited Loading

d4straub commented Dec 19, 2023

d4straub left a comment

Choose a reason for hiding this comment

MatthewJM96 commented Jan 15, 2024

MatthewJM96 commented Nov 28, 2023 •

edited

Loading

github-actions bot commented Nov 28, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

MatthewJM96 commented Dec 8, 2023 •

edited

Loading

d4straub commented Dec 12, 2023 •

edited

Loading