Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greengenes2 2022.10 Support #666

Merged
merged 10 commits into from
Jan 12, 2024
Merged

Conversation

MatthewJM96
Copy link

@MatthewJM96 MatthewJM96 commented Nov 28, 2023

Implemented support for greengenes2 database, version 2022.10 for QIIME taxonomic classification via the --qiime_ref_taxonomy flag. This database has been used in work we have engaged in, and is a lowcost addition.

Addresses #658

PR checklist

  • This comment contains a description of changes (with reason).
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • CHANGELOG.md is updated.

Copy link

github-actions bot commented Nov 28, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 70b2e01

+| ✅ 158 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.11.1
  • Run at 2024-01-11 15:22:33

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR! I do have a few comments.
Did you test running with the greengene2 database that way?
I assume its by far too large to run it in a test config?

conf/ref_databases.config Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
bin/taxref_reformat_qiime_greengenes2022.sh Outdated Show resolved Hide resolved
@d4straub
Copy link
Collaborator

d4straub commented Dec 20, 2023

Do you have any intention to finish that PR? I am asking because a release might be due again and having that feature in would be nice, I think.
edit: I am on it...

@d4straub
Copy link
Collaborator

d4straub commented Dec 21, 2023

I am running the pipeline now with
nextflow run MatthewJM96/ampliseq -r greengenes_2022 -profile test_full,cfc --qiime_ref_taxonomy greengenes2 --skip_dada_taxonomy --outdir results

Update: 19 hours in, QIIME2_EXTRACT failed twice with error exit status (140), the process is automatically retried with more computational resources. So I'll wait.

Update2: Well, the pipeline does fail with standard settings as described above even after 3 retries. Now increasing resources and resume. Resource monitor says that step indeed uses only 1 cpu and basically no RAM, will see how that goes, 3 days time limit for now...

Update4: QIIME2_EXTRACT is running now for 24h, still in process, seems to require almost no RAM and only uses 1 cpu. All in all very inefficient?

@d4straub
Copy link
Collaborator

d4straub commented Jan 11, 2024

@MatthewJM96 I did test the pipeline and QIIME2_EXTRACT took 2d 22h on our system, the remaining tasks required maximal 21 minutes. I think that is too long. I investigated and it seems that step allows now multithreading, so I'll activate that and see how that improves.

@d4straub
Copy link
Collaborator

With the updated settings the pipeline requires with command nextflow run MatthewJM96/ampliseq -r greengenes_2022 -profile test_full,cfc --qiime_ref_taxonomy greengenes2 --skip_dada_taxonomy --outdir results a walltime of 9h 43m, and QIIME2_EXTRACT is running 8h 58m on our hpc. That is still long but acceptable I think.

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d4straub d4straub merged commit 1798c4a into nf-core:dev Jan 12, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants