-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow input of recalibration tables #45
Conversation
Note: the paired sample tests are still in progress |
Update: the paired test runs have completed successfully, with the runtime reflecting the time saved by providing the recalibration tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Have you tested samples with multiple RGs as BQSR runs per RG? If I recall, the BQSR table is generated per sample and can contain multiple RGs within it. Is that correct?
Right, the table is generated per sample and would contain all of the read groups within it; I haven't explicitly tested with multiple read groups since the intent here was to use the table that would've been generated and used anyways in the pipeline if there wasn't any failure. Given that the pipeline works with multiple read groups as is, I wouldn't expect any difference as long as the right table is given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the table is generated per sample and would contain all of the read groups within it; I haven't explicitly tested with multiple read groups since the intent here was to use the table that would've been generated and used anyways in the pipeline if there wasn't any failure. Given that the pipeline works with multiple read groups as is, I wouldn't expect any difference as long as the right table is given
Got it. The change looks good to me! Anything else to add? @Faizal-Eeman
Description
Adding the option to allow users to input BQSR recalibration tables if they already have them. These are especially useful when the pipeline fails after the
BaseRecalibrator
step and needs to be re-run.BaseRecalibrator
is the longest running process in the pipeline and a bottleneck so existing tables can save significant compute time.Also, updating to use the modularized
set_env
function as the current iteration in the pipeline causes issues when submitted throughnftest
Testing Results
NFTest
/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/log-nftest-20231219T214754Z.log
Explicit recalibration table tests (Use of recalibration table can be confirmed from the trace file where the
BaseRecalibrator
process runs very briefly if the table already exists):/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/single_with_recal
/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/paired_one_recal
/hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/paired_two_recal
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.