Allow input of recalibration tables #45

yashpatel6 · 2023-12-19T23:02:17Z

Description

Adding the option to allow users to input BQSR recalibration tables if they already have them. These are especially useful when the pipeline fails after the BaseRecalibrator step and needs to be re-run. BaseRecalibrator is the longest running process in the pipeline and a bottleneck so existing tables can save significant compute time.

Also, updating to use the modularized set_env function as the current iteration in the pipeline causes issues when submitted through nftest

Testing Results

NFTest
- log: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/log-nftest-20231219T214754Z.log
- cases: default set
Explicit recalibration table tests (Use of recalibration table can be confirmed from the trace file where the BaseRecalibrator process runs very briefly if the table already exists):
- single sample: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/single_with_recal
- paired samples with one table: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/paired_one_recal
- paired samples with two tables: /hot/software/pipeline/pipeline-recalibrate-BAM/Nextflow/development/unreleased/yashpatel-use-recal-table/paired_two_recal

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline using NFTest, or I have justified why I did not need to run NFTest above.

…tion

…sample

yashpatel6 · 2023-12-19T23:02:29Z

Note: the paired sample tests are still in progress

yashpatel6 · 2023-12-20T01:22:53Z

Update: the paired test runs have completed successfully, with the runtime reflecting the time saved by providing the recalibration tables

tyamaguchi-ucla

Looks good. Have you tested samples with multiple RGs as BQSR runs per RG? If I recall, the BQSR table is generated per sample and can contain multiple RGs within it. Is that correct?

README.md

yashpatel6 · 2023-12-20T16:37:42Z

Looks good. Have you tested samples with multiple RGs as BQSR runs per RG? If I recall, the BQSR table is generated per sample and can contain multiple RGs within it. Is that correct?

Right, the table is generated per sample and would contain all of the read groups within it; I haven't explicitly tested with multiple read groups since the intent here was to use the table that would've been generated and used anyways in the pipeline if there wasn't any failure. Given that the pipeline works with multiple read groups as is, I wouldn't expect any difference as long as the right table is given

tyamaguchi-ucla

Right, the table is generated per sample and would contain all of the read groups within it; I haven't explicitly tested with multiple read groups since the intent here was to use the table that would've been generated and used anyways in the pipeline if there wasn't any failure. Given that the pipeline works with multiple read groups as is, I wouldn't expect any difference as long as the right table is given

Got it. The change looks good to me! Anything else to add? @Faizal-Eeman

yashpatel6 added 8 commits December 19, 2023 14:32

Add example YAMLs for using recalibration tables

3f1e61b

Add recalibration tables input to schema

8f5da04

Add type for list of recalibration tables

8a551e6

Add handler for recalibration tables and use modularized set_env func…

bcd3ceb

…tion

Add recal tables to log info

b184cfb

Add check to skip recalibration table generation if table exists for …

95b148d

…sample

Update CHANGELOG

4a1ee89

Add recalibration table input option to README

b83b022

yashpatel6 requested review from nwiltsie and rhughwhite December 19, 2023 23:02

yashpatel6 requested review from tyamaguchi-ucla and Faizal-Eeman as code owners December 19, 2023 23:02

tyamaguchi-ucla reviewed Dec 20, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

Use singular form for recalibration_table

a772dae

tyamaguchi-ucla approved these changes Dec 20, 2023

View reviewed changes

yashpatel6 merged commit 7f10450 into main Dec 21, 2023
1 check passed

yashpatel6 deleted the yashpatel-use-recal-table branch December 21, 2023 18:43

yashpatel6 mentioned this pull request Dec 21, 2023

Remove local set_env and check_limits methods #46

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow input of recalibration tables #45

Allow input of recalibration tables #45

yashpatel6 commented Dec 19, 2023

yashpatel6 commented Dec 19, 2023

yashpatel6 commented Dec 20, 2023

tyamaguchi-ucla left a comment

yashpatel6 commented Dec 20, 2023

tyamaguchi-ucla left a comment

Allow input of recalibration tables #45

Allow input of recalibration tables #45

Conversation

yashpatel6 commented Dec 19, 2023

Description

Testing Results

Checklist

yashpatel6 commented Dec 19, 2023

yashpatel6 commented Dec 20, 2023

tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

yashpatel6 commented Dec 20, 2023

tyamaguchi-ucla left a comment

Choose a reason for hiding this comment