Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaCoV] iVar Consensus Pipefail fix #629

Merged
merged 11 commits into from
Sep 26, 2024
Merged

Conversation

Michal-Babins
Copy link
Contributor

@Michal-Babins Michal-Babins commented Sep 20, 2024

This PR closes #598

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

The Ivar Consensus workflow was facing a silent error due to a pipefail on input files that were very large.

The changes in this PR do the following:

  1. Add pipefail parameters set -euo pipefail to both the ivar consensus task and the ivar variant calling task.
  2. Expose cpu, memory, disk_size, and docker parameters for the following tasks in the ivar consensus wf:
  • bwa alignment
  • primer trimming
  • stats and coverage for primer trimming
  • variant calling
  • consensus
  • stats and coverage for the consensus wf

3.Improve error handling so that tasks will now fail correctly instead of producing silent errors.

⚡ Impacted Workflows/Tasks

This PR may lead to different results in pre-existing outputs: No

This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

⚙️ Algorithm

set -euo pipefail has been added to the task_ivar_consensus.wdl and to task_ivar_variant_call.wdl

➡️ Inputs

The following optional inputs have been added to the ivar consensus workflow:

Int?        ivar_bwa_cpu
Int?        ivar_bwa_memory
Int?        ivar_bwa_disk_size
String?     ivar_bwa_docker
Int?        ivar_trim_primers_cpu
Int?        ivar_trim_primers_memory
Int?        ivar_trim_primers_disk_size
String?     ivar_trim_primers_docker
Int?        ivar_variant_cpu
Int?        ivar_variant_memory
Int?        ivar_variant_disk_size
String?     ivar_variant_docker
Int?        ivar_consensus_cpu
Int?        ivar_consensus_memory
Int?        ivar_consensus_disk_size
String?     ivar_consensus_docker
Int?        stats_n_coverage_cpu
Int?        stats_n_coverage_memory
Int?        stats_n_coverage_disk_size
String?     stats_n_coverage_docker

⬅️ Outputs

🧪 Testing

This PR was tested on the input data that originally brought up this bug by Marc Perry from UCSC. The following set of rsv fastq inputs: RSV00100_A & RSV00122_A, these inputs can be exported from the ucsc-rsv data table found here.

This data was first tested with the workflow defaults, which lead to the correct failures being flagged on terra.

Then the data was tested with the following parameters adjusted for the ivar consensus task on the TheiaCov Illumina PE wf:
ivar_consensus_memory: 16
ivar_variant_memory: 16

To further confirm the updates did not cause any breaking changes, the workflow was tested with defaults against WNV and MPXV validation data from the theiagen validation illumina pe data set.

Suggested Scenarios for Reviewer to Test

Please test with simple inputs with defualt and one with large files >=1.5GB per fastq with the following adjustments:
ivar_consensus_memory: 16
ivar_variant_memory: 16

I can provide the source to the large datasets.

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

Added set -euo pipefail to variants task.
Switch o and u for consistency
Added optional inputs for compute variables for stats and coverage for primer trimming
Corrected disk size naming variable for stats n coverage prim trim
@Michal-Babins Michal-Babins requested a review from a team as a code owner September 20, 2024 13:21
@Michal-Babins Michal-Babins changed the title Mb ivar pipe fail dev Theiacov Ivar Consensus Pipefail fix Sep 20, 2024
@Michal-Babins Michal-Babins changed the title Theiacov Ivar Consensus Pipefail fix [TheiaCov] Ivar Consensus Pipefail fix Sep 20, 2024
@Michal-Babins Michal-Babins changed the title [TheiaCov] Ivar Consensus Pipefail fix [TheiaCoV] Ivar Consensus Pipefail fix Sep 20, 2024
@Michal-Babins Michal-Babins changed the title [TheiaCoV] Ivar Consensus Pipefail fix [TheiaCoV] iVar Consensus Pipefail fix Sep 20, 2024
Update md5 sum for variant call and consensus for ivar consensus tests.
Update md5 check sum for variant call and consensus
Remove buffer space
Updated docs for ivar consensus optional inputs exposing compute params.
Copy link
Member

@sage-wright sage-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing on regularly-sized samples here. Examined your tests here and feel confident with results.

@sage-wright sage-wright merged commit 2a8562c into main Sep 26, 2024
9 checks passed
@sage-wright sage-wright deleted the mb-ivar-pipe-fail-dev branch September 26, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ivar consensus fails silently from a broken pipe
2 participants