-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ivar consensus fails silently from a broken pipe #598
Comments
Great find @mdperry!
Yikes -- silent failures are scary. @capsakcj is there a way to make ivar consensus create a different output file, and then rename that to the final output filename after successful completion? |
Better yet, this! |
@mdperry thank you for sharing your data with us and for the suggestion of setting the pipefail parameters. Upon reproducing your error we were able to also find the same error in the ivar variant calling task. We have added pipefail parameters to both as well as exposed optional compute parameters that will be helpful with larger file sized. By setting the memory to 16GB for both the ivar consensus task and the ivar variant calling task, the tasks ran successfully. The tasks will now also fail correctly instead of producing a silent error. In the next patch release of PHB, the optional compute parameters will be exposed for the following ivar consensus tasks:
Here are the compute parameters that will be exposed for input:
|
🐛
📝 Describe the Issue
I had analyzed some RSV fastq files to generate consensus.fasta output (similar to SARS-CoV-2) using local resources and a roll-your-own pipeline of commands. I wanted to test the TheiaCoV-PHB-PE workflow running on Terra as a comparison (in particular I was interested in all of the additional QC and filtering steps). I had 3 RSV-A samples and 3 RSV-B samples, and I aligned all 6 against RSV-A and separately against RSV-B.
I discovered that consensus.fasta files for 2/3 of the RSV-A samples were incomplete, or "too short", they each only contained ~ 53.5 % of the RSV-A genome (the back half was missing, gone, not replaced by N's). I had selected these samples because my previous characterization showed that they had extremely high genome coverage with a minimal number of N's.
Further examination of the Terra stderr files for both samples revealed identical errors, a broken pipe had killed the execution:
I realized that since partial consensus.fasta files had been written to disk, the task (and workflow) proceeded to completion.
I tried to see if I could reproduce this error with the Terra bam file running on a localhost (just running the same command, but on the command line). I ran two tests with the same input; one used the code from my pipeline, and one used the command from TheiaCoV-PHB-PE (they were very slightly different, also I did not try to control for versions of
samtools
, orivar
).I was monitoring the progress (it took a good 20 or more minutes to run) and I noticed that first the output files were basically empty, then suddenly they contained about 8191 bytes, at which point they stayed the same size until the end when they were both just over 15000 bytes. I mention this because the two truncated consensus.fasta files were exactly this size, 8191 bytes, and contained about the same number of contiguous bases. I inferred that
ivar consensus
processes data in blocks, or chunks, and writes the cache out to disk in a stepwise fashion.Ordinarily, I add this line to blocks of bash code (whether in WDL workflows or executable files):
set -euo pipefail;
I believe that in the context of the Terra.bio workspace this addition would have forced the ivar consensus task to throw an exception, and alerted me to the fact that I needed to investigate and possibly re-run the affected samples.
🔁 How to Reproduce
I just set the required parameters and launched the workflow after importing it from Dockstore. I suspect (based on almost nothing) that a broken pipe is probably a random, or stochastic, type of error. I did not try re-running those two samples on Terra (yet). It would be potentially interesting if the same two consistently died with the same error, but I have not had time to dig any deeper.
Feel free to answer the following questions to help us understand:
YES, on GCP
miniwdl
orcromwell
?NO
💻 Version Information
PHB v2.1.0
The text was updated successfully, but these errors were encountered: