You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started a sarek run (v3.4.2) providing a custom reference in the form of a bgzipped FASTA. The run from FASTQs started normally and did not run into any errors until the MarkDuplicates step. I had missed copying an index file (*.fasta.gz.gzi) to the same folder as the FASTA which caused the step to fail just before finishing 🤦, see the error message below.
[Thu Nov 28 20:40:47 GMT 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 134.77 minutes.
Runtime.totalMemory()=285212672
[E::bgzf_index_load] Error opening GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz.gzi : No such file or directory
[E::bgzf_open_ref] Unable to load .gzi index 'GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz.gzi'
[E::refs_load_fai] Failed to open reference file 'GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz'
[E::hts_open_format] Failed to open file "OPM2.md.cram" : Invalid argument
samtools view: failed to open "OPM2.md.cram" for writing: Invalid argument
It would be great if there could be a parameter check on start so that when a bgzipped fasta *.fasta.gz is provided a corresponding index *.fasta.gz.gzi should be present.
Some other considerations if this is hard to implement:
This .gzi file could be generated automatically if it is missing as part of the build process.
The MarkDuplicates step should have a check that if a bgzipped FASTA is provided an index file is also present before starting the step, making the run fail earlier.
Edit: forgot to add info about sarek version
Edit2: gzip --> bgzip
The text was updated successfully, but these errors were encountered:
An update on this, the .gzi is now in the folder with the bgzipped FASTA reference but I still run into this error. Seems that it specifically is samtools that requires this .gzi file for converting the output to CRAM, see related issue: samtools/samtools#804.
Looking at the relevant code, see below, it seems that the .gzi index is not included in the work folder causing the issue.
Description of feature
I started a sarek run (
v3.4.2
) providing a custom reference in the form of a bgzipped FASTA. The run from FASTQs started normally and did not run into any errors until the MarkDuplicates step. I had missed copying an index file (*.fasta.gz.gzi) to the same folder as the FASTA which caused the step to fail just before finishing 🤦, see the error message below.This is the relevant part of my parameter file
It would be great if there could be a parameter check on start so that when a bgzipped fasta
*.fasta.gz
is provided a corresponding index*.fasta.gz.gzi
should be present.Some other considerations if this is hard to implement:
.gzi
file could be generated automatically if it is missing as part of thebuild
process.Edit: forgot to add info about sarek version
Edit2: gzip --> bgzip
The text was updated successfully, but these errors were encountered: