You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently working with a contig named "160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular)". I was able to align reads to this contig and perform r-mutation calling in it; however, when I get to the part of strainFlye call r-mutation that attempts to convert the resulting VCF file to a BCF file, I get the following error:
...
call r-mutation @ 1,989.53s: Converting the VCF file to a compressed BCF file...
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular),length=388097>"
...
I'm pretty sure the commas and parentheses are the kicker here. The VCF v4.3 spec confirms this:
Contig names follow the same rules as the SAM format’s reference sequence names: they may contain any printable
ASCII characters in the range [!-~] apart from ‘\ , "‘’ () [] {} <>’ and may not start with ‘*’ or ‘=’. Thus they
match the following regular expression:
In particular, excluding commas facilitates parsing ##contig lines, and excluding the characters ‘<>[]’ and initial ‘*’
avoids clashes with symbolic alleles. The contig names must not use a reserved symbolic allele name.
The obvious solution here is thus updating strainFlye's FASTA-reading function (?) to reject badly-named contigs like this. We could maybe make this check only required for strainFlye commands that will use bcftools or something, but it's less of a headache (and probably good practice) to just apply this check to every time we read a FASTA file.
This is a similar idea to #57 -- however, addressing that issue in its entirety would require some extra work. (For example: The regex given by the VCF spec apparently allows contigs with / characters in their names, but I have to imagine that if your contigs actually have / in their names then a swarm of locusts will devour your terminal or something when you start trying to use these names in files.)
The text was updated successfully, but these errors were encountered:
I'm currently working with a contig named "
160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular)
". I was able to align reads to this contig and perform r-mutation calling in it; however, when I get to the part ofstrainFlye call r-mutation
that attempts to convert the resulting VCF file to a BCF file, I get the following error:I'm pretty sure the commas and parentheses are the kicker here. The VCF v4.3 spec confirms this:
The obvious solution here is thus updating strainFlye's FASTA-reading function (?) to reject badly-named contigs like this. We could maybe make this check only required for strainFlye commands that will use bcftools or something, but it's less of a headache (and probably good practice) to just apply this check to every time we read a FASTA file.
This is a similar idea to #57 -- however, addressing that issue in its entirety would require some extra work. (For example: The regex given by the VCF spec apparently allows contigs with
/
characters in their names, but I have to imagine that if your contigs actually have/
in their names then a swarm of locusts will devour your terminal or something when you start trying to use these names in files.)The text was updated successfully, but these errors were encountered: