-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At least some of our VCFs are invalid according to the VCF spec. #265
Comments
It seems the FileDate wasn't a user-defined field, it should be part of "Source". Because it is missing double quotes, so accidentally becomes an extra field. No doubt we should put double quotes to all string values which allows space etc. the comma inside a quoted string may be ok. But seme colon also good between file name an date inside Source. |
What |
So it looks like there was a change on the 7th of June 2018 that removed the |
Running If there is a way to be more thorough in our vcf validation (perhaps by using your |
Sorry, not a golang command - I'm using the github.com/brentp/vcfgo package (like a class but not OO) in my own code and it was borking. It's been working fine on the toy VCFs I use for testing but it didn't like the full Colo-829 VCF I pulled down for testing at proper scale. |
Ollie - any idea on why we removed the |
I can't remember JP. It could just have been a mistake (I had merged a long running qsnp branch that commented out the |
The bug
golang VCF libraries will not read out VCFs.
According to a strict reading of the last few version of the VCF spec (https://samtools.github.io/hts-specs/VCFv4.2.pdf, https://samtools.github.io/hts-specs/VCFv4.3.pdf) the INFO field has a prescribed format which we are departing from in a number of ways:
Source
element is not enclosed in double quotes.Version
element is not enclosed in double quotes.FileDate
To Reproduce
I am using VCF
fbe3b136-dc8b-4c8d-bde3-a6390c91b521.vcf
from the COLO-829 analysisanalysis_fbe3b136-dc8b-4c8d-bde3-a6390c91b521
for testing my code. The following two lines demonstrate the problems shown above - the first line has all 3 problems and the second line has problems 1 and 2:If I cut out the first 1000 lines form this VCF and rectify all 3 problems, then the VCF will parse.
Expected behavior
The golang library appears to be applying the VCF spec strictly and not allowing for the addition of user-defined fields in INFO lines however the spec does not explicitly allow for user-defined fields in INFO lines so I think we should stop using them.
I'm guessing qannotate may be adding these but wherever it is, I'd like to have the quoting fixed. And for the FileDate, we could make Source a composite field that also contained the file date, for example:
If we go down the composite field path, I would suggest that we use semi-colon as the separator because comma is the default separator between subfields within an INFO field.
The text was updated successfully, but these errors were encountered: