Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF mods #93

Open
iqbal-lab opened this issue Aug 11, 2022 · 0 comments
Open

VCF mods #93

iqbal-lab opened this issue Aug 11, 2022 · 0 comments

Comments

@iqbal-lab
Copy link
Contributor

iqbal-lab commented Aug 11, 2022

Right, v quick github issue before i fly to freedom

Current VCF looks like this (sorry looks horrid in github markdown)

##fileformat=VCFv4.2
##contig=<ID=MN908947,length=29903>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
MN908947 210 1 G T . PASS primer_calls_ignored=0/0;total_primer_bases=0/0;unfiltered_depth=34;total=33/1;amplicon_overlap=1;amplicon_totals=33/1;amplicon_names=SARSCoV_1200_1 GT 1/1
MN908947 28247 28 AGATTTC A . low_frs primer_calls_ignored=0/0;total_primer_bases=0/0;unfiltered_depth=633;total=601/32;amplicon_overlap=1;amplicon_totals=601/32;amplicon_names=SARSCoV_1200_28 GT 1/1

Things that are specific to the sample under analysis , and not the variant, need to be in the final column, so i think that first line needs to become

MN908947 210 1 G T . PASS GT:PRIMER_CALLS_IGNORED:TOTAL_PRIMER_BASES:UNFILTERED_DEPTH:UNFILTERED_DEPTH_BY_ALLELE:AMPLICON_OVERLAP:AMPLICON_TOTALS:AMPLICON_NAMES 1/1:0/0:0/0:33/1:1:33/1:SARSCoV_1200_1

(the genotype has to be the first thing in that last column, then i've left your other stuff unchanged except they need to be upper case)

So i think the changes are

  • add definitions of PRIMER_CALLS_IGNORED etc in the header, and move them all to colon-separated things in the final column. oh and i renamed totals as UNFILTERED_DEPTH_BY_ALLELE, so it's a bit clearer.

  • ideally add depth on the alleles we have in the 4th and 5th columns. Let's say the ref allele is A, and the alt is C, it would be good to have the good coverage on those two alleles, like "1,20", where 1 is ref and 20 is alt. Note i really mean ref here, not consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant