Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

END less than start in gnomad SV VCF? #515

Closed
cmdcolin opened this issue Oct 29, 2019 · 3 comments
Closed

END less than start in gnomad SV VCF? #515

cmdcolin opened this issue Oct 29, 2019 · 3 comments
Assignees

Comments

@cmdcolin
Copy link

Hi there
I work on a genome browser where we'd like to helpfully display SVs but the code doesn't like when it sees features where END<START

I saw in the SV VCF that there is a line where the "END" field is less than the start

https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz

1 1386372 gnomAD_v2_INS_1_51      N       <INS>   372     PASS    END=1386324;SVTYPE=INS;CHR2=1;SVLEN=78;ALGORITHMS=delly,mant
a;EVIDENCE=SR;SOURCE=INS_1:1451663-1451741;PROTEIN_CODING__INTRONIC=ATAD3C;AN=20020;AC=143;AF=0.007143;N_BI_GENOS=10010;N_HOMREF=9867;N_HET=143;N_HOMALT=0;FREQ_HOMREF=0.985714;FREQ_HET=0.0142857;FREQ_HOMALT=0;AFR_AN=8062;AFR_AC=138;AFR_AF=0.017117;AFR_N_BI_GENOS=4031;AFR_N_HOMREF=3893;AFR_N_HET=138;AFR_N_HOMALT=0;AFR_FREQ_HOMREF=0.965765;AFR_FREQ_HET=0.0342347;AFR_FREQ_HOMALT=0;AMR_AN=1766;AMR_AC=3;AMR_AF
=0.001699;AMR_N_BI_GENOS=883;AMR_N_HOMREF=880;AMR_N_HET=3;AMR_N_HOMALT=0;AMR_FREQ_HOMREF=0.996602;AMR_FREQ_HET=0.00339751;AMR_FREQ_HOMALT=0;EAS_AN=2222;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=1111;EAS_N_HOMREF=1
111;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=1;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=7586;EUR_AC=1;EUR_AF=0.000132;EUR_N_BI_GENOS=3793;EUR_N_HOMREF=3792;EUR_N_HET=1;EUR_N_HOMALT=0;EUR_FREQ_HOMREF=0.99
9736;EUR_FREQ_HET=0.000263644;EUR_FREQ_HOMALT=0;OTH_AN=384;OTH_AC=1;OTH_AF=0.002604;OTH_N_BI_GENOS=192;OTH_N_HOMREF=191;OTH_N_HET=1;OTH_N_HOMALT=0;OTH_FREQ_HOMREF=0.994792;OTH_FREQ_HET=0.00520833;OTH_FREQ
_HOMALT=0;POPMAX_AF=0.017117

I was wondering if this has any meaning or maybe it is a liftover or data bug? Here is the link in the browser https://gnomad.broadinstitute.org/variant/INS_1_35?dataset=gnomad_sv_r2_1

@nawatts
Copy link
Contributor

nawatts commented Oct 29, 2019

Hi @cmdcolin,

I believe this is related to talkowski-lab/gnomad-sv-pipeline#3 (@RCollins13 would have to confirm that).

The SV dataset shown in the browser has been updated from the VCF linked in the issue description. The SV dataset shown in the browser comes from https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2.1_sv.sites.vcf.gz. In the 2.1 VCF, end is greater than start for the variant "gnomAD-SV_v2.1_INS_1_35".

1	1386372	gnomAD-SV_v2.1_INS_1_35	N	<INS>	692	UNSTABLE_AF_PCRMINUS	END=1386373;SVTYPE=INS;SVLEN=78;CHR2=1;POS2=1386324;END2=1386325;ALGORITHMS=delly,manta;EVIDENCE=SR;HIGH_SR_BACKGROUND;PROTEIN_CODING__INTRONIC=ATAD3C;SOURCE=INS_1:1451663-1451741;AN=18980;AC=29;AF=0.001528;N_BI_GENOS=9490;N_HOMREF=9461;N_HET=29;N_HOMALT=0;FREQ_HOMREF=0.996944;FREQ_HET=0.00305585;FREQ_HOMALT=0;AFR_AN=7500;AFR_AC=29;AFR_AF=0.003867;AFR_N_BI_GENOS=3750;AFR_N_HOMREF=3721;AFR_N_HET=29;AFR_N_HOMALT=0;AFR_FREQ_HOMREF=0.992267;AFR_FREQ_HET=0.00773333;AFR_FREQ_HOMALT=0;AMR_AN=1444;AMR_AC=0;AMR_AF=0;AMR_N_BI_GENOS=722;AMR_N_HOMREF=722;AMR_N_HET=0;AMR_N_HOMALT=0;AMR_FREQ_HOMREF=1;AMR_FREQ_HET=0;AMR_FREQ_HOMALT=0;EAS_AN=2410;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=1205;EAS_N_HOMREF=1205;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=1;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=7440;EUR_AC=0;EUR_AF=0;EUR_N_BI_GENOS=3720;EUR_N_HOMREF=3720;EUR_N_HET=0;EUR_N_HOMALT=0;EUR_FREQ_HOMREF=1;EUR_FREQ_HET=0;EUR_FREQ_HOMALT=0;OTH_AN=186;OTH_AC=0;OTH_AF=0;OTH_N_BI_GENOS=93;OTH_N_HOMREF=93;OTH_N_HET=0;OTH_N_HOMALT=0;OTH_FREQ_HOMREF=1;OTH_FREQ_HET=0;OTH_FREQ_HOMALT=0;POPMAX_AF=0.003867

@cmdcolin
Copy link
Author

Ah didn't realize this is a subset of that problem, I saw this in similar context in a Sniffles VCF too. Thanks for clarifying

@RCollins13
Copy link
Contributor

Hi @cmdcolin,

This is an issue with several SV algorithms, and one we're actively working to resolve for gnomAD.

The v2.1 SV release (available here: https://gnomad.broadinstitute.org/downloads) should have these issues corrected, and future gnomAD releases will also not include records where END<POS.

Thanks,
Ryan

@nawatts nawatts closed this as completed Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants