Skip to content
This repository has been archived by the owner on Nov 2, 2020. It is now read-only.

Structural variants shouldn't use END as the position of a translocation. #140

Closed
lindenb opened this issue Jun 25, 2019 · 3 comments
Closed

Comments

@lindenb
Copy link

lindenb commented Jun 25, 2019

Cross posted at https://www.biostars.org/p/386345/

What you did:

I downloaded the SV from gnomad:


wget -O gnomad_v2_sv.sites.vcf.gz "https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz"

wget -O gnomad_v2_sv.sites.vcf.gz.tbi "https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz.tbi"

there is a translocation at: 12 :60718971


$ bcftools view gnomad_v2_sv.sites.vcf.gz  | grep gnomAD_v2_CTX_12_13 -m1

12  60718971    gnomAD_v2_CTX_12_13 N   <CTX>   999 PASS    END=57020218;SVTYPE=CTX;CHR2=13;SVLEN=-1;ALGORITHMS=manta;EVIDENCE=PE;CPX_TYPE=CTX_PP/QQ;PROTEIN_CODING__NEAREST_TSS=SLC16A7;PROTEIN_CODING__INTERGENIC;AN=21476;AC=1;AF=4.7e-05;N_BI_GENOS=10738;N_HOMREF=10737;N_HET=1;N_HOMALT=0;FREQ_HOMREF=0.999907;FREQ_HET=9.31272e-05;FREQ_HOMALT=0;AFR_AN=9480;AFR_AC=0;AFR_AF=0;AFR_N_BI_GENOS=4740;AFR_N_HOMREF=4740;AFR_N_HET=0;AFR_N_HOMALT=0;AFR_FREQ_HOMREF=1;AFR_FREQ_HET=0;AFR_FREQ_HOMALT=0;AMR_AN=1784;AMR_AC=1;AMR_AF=0.000561;AMR_N_BI_GENOS=892;AMR_N_HOMREF=891;AMR_N_HET=1;AMR_N_HOMALT=0;AMR_FREQ_HOMREF=0.998879;AMR_FREQ_HET=0.00112108;AMR_FREQ_HOMALT=0;EAS_AN=2226;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=1113;EAS_N_HOMREF=1113;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=1;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=7598;EUR_AC=0;EUR_AF=0;EUR_N_BI_GENOS=3799;EUR_N_HOMREF=3799;EUR_N_HET=0;EUR_N_HOMALT=0;EUR_FREQ_HOMREF=1;EUR_FREQ_HET=0;EUR_FREQ_HOMALT=0;OTH_AN=388;OTH_AC=0;OTH_AF=0;OTH_N_BI_GENOS=194;OTH_N_HOMREF=194;OTH_N_HET=0;OTH_N_HOMALT=0;OTH_FREQ_HOMREF=1;OTH_FREQ_HET=0;OTH_FREQ_HOMALT=0;POPMAX_AF=0.000561

but you're using END as the position of the translocation for CHR2 citing John Marshall 's answer: https://www.biostars.org/p/386345/#386354 , this usage of END breaks the integrity of the tabix index , the variant is not found anymore with bcftools or tabix:

$ bcftools view gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972" | grep gnomAD_v2_CTX_12_13
$ 
$ tabix gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972" | grep gnomAD_v2_CTX_12_13
$
@jmarshall
Copy link

jmarshall commented Jun 25, 2019

Dropping the grep from this last query highlights the unfortunate state that this corrupt index is in:

$ bcftools view gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972"
[424 records on chr. 12 at positions between 62403 and 60412822.]

@RCollins13
Copy link

Hi @lindenb and @jmarshall,

Thanks for raising this issue, and pointing out the implications for the VCF index for reciprocal translocations.

We are aware of some deviations that the current gnomAD-SV release has taken from the VCF spec.

We are working on a conversion from the current gnomAD-SV VCF to a fully spec-compliant VCF, but currently don't have a timetable for its availability.

Will update once these issues have been corrected.

Thanks,
Ryan

@nawatts
Copy link
Contributor

nawatts commented Jun 27, 2019

Moved to talkowski-lab/gnomad-sv-pipeline#3.

@nawatts nawatts closed this as completed Jun 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants