You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bgzipped VCF files can be found at /SAN/vyplab/UCLex/mainset_May2019/VQSR/both/mainset_May2019_chr*_filtered.vcf.gz
There are around 5500 samples. Sample ids are internal ones (many will overlap with what we already have in the database, in which case they can be ignored).
The format is like this (omit the header starting with ##)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 09G06954_S4_L001_R1_001.fastq.gz 10G00594_S5_L001_R1_001.fastq.gz 10G06637_S2_L001_R1_001.fastq.gz
1 13302 rs180734498 C T 579047.33 PASS AC=1844;AF=0.277;AN=6646;BaseQRankSum=0.480;ClippingRankSum=0.055;DB;DP=284259;ExcessHet=0.0004;FS=5.505;InbreedingCoeff=0.0555;MLEAC=1845;MLEAF=0.278;MQ=20.14;MQ0=0;MQRankSum=-3.805e+00;QD=6.17;ReadPosRankSum=-1.485e+00;SOR=1.127;VQSLOD=0.496;culprit=QD GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0 ./.:0,0:0 ./.:0,0:0
1 13327 rs144762171 G A,C 6985.43 PASS AC=24,10;AF=3.209e-03,1.337e-03;AN=7502;BaseQRankSum=-7.270e-01;ClippingRankSum=0.620;DB;DP=255285;ExcessHet=0.0000;FS=4.881;InbreedingCoeff=0.1615;MLEAC=14,6;MLEAF=1.866e-03,7.998e-04;MQ=2.52;MQ0=0;MQRankSum=-7.360e-01;QD=14.64;ReadPosRankSum=0.743;SOR=4.424;VQSLOD=2.57;culprit=InbreedingCoeff GT:AD:DP:GQ:PGT:PID:PL 0/0:24,0,0:24 1/2:1,12,12:25 2/2:1,0,20:21
...
First column: chromosome
2: Position
3: ID (ignore)
4: REF
5: ALT
6: quality (ignore)
7: Filter
8: INFO (ignore)
9: Format (this should be the header for sample genotypes on the same row, delimited by :. However the genotypes don't seem to always follow what the header depicts)
10-: sample genotypes
For each row, it might host multiple variants, in which case we call it multi-allelic site. You can parse the row with something like this:
The bgzipped VCF files can be found at
/SAN/vyplab/UCLex/mainset_May2019/VQSR/both/mainset_May2019_chr*_filtered.vcf.gz
There are around 5500 samples. Sample ids are internal ones (many will overlap with what we already have in the database, in which case they can be ignored).
The format is like this (omit the header starting with
##
)First column: chromosome
2: Position
3: ID (ignore)
4: REF
5: ALT
6: quality (ignore)
7: Filter
8: INFO (ignore)
9: Format (this should be the header for sample genotypes on the same row, delimited by
:
. However the genotypes don't seem to always follow what the header depicts)10-: sample genotypes
For each row, it might host multiple variants, in which case we call it
multi-allelic site
. You can parse the row with something like this:The text was updated successfully, but these errors were encountered: