SampleAncestry documentation

The ancestry estimation is based on correlating the sample variants with population-specific SNPs.
For each population (AFR, EUR, SAS, EAS) the 1000 most informative exonic SNPs were selected for this purpose.
A benchmark on the 1000 Genomes variant data assigned 99.81% of the samples to the correct population (2153 of 2157).

Due to different similarity between popultations, the expected scores differ depending on the ancestry of the sample of interest.
This plot shows the score distribution on the 1000 Genomes data:

This table shows the score median and median average deviation determined from the 1000 Genomes data and used internally to assign a population:

population	AFR median / mad	EUR median / mad	SAS median / mad	EAS median / mad
AFR	0.5002 / 0.0291	0.0553 / 0.0280	0.1061 / 0.0267	0.0895 / 0.0274
EUR	0.0727 / 0.0271	0.3251 / 0.0252	0.1922 / 0.0249	0.0603 / 0.0264
SAS	0.0698 / 0.0264	0.1574 / 0.0295	0.3395 / 0.0291	0.1693 / 0.0288
EAS	0.08415 / 0.0275	0.06725 / 0.0269	0.21495 / 0.0228	0.47035 / 0.0242

Help and ChangeLog

The SampleAncestry command-line help and changelog can be found here.

back to ngs-bits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

SampleAncestry documentation

Help and ChangeLog

Files

index.md

Latest commit

History

index.md

File metadata and controls

SampleAncestry documentation

Help and ChangeLog