Skip to content

Basic Usage and Options

whao89 edited this page Nov 16, 2016 · 7 revisions

Topics:

Command line options

Primary options:

  • -file <name>: location by individuals matrix of SNP values (0,1,2)
  • -n <N>: number of individuals
  • -l <L>: number of locations
  • -k <K>: number of populations
  • -rfreq <val>: checks for convergence and logs output every iterations
  • -nthreads <val>: use threads

Other options:

  • -help: usage
  • -label: descriptive tag for the output directory
  • -file-suffix: save files with the corresponding iteration as suffix
  • -force: overwrite existing output directory
  • -seed <val>: value is a real number (read as "double") and sets the seed for GSL
  • -compute-beta: computes beta values, given a completely theta fit
  • -locations-file: text file of markers for -compute-beta (if not all markers). File is formatted as one SNP index per line, starting from 0.

File formats supported

We support PLINK binary format .bed.

We also support a text format *.012, where SNPs as coded as 0, 1, and 2 and there is no delimiter.

Output

  • theta.txt: admixture proportions
  • validation.txt: validation likelihoods for the fit. The columns are: iteration number, time elapsed (in seconds), validation likelihood, number of SNPs sampled, and perplexity.
Clone this wiki locally