Skip to content

Running cte

martinghunt edited this page Mar 4, 2022 · 3 revisions

cte can be run in two modes: evaluate a single consensus sequence, or evaluate a batch of consensus sequences.

Evaluate a single consensus sequence

To evaluate one SARS-CoV-2 consensus sequence, you will need:

  1. A VCF file of the "truth" calls truth.vcf, as documented on the Truth VCF file page.
  2. The consensus sequence to evalaute in a FASTA file cons.fa
  3. The primer scheme. Currently supported: COVID-ARTIC-V3, COVID-ARTIC-V4, COVID-MIDNIGHT-1200. Or use your own TSV file of primers in Viridian Workflow format.

Example, assuming primer scheme COVID-ARTIC-V4:

cte eval_one_run \
  --outdir OUT \
  --truth_vcf truth.vcf \
  --fasta_to_eval cons.fa \
  --primers COVID-ARTIC-V4

Please read the output files page for a description of the output. Briefly, the output files are:

  1. results.tsv - most likely the file you want. Tab-delimited file with counts of the truth bases vs what was called in the consensus. The same information is also put in a JSON file results.json.
  2. per_position.tsv - a more detailed TSV file, with information at each position in the genome.

Evaluate a batch of sequences

In batch mode, each consensus sequence is evaluated independently, making the same files as if it were run on its own. Additionally, a "grand total" results.tsv file is made that has the sum of all the values from the individual results files.

To run a batch you will need the same information as for running on a single consensus. That information needs to go in a manifest tab-delimited file that has a name for the consensus, truth VCF filename, consensus FASTA filename, and primers name or primers filename. The file must have the columns name, truth_vcf, eval_fasta and primers. The order does not matter (and any other columns are ignored). Example:

name         truth_vcf    eval_fasta      primers
consensus1   truth1.vcf   cons1.fasta     COVID-ARTIC-V3
consensus1.2 truth1.vcf   cons1.2.fasta   COVID-ARTIC-V3
consensus2   truth2.vcf   cons2.fasta     COVID-ARTIC-V3

The name must be unique within the file, and also be "filesystem friendly" - the output files are put in directories named using the name column.

Run the batch with this command:

cte eval_runs --outdir out manifest.tsv

The output directory out contains the files/directories:

  • Processing/ - this contains a directory per consensus, each one named using the name column from manifest.tsv. It is the result of running cte on that consensus, ie equivalent to running cte eval_one_run.
  • results.tsv - the results summed across all of the input consensus sequences. It is the same format as a per-consensus results.tsv file.
  • results_per_run.tsv - the results of each individual run. This is the same format as the per-consensus results.tsv, but with an extra name column. The same information is also written to results_per_run.json.
Clone this wiki locally