Skip to content
bhodges edited this page Apr 24, 2011 · 11 revisions

Reference package specification

v1.0 Draft

Motivation

To standardize the construction of reference sets for pplacer and other similar tools.

Structure

A reference package is just a directory with a .refpkg suffix. There will be a CONTENTS.json file describing the files in JSON format. There are three sections: files, metadata, and md5.

Every specified field must appear, but any may be undefined.

Here we describe the sections in order.

files

profile
The alignment profile for the reference alignment.
taxonomy
The taxonomic map made by taxtastic.
aln_fasta
The reference alignment in fasta format.
tree_file
The reference tree in Newick format.
seq_info
Mapping from sequence name in tree to taxon id.
aln_sto
The reference alignment in Stockholm format.
phylo_model_file
A file describing statistics about the phylogenetic model, in JSON format (generated by taxtastic if a supported tree_stats files is specified).

metadata

locus
The locus described by the reference package (required).
description
An arbitrary description field (optional).
author
Person who created the reference package (optional).
create_date
The date of creation (generated by taxtastic).
format_version
Version of the package structure (generated by taxtastic).
package_version
The reference package release version (optional).

md5

md5 checksums for each of the files to ensure integrity.