Skip to content

Explain the resulting profile

Alessio Milanese edited this page Mar 16, 2019 · 6 revisions

The result that you obtain from motus profile or motus calc_motu is a profile with three headers that start with #. After these three lines you have the taxa id/name and read count values.

When you profile at the mOTU level, you get 2 tab-separated columns: First NCBI_consensus_name[mOTUs_id] and second the relative abundance value. For example, the result of motus profile -s test1_single.fastq -n test1 is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test1
#consensus_taxonomy	test1
Kandleria vitulina [ref_mOTU_v2_0001]	0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002]	0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003]	0.0234955832
...
Thermoproteus uzoniensis [ref_mOTU_v2_5304]	0.0000000000
Paenibacillus sp. [ref_mOTU_v2_5305]	0.0030541740
unknown Bdellovibrio [meta_mOTU_v2_5307]	0.0000000000
unknown Alphaproteobacteria [meta_mOTU_v2_5308]	0.0000031719
...
unknown Clostridiales [meta_mOTU_v2_7800]	0.0000000000
-1	0.2307163722

There are 5232 ref_mOTUs (= species with a reference genome) and 2494 meta_mOTUs (= species without a reference genome). The -1 at the end represents the fraction of unmapped reads (species that we cannot measure).

You can easily remove the first two rows with:

tail -n+3 taxonomic_profiling.txt