KrakMeOpen

A Kraken 2 downstream analysis toolkit. More specifically, calculate a series of quality metrics for Kraken 2 classifications.

To apply any confidence score stringency for your Kraken 2 classifications, try out StringMeUp.

Installation

KrakMeOpen is available to install through conda. Simply run the following command to install it:

conda install -c conda-forge -c bioconda krakmeopen

Usage

A good start is to run krakmeopen --help.

To calculate quality metrics for a Kraken 2 classification, run:

krakmeopen --input classifications.kraken2 --output metrics_out.tsv --names names.dmp --nodes nodes.dmp [--tax_id INT | --tax_id_file tax_ids.txt] --output_kmer_tally kmer_tally_out.tsv

Where:

input is the read-by-read classifications output by Kraken 2 (or StringMeUp). Required.
output is the file to write the output to. Required.
name and nodes are the same NCBI style taxonomy files used for the building of the database that was used to produce the original Kraken 2 classifications. Required.
tax_id takes a single taxonomic ID (taxID), while tax_id_file is a file containing multiple taxIDs (one per line) that you wish to get quality metrics for. Must specify one of the two. Required.
output_kmer_tally is a file to output the complete kmer tally for each tax ID to (human readable). Optional.

KrakMeOpen consists of two steps: (1) tally kmers, and (2) calculate metrics. It is possible to do only the first step and save the result in a pickle for later use. This is for example when you want to calculate metrics from multiple classifications. The kmer tallies from multiple pickles are added together and metrics are calculated on that sum of kmers.

In the below example, calculate combined metrics for the clade rooted at human (9606) from two kraken 2 classifications (sample_1, sample_2). First tally the kmers from sample_1 and sample_2 by specifying --output_pickle. Then, calculate metrics and output the final result to combined_metrics.tsv. Note that the tax_id, names.dmp, and nodes.dmp must be the same in all calls.

krakmeopen --input sample_1.kraken2 --output_pickle sample_1_metrics.pickle --names names.dmp --nodes nodes.dmp --tax_id 9606

krakmeopen --input sample_2.kraken2 --output_pickle sample_2_metrics.pickle --names names.dmp --nodes nodes.dmp --tax_id 9606

krakmeopen --input_file_list pickle_file_list.txt --output combined_metrics.tsv --names names.dmp --nodes nodes.dmp --tax_id 9606

Quality metrics

The metrics are calculated on the clade-level. All kmers from all reads that are classified to any of the nodes in the clades rooted at the supplied tax IDs are aggregated, and metrics are calculated on those aggregations. Input is Kraken2 read-by-read classification files (can be gzipped). Output is a tab separated file containing the metrics.

The following metrics are calculated:

Metric name	Description
nkmers_total	Total number of kmers
nkmers_classified	Total number of classified kmers
nkmers_unclassified	Total number of unclassified kmers
nkmers_clade	Total number of kmers classified to any tax ID within the clade
nkmers_lineage	Total number of kmers classified to any tax ID directly above the clade root tax ID
confidence_original	The confidence score for the clade, calculated as described by Kraken2
confidence_classified	An alternative confidence score where the unclassified kmers are removed from the denominator
other_kmers_lineage_ratio	Ratio of nkmers_lineage / (nkmers_total - nkmers_clade)
other_kmers_root_ratio	Ratio of "kmers classified to root" / (nkmers_total - nkmers_clade)
other_kmers_classified_ratio	Ratio of (nkmers_total - nkmers_clade - nkmers_unclassified) / (nkmers_total - nkmers_clade)
other_kmers_distance	Average distance between the clade root tax ID and the tax IDs which kmers are classified to
other_kmers_distance_lineage_excluded	Like other_kmers_distance but kmers classified to tax IDs above the clade are excluded
other_kmers_intra_distance	Average distance between kmers classified outside the clade
other_kmers_intra_distance_lineage_excluded	Like other_kmers_intra_distance but kmers classified above the clade are excluded

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
krakmeopen		krakmeopen
.gitignore		.gitignore
LICENCE		LICENCE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KrakMeOpen

Installation

Usage

Quality metrics

About

Releases

Packages

Languages

License

danisven/KrakMeOpen

Folders and files

Latest commit

History

Repository files navigation

KrakMeOpen

Installation

Usage

Quality metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages