Skip to content

Commit

Permalink
multiple gzip members
Browse files Browse the repository at this point in the history
  • Loading branch information
dagou committed Jan 29, 2025
1 parent 13b40b8 commit 7508b81
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Kun-peng <img src="./docs/KunPeng.png" alt="Kun-peng Logo" align="right" width="50"/>

[![](https://img.shields.io/badge/doi-waiting-yellow.svg)]() [![](https://img.shields.io/badge/release%20version-0.7.2-green.svg)](https://github.com/eric9n/Kun-peng/releases)
[![](https://img.shields.io/badge/doi-waiting-yellow.svg)]() [![](https://img.shields.io/badge/release%20version-0.7.5-green.svg)](https://github.com/eric9n/Kun-peng/releases)

Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expanding genome databases. Here, we present Kun-peng, featuring a unique ordered 4GB block database design for ultra-efficient resource management, faster processing, and higher accuracy. When benchmarked on mock communities (Amos HiLo, Mixed, and NIST) against Kraken2, Centrifuge, and Sylph. Kun-peng matched Sylph, achieving the highest precision and lowest false-positive rates while demonstrating superior time and memory efficiency among all tested tools. Furthermore, Kun-peng's efficient database architecture enables the practical utilization of large-scale reference databases that were previously computationally prohibitive. In comprehensive testing across 586 air, water, soil, and human metagenomic samples, Kun-peng processed each sample in 0.2-11.2 minutes using only 4.0-35.4GB peak memory with an expansive pan-domain database (204,477 genomes, 4.3TB). Kun-peng classified 69.78-94.29% of reads, achieving 38-43% higher classification rates than Kraken2 with the standard database. Remarkably, Kun-peng’s processing times were comparable to Kraken2 using the standard database (81GB), roughly 5% of the size of the pan-domain database. Memory-wise, Kun-peng required only 35.4GB peak memory, representing a 473-fold reduction compared to Kraken2 (1.85TB). Unexpectedly, Sylph failed to classify any reads in air samples and left > 99.85% of reads unclassified in water and soil samples using the expansive pan-domain database. Kun-peng also processes samples up to 46.3 times faster, using up to 20.6 times less memory than Sylph. Overall, Kun-peng offers an ultra-memory-efficient, fast, and accurate solution for pan-domain metagenomic classifications.

Expand Down

0 comments on commit 7508b81

Please sign in to comment.