Huffman compression given a probability distribution over arbitrary symbols.
This project has limited real-world utility. It may be useful to experiment with or learn about Huffman coding (for example, when working on bespoke chess game compression for lichess.org), but there are better entropy coders (both in terms of compression ratio and performance) and better implementations.
See constriction
for composable
entropy coders, models and streams.
See arcode
for a standalone implementation
of arithmetic coding.
use std::iter::FromIterator;
use std::collections::HashMap;
use bit_vec::BitVec;
use huffman_compress::{CodeBuilder, Book, Tree};
let mut weights = HashMap::new();
weights.insert("CG", 293);
weights.insert("AG", 34);
weights.insert("AT", 4);
weights.insert("CT", 4);
weights.insert("TG", 1);
// Construct a Huffman code based on the weights (e.g. counts or relative
// frequencies).
let (book, tree) = CodeBuilder::from_iter(weights).finish();
// More frequent symbols will be encoded with fewer bits.
assert!(book.get("CG").map_or(0, |cg| cg.len()) <
book.get("AG").map_or(0, |ag| ag.len()));
// Encode some symbols using the book.
let mut buffer = BitVec::new();
let example = vec!["AT", "CG", "AT", "TG", "AG", "CT", "CT", "AG", "CG"];
for symbol in &example {
book.encode(&mut buffer, symbol);
}
// Decode the symbols using the tree.
let decoded: Vec<&str> = tree.decoder(&buffer).collect();
assert_eq!(decoded, example);
- 0.6.1
- Fix deprecation warning and remove
#[deny(warnings)]
(a future compatibility hazard in libraries).
- Fix deprecation warning and remove
- 0.6.0
- Update to
bit-vec
0.6.
- Update to
- 0.5.0
- Update to
bit-vec
0.5.
- Update to
- 0.4.0
- Renamed
Tree::decoder()
toTree::unbounded_decoder()
to avoid surprises. A newTree::decoder()
takes the maximum number of symbols to decode. - No longer reexporting
Saturating
from num-traits.
- Renamed
- 0.3.2
- Preallocate arena space for Huffman tree.
- 0.3.1
- Update num-traits to 0.2 (semver compatible).
- 0.3.0
- Introduce
CodeBuilder
. - Changes tie breaking order.
- Introduce
- 0.2.0
- Allow initialization from iterators without creating a
HashMap
. Thanks @mernen. - Require
K: Ord
instead ofK: Hash + Eq
for symbols and switchBook
internals fromHashMap
toBTreeMap
. - Specify stability guarantees.
- Allow initialization from iterators without creating a
- 0.1.1
- Expose more methods on
Book
.
- Expose more methods on
- 0.1.0
- Initial release.
huffman-compress is dual licensed under the Apache 2.0 and MIT license, at your option.