This repository has been archived by the owner on Mar 25, 2024. It is now read-only.
Parallel primitives
It is now possible to use llamapun while fully utilizing available CPU cores (configurable as is in rayon
).
Most of the examples are now refactored to the parallel primitives, and can see a 20x speedup on high-end chips with 16+ cores. A pass over arXMLiv 08.2018
now takes between 2-3 hours for a lightweight task (frequency reports, token models, etc) on such hardware.
The library also uses the parallel-friendly RoNode
libxml struct, which allows for additional gains when iterating over the DOM.
Example from corpus_mathml_stats
:
use llamapun::parallel_data::Corpus;
// ...
let corpus = Corpus::new(corpus_path);
let catalog = corpus.catalog_with_parallel_walk(|document| {
document
.get_math_nodes()
.into_par_iter()
.map(|math| {
let mut catalog = HashMap::new();
dfs_record(math, &open_ended, &mut catalog);
catalog
})
.reduce(HashMap::new, |mut map1, map2| {
for (k, v) in map2 {
let entry = map1.entry(k).or_insert(0);
*entry += v;
}
map1
})
});
For details, consult #29