Scalability of large backbone grafting #101

rvosa · 2024-05-02T10:41:58Z

In the final step of the pipeline, all (family-level) subtrees are grafted onto a rarified backbone topology. This is handled in-memory, implemented in Python. Whether this approach holds up when an analysis is run across the entire data set is untested and unknown. If there are issues, other implementation need to be attempted. There are at least two viable options:

The splits in the backbone on which subtrees are to be grafted are tagged with unique IDs corresponding with the respective subtrees. The subtrees are then inserted in lieu of these IDs using string replacement in the Newick syntax.
The operation is done in a database, e.g. using the DBTree schema.

This issue is considered 'done' when the BOLD 10M data set is processed without this issue presenting.

rvosa added this to the Roadmap NLeSC/Naturalis collaboration milestone May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability of large backbone grafting #101

Scalability of large backbone grafting #101

rvosa commented May 2, 2024 •

edited

Loading

Scalability of large backbone grafting #101

Scalability of large backbone grafting #101

Comments

rvosa commented May 2, 2024 • edited Loading

rvosa commented May 2, 2024 •

edited

Loading