Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability of large backbone grafting #101

Open
rvosa opened this issue May 2, 2024 · 0 comments
Open

Scalability of large backbone grafting #101

rvosa opened this issue May 2, 2024 · 0 comments

Comments

@rvosa
Copy link
Member

rvosa commented May 2, 2024

In the final step of the pipeline, all (family-level) subtrees are grafted onto a rarified backbone topology. This is handled in-memory, implemented in Python. Whether this approach holds up when an analysis is run across the entire data set is untested and unknown. If there are issues, other implementation need to be attempted. There are at least two viable options:

  1. The splits in the backbone on which subtrees are to be grafted are tagged with unique IDs corresponding with the respective subtrees. The subtrees are then inserted in lieu of these IDs using string replacement in the Newick syntax.
  2. The operation is done in a database, e.g. using the DBTree schema.

This issue is considered 'done' when the BOLD 10M data set is processed without this issue presenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant