-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running branchwater on large assemblies #14
Comments
hi @jmattock5, I don't offhand know what the timeout is for when the branchwater backend will give up, but I can give you some insight into why it is taking so long: you're running into the problem that the runtime for branchwater-web scales with the size of the query. So, a 10 MB genome will take twice as long as a 5 MB genome to query. That's why it's slow. (Interestingly, the database functionality underlying this scaling is what makes branchwater possible; handling large queries against a large database is much harder!) |
Ok, thanks for the explanation. I'll leave it running for longer and see if anything happens. Would it be possible to run this locally instead? I have sourmash_plugin_branchwater installed, is the database that branchwater uses shareable? Thanks, |
IIRC, the on-disk index for branchwater is 1-2 TB. The raw data is in the 8-10 TB range (and that's something that we can search using the plugin). So, umm, probably a bit too large for download :). @luizirber would the branchwater web site work faster if a signature with a higher scaled value were used? e.g. 100,000 rather than 1000? Does that even work? cc @bluegenes |
Couple of notes on this:
In fact, I tried this crime from the last item with a rumen metagenome I had around (98M originally, 285k after downsample to 100,000), and got 171,599 results back, 285 of those above 20% containment. So yeah, this definitely works. We can change
|
More crimes: got a question on what are these matches, and going thru SRA IDs manually is boring. But the search server only returns SRA IDs and containment, how can I get the same data the web frontend returns? Like this =] Prepare a request and send to the web frontend:
(this is wrapping Parsing JSON is also boring, so here is a long oneliner to read
So yeah, definitely rumen metagenomes. But not only cow, also got sheep and Sika deer Finally, without filtering (all matches, even those that are only 0.3%):
|
Hello,
Thanks for developing such a great tool. I've been trying to run branchwater on some whole metagenome assemblies that are quite large (0.3G-1G). When I upload even the smaller ones and submit them I don't get any output. I've tried leaving a couple with the tab open for ~12 hours to no avail.
If I leave them for long enough will they eventually complete?
Thanks!
Jenny
The text was updated successfully, but these errors were encountered: