-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kaiju #690
Comments
@pmenzel as mentioned on Slack, first there needs to be a Galaxy wrapper for |
@pmenzel we can help you getting this tools into Galaxy. Do you know if the tool and its database will be maintained when they shutdown their server? Maybe they are interested to help you as well? Here are a few links to checkout for Galaxy tools dev: A few links and useful literature for Galaxy tool and workflow development.
A complete (4h) tutorial, with everything you need is avaiable in the planeom docs. It will save you a lot of time later, please do this tutorial: https://planemo.readthedocs.io/en/latest/writing.html If you want to read a publication about planemo have a look here: https://genome.cshlp.org/content/33/2/261.long Last but not least there is this short blog post from David, showing the 3 major steps that you need to do if you want to get your tool into the European Galaxy server: https://usegalaxy-eu.github.io/posts/2020/08/22/three-steps-to-galaxify-your-tool/ |
@pvanheus Yes, exactly. Given the high resource requirements, I would need to know if it is at all possible, before starting to make a wrapper.. @bgruening and other admins, what do you think? @bgruening I am the only maintainer and I plan to keep updating the downloadable reference databases once per year as before. |
Ah I see! :) Thanks, let us know if you need any help. |
The memory requirement comes from loading the reference index and does not depend on the size of input fastq/fasta files, so it is easy to predict. :) It currently ranges from 49GB to 204GB depending on the reference database (numbers from June 2023). However, as sequence databases grow, these numbers will continue to increase and might well be up to 230GB in this year (I will make the databases for 2024 in the summer again). From the CPU perspective, 10 parallel threads are enough for the program to chuck along. |
That all looks ok to me! |
Great! For many users, it's not possible to run kaiju on their own hardware due to the large RAM requirement, so it would be really nice to have it available as a web service on usegalaxy.eu! |
@pmenzel do you need any help? |
@bgruening I didn't get around to delve into this issue, unfortunately. If the open issue is bothering, we can also close it for now and I will comment again, once I started working on it. |
its not bothering, just keep us updated :) Thanks a lot! |
I would be VERY interested to see the KAIJU program on the Galaxay EU site. I find that it can classify species that aren't identified by nucleotide mapping. So if I question results, I can run them on KAIJU and compare results. If I find a species on KAIJU, I can usually find it in my sequences. I have had only two issues regarding the databases used by KAIJU, and it's the same issue on Kraken2, etc. I have three species in the samples I'm running that aren't represented. Mycobacterium - (Mycobacterium 1100029.7) It's not recognized by Kraken2, as it's not in the database. KAIJU catches it as Tuberculosis as I believe one of it's genes is the same as Tuberculosis. Plasmodium (Plasmodium Ovale Wallikeri and Plasmodium Ovale Curtisi) BOTH of these are human pathogens, yet almost none of the databases available have it in them. The NCBI has it, so it shows up in the KAIJU results, but it's not in the Kraken2 databases so it doesn't even recognize it as Plasmodium. However a new reference was recently published POW222 (Plasmodium Ovale Wallikeri), and POC221 (Plasmodium Ovale Curtisi). I was using KAIJU through Kbase, but it's use on Galaxy would be much better. Thanks!! |
Hi!
I was wondering if it is possible to make kaiju available for public usage on usegalaxy.eu.
The current public web server for kaiju is always overloaded and will likely be out of service soon, so it would be very nice to make the program available at usegalaxy.eu. Similar tools like kraken2 are already there.
The main computational cost is the high memory requirement, depending on the used reference database, which might be prohibitive.
See the kaiju download page for the memory requirements, they range from 49GB to 204GB (as of June 2023) from the smallest to the most comprehensive reference database.
If it is possible to add kaiju to the service, @pvanheus suggested to add it to the list of tools at https://github.com/galaxyproject/tools-iuc, which I could try to do, probably requiring some help.
Thanks!
The text was updated successfully, but these errors were encountered: