Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about running time #4

Closed
RiErm7 opened this issue Feb 27, 2024 · 7 comments
Closed

Questions about running time #4

RiErm7 opened this issue Feb 27, 2024 · 7 comments

Comments

@RiErm7
Copy link

RiErm7 commented Feb 27, 2024

Hello, I have a question. How long does it take to identify an NLR gene in rice? I have been identifying for more then 4 hours and still have no results. The log file is as follows:
`Tue, 27 Feb 2024 17:03:38 +0800

  • /public1/home/stu_wangyuhao/micromamba/envs/resistify/bin/resistify /public1/home/stu_wangyuhao/geno_data/MSU7/Osativa_323_v7.0.protein_primaryTranscriptOnly.fa ./msu7
    2024-02-27 17:03:39,179 - 😊 Output directory created at ./msu7
    2024-02-27 17:03:39,989 - 😊 Running hmmsearch...
    2024-02-27 17:03:41,679 - 😊 hmmsearch completed successfully...
    2024-02-27 17:03:41,823 - 😊 473 sequences classified as potential NLRs!
    2024-02-27 17:03:41,825 - 😊 Running jackhmmer...
    `

Thanks !

@SwiftSeal
Copy link
Owner

Hi @RiErm7 - thanks for getting in touch!
The NLRexpress module is the longest part of Resistify as it is highly dependent on disk I/O speed. Typically ~500 NLRs would take 10 or so hours on our system, although the I/O was quite slow on it. Unfortunately there's no way to easily speed this up as it relies upon jackhmmer!

Did it run successfully in the end?

@RiErm7
Copy link
Author

RiErm7 commented Mar 11, 2024

Hi @SwiftSeal , the software is very nice。
I also observed that the jackhmmer step in NLRexpress was very slow. In nlrexpress.py, I discovered that the default number of CPU cores can be changed, so I set it to 10 (not sure if it helped). In the end, it took 3 and a half hours to complete.
Tue, 27 Feb 2024 22:22:09 +0800
2024-02-27 22:22:10,681 - 😊 Output directory created at ./MSU7_chr_uniq.pep
2024-02-27 22:22:11,654 - 😊 Running hmmsearch...
2024-02-27 22:22:12,959 - 😊 hmmsearch completed successfully...
2024-02-27 22:22:13,076 - 😊 464 sequences classified as potential NLRs!
2024-02-27 22:22:13,077 - 😊 Running jackhmmer...
2024-02-28 01:54:24,301 - 😊 jackhmmer completed successfully...
2024-02-28 01:54:31,815 - 😊 Generating matrix for extEDVID...
2024-02-28 01:55:10,462 - 😊 Predicting extEDVID motifs...
2024-02-28 01:55:14,802 - 😊 Generating matrix for VG...
2024-02-28 01:55:42,662 - 😊 Predicting VG motifs...
2024-02-28 01:55:46,323 - 😊 Generating matrix for P-loop...
2024-02-28 01:56:22,462 - 😊 Predicting P-loop motifs...
2024-02-28 01:56:26,314 - 😊 Generating matrix for RNSB-A...
2024-02-28 01:57:01,987 - 😊 Predicting RNSB-A motifs...
2024-02-28 01:57:05,955 - 😊 Generating matrix for RNSB-B...
2024-02-28 01:57:39,213 - 😊 Predicting RNSB-B motifs...
2024-02-28 01:57:42,965 - 😊 Generating matrix for RNSB-C...
2024-02-28 01:58:18,492 - 😊 Predicting RNSB-C motifs...
2024-02-28 01:58:22,427 - 😊 Generating matrix for RNSB-D...
2024-02-28 01:58:58,922 - 😊 Predicting RNSB-D motifs...
2024-02-28 01:59:02,789 - 😊 Generating matrix for Walker-B...
2024-02-28 01:59:34,983 - 😊 Predicting Walker-B motifs...
2024-02-28 01:59:38,813 - 😊 Generating matrix for GLPL...
2024-02-28 02:00:08,810 - 😊 Predicting GLPL motifs...
2024-02-28 02:00:12,455 - 😊 Generating matrix for MHD...
2024-02-28 02:00:36,077 - 😊 Predicting MHD motifs...
2024-02-28 02:00:39,624 - 😊 Generating matrix for LxxLxL...
2024-02-28 02:01:10,439 - 😊 Predicting LxxLxL motifs...
2024-02-28 02:01:14,098 - 😊 Generating matrix for aA...
2024-02-28 02:01:44,700 - 😊 Predicting aA motifs...
2024-02-28 02:01:48,415 - 😊 Generating matrix for aC...
2024-02-28 02:02:19,377 - 😊 Predicting aC motifs...
2024-02-28 02:02:23,034 - 😊 Generating matrix for aD3...
2024-02-28 02:03:03,189 - 😊 Predicting aD3 motifs...
2024-02-28 02:03:07,306 - 😊 Generating matrix for bA...
2024-02-28 02:03:49,208 - 😊 Predicting bA motifs...
2024-02-28 02:03:53,849 - 😊 Generating matrix for bC...
2024-02-28 02:04:28,130 - 😊 Predicting bC motifs...
2024-02-28 02:04:32,066 - 😊 Generating matrix for bDaD1...
2024-02-28 02:05:18,020 - 😊 Predicting bDaD1 motifs...
2024-02-28 02:05:22,322 - 😊 Saving results to ./MSU7_chr_uniq.pep...
2024-02-28 02:05:22,349 - 😊 Thank you for using Resistify!

@SwiftSeal
Copy link
Owner

Hi @RiErm7,

That's good news that it completed successfully 🎉

During development I noticed that jackhmmer would use a maximum of ~2 threads due to the I/O bottleneck despite supplying additional threads as you outlined. If you experienced a significant speed up with your modification I'll add an argument to allow the user to specify additional threads. I did experiment with splitting the input and running multiple jackhmmer parallel subprocesses, I might revisit this if I get the chance 🤔

Cheers!

@RiErm7
Copy link
Author

RiErm7 commented Mar 11, 2024

Hi @SwiftSeal ,
Thank you for your blessings, and also thank you for providing the software. I am not sure if providing extra CPU cores will improve the running speed. I ran over 2,700 NLR genes of other species using 10 cores, and the time taken is as follows (hope this helps you).
Wish you success🎉.
2024-02-27 22:23:19,946 - 😊 Running hmmsearch...
2024-02-27 22:23:27,896 - 😊 hmmsearch completed successfully...
2024-02-27 22:23:28,641 - 😊 2704 sequences classified as potential NLRs!
2024-02-27 22:23:28,649 - 😊 Running jackhmmer...
2024-02-28 14:39:42,292 - 😊 jackhmmer completed successfully...
2024-02-28 14:40:22,624 - 😊 Generating matrix for extEDVID...
2024-02-28 14:45:48,057 - 😊 Predicting extEDVID motifs...
2024-02-28 14:46:32,601 - 😊 Generating matrix for VG...
2024-02-28 14:49:27,920 - 😊 Predicting VG motifs...
2024-02-28 14:50:09,935 - 😊 Generating matrix for P-loop...
2024-02-28 14:55:09,267 - 😊 Predicting P-loop motifs...
2024-02-28 14:55:43,646 - 😊 Generating matrix for RNSB-A...
2024-02-28 15:00:52,867 - 😊 Predicting RNSB-A motifs...
2024-02-28 15:01:36,959 - 😊 Generating matrix for RNSB-B...
2024-02-28 15:06:24,492 - 😊 Predicting RNSB-B motifs...
2024-02-28 15:07:14,821 - 😊 Generating matrix for RNSB-C...
2024-02-28 15:11:50,959 - 😊 Predicting RNSB-C motifs...
2024-02-28 15:12:23,978 - 😊 Generating matrix for RNSB-D...
2024-02-28 15:17:38,991 - 😊 Predicting RNSB-D motifs...
2024-02-28 15:18:18,745 - 😊 Generating matrix for Walker-B...
2024-02-28 15:22:14,856 - 😊 Predicting Walker-B motifs...
2024-02-28 15:22:58,825 - 😊 Generating matrix for GLPL...
2024-02-28 15:26:39,448 - 😊 Predicting GLPL motifs...
2024-02-28 15:27:21,393 - 😊 Generating matrix for MHD...
2024-02-28 15:30:16,031 - 😊 Predicting MHD motifs...
2024-02-28 15:30:57,195 - 😊 Generating matrix for LxxLxL...
2024-02-28 15:34:36,680 - 😊 Predicting LxxLxL motifs...
2024-02-28 15:35:02,702 - 😊 Generating matrix for aA...
2024-02-28 15:39:14,569 - 😊 Predicting aA motifs...
2024-02-28 15:39:52,617 - 😊 Generating matrix for aC...
2024-02-28 15:44:03,572 - 😊 Predicting aC motifs...
2024-02-28 15:44:41,618 - 😊 Generating matrix for aD3...
2024-02-28 15:49:46,455 - 😊 Predicting aD3 motifs...
2024-02-28 15:50:18,185 - 😊 Generating matrix for bA...
2024-02-28 15:55:35,419 - 😊 Predicting bA motifs...
2024-02-28 15:56:06,881 - 😊 Generating matrix for bC...
2024-02-28 16:00:20,058 - 😊 Predicting bC motifs...
2024-02-28 16:00:50,086 - 😊 Generating matrix for bDaD1...
2024-02-28 16:06:21,785 - 😊 Predicting bDaD1 motifs...
2024-02-28 16:06:54,807 - 😊 Saving results to ./ZG_primary.pep...
2024-02-28 16:06:55,009 - 😊 Thank you for using Resistify!

@SwiftSeal
Copy link
Owner

Wow that is a big inventory of NLRs! Thank you for sending this over, it's really useful to get an idea of what runtimes other users are experiencing. I'll add a note on runtimes to the README.

I will close this issue for now, thanks again for getting in touch and please drop an issue if you'd like further support.

@SwiftSeal
Copy link
Owner

Hi @RiErm7 ,

Just a quick note that I have released a new version which parallelises jackhmmer which should be much quicker for larger datasets

@RiErm7
Copy link
Author

RiErm7 commented Apr 26, 2024

Hi @RiErm7 ,

Just a quick note that I have released a new version which parallelises jackhmmer which should be much quicker for larger datasets

Congratulations. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants