-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce MS-GF+ search time #108
Comments
Search times are dependent on three things:
I suspect you are performing a partially tryptic search on a large FASTA (200 MB or larger) and using several dynamic mods. I suggest you change your search to be a fully tryptic search (ntt = 2) and run a test search on one of your 70 .raw files that already finished. Compare the results: did the partially tryptic search reveal more than ~3% additional identifications? I am, of course, just guessing here. You'll need to tell us:
|
Hi, mods=/scratch/oknjav001/bal_mzML_raw_files/databaseComparisonProject/msgfplus/searchEngine/MSGFPlus_Mods1.txt fastadb=/scratch/oknjav001/bal_mzML_raw_files/humanDatabase/fullmicribiome.fasta #============baseline================================== for mzml in *.mzML do java -Xmx16G -jar $msgfplus -s $mzml -d $fastadb -mod $mods -inst 3 -maxMissedCleavages 1 -t 20ppm -ti -1,2 -ntt 2 -tda 1 done; #============================bcg================================= for mzml in *.mzML do java -Xmx16G -jar $msgfplus -s $mzml -d $fastadb -mod $mods -inst 3 -maxMissedCleavages 1 -t 20ppm -ti -1,2 -ntt 2 -tda 1 done; |
You are using The big problem is that 1.5 GB FASTA file. I'm not sure that 16 GB is enough for it; hopefully it is. Provided Java does not report an out-of-memory exception, there really isn't much that can be done to speed up the search time: a 1.5 GB FASTA file is very large and will take time to search The only option would be to remove any dynamic mods in MSGFPlus_Mods1.txt (which is why I'm curious what it has). Splitting the 1.5 GB FASTA file into smaller chunks (using https://github.com/PNNL-Comp-Mass-Spec/Fasta-File-Splitter ) is an option, but that won't speed up the overall search time; it's really only useful if either Java is running out of memory, or if you're able to run multiple copies of MS-GF+ simultaneously, ideally on different systems |
Ah, I just noticed in #10 that the software is, in fact, crashing, and you need a copy of the Here you go: Note that the Fasta-File-Splitter is a VB.NET program (while most of our software is C#). Thus, you need a new enough version of Mono that supports VB.NET (it's had support for 6+years, but package managers for older Linux distros might have an old version of mono). See https://www.mono-project.com/download/stable/ You will split the FASTA file (probably into 10 parts), then run MS-GF+ 10 times for each .mzML file. Once you have the .mzid files from all of the searches, you will need to re-combine them and re-compute EValues. For that, use the MzIdMerger: |
@alchemistmatt this is the information in my modification file. C2H3N1O1,C,fix,any,Carbamidomethyl # Fixed Carbamidomethyl C Variable Modifications (default: none)O1,M,opt,any,Oxidation # Oxidation M #15.994915,M,opt,any,Oxidation # Oxidation M (mass is used instead of CompositionStr) |
comet can runs fast after indexing the database. The indexed database includes those modifications. I think msgf+ can be much faster if in the index step modifications were included, and sorted properly, I guess... |
@ATPs Implementing such an idea would be a significant amount of work. |
I am running MS-GF+ and it take days to finish running. How can I shorten the sequence database search time? I have increased the number of threads to 32 and RAM to 32 GB but the search time has not reduced as I expected. Could you kindly help figure out how this can be realized? I have 70 raw files which has taken five days to run on HPC
The text was updated successfully, but these errors were encountered: