Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support or warnings (and tests) for edge cases (extension case, single files) #431

Open
NikoHensley opened this issue Oct 18, 2024 · 43 comments
Labels
bug Something isn't working

Comments

@NikoHensley
Copy link

Description of the bug

Hi all, I keep encountering the following error on diverse datasets that I am trying to use with quantms. I have tried it multiple times on the same HPC cluster but with different downloads from nf-core and/or bigbio, different memory allocations on the cluster, and different datasets including data files that I know should work because I have been able to run them using a different proteomics pipeline on the same cluster. I was able to run the example file just fine (test_lfq) but as soon as I have my own data, it fails with this same error.

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design)` terminated with an error exit status (6)


Command executed:

  ProteomicsLFQ \
      -threads 12 \
      -in PD225_Block1-4.mzML \
      -ids PD225_Block1-4_consensus_fdr_filter.idXML \
      -design test_ants_exp_setup.sdrf_openms_design.tsv \
      -fasta GCF_000001405.40_protein_decoy.fasta \
      -protein_inference bayesian \
      -quantification_method feature_intensity \
      -targeted_only false \
      -feature_with_id_min_score 0 \
      -feature_without_id_min_score 0 \
      -mass_recalibration false \
      -Seeding:intThreshold 100 \
      -protein_quantification unique_peptides \
      -alignment_order star \
       \
      -psmFDR 0.1 \
      -proteinFDR 0.05 \
      -picked_proteinFDR false \
      -out_cxml test_ants_exp_setup.sdrf_openms_design_openms.consensusXML \
      -out test_ants_exp_setup.sdrf_openms_design_openms.mzTab \
      -out_msstats test_ants_exp_setup.sdrf_openms_design_msstats_in.csv \
       \
      -debug 1000 \
      2>&1 | tee proteomicslfq.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ":
      ProteomicsLFQ: $(ProteomicsLFQ 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
  END_VERSIONS

Command exit status:
  6

Command output:
  The OpenMS team is collecting usage statistics for quality control and funding purposes.
  We will never give out your personal data, but you may disable this functionality by 
  setting the environmental variable OPENMS_DISABLE_UPDATE_CHECK to ON.
  Connecting to REST server failed. Skipping update check.
  Error: Host unreachable
  TOPPBase.cpp(1588): Value of string option 'no_progress': 0
  TOPPBase.cpp(1588): Value of string option 'in': PD225_Block1-4.mzML
  TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4.mzML'
  TOPPBase.cpp(1588): Value of string option 'out': test_ants_exp_setup.sdrf_openms_design_openms.mzTab
  TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_openms.mzTab'
  TOPPBase.cpp(1588): Value of string option 'out_msstats': test_ants_exp_setup.sdrf_openms_design_msstats_in.csv
  TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_msstats_in.csv'
  TOPPBase.cpp(1588): Value of string option 'out_triqler': 
  TOPPBase.cpp(1588): Value of string option 'ids': PD225_Block1-4_consensus_fdr_filter.idXML
  TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4_consensus_fdr_filter.idXML'
  TOPPBase.cpp(1588): Value of string option 'design': test_ants_exp_setup.sdrf_openms_design.tsv
  TOPPBase.cpp(1588): Checking input file 'test_ants_exp_setup.sdrf_openms_design.tsv'
  TOPPBase.cpp(1588): Value of string option 'fasta': GCF_000001405.40_protein_decoy.fasta
  TOPPBase.cpp(1588): Checking input file 'GCF_000001405.40_protein_decoy.fasta'
  TOPPBase.cpp(1588): Value of string option 'quantification_method': feature_intensity
  Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames.
  TOPPBase.cpp(1588): Error occurred in line 1629 of file /opt/conda/conda-bld/openms-meta_1716538752609/work/src/topp/ProteomicsLFQ.cpp (in function: virtual OpenMS::TOPPBase::ExitCodes ProteomicsLFQ::main_(int, const char**)) !

Command error:
  The OpenMS team is collecting usage statistics for quality control and funding purposes.
  We will never give out your personal data, but you may disable this functionality by 
  setting the environmental variable OPENMS_DISABLE_UPDATE_CHECK to ON.
  Connecting to REST server failed. Skipping update check.
  Error: Host unreachable
  TOPPBase.cpp(1588): Value of string option 'no_progress': 0
  TOPPBase.cpp(1588): Value of string option 'in': PD225_Block1-4.mzML
  TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4.mzML'
  TOPPBase.cpp(1588): Value of string option 'out': test_ants_exp_setup.sdrf_openms_design_openms.mzTab
  TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_openms.mzTab'
  TOPPBase.cpp(1588): Value of string option 'out_msstats': test_ants_exp_setup.sdrf_openms_design_msstats_in.csv
  TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_msstats_in.csv'
  TOPPBase.cpp(1588): Value of string option 'out_triqler': 
  TOPPBase.cpp(1588): Value of string option 'ids': PD225_Block1-4_consensus_fdr_filter.idXML
  TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4_consensus_fdr_filter.idXML'
  TOPPBase.cpp(1588): Value of string option 'design': test_ants_exp_setup.sdrf_openms_design.tsv
  TOPPBase.cpp(1588): Checking input file 'test_ants_exp_setup.sdrf_openms_design.tsv'
  TOPPBase.cpp(1588): Value of string option 'fasta': GCF_000001405.40_protein_decoy.fasta
  TOPPBase.cpp(1588): Checking input file 'GCF_000001405.40_protein_decoy.fasta'
  TOPPBase.cpp(1588): Value of string option 'quantification_method': feature_intensity
  Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames.
  TOPPBase.cpp(1588): Error occurred in line 1629 of file /opt/conda/conda-bld/openms-meta_1716538752609/work/src/topp/ProteomicsLFQ.cpp (in function: virtual OpenMS::TOPPBase::ExitCodes ProteomicsLFQ::main_(int, const char**)) !

Command used and terminal output

The command I use is:


nextflow run main.nf -c /home/nicholaih/29apr2024_upperlip_expression/nf_quantms/quantms/test_ants.config \
--input /home/nicholaih/29apr2024_upperlip_expression/dn_maxquant/test_run/test_ants_exp_setup.sdrf.tsv \
--outdir /home/nicholaih/29apr2024_upperlip_expression/quantms_test_ants \
--email [email protected] \
--multiqc_title test_ants \
--database /home/nicholaih/29apr2024_upperlip_expression/dn_maxquant/test_run/GCF_000001405.40_protein.fasta \
-profile mamba

The overall output produces this file:

task_id	hash	native_id	name	status	exit	submit	duration	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
1	66/5d8458	64595	NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (test_ants_exp_setup.sdrf.tsv)	COMPLETED	0	2024-10-17 07:35:12.926	2m 2s	1m 51s	33.1%	500.5 MB	7.3 GB	72.2 MB	36.1 KB
2	c8/ad3d93	2552	NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (test_ants_exp_setup.sdrf.tsv)	COMPLETED	0	2024-10-17 07:37:15.102	19.3s	13.7s	141.5%	181 MB	6.5 GB	28.2 MB	3.6 KB
4	74/106a48	4044	NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (PD225_Block1-4)	FAILED	1	2024-10-17 07:37:34.734	5.8s	5.7s	-	-	-	-	-
3	2c/a0a6a5	4063	NFCORE_QUANTMS:QUANTMS:DECOYDATABASE (1)	COMPLETED	0	2024-10-17 07:37:35.071	2m 3s	1m 44s	101.5%	135.7 MB	5 GB	190.9 MB	187.5 MB
5	4b/240662	4138	NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (PD225_Block1-4)	COMPLETED	0	2024-10-17 07:37:40.860	7m 45s	7m 39s	166.1%	1.4 GB	4.2 GB	574.2 MB	382.1 MB
6	36/6ea94d	19045	NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLSTATISTICS (PD225_Block1-4)	COMPLETED	0	2024-10-17 07:45:27.462	2m 14s	1m 49s	254.7%	813.6 MB	10.1 GB	456.6 MB	1.6 MB
8	32/a13204	19065	NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINESAGE ([PD225_Block1-4])	COMPLETED	0	2024-10-17 07:45:27.879	13m 50s	13m 17s	343.8%	22.1 GB	29.2 GB	1.4 GB	159.5 MB
9	4d/632ae5	52473	NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:IDPEP (PD225_Block1-4)	COMPLETED	0	2024-10-17 07:59:18.599	42.2s	28.3s	117.2%	164.6 MB	5 GB	68.6 MB	68.1 MB
7	f5/8a760e	19069	NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET (PD225_Block1-4)	COMPLETED	0	2024-10-17 07:45:27.949	1h 9m 50s	1h 9m 17s	773.2%	4.4 GB	9.9 GB	5.7 GB	153.9 MB
10	77/511ccd	14424	NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:IDPEP (PD225_Block1-4)	COMPLETED	0	2024-10-17 08:55:17.970	52.7s	39.2s	117.4%	179.5 MB	4.6 GB	98.2 MB	98.2 MB
11	68/842048	16291	NFCORE_QUANTMS:QUANTMS:LFQ:ID:CONSENSUSID (PD225_Block1-4)	COMPLETED	0	2024-10-17 08:56:11.244	1m 15s	1m 2s	107.3%	370.7 MB	5.2 GB	169.9 MB	69.9 MB
12	b6/5b4eeb	18638	NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:FDRCONSENSUSID (PD225_Block1-4)	COMPLETED	0	2024-10-17 08:57:26.622	43.4s	30.3s	119.4%	209.3 MB	5 GB	73.5 MB	50.1 MB
13	de/879e39	20397	NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER (PD225_Block1-4)	COMPLETED	0	2024-10-17 08:58:10.278	29.6s	16.6s	136.8%	164.8 MB	5 GB	57.1 MB	22.5 KB
14	0c/13b61a	21850	NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design)	FAILED	6	2024-10-17 08:58:40.676	14.9s	14.9s	-	-	-	-	-


### Relevant files

[quantms.zip](https://github.com/user-attachments/files/17432158/quantms.zip)

I'm attaching the output log, the config file, and the nextflow log in a zip file. Any help in diagnosing what is wrong would be really great as I'm keen to getting this program to work.

### System information

I'm trying to use a Linux OS, HPC cluster to run quantms using a SLURM job scheduler and the node has the following configuration (1 TB, 32 cores):

CPUAlloc=24 CPUTot=32 CPULoad=3.08
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node91 NodeHostName=node91 Version=20.11.2
OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020
RealMemory=1 AllocMem=0 FreeMem=964132 Sockets=4 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=largemem
BootTime=2023-02-01T08:46:44 SlurmdStartTime=2023-02-01T08:50:53
CfgTRES=cpu=32,mem=1M,billing=32
AllocTRES=cpu=24
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment=(null)

@NikoHensley NikoHensley added the bug Something isn't working label Oct 18, 2024
@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 18, 2024 via email

@ypriverol
Copy link
Member

@NikoHensley can you share your SDRF?

@NikoHensley
Copy link
Author

Hi all! I've attached my SDRF (as a text, although I run it as a tsv) and the parsing log from the quantms output. test_ants_exp_setup.sdrf_parsing.log
test_ants_exp_setup.sdrf.txt

@jpfeuffer
Copy link
Collaborator

I think both using capitalized .RAW and using single files currently have undefined/untested behaviour.
Please try to circumvent this for now.

@NikoHensley
Copy link
Author

So to clarify, I should change all the file names to only end as ".raw" ? And I cannot do analyses that only have one technical or biological replicate with your pipeline?

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 18, 2024

Yes, to be on the safe side, you should change the extension to .raw. This is probably the main issue.
I have never tried running it with a single file in an experiment. It might work but I can definitely say that we are not specifically handling this corner case, yet.

@jpfeuffer
Copy link
Collaborator

Those should all require only minor changes to handle it in the pipeline. It is just not something that we ever needed.

@jpfeuffer jpfeuffer changed the title Another LFQ error (exit status 6, not 8 or 11) Add support or warnings (and tests) for edge cases (extension case, single files) Oct 20, 2024
@NikoHensley
Copy link
Author

Hi, I've made the changes you've suggested and tried re-running the program but there are still errors at the ProteomicsLFQ step. It does get further than before, so the changes in file names from ".raw" helped. I tried using just 1 input file, that that gave an Exit 8 status code. So now I'm using 3 sample files appropriately names, and that is producing an Exit 139 status code. It seems like it cannot find peaks in these files? Or at least match them between samples? Although I have run this just fine using MaxQuant for these 3 same files. I'm attaching the output and error logs. Maybe I can discern something specific, but any help is appreciated! Looks like there's progress to getting the pipeline to work.

pipeline_report.txt
nextflow_proteomicslfq_debug.log

@NikoHensley
Copy link
Author

In a follow up to this, I've tried increasing the max_memory for the whole run as well as the memory available to the specific ProteomicsLFQ process (up to 350 G) and that does not help the problem. In the output file, with Exit Code error 6, ProteomicsLFQ terminates with:

Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames.
TOPPBase.cpp(1588): Error occurred in line 1629 of file /opt/conda/conda-bld/openms-meta_1716538752609/work/src/topp/ProteomicsLFQ.cpp (in function: virtual OpenMS::TOPPBase::ExitCodes ProteomicsLFQ::main_(int, const char**)) !

Which has been the problem from before.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 25, 2024

The first error that you had is due to a bug in a new version of thermorawfileparser that leads to infinite MZ values and therefore infinite memory allocation in ProteomicsLFQ (see the linked pull request). You will probably need to try the dev version until there is a Bugfix release. If you already cloned dev, you will need to pull the latest changes.
ProteomicsLFQ should not need 300gb on 4 files.

The second error is strange if it went beyond the experimental design validation stage before that (with the same sdrf file).
How does your sdrf look like now by the way?

@NikoHensley
Copy link
Author

My SDRF is here (as a .log file but only for uploading):
test_ants_exp_setup.sdrf.log

I will redownload quantms and also use the dev version in my next run to see how it performs, as you suggest.

@ypriverol
Copy link
Member

Sorry for this @NikoHensley, we are working with TRFP (compomics/ThermoRawFileParser#187) to solve this issue.

@jpfeuffer
Copy link
Collaborator

Hi, can you also use .raw in the sdrf, please? I think this is actually more important than renaming the file. That is where we do the case sensitive replacement.

@NikoHensley
Copy link
Author

Sorry, I may be misunderstanding the goal here. You want me to run it with ".raw" instead of ".RAW" even though you suggested that would make it fail?

@jpfeuffer
Copy link
Collaborator

No, I think .RAW will fail, while .raw should work.
Initially I only suggested to change the actual filenames but I realized the names how they are written in the sdrf are even more important.

The culprits are here:
https://github.com/search?q=repo%3Abigbio%2Fquantms%20.raw&type=code

@ypriverol
Copy link
Member

@NikoHensley if you want you way and we do a change in the code to tackle this issue, better that needs to rename the files from my point of view, we can do a PR to dev.

@jpfeuffer
Copy link
Collaborator

@ypriverol Ideally we should do it anyway. I saw .RAW being used sometimes. We should also check if those raw files are actually Thermo Raw files. I think other vendors might also use .raw and TRFP will fail on them. We should throw a nice error in that case.

@daichengxin
Copy link
Collaborator

I reproduced the error. The problem code is here

--extension_convert raw:mzML$extensionconversions \\
for sdrf input. We will fixed. Thanks a lot.

@jpfeuffer
Copy link
Collaborator

Can you check that the branching for file types etc works as expected too?

raw: hasExtension(it[1], '.raw')

sed 's/.raw\\t/.mzML\\t/I' ${design} > ${design.baseName}_openms_design.tsv

@daichengxin
Copy link
Collaborator

Yes. It works

def hasExtension(file, extension) {
return file.toString().toLowerCase().endsWith(extension.toLowerCase())
}
, and sed ignores case use /I

@jpfeuffer
Copy link
Collaborator

Awesome I totally overlooked those details

@NikoHensley
Copy link
Author

Ok, so I've tried running the bigbio/quantms (using the mamba profile), and with ".raw" endings. The program got up to the following before throwing an error (which is one I have not seen before). Ignore the FileParser failures, as the mspectra files are large and it retries them with more memory (and succeeds):

[29/e6727e] NFC…t_ants_exp_setup.sdrf.tsv) | 1 of 1 ✔
[f8/d4b01b] NFC…t_ants_exp_setup.sdrf.tsv) | 1 of 1 ✔
[- ] NFC…ILE_PREPARATION:DECOMPRESS -
[- ] NFC…E_PREPARATION:MZMLINDEXING -
[81/b5e183] NFC…EPARSER (PD279_Block_2-15) | 6 of 6, failed: 3, retries: 3 ✔
[55/6e7e00] NFC…TISTICS (PD279_Block_2-15) | 3 of 3 ✔
[0e/b60e6d] NFC…:QUANTMS:DECOYDATABASE (1) | 1 of 1 ✔
[- ] NFC…HENGINES:SEARCHENGINECOMET -
[- ] NFC…SCORING:EXTRACTPSMFEATURES -
[- ] NFC…ID:PSMRESCORING:PERCOLATOR -
[- ] NFC…FDRCONTROL:IDSCORESWITCHER -
[- ] NFC…:ID:PSMFDRCONTROL:IDFILTER -
[- ] NFC…UREMAPPER:ISOBARICANALYZER -
[- ] NFC…TMT:FEATUREMAPPER:IDMAPPER -
[- ] NFC…NTMS:QUANTMS:TMT:FILEMERGE -
[- ] NFC…NFERENCE:PROTEININFERENCER -
[- ] NFC…:PROTEININFERENCE:IDFILTER -
[- ] NFC…INQUANT:IDCONFLICTRESOLVER -
[- ] NFC…EINQUANT:PROTEINQUANTIFIER -
[- ] NFC…TEINQUANT:MSSTATSCONVERTER -
[a4/51d8ea] NFC…NECOMET (PD279_Block_2-15) | 3 of 3 ✔
[da/918f7e] NFC…EATURES (PD279_Block_2-15) | 3 of 3 ✔
[ae/165a28] NFC…COLATOR (PD279_Block_2-15) | 2 of 3
[d2/acd4cc] NFC…ESWITCHER (PD225_Block1-4) | 1 of 2
[e8/5ddcde] NFC…:IDFILTER (PD225_Block1-4) | 0 of 1
Plus 9 more processes waiting for tasks…
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER (PD225_Block1-4)'

Caused by:
Process NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER (PD225_Block1-4) terminated with an error exit status (6)

Command executed:

IDFilter
-in PD225_Block1-4_comet_feat_perc_pep.idXML
-out PD225_Block1-4_comet_feat_perc_pep_filter.idXML
-threads 2
-score:psm "0.10"
2>&1 | tee PD225_Block1-4_comet_feat_perc_pep_idfilter.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER":
IDFilter: $(IDFilter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS

Command exit status:
6

@jpfeuffer
Copy link
Collaborator

Exit status 6 is usually "wrong parameter" in OpenMS tools. Do you have the actual log of this step?

I.e. what was printed to stdout or equally what is in PD225_Block1-4_comet_feat_perc_pep_idfilter.log

@NikoHensley
Copy link
Author

The first few lines of the requested file read as such:

Unknown option(s) '[-score:psm]' given. Aborting!
stty: standard input: Inappropriate ioctl for device

IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_IDFilter.html
Version: 3.1.0-pre-exported-20240524 May 24 2024, 08:25:56
To cite OpenMS:

  • Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

I am not sure if it's because I only had one search engine selected (comet), but I am re-running the program with comet and sage to see if that helps. I am also having it print more of the debugging info for the ID steps.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 27, 2024

I was suspecting this. This parameter was changed between openms 3.1 and 3.2.
There seems to be a mismatch of the quantms version and the openms version you are using.

On dev we are definitely using the compatible one:

conda "bioconda::openms-thirdparty=3.2.0"

Not sure how 3.1 ended up on your workers. Are they sharing some condo env? Do you have a preexisting env?

In general we are very much recommending container profiles.

@ypriverol
Copy link
Member

ypriverol commented Oct 27, 2024

I found the bug. Thanks, @NikoHensley for reporting it. I will do a PR about it.

@ypriverol
Copy link
Member

@jpfeuffer It is interesting, I ran the version 3.2.0 of IDFilter and that parameter is there:

IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_IDFilter.html
Version: 3.2.0-pre-exported-20241011 Oct 11 2024, 12:58:22
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  IDFilter <options>

Options (mandatory options marked with '*'):
  -in <file>*                                               Input file  (valid formats: 'idXML', 'consensusXML')
  -out <file>*                                              Output file  (valid formats: 'idXML', 'consensusXML')

Filtering by precursor attributes (RT, m/z, charge, length):
  -precursor:rt [min]:[max]                                 Retention time range to extract. (default: ':')
  -precursor:mz [min]:[max]                                 Mass-to-charge range to extract. (default: ':')
  -precursor:length [min]:[max]                             Keep only peptide hits with a sequence length in this range. (default: ':')
  -precursor:charge [min]:[max]                             Keep only peptide hits with charge states in this range. (default: ':')

Filtering by peptide/protein score.:
  -score:psm <score>                                        The score which should be reached by a peptide hit to be kept. (use 'NAN' to disable this filter) (default: 'nan')
  -score:protein <score>                                    The score which should be reached by a protein hit to be kept. All proteins are filtered based on their singleton scores irrespective of grouping. 
                                                            Use in combination with 'delete_unreferenced_peptide_hits' to remove affected peptides. (use 'NAN' to disable this filter) (default: 'nan')
  -score:proteingroup <score>                               The score which should be reached by a protein group to be kept. Performs group level score filtering (including groups of single proteins). Use in 
                                                            combination with 'delete_unreferenced_peptide_hits' to remove affected peptides. (use 'NAN' to disable this filter) (default: 'nan')

@jpfeuffer
Copy link
Collaborator

I know. But @NikoHensley is somehow using 3.1.0, which is not compatible to the current version of quantms.
As I said, on first glance, everything looks correct in our version definitions, so it might be a corrupted user environment.

@NikoHensley
Copy link
Author

NikoHensley commented Oct 30, 2024

I updated my conda, nextflow, and re-added bioconda to my environment to see if they were grabbing the wrong instance of openms when the pipeline is initiated, but that did not fix the fact that when i try to run quantms using nextflow, it automatically tries to grab openms 3.1 ? Here's the command and the important part of the output:

nextflow run bigbio/quantms -r dev -profile test,mamba

And then it says (but always has)
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [c70bc99]
Launching https://github.com/bigbio/quantms [reverent_goldstine] DSL2 - revision: acf5318 [dev]
Another Nextflow instance is creating the conda environment bioconda::openms-thirdparty=3.1.0 -- please wait till it completes

As well as the log:
nextflow.log

@ypriverol
Copy link
Member

This is strange. Because I have done a full search in the code for 3.1.0 and found nothing. Thanks a lot for searching with us for this bug. I will try to reproduce it here.

@fiuzatayna
Copy link

I'm also facing this same Command output: Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames. issue. Apparently the sdrf is fine. I tried with the pre-converted .mzML files and also back with the original .raw files. Running -profile test,docker is fine, but -profile docker gives me the mentioned error. I will try other configurations, but I'm commenting here since it is a fresh topic.

@ypriverol
Copy link
Member

This looks like a different error @fiuzatayna

@NikoHensley
Copy link
Author

I tried remaking the conda environment for nextflow and updating packages to see if it was pulling an old version of OpenMS from somewhere else, and that did not work either. I got the same error of it trying to use openms=3.1.0

@ypriverol
Copy link
Member

Im trying to test also locally. to reproduce the error. Can you see in the .nextlow.log in wich step the conda version is pull?

@NikoHensley
Copy link
Author

I just updated my conda/mamba hoping that would also solve the issue and have now ruined my conda install, as nextflow now throws a different error at that step (127 instead of 6). However, my nextflow.log a few posts back details where it found the openms=3.1.0, which I've copied below.

Oct-30 08:31:27.045 [Actor Thread 10] DEBUG nextflow.conda.CondaCache - mamba found local env for environment=bioconda::openms-thirdparty=3.1.0; path=/home/nicholaih/29apr2024_upperlip_expression/nf_quantms/quantms/work/conda/env-9a0636a369edb4e4410fb1491ffbb8cd

I am not sure where this local environment came? I have a local, up-to-date version of quantms as well as trying to use this dev version with a conda/mamba profile

@ypriverol
Copy link
Member

Did you delete your conda work folder? I had multiple issues with conda (the main reason why I use now is mainly singularity) It may help to delete the work folder. If you see, it is using the CondaCache with mamba; probably, it doesn't update or use the new version because of that?

@jpfeuffer
Copy link
Collaborator

And then it says (but always has)
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [c70bc99]

This is most likely the culprit. You need to use -latest for nextflow to always pull the latest changes of the pipeline.

@NikoHensley
Copy link
Author

I deleted my work directory and old install of quantms entirely to see if that fixed the issue of using openms=3.1.0. It did not. And I've tried using -latest and it throws this error below:

(env_nf) bash-4.2$ nextflow run bigbio/quantms -r dev -profile test,mamba -latest

N E X T F L O W ~ version 24.10.0

Pulling bigbio/quantms ...
bigbio/quantms contains uncommitted changes -- cannot pull from repository

As opposed to when I use the normal command and it recapitulates the error of defaulting to using openms=3.1.0

(env_nf) bash-4.2$ nextflow run bigbio/quantms -r dev -profile test,mamba

N E X T F L O W ~ version 24.10.0

NOTE: Your local project version looks outdated - a different revision is available in the remote repository [c70bc99]
Launching https://github.com/bigbio/quantms [lethal_brazil] DSL2 - revision: acf5318 [dev]

Maybe I'm just not familiar with nextflow enough or am doing something wrong with how I'm calling different versions of quantms. I have to use the conda/mamba versions because docker/singularity do not play nicely on the HPC I am using, as I have no root access. I just cloned the quantms dev repository and will try using main.nf (local version) instead to see if that uses openms=3.2 instead.

@jpfeuffer
Copy link
Collaborator

Did you maybe accidentally pull changes in the cached clone of quantms that nextflow manages for you?

You should always either:

  • have your local clone and use nextflow run main.nf
  • OR let nextflow handle this with nextflow run reponame

@NikoHensley
Copy link
Author

I usually just let nextflow pull the repo, using nextflow run bigbio/quantms. But I have tried every iteration to get it to run, including have a local version. Maybe that did confuse it. I will wipe it all and start fresh

@NikoHensley
Copy link
Author

I have restored my conda, deleted the local version of quantms, and have gotten nextflow to run quantms properly given the following test:

nextflow run -latest bigbio/quantms -r dev -profile test,mamba

This has been quite a journey and I am happy that the output worked on the example data with the proper openms=3.2.0. However, I cannot get this to now work on my sample data, and it returns an error about not being able to parse the pulled config file:

 N E X T F L O W   ~  version 24.10.0

ERROR ~ Unable to parse config file: '/home/nicholaih/.nextflow/assets/bigbio/quantms/nextflow.config'

  Cannot read config file include: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config

despite having just run successfully on the example. I am trying to use this on an HPC with a slurm submission process. However, when I try to run it on a head node (with limited memory), the nextflow processes instead starts but then fails at the SDRF checking step, despite my SDRF being the same as previously reported with 3 samples. I will explore these issues more, as I believe it seems like user error, but it is confusing as to why it would fail in different ways during the HPC submission versus on a head node. Thanks for all your help thus far!

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Nov 3, 2024

The config error could have been a hiccup when connecting or downloading from GitHub.
Or does this happen reproducibly? You could always set the NXF_OFFLINE variable (not sure which other things this disables) or download the configs and set the config_base_url (for how it was called) parameter to a local path.

Regarding SLURM: have you created a configuration for your cluster already? The minimum you will need to do is to set process.executor = "slurm" (which you could do from the command line). Once you need more parameters you should use a config file.
https://www.nextflow.io/docs/latest/executor.html#slurm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants