Skip to content

Commit

Permalink
Merge pull request #484 from zargham-ahmad/issue480
Browse files Browse the repository at this point in the history
Disabled key conversion in matchms tools
  • Loading branch information
hechth authored Feb 5, 2024
2 parents c6bc00f + d54b869 commit da19386
Show file tree
Hide file tree
Showing 30 changed files with 524 additions and 403 deletions.
25 changes: 22 additions & 3 deletions tools/matchms/matchms_add_key.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="matchms_add_key" name="matchms add key" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<tool id="matchms_add_key" name="matchms add key" version="@TOOL_VERSION@+galaxy1" profile="21.09">
<description>Set metadata key in MSP to static value</description>

<macros>
Expand All @@ -25,8 +25,13 @@
<configfile name="matchms_python_cli">
@init_logger@

import matchms
from matchms.importing import load_from_msp
from matchms.exporting import save_as_msp


matchms.Metadata.set_key_replacements({})

spectra = list(load_from_msp("${spectral_library}", metadata_harmonization = "False"))
new_spectra = []
for spectrum in spectra:
Expand All @@ -41,7 +46,15 @@ save_as_msp(new_spectra, "${output}")
help="Mass spectral library file to add key." />

<param label="Attribute Name" name="key" type="text" value="" help="Name of the attribute which will be assigned to all spectra records in the MSP." />
<param label="Value" name="value" type="text" value="" help="Value of the attribute which will be assigned to all spectra records in the MSP." />
<param label="Value" name="value" type="text" value="" help="Value of the attribute which will be assigned to all spectra records in the MSP." >
<sanitizer>
<valid initial="default">
<add value="{}" />
<add value="[]" />
<add value="\" />
</valid>
</sanitizer>
</param>
</inputs>

<outputs>
Expand All @@ -54,7 +67,13 @@ save_as_msp(new_spectra, "${output}")
<param name="spectral_library" value="filtering/input.msp" ftype="msp"/>
<param name="key" value="tool_used"/>
<param name="value" value="matchms"/>
<output name="output" file="out_matchms_add_key.msp" ftype="msp"/>
<output name="output" file="add_key/out_matchms_add_key.msp" ftype="msp"/>
</test>
<test>
<param name="spectral_library" value="add_key/add_key_test2.msp" ftype="msp"/>
<param name="key" value="adduct"/>
<param name="value" value="[M]+"/>
<output name="output" file="add_key/add_key_test2_out.msp" ftype="msp"/>
</test>
</tests>

Expand Down
4 changes: 2 additions & 2 deletions tools/matchms/matchms_fingerprint_similarity.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ scores.to_json("$scores_out")
Similarity between molecular fingerprints can serve as a proxy for structural similarity and can therefore be used to compare molecules.

.. rubric:: **Footnotes**
.. [1] SQL join types explained on LearnSQL_.
.. [1] SQL join types explained on W3School_.
.. [2] Fingerprint - the `daylight fingerprint`_ is used to compute chemical similarity.
Fingerprints are derived from SMILES or InChI structure notations present in the spectrum metadata.

.. _LearnSQL: https://learnsql.com/blog/sql-joins-types-explained/
.. _W3School: https://www.w3schools.com/sql/sql_join.asp
.. _daylight fingerprint: https://www.daylight.com/dayhtml/doc/theory/theory.finger.html

@HELP_matchms@
Expand Down
9 changes: 8 additions & 1 deletion tools/matchms/matchms_metadata_export.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="matchms_metadata_export" name="matchms metadata export" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<tool id="matchms_metadata_export" name="matchms metadata export" version="@TOOL_VERSION@+galaxy1" profile="21.09">
<description>extract all metadata from mass spectra file to tabular format</description>
<macros>
<import>macros.xml</import>
Expand All @@ -17,9 +17,14 @@

<configfiles>
<configfile name="matchms_python_cli">
import matchms
from matchms.importing import load_from_msp, load_from_mgf
from matchms.exporting.metadata_export import export_metadata_as_csv


if "$harmonize_metadata" == "False":
matchms.Metadata.set_key_replacements({})

spectra_list = list(load_from_${input_file.ext}("${input_file}", $harmonize_metadata))

export_metadata_as_csv(spectra_list, "${output_file}")
Expand All @@ -39,10 +44,12 @@ export_metadata_as_csv(spectra_list, "${output_file}")
<tests>
<test>
<param name="input_file" value="convert/mgf_out.mgf" ftype="mgf"/>
<param name="harmonize_metadata" value="True"/>
<output name="output_file" file="convert/metadata.csv" ftype="csv" compare="sim_size" delta="0"/>
</test>
<test>
<param name="input_file" value="similarity/RECETOX_Exposome_pesticides_HR_MS_20220323.msp" ftype="msp"/>
<param name="harmonize_metadata" value="True"/>
<output name="output_file" file="convert/metadata.csv" ftype="csv" compare="sim_size" delta="0"/>
</test>
</tests>
Expand Down
4 changes: 2 additions & 2 deletions tools/matchms/matchms_metadata_match.xml
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,9 @@ scores.to_json("$scores_out")
The output will be __TRUE__ if the value is within the tolerance or an exact match and __FALSE__ otherwise.

.. rubric:: **Footnotes**
.. [1] SQL join types explained on LearnSQL_.
.. [1] SQL join types explained on W3School_.

.. _LearnSQL: https://learnsql.com/blog/sql-joins-types-explained/
.. _W3School: https://www.w3schools.com/sql/sql_join.asp

@HELP_matchms@
</help>
Expand Down
4 changes: 2 additions & 2 deletions tools/matchms/matchms_spectral_similarity.xml
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,9 @@ scores.to_json("$similarity_scores")
For more details see this `galaxy training`_.

.. rubric:: **Footnotes**
.. [1] SQL join types explained on LearnSQL_.
.. [1] SQL join types explained on W3School_.

.. _LearnSQL: https://learnsql.com/blog/sql-joins-types-explained/
.. _W3School: https://www.w3schools.com/sql/sql_join.asp
.. _galaxy training: https://training.galaxyproject.org/training-material/topics/metabolomics/tutorials/gc_ms_with_xcms/tutorial.html

@HELP_matchms@
Expand Down
32 changes: 4 additions & 28 deletions tools/matchms/matchms_split.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,13 @@
import argparse
import itertools
import os
from typing import List

import matchms
from matchms.exporting import save_as_msp
from matchms.importing import load_from_msp


def get_spectra_names(spectra: list) -> List[str]:
"""Read the keyword 'compound_name' from a spectra.
Args:
spectra (list): List of individual spectra.
Returns:
List[str]: List with 'compoud_name' of individual spectra.
"""
return [x.get("compound_name") for x in spectra]
matchms.Metadata.set_key_replacements({})


def make_outdir(outdir: str):
Expand All @@ -35,23 +26,8 @@ def write_spectra(spectra, outdir):
spectra (List[Spectrum]): Spectra to write to file
outdir (str): Path to destination directory.
"""
names = get_spectra_names(spectra)
for i in range(len(spectra)):
outpath = assemble_outpath(names[i], outdir)
save_as_msp(spectra[i], outpath)


def assemble_outpath(name, outdir):
"""Filter special chracteres from name.
Args:
name (str): Name to be filetered.
outdir (str): Path to destination directory.
"""
filename = ''.join(filter(str.isalnum, name))
outfile = str(filename) + ".msp"
outpath = os.path.join(outdir, outfile)
return outpath
save_as_msp(spectra[i], os.path.join(outdir, f"{i}.msp"))


def split_round_robin(iterable, num_chunks):
Expand All @@ -76,7 +52,7 @@ def split_round_robin(iterable, num_chunks):


if __name__ == "__main__":
spectra = load_from_msp(filename, metadata_harmonization=True)
spectra = load_from_msp(filename, metadata_harmonization=False)
make_outdir(outdir)

if method == "one-per-file":
Expand Down
22 changes: 11 additions & 11 deletions tools/matchms/matchms_split.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="matchms_split" name="matchms split library" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<tool id="matchms_split" name="matchms split library" version="@TOOL_VERSION@+galaxy1" profile="21.09">
<description>split a large library into subsets</description>
<macros>
<import>macros.xml</import>
Expand Down Expand Up @@ -53,16 +53,16 @@
<param name="msp_input" value="split/sample_input.msp" />
<param name="split_type" value="one-per-file" />
<output_collection name="sample" type="list">
<element name="1NITROPYRENE" file="split/one-per-file/1NITROPYRENE.msp" ftype="msp" compare="diff"/>
<element name="23DICHLOROPHENOL" file="split/one-per-file/23DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="245TRICHLOROPHENOL" file="split/one-per-file/245TRICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="246TRICHLOROPHENOL" file="split/one-per-file/246TRICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="24DICHLOROPHENOL" file="split/one-per-file/24DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="24DINITROPHENOL" file="split/one-per-file/24DINITROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="25DICHLOROPHENOL" file="split/one-per-file/25DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="26DICHLOROPHENOL" file="split/one-per-file/26DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="34DICHLOROPHENOL" file="split/one-per-file/34DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="35DICHLOROPHENOL" file="split/one-per-file/35DICHLOROPHENOL.msp" ftype="msp" compare="diff"/>
<element name="0" file="split/one-per-file/0.msp" ftype="msp" compare="diff"/>
<element name="1" file="split/one-per-file/1.msp" ftype="msp" compare="diff"/>
<element name="2" file="split/one-per-file/2.msp" ftype="msp" compare="diff"/>
<element name="3" file="split/one-per-file/3.msp" ftype="msp" compare="diff"/>
<element name="4" file="split/one-per-file/4.msp" ftype="msp" compare="diff"/>
<element name="5" file="split/one-per-file/5.msp" ftype="msp" compare="diff"/>
<element name="6" file="split/one-per-file/6.msp" ftype="msp" compare="diff"/>
<element name="7" file="split/one-per-file/7.msp" ftype="msp" compare="diff"/>
<element name="8" file="split/one-per-file/8.msp" ftype="msp" compare="diff"/>
<element name="9" file="split/one-per-file/9.msp" ftype="msp" compare="diff"/>
</output_collection>
</test>
<test>
Expand Down
6 changes: 3 additions & 3 deletions tools/matchms/matchms_subsetting.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<tool id="matchms_subsetting" name="matchms subsetting" version="@TOOL_VERSION@+galaxy4" profile="21.09">
<tool id="matchms_subsetting" name="matchms subsetting" version="@TOOL_VERSION@+galaxy5" profile="21.09">
<description>Extract spectra from a library given unique metadata identifier</description>

<macros>
Expand Down Expand Up @@ -87,13 +87,13 @@ matchms.exporting.save_as_msp(filtered_spectra.tolist(), '${output}')

<tests>
<test>
<param name="spectral_library" value="out_matchms_add_key.msp" ftype="msp"/>
<param name="spectral_library" value="filtering/input.msp" ftype="msp"/>
<param name="mode" value="include"/>
<param name="list_of_identifiers" value="subsetting/identifier.csv" ftype="csv"/>
<output name="output" file="subsetting/subsetting_output.msp" ftype="msp"/>
</test>
<test>
<param name="spectral_library" value="out_matchms_add_key.msp" ftype="msp"/>
<param name="spectral_library" value="filtering/input.msp" ftype="msp"/>
<param name="mode" value="exclude"/>
<param name="list_of_identifiers" value="subsetting/identifier.csv" ftype="csv"/>
<output name="output" file="subsetting/subsetting_output2.msp" ftype="msp"/>
Expand Down
46 changes: 46 additions & 0 deletions tools/matchms/test-data/add_key/add_key_test2.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
SCANNUMBER: -1
IONMODE: positive
SPECTRUMTYPE: Centroid
FORMULA: C20H12
INCHIKEY: CSHWQDPOILHKBI-UHFFFAOYSA-N
SMILES: C1=CC2=C3C(=C1)C1=CC=CC4=C1C(=CC=C4)C3=CC=C2
AUTHORS: Price et al., RECETOX, Masaryk University (CZ)
INSTRUMENT: Q Exactive GC Orbitrap GC-MS/MS
IONIZATION: EI+
LICENSE: CC BY-NC
COMPOUND_NAME: Perylene
RETENTION_TIME: None
RETENTION_INDEX: 2886.9
COLLISION_ENERGY: 70eV
INSTRUMENT_TYPE: GC-EI-Orbitrap
CHARGE: 1
PARENT_MASS: 251.08595400000002
NUM PEAKS: 3
250.07765 0.3282529462971431
252.09323 1.0
253.09656 0.20573802940517583

SCANNUMBER: -1
IONMODE: positive
SPECTRUMTYPE: Centroid
FORMULA: C14H10
INCHIKEY: YNPNZTXNASCQKK-UHFFFAOYSA-N
SMILES: C1=CC2=C(C=C1)C1=C(C=CC=C1)C=C2
AUTHORS: Price et al., RECETOX, Masaryk University (CZ)
INSTRUMENT: Q Exactive GC Orbitrap GC-MS/MS
IONIZATION: EI+
LICENSE: CC BY-NC
COMPOUND_NAME: Phenanthrene
RETENTION_TIME: None
RETENTION_INDEX: 1832.9
COLLISION_ENERGY: 70eV
INSTRUMENT_TYPE: GC-EI-Orbitrap
CHARGE: 1
PARENT_MASS: 177.070224
NUM PEAKS: 5
152.0619 0.1657993569424221
176.062 0.24558560966311757
177.06982 0.12764433529926775
178.0775 1.0
179.08078 0.16394988149600653

48 changes: 48 additions & 0 deletions tools/matchms/test-data/add_key/add_key_test2_out.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
SCANNUMBER: -1
IONMODE: positive
SPECTRUMTYPE: Centroid
FORMULA: C20H12
INCHIKEY: CSHWQDPOILHKBI-UHFFFAOYSA-N
SMILES: C1=CC2=C3C(=C1)C1=CC=CC4=C1C(=CC=C4)C3=CC=C2
AUTHORS: Price et al., RECETOX, Masaryk University (CZ)
INSTRUMENT: Q Exactive GC Orbitrap GC-MS/MS
IONIZATION: EI+
LICENSE: CC BY-NC
COMPOUND_NAME: Perylene
RETENTION_TIME: None
RETENTION_INDEX: 2886.9
COLLISION_ENERGY: 70eV
INSTRUMENT_TYPE: GC-EI-Orbitrap
CHARGE: 1
PARENT_MASS: 251.08595400000002
ADDUCT: [M]+
NUM PEAKS: 3
250.07765 0.3282529462971431
252.09323 1.0
253.09656 0.20573802940517583

SCANNUMBER: -1
IONMODE: positive
SPECTRUMTYPE: Centroid
FORMULA: C14H10
INCHIKEY: YNPNZTXNASCQKK-UHFFFAOYSA-N
SMILES: C1=CC2=C(C=C1)C1=C(C=CC=C1)C=C2
AUTHORS: Price et al., RECETOX, Masaryk University (CZ)
INSTRUMENT: Q Exactive GC Orbitrap GC-MS/MS
IONIZATION: EI+
LICENSE: CC BY-NC
COMPOUND_NAME: Phenanthrene
RETENTION_TIME: None
RETENTION_INDEX: 1832.9
COLLISION_ENERGY: 70eV
INSTRUMENT_TYPE: GC-EI-Orbitrap
CHARGE: 1
PARENT_MASS: 177.070224
ADDUCT: [M]+
NUM PEAKS: 5
152.0619 0.1657993569424221
176.062 0.24558560966311757
177.06982 0.12764433529926775
178.0775 1.0
179.08078 0.16394988149600653

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
NAME: C001
IONMODE: Negative
RETENTIONTIME: 38.74
RETENTIONINDEX: -1
SPECTRUMTYPE: Centroid
COMPOUND_NAME: C001
RETENTION_TIME: 38.74
RETENTION_INDEX: -1
TOOL_USED: matchms
NUM PEAKS: 57
138.9121 10186226.0
Expand Down Expand Up @@ -63,11 +63,11 @@ NUM PEAKS: 57
676.6436 1982714.0
800.4451 2792137.0

NAME: C002
IONMODE: Negative
RETENTIONTIME: 520.25
RETENTIONINDEX: 1234.5
SPECTRUMTYPE: Centroid
COMPOUND_NAME: C002
RETENTION_TIME: 520.25
RETENTION_INDEX: 1234.5
TOOL_USED: matchms
NUM PEAKS: 35
131.1733 1971789.0
Expand Down Expand Up @@ -106,10 +106,10 @@ NUM PEAKS: 35
1216.8041 4439324.0
1217.807 3565334.0

NAME: C003
IONMODE: Negative
RETENTIONTIME: 483.67
SPECTRUMTYPE: Centroid
COMPOUND_NAME: C003
RETENTION_TIME: 483.67
TOOL_USED: matchms
NUM PEAKS: 26
265.2529 11366224.0
Expand Down Expand Up @@ -139,10 +139,10 @@ NUM PEAKS: 26
1071.1639 15461047.0
1072.1671 5096642.0

NAME: C004
IONMODE: Negative
RETENTIONTIME: 473.48
SPECTRUMTYPE: Centroid
COMPOUND_NAME: C004
RETENTION_TIME: 473.48
TOOL_USED: matchms
NUM PEAKS: 24
124.1405 6517662.0
Expand Down Expand Up @@ -170,10 +170,10 @@ NUM PEAKS: 24
1019.6555 57647644.0
1020.6591 12469103.0

NAME: C005
IONMODE: Negative
RETENTIONTIME: 41.72
SPECTRUMTYPE: Centroid
COMPOUND_NAME: C005
RETENTION_TIME: 41.72
TOOL_USED: matchms
NUM PEAKS: 20
218.1386 14009249.0
Expand Down
Loading

0 comments on commit da19386

Please sign in to comment.