Adding simulation stuff #41

adamoyoung · 2024-10-29T14:34:05Z

Some of the dataset inheritance stuff needs to be reworked, but it seems like you guys are still actively developing it.

To run the model (FP model, mass-based candidate set for retrieval, wandb logging disabled), I use python scripts/run_pl_model_fit.py -c config/simulation_retrieval/fp_formula -w disabled -s <directory_for_saving_checkpoints>.

Let me know how you would like to proceed.

…n data

…all changes

… class

…ampling

roman-bushuiev

It's looking great!! I just left some minor comments.

roman-bushuiev · 2024-12-27T12:33:11Z

massspecgym/data/transforms.py

+        # spec.peaks.intensities = spec.peaks.intensities * 1000.
+        return spec
+
+    def matchms_to_torch(self, spec: matchms.Spectrum) -> dict:


@adamoyoung Could you please update the docstring? It seems like this one is from SpecTokenizer.

roman-bushuiev · 2024-12-27T13:07:57Z

massspecgym/runner.py

@@ -0,0 +1,244 @@
+import torch


Adamo, could you move this file for example to massspecgym/scripts/runnner_simulation.py? We will try to merge it with massspecgym/scripts/run.py (equivalent file for the de novo and retrieval challenges) in the future (probably next month). If it is located in massspecgym/runner.py it may confuse people because it is designed for the simulation challenge at the moment.

sure, although I would note that the the functionality of massspecgym/scripts/run.py for the simulation stuff is spread across two files (massspecgym/runner.py and scripts/run_pl_model_fit.py)

roman-bushuiev · 2024-12-27T13:12:04Z

scripts/run_pl_model_fit.py

Could you please include simulation in the file name here as well?

roman-bushuiev · 2024-12-27T13:24:14Z

notebooks/mgf_to_csv_final_cleaning.ipynb

I think this notebook is an outdated version of this notebook in the main branch. Maybe we can remove it?

roman-bushuiev · 2024-12-27T13:26:31Z

@adamoyoung, what do you mean by

Some of the dataset inheritance stuff needs to be reworked, but it seems like you guys are still actively developing it.

It seems like you have already implemented all the inheritance consistently with the other challenges.

adamoyoung · 2025-01-12T15:51:56Z

@roman-bushuiev I addressed the changes above, and the Jensen-Shannon Similarity bug we discussed offline. I verified that the simulation models can be retrained with this new code and achieve comparable metrics to those reported in the paper. However, I haven't tested that the other experiments still work as expected, should I do that? What commands do I need to run to do it?

adamoyoung · 2025-01-12T16:41:58Z

Actually there are some additional minor changes, so I will re-run the simulation experiments to confirm everything is good. Will let you know when it's ready.

roman-bushuiev · 2025-01-12T18:06:52Z

Hi Adamo! Sounds great!

To check the other experiments I believe it should be sufficient just to check that the demo.ipynb notebook works properly. For example, the "De novo SMILES transformer" section.

adamoyoung · 2025-01-14T01:10:48Z

In the notebook, I'm getting an error in the "MIST on the fingerprint retrieval task" cell:

FileNotFoundError: [Errno 2] No such file or directory: 'fp_preds_MassSpecGym_df.pkl'

I don't see a file like that in the directory, is it supposed to be created by something in the notebook? It doesn't seem like it.

roman-bushuiev · 2025-01-14T16:30:02Z

In the notebook, I'm getting an error in the "MIST on the fingerprint retrieval task" cell:

FileNotFoundError: [Errno 2] No such file or directory: 'fp_preds_MassSpecGym_df.pkl'

I don't see a file like that in the directory, is it supposed to be created by something in the notebook? It doesn't seem like it.

It's fine, we just need to remove the MIST part from the demo because MIST is not implemented within our codebase. Does the "De novo SMILES transformer" section work?

adamoyoung · 2025-01-17T05:08:03Z

I ran the notebook and it runs, but the output is different from before. It seems ok, can you take a look? notebook

roman-bushuiev · 2025-01-18T11:05:32Z

It looks good to me and I think we can merge the PR. What do you think?

So far, it seems that the biggest interest is in the simulation challenge. Maybe you could add a cell to the demo notebook or a brief tutorial on how to run the training scripts?

adamoyoung · 2025-01-18T21:35:53Z

OK sure I can add that

adamoyoung added 30 commits May 27, 2024 00:12

adding preprocessing for the spectrum simulation subset

dd80123

initial dataset/transform changes, still WIP

dc757e9

adding dataset stuff for simulation models

532272e

fixing minor bugs

8c6514c

temporary commit, WIP

0b8d346

attempting merge

5921920

updating dataset stuff to work with new data

71ad27c

adding model stuff, WIP

da0bcfe

more model stuff, still WIP

4fe1d32

more pl model stuff, still WIP

002bb15

first model runs, still need to debug training

f846ac3

minor dataset rework, eval metrics still not functional

d473f57

NEIMS model now works in notebook, does not perform well on validatio…

cbb1ad0

…n data

adding runner.py script, proper epoch metric accumulation

8006a2c

adding precursor only baseline, fixing some bugs

5d081e0

refactoring into runner/config format, fixing intensity bug, other sm…

52666d5

…all changes

integrating newer dataset (v4), misc small changes

a6dfbea

implemented GNN models

e3eb328

changing old dataset filters to checks

18751cd

final commit before original submission

5cf673d

finished merge

4153a2f

major updates

1123bad

removing cache_feats

950245a

merging changes from main

368978f

fixing import bugs, inheritance bugs

0f5adf1

fixing metric calculation and logging functions to comply with parent…

3f13624

… class

reworking models to not use save_hyperparameters

3304b64

reworking SimulationDataset

163c37b

initial simulation retrieval implementation, support for dataset subs…

8c1cc68

…ampling

more retrieval changes, fixing some bugs

0ccea3f

reworking data stuff

988d343

roman-bushuiev reviewed Dec 27, 2024

View reviewed changes

roman-bushuiev mentioned this pull request Dec 27, 2024

baselines for spectrum simulation #45

Closed

adamoyoung added 6 commits January 6, 2025 16:36

merging with more recent changes

1daf6f0

removing old notebook

ee31e86

fixing jss metric

73c3adc

reorganizing run files

b70d544

minor updates to SpecToMzsInts transform

d77b965

Merge branch 'main' into adamo5

73328f4

redefining jss such that it is between 0 and 1

443fb07

adamoyoung added 2 commits January 17, 2025 00:04

minor fix to datasets

9fdb5c8

adding updated version of demo notebook

7ccbc91

adamoyoung and others added 8 commits February 3, 2025 23:34

reworking configs, removing old notebooks

2dc5bf3

adding script for fixing JSS from original version

aaafca6

removing user-specific info from template config

059d0fe

updating demo stuff

4b7589f

updating boostrap notebook

7a03118

fixing some notebook stuff

5a68754

removing debug configs

01508d2

Merge branch 'main' into adamo5

1c5fdd2

roman-bushuiev merged commit 00f5c03 into pluskal-lab:main Feb 10, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding simulation stuff #41

Adding simulation stuff #41

adamoyoung commented Oct 29, 2024 •

edited

Loading

roman-bushuiev left a comment

roman-bushuiev Dec 27, 2024

roman-bushuiev Dec 27, 2024

adamoyoung Jan 6, 2025

roman-bushuiev Dec 27, 2024

roman-bushuiev Dec 27, 2024

roman-bushuiev commented Dec 27, 2024

adamoyoung commented Jan 12, 2025

adamoyoung commented Jan 12, 2025

roman-bushuiev commented Jan 12, 2025

adamoyoung commented Jan 14, 2025

roman-bushuiev commented Jan 14, 2025

adamoyoung commented Jan 17, 2025

roman-bushuiev commented Jan 18, 2025

adamoyoung commented Jan 18, 2025

Adding simulation stuff #41

Adding simulation stuff #41

Conversation

adamoyoung commented Oct 29, 2024 • edited Loading

roman-bushuiev left a comment

Choose a reason for hiding this comment

roman-bushuiev Dec 27, 2024

Choose a reason for hiding this comment

roman-bushuiev Dec 27, 2024

Choose a reason for hiding this comment

adamoyoung Jan 6, 2025

Choose a reason for hiding this comment

roman-bushuiev Dec 27, 2024

Choose a reason for hiding this comment

roman-bushuiev Dec 27, 2024

Choose a reason for hiding this comment

roman-bushuiev commented Dec 27, 2024

adamoyoung commented Jan 12, 2025

adamoyoung commented Jan 12, 2025

roman-bushuiev commented Jan 12, 2025

adamoyoung commented Jan 14, 2025

roman-bushuiev commented Jan 14, 2025

adamoyoung commented Jan 17, 2025

roman-bushuiev commented Jan 18, 2025

adamoyoung commented Jan 18, 2025

adamoyoung commented Oct 29, 2024 •

edited

Loading