Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding simulation stuff #41

Merged
merged 73 commits into from
Feb 10, 2025
Merged

Adding simulation stuff #41

merged 73 commits into from
Feb 10, 2025

Conversation

adamoyoung
Copy link
Collaborator

@adamoyoung adamoyoung commented Oct 29, 2024

Some of the dataset inheritance stuff needs to be reworked, but it seems like you guys are still actively developing it.

To run the model (FP model, mass-based candidate set for retrieval, wandb logging disabled), I use python scripts/run_pl_model_fit.py -c config/simulation_retrieval/fp_formula -w disabled -s <directory_for_saving_checkpoints>.

Let me know how you would like to proceed.

adamoyoung added 30 commits May 27, 2024 00:12
Copy link
Contributor

@roman-bushuiev roman-bushuiev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking great!! I just left some minor comments.

# spec.peaks.intensities = spec.peaks.intensities * 1000.
return spec

def matchms_to_torch(self, spec: matchms.Spectrum) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamoyoung Could you please update the docstring? It seems like this one is from SpecTokenizer.

@@ -0,0 +1,244 @@
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adamo, could you move this file for example to massspecgym/scripts/runnner_simulation.py? We will try to merge it with massspecgym/scripts/run.py (equivalent file for the de novo and retrieval challenges) in the future (probably next month). If it is located in massspecgym/runner.py it may confuse people because it is designed for the simulation challenge at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, although I would note that the the functionality of massspecgym/scripts/run.py for the simulation stuff is spread across two files (massspecgym/runner.py and scripts/run_pl_model_fit.py)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please include simulation in the file name here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this notebook is an outdated version of this notebook in the main branch. Maybe we can remove it?

@roman-bushuiev
Copy link
Contributor

@adamoyoung, what do you mean by

Some of the dataset inheritance stuff needs to be reworked, but it seems like you guys are still actively developing it.

It seems like you have already implemented all the inheritance consistently with the other challenges.

@adamoyoung
Copy link
Collaborator Author

@roman-bushuiev I addressed the changes above, and the Jensen-Shannon Similarity bug we discussed offline. I verified that the simulation models can be retrained with this new code and achieve comparable metrics to those reported in the paper. However, I haven't tested that the other experiments still work as expected, should I do that? What commands do I need to run to do it?

@adamoyoung
Copy link
Collaborator Author

Actually there are some additional minor changes, so I will re-run the simulation experiments to confirm everything is good. Will let you know when it's ready.

@roman-bushuiev
Copy link
Contributor

Hi Adamo! Sounds great!

To check the other experiments I believe it should be sufficient just to check that the demo.ipynb notebook works properly. For example, the "De novo SMILES transformer" section.

@adamoyoung
Copy link
Collaborator Author

In the notebook, I'm getting an error in the "MIST on the fingerprint retrieval task" cell:

FileNotFoundError: [Errno 2] No such file or directory: 'fp_preds_MassSpecGym_df.pkl'

I don't see a file like that in the directory, is it supposed to be created by something in the notebook? It doesn't seem like it.

@roman-bushuiev
Copy link
Contributor

In the notebook, I'm getting an error in the "MIST on the fingerprint retrieval task" cell:

FileNotFoundError: [Errno 2] No such file or directory: 'fp_preds_MassSpecGym_df.pkl'

I don't see a file like that in the directory, is it supposed to be created by something in the notebook? It doesn't seem like it.

It's fine, we just need to remove the MIST part from the demo because MIST is not implemented within our codebase. Does the "De novo SMILES transformer" section work?

@adamoyoung
Copy link
Collaborator Author

I ran the notebook and it runs, but the output is different from before. It seems ok, can you take a look? notebook

@roman-bushuiev
Copy link
Contributor

It looks good to me and I think we can merge the PR. What do you think?

So far, it seems that the biggest interest is in the simulation challenge. Maybe you could add a cell to the demo notebook or a brief tutorial on how to run the training scripts?

@adamoyoung
Copy link
Collaborator Author

OK sure I can add that

@roman-bushuiev roman-bushuiev merged commit 00f5c03 into pluskal-lab:main Feb 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants