-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding simulation stuff #41
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking great!! I just left some minor comments.
# spec.peaks.intensities = spec.peaks.intensities * 1000. | ||
return spec | ||
|
||
def matchms_to_torch(self, spec: matchms.Spectrum) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamoyoung Could you please update the docstring? It seems like this one is from SpecTokenizer.
massspecgym/runner.py
Outdated
@@ -0,0 +1,244 @@ | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adamo, could you move this file for example to massspecgym/scripts/runnner_simulation.py
? We will try to merge it with massspecgym/scripts/run.py
(equivalent file for the de novo and retrieval challenges) in the future (probably next month). If it is located in massspecgym/runner.py
it may confuse people because it is designed for the simulation challenge at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, although I would note that the the functionality of massspecgym/scripts/run.py
for the simulation stuff is spread across two files (massspecgym/runner.py
and scripts/run_pl_model_fit.py
)
scripts/run_pl_model_fit.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please include simulation
in the file name here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this notebook is an outdated version of this notebook in the main branch. Maybe we can remove it?
@adamoyoung, what do you mean by
It seems like you have already implemented all the inheritance consistently with the other challenges. |
@roman-bushuiev I addressed the changes above, and the Jensen-Shannon Similarity bug we discussed offline. I verified that the simulation models can be retrained with this new code and achieve comparable metrics to those reported in the paper. However, I haven't tested that the other experiments still work as expected, should I do that? What commands do I need to run to do it? |
Actually there are some additional minor changes, so I will re-run the simulation experiments to confirm everything is good. Will let you know when it's ready. |
Hi Adamo! Sounds great! To check the other experiments I believe it should be sufficient just to check that the |
In the notebook, I'm getting an error in the "MIST on the fingerprint retrieval task" cell:
I don't see a file like that in the directory, is it supposed to be created by something in the notebook? It doesn't seem like it. |
It's fine, we just need to remove the MIST part from the demo because MIST is not implemented within our codebase. Does the "De novo SMILES transformer" section work? |
I ran the notebook and it runs, but the output is different from before. It seems ok, can you take a look? notebook |
It looks good to me and I think we can merge the PR. What do you think? So far, it seems that the biggest interest is in the simulation challenge. Maybe you could add a cell to the demo notebook or a brief tutorial on how to run the training scripts? |
OK sure I can add that |
Some of the dataset inheritance stuff needs to be reworked, but it seems like you guys are still actively developing it.
To run the model (FP model, mass-based candidate set for retrieval, wandb logging disabled), I use
python scripts/run_pl_model_fit.py -c config/simulation_retrieval/fp_formula -w disabled -s <directory_for_saving_checkpoints>
.Let me know how you would like to proceed.