-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to randomly generate molecules? #28
Comments
Hi there, If you look at one of the Jupyter notebooks available, for example the LogP one, you will notice that before biasing the generator it generates an unbiased distribution, which later is used for comparison. I believe what you want is exactly this unbiased distribution. |
@gmseabra |
@zhouhao-learning : this is exactly what @gmseabra means! If you just use the generator it will produce unbiased new molecules without property optimization. |
@isayev |
@zhouhao-learning Please elaborate in depth what exactly do you mean? By 'don't want to use any rules' do you mean all rules of chemistry or specific property optimization? If former, those are random smiles, they do not correspond to valid chemical structures. Otherwise, just use pretrined baseline generator and will do exactly what are you asking for, |
@isayev |
Sorry, I misunderstood you. But to me, "use my SMILES data" does not mean "randomly generate molecules". However, this could be done quite easily.
|
@isayev |
Yes, option number 2 above. Load your SMILES in cell 9, and train your own generator. |
@isayev |
@isayev |
NO. If you continually feed the model your own molecules (assuming they have some common characteristic, that is, they are not just random molecules), eventually the model will learn to generate molecules that will have structural characteristics similar to the molecules you are feeding it. So, it will eventually capture structural characteristics of your training data. Lets look again at the Now, if you want, you can train the generator with the molecules you have in your database. But pay attention to the following: the generator needs a large dataset to learn the "rules of chemistry" by itself, and generate valid molecules. Currently it learns the rules from the ChEMBL data, which is very large (>1.5 million molecules). It is very unlikely that your database will be larger than ChEMBL, and the new generator will probably not work as well, as it will tend to generate molecules that look like the ones in your dataset but without learning that much the chemistry rules. But it can be done if that's what you want. To train the generator with your smiles, just change the path in cell 9 to point to your SMILES file, then uncomment cell 16 to train the generator. Finally, I just want to point out that another (probably better) option is to retrain the generator with your molecules by just creating a new
That should bias the generator towards molecules that (structurally) look more like the ones in your dataset but, most importantly, without forgetting the chemical rules it already learned from ChEMBL. That seems to be closer to what you want, from what I understand. |
@gmseabra |
Glad to help. Enjoy! |
@gmseabra
|
Hello, I am very happy to see your research. I want to use the trained generation model to randomly generate molecules without setting any conditions, but I don't know how to operate, so please give me more guidance, how should I do it? Can you generate molecules randomly instead of conditionally?
Best !
The text was updated successfully, but these errors were encountered: