Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"prob" parameter in dataset source #13

Open
albertma-evotec opened this issue Oct 9, 2019 · 1 comment
Open

"prob" parameter in dataset source #13

albertma-evotec opened this issue Oct 9, 2019 · 1 comment

Comments

@albertma-evotec
Copy link

albertma-evotec commented Oct 9, 2019

(In pretrain notebook)

image

What does this parameter control? I can see that it was store in the self.source_probs variable but i cannot really understand what it is trying to do in the getitem function? and why s is updated as s += self.source_probs[i] at the end of each For loop?

(In dataloader.py)
image

Anyone can educate me please?
Many thanks in advance

@Bibyutatsu
Copy link

Hi, this 'prob' parameter controls the frequency of the sampled data from the datasets.
For example, you have two datasets, with two different probabilities, 0.8 and 0.2 respectively (As the sum should be 1)

A = gentrl.MolecularDataset(sources=[{
          'path':'A.csv',
          'smiles': 'SMILES',
          'prob': 0.8,
          'plogP' : 'plogP',
           }], 
        props=['plogP'])

B = gentrl.MolecularDataset(sources=[{
          'path':'B.csv',
          'smiles': 'SMILES',
          'prob': 0.2,
          'plogP' : 'plogP',
           }], 
        props=['plogP'])

So, when you train using these dataset the 80% of training data will be from dataset A. And 20% of training data will be from dataset B.

So, basically in this example it is kept 1 so that 100% of the training data is from the train_plogp_plogpm.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants