Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing PAS on indepdendent dataset and own algorithm (K-FOLD CV, HPO integration) #5

Open
shrutiOx opened this issue Apr 24, 2024 · 6 comments

Comments

@shrutiOx
Copy link

No description provided.

@shrutiOx shrutiOx changed the title Hello, Implementing PAS on indepdendent dataset and own algorithm (K-FOLD CV, HPO integration) Apr 24, 2024
@shrutiOx
Copy link
Author

Hello,

Thank you for this great work. I want to implement PAS on a custom dataset in my own pipeline (K-FOLD CV, HPO integration). Could you please advice on how to do that.

Thanks!

@shrutiOx
Copy link
Author

shrutiOx commented Apr 24, 2024

Hi,

I further downloaded your code and tried to implement this in my pipeline. It is not clear though where 'DARTS training' is invoked ? Is it in the model_search.py module ? Then we ask why is 'SANE' mentioned under 'model' params in args list. The second question is, it is not clear in case of implementing any custom data/ dataset not included in your experiment (e.g. ENZYMES dataset), how do we do that ? In args list, under 'data' param if we write 'ENZYMES', it is not clear if it will invoke that dataset. Moreover for custom dataset implementation, how do we do it ? So the questions would be 1)How do we implement ENZYMES dataset wiith your model (search-space and DARTS algorithm) ? 2) How do we implement the same for custom dataset ? 3) Why 'SANE' is given in args list, whereas algorithm is implementing 'DARTS' ?

Thanks a lot

@wei-ln
Copy link
Collaborator

wei-ln commented May 6, 2024

Thank you for your attention.

  1. The 'DARTS training' is implemented in train_search.py, and the mixed operations are provided in model_search.py. We need to search for an architecture within the supernet, and this can be accomplished by following Step 1 in the instructions (as mentioned in the README). The term 'SANE' mentioned in the 'model' is a bug (which is not used in the code), and we will address it.
  2. New datasets can be added, and you should update the load_data function in dataset.py accordingly. The ENZYMES dataset can be utilized by specifying the argument ‘--data ENZYMES’ since it can be loaded using the torch_geometric.datasets.TUDataset function.

@shrutiOx
Copy link
Author

shrutiOx commented May 6, 2024

Hi,

Thanks a lot for your kind reply. I will try to implement PAS as per your instructions and get back to you in case of any questions.
Just to confirm,
1.Can be apply custom processed dataset (not standard torch geometric datasets) i.e. train and test (independent) on your current PAS implementation (with the code that you have shared)? In that case do we need to change dataset.py ?

@wei-ln
Copy link
Collaborator

wei-ln commented May 6, 2024

New datasets (https://github.com/LARS-research/PAS/blob/main/dataset.py#L55) and splits (https://github.com/LARS-research/PAS/blob/main/dataset.py#L91) can be used in PAS by modifying the code correspondingly. The code is implemented based on PyG, and the non-standard processed data can be re-constructed and then used in PAS following the instructions https://pytorch-geometric.readthedocs.io/en/latest/get_started/introduction.html#data-handling-of-graphs

@shrutiOx
Copy link
Author

shrutiOx commented May 6, 2024

Thanks so much for your prompt reply. I wanted to understand, how do I reproduce your model that has been trained say on a custom processed dataset cause reproducing models of DARTS are not straightforward normally. I need to train on a custom dataset and then put that model on a independent test dataset in a different process. So this is like transfer learning. Could you please let me know if your current code would allow me with this opportunity ?How will the model produced by PAS, be transferred on a independent dataset (suppose that we are saving the K-FOLD CV yielded model and later in a independent process/code trying to reproduce that on new data NOT doing any K-FOLD CV/training and just testing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants