Improve `Trainer` and `DeeprankDataset` for production testing #510

gcroci2 · 2023-10-06T15:37:39Z

There are some issues when using the package for testing a pre-trained model on newly generated data:

The GraphDataset class requires dataset_train as input even in such cases (whenever train is False). We should be able of using a test dataset without the need of the original model's training dataset. We can use the info stored in the pre-trained model for inherit the needed attributes. (see _check_inherited_params in dataset.py)
In the Trainer class' init, before loading parameters and the pretrained model there is a check for the target, which in a pre-trained model case could be not present at all.
The Trainer class expects the attribute epoch_saved_model, which should be saved within the state of the pre-trained model.
If the test dataset has no labels, the output exporter doesn't work (ValueError("All arrays must be of the same length"))

In order to make reasonable changes, I think we need to take into account all the possible scenarios using a mock example:

No pre-trained model, train, valid, and test. (should be good)
No pre-trained model, train, valid, no test. (should be good)
No pre-trained model, train only. (should be good)
Pre-trained model, test only, with labels. (the one to improve the code for)
Pre-trained model, test only, with no labels. (the one to improve the code for)

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-24T03:21:43Z

This issue is stale because it has been open for 30 days with no activity.

gcroci2 changed the title ~~Create Tester class~~ Improve Trainer and DeeprankDataset for production testing Oct 6, 2023

gcroci2 added the priority Solve this first label Oct 9, 2023

gcroci2 self-assigned this Oct 19, 2023

gcroci2 linked a pull request Oct 19, 2023 that will close this issue

feat: improve Trainer and DeeprankDataset logic for production testing #515

Merged

github-actions bot added the stale issue not touched from too much time label Nov 24, 2023

gcroci2 closed this as completed Jan 3, 2024

gcroci2 removed the priority Solve this first label Mar 19, 2024

gcroci2 added this to Development Jul 12, 2024

gcroci2 moved this to Done in Development Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `Trainer` and `DeeprankDataset` for production testing #510

Improve `Trainer` and `DeeprankDataset` for production testing #510

gcroci2 commented Oct 6, 2023 •

edited

Loading

github-actions bot commented Nov 24, 2023

Improve Trainer and DeeprankDataset for production testing #510

Improve Trainer and DeeprankDataset for production testing #510

Comments

gcroci2 commented Oct 6, 2023 • edited Loading

github-actions bot commented Nov 24, 2023

Improve `Trainer` and `DeeprankDataset` for production testing #510

Improve `Trainer` and `DeeprankDataset` for production testing #510

gcroci2 commented Oct 6, 2023 •

edited

Loading