Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt about evaluation and calibration #29

Open
JossTG-UPM opened this issue Feb 11, 2022 · 5 comments
Open

Doubt about evaluation and calibration #29

JossTG-UPM opened this issue Feb 11, 2022 · 5 comments

Comments

@JossTG-UPM
Copy link

Both in the retain_train.py script and in the retain_evaluation.py script the same dataset is used as test and evaluation, is this correct? What I mean is that after each epoch the data from the test set is used to measure the model performance, and then the same data is used to evaluate and get the analytical graphs, but wouldn't the evaluation have to be done on a different data set?

Another question I had is regarding model calibration, during model evaluation a calibration graph is calculated and drawn, but during training the model is not calibrated, is there a reason why this is not done?

Sorry if I missed something that might be obvious and thanks for your help and time.

@jstremme
Copy link
Contributor

Hey @Yandrak, absolutely you should create a separate data split for a final, unbiased evaluation. The code in this repo uses the same validation data for early stopping and for final evaluation for the sake of convenience, but a production system or research paper that uses this model should split the data into train, val, and test.

@jstremme
Copy link
Contributor

Can you elaborate on the calibration question? I believe the evaluation script is looking at how well-calibrated the predictions from the model are, e.g., if the model assigns a risk score of 0.1, did 1/10 patients in the 0.1 score bucket actually have the outcome? We use sklearn's calibration plots to understand if the risk scores from a model can be interpreted as rough probabilities or if they're way off. As I understand it, models with poor calibration can be calibrated, but we don't do anything to calibrate our model in the code provided. I've found that the calibration plot is useful just to get an understanding of what the risk scores represent, even if the calibration of the model isn't spot on.

@JossTG-UPM
Copy link
Author

JossTG-UPM commented Feb 16, 2022

Hi @jstremme , of course, I understand the purpose of the calibration graph that is generated. I was referring precisely to the second point you make about calibrating a model that has a bad or improvable calibration. In my case, I am getting good AUC values and accuracy, but the calibration graphs are not as good as I would like. That is why I was interested in calibrating the model during training. Obviously I am not an expert on the subject and that is why I was asking if there was an affordable way to do this. If you have any suggestions I would be very grateful. In any case thank you very much for always answering and being so helpful.

@jstremme
Copy link
Contributor

jstremme commented Feb 17, 2022

For sure. Have you looked at CalibratedClassifierCV? Unfortunately I don't have any code for calibrating RETAIN, but maybe this sklearn approach could work or be modified to work with RETAIN.

@JossTG-UPM
Copy link
Author

Yes, I tried to use the CalibratedClassifierCV library, as it was the fastest option I could find. But I don't understand why, when I call this function and pass it the trained model I get the following message "sklearn.exceptions.NotFittedError: This Functional instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator". From this message I understand that the function is not able to detect that the model has already been trained and I don't quite understand why. I am trying to do this inside the on_epoch_end function, which is passed the model trained so far to evaluate its performance with the evaluation set after each epoch. I was doing it at this point with the intention of calibrating the model after each epoch, do you see something wrong with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants