-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubt about evaluation and calibration #29
Comments
Hey @Yandrak, absolutely you should create a separate data split for a final, unbiased evaluation. The code in this repo uses the same validation data for early stopping and for final evaluation for the sake of convenience, but a production system or research paper that uses this model should split the data into train, val, and test. |
Can you elaborate on the calibration question? I believe the evaluation script is looking at how well-calibrated the predictions from the model are, e.g., if the model assigns a risk score of 0.1, did 1/10 patients in the 0.1 score bucket actually have the outcome? We use sklearn's calibration plots to understand if the risk scores from a model can be interpreted as rough probabilities or if they're way off. As I understand it, models with poor calibration can be calibrated, but we don't do anything to calibrate our model in the code provided. I've found that the calibration plot is useful just to get an understanding of what the risk scores represent, even if the calibration of the model isn't spot on. |
Hi @jstremme , of course, I understand the purpose of the calibration graph that is generated. I was referring precisely to the second point you make about calibrating a model that has a bad or improvable calibration. In my case, I am getting good AUC values and accuracy, but the calibration graphs are not as good as I would like. That is why I was interested in calibrating the model during training. Obviously I am not an expert on the subject and that is why I was asking if there was an affordable way to do this. If you have any suggestions I would be very grateful. In any case thank you very much for always answering and being so helpful. |
For sure. Have you looked at |
Yes, I tried to use the CalibratedClassifierCV library, as it was the fastest option I could find. But I don't understand why, when I call this function and pass it the trained model I get the following message "sklearn.exceptions.NotFittedError: This Functional instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator". From this message I understand that the function is not able to detect that the model has already been trained and I don't quite understand why. I am trying to do this inside the on_epoch_end function, which is passed the model trained so far to evaluate its performance with the evaluation set after each epoch. I was doing it at this point with the intention of calibrating the model after each epoch, do you see something wrong with that? |
Both in the retain_train.py script and in the retain_evaluation.py script the same dataset is used as test and evaluation, is this correct? What I mean is that after each epoch the data from the test set is used to measure the model performance, and then the same data is used to evaluate and get the analytical graphs, but wouldn't the evaluation have to be done on a different data set?
Another question I had is regarding model calibration, during model evaluation a calibration graph is calculated and drawn, but during training the model is not calibrated, is there a reason why this is not done?
Sorry if I missed something that might be obvious and thanks for your help and time.
The text was updated successfully, but these errors were encountered: