-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: For long ML tasks, make intermediate saves #1024
Comments
So if Example regarding the caching: https://joblib.readthedocs.io/en/stable/auto_examples/memory_basic_usage.html#sphx-glr-auto-examples-memory-basic-usage-py In #997, there is an in-memory caching mechanism. Persisting it on the disk would be useful to avoid the recomputing some of the results. |
For this to work, we need to move the fitting logic out of report = CrossValidationReport(estimator, ...) immediately fits the estimators, which can take a long time; if the fitting is interrupted, then the whole Instead the fitting could be done as soon as some metric/plot method is called. |
I don't think so. The fitting is using a The more challenging and subsequent feature is being able to "resume" the execution from what have computed to be able to not recompute some previous cached information. Maybe in this case, we should have a report = CrossValidationReport(...)
# report is interrupted or crash for some reason
report.resume()
# eventually to restart from scractch
report.restart() (disclosure I'm not sure about the naming because they might not be explicit what are we restarting or resuming). |
I can confirm that with the "call Doing this brought up the same question as @glemaitre: What happens if Even if it's theoretically possible to do everything in the With that said, I'll now look at catching exceptions directly in |
I'm interested to see what about just to be sure in which "user setting" you are.
My thought would be to catch it up in the loop consuming the generator (but I need to see you other code to understand exactly the use case). |
I took some time to try to write an automatic test, and I'm not getting anywhere. The |
Try by mocking clone? |
Thanks, worked perfectly :) |
Is your feature request related to a problem? Please describe.
As a data scientist, I might launch some long ML tasks on a server that is bad and I might loose all my results if the server crashes.
Got this issue from 2 user interviews.
Describe the solution you'd like
Save some intermediate results. For example, if you do a cross-validation with 5 splits, we could store at least the 1st split before everything has finished running, so that you have at least the 1st split if it crashes in the middle of the 2nd split.
Related to #989
Edit: neptune.ai does continued tracking (but for foundation models)
The text was updated successfully, but these errors were encountered: