-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert single task study calls to a task call #303
base: master
Are you sure you want to change the base?
Conversation
This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total).
Is this a fix that will reduce the frequency of openml server errors? Does this impact runs done with pre-existing benchmark files instead of studies (such as |
It reduces requests only when the benchmark was specified with the |
_task_names = [] | ||
else: | ||
_task_names = task_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to change task_names variable, it will work either way.
Did you try it?
I want to be sure that the folder structure generated on s3 is the same as before and that this change is not making it more difficult to retrieve results from s3 a posteriori.
Currently, s3 is the long-term storage for results and those are organized by sessions, and inside the session, each folder contains the original benchmark name and the task name, which makes it relatively easy to download only a specific result.
I think it should be fine though as aws
mode is running benchmarks using --session=
(which removes the session folder on the ec2 instance to avoid an additional subfolder) and this should prevent the modifed params to appear anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing with python runbenchmark.py constantpredictor openml/s/264 -m aws -f 0
the structure on the bucket seems the same, but the local result directory is actually different. Both have the same aws.openml_s_264.test.all_tasks.0.constantpredictor
subdirectory with the data from that run, but the main directory of this branch does not feature logs
and logs.zip
.
You seem to be correct that task names don't need to be modified, though I find the openml/t/61 -t iris
notation a bit odd to explicitly support.
This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total) for each job.