Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert single task study calls to a task call #303

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PGijsbers
Copy link
Collaborator

This avoids performing a call to retrieve the study and all their dataset metadata (2 calls total) for each job.

This avoids performing a call to retrieve the study and all their
dataset metadata (2 calls total).
@PGijsbers PGijsbers requested a review from sebhrusen May 14, 2021 10:23
@Innixma
Copy link
Collaborator

Innixma commented May 18, 2021

Is this a fix that will reduce the frequency of openml server errors? Does this impact runs done with pre-existing benchmark files instead of studies (such as resources/benchmarks/medium.yaml)?

@PGijsbers
Copy link
Collaborator Author

It reduces requests only when the benchmark was specified with the openml/s/N format. There should be an update to the retry policy for openml-python later this week which hopefully alleviates the server issues more generally.

Comment on lines +342 to +344
_task_names = []
else:
_task_names = task_names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to change task_names variable, it will work either way.

Did you try it?
I want to be sure that the folder structure generated on s3 is the same as before and that this change is not making it more difficult to retrieve results from s3 a posteriori.
Currently, s3 is the long-term storage for results and those are organized by sessions, and inside the session, each folder contains the original benchmark name and the task name, which makes it relatively easy to download only a specific result.
I think it should be fine though as aws mode is running benchmarks using --session= (which removes the session folder on the ec2 instance to avoid an additional subfolder) and this should prevent the modifed params to appear anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with python runbenchmark.py constantpredictor openml/s/264 -m aws -f 0 the structure on the bucket seems the same, but the local result directory is actually different. Both have the same aws.openml_s_264.test.all_tasks.0.constantpredictor subdirectory with the data from that run, but the main directory of this branch does not feature logs and logs.zip.

You seem to be correct that task names don't need to be modified, though I find the openml/t/61 -t iris notation a bit odd to explicitly support.

@PGijsbers PGijsbers added this to the 2.1 milestone Mar 3, 2023
@PGijsbers PGijsbers modified the milestones: 2.1, 2.2 May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants