Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert single task study calls to a task call #303

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions amlb/runners/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,14 +333,23 @@ def _setup(_self):
def _run(_self):
try:
resources_root = "/custom" if rconfig().aws.use_docker else "/s3bucket/user"
benchmark = (self._forward_params['benchmark_name']if self.benchmark_path is None or self.benchmark_path.startswith(rconfig().root_dir)
else "{}/{}".format(resources_root, self._rel_path(self.benchmark_path)))

if benchmark.startswith('openml/s/') and len(task_names) == 1:
task_id = next(task['openml_task_id'] for task in self.benchmark_def if task['name'] == task_names[0])
benchmark = f'openml/t/{task_id}'
_task_names = []
else:
_task_names = task_names
Comment on lines +342 to +344
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to change task_names variable, it will work either way.

Did you try it?
I want to be sure that the folder structure generated on s3 is the same as before and that this change is not making it more difficult to retrieve results from s3 a posteriori.
Currently, s3 is the long-term storage for results and those are organized by sessions, and inside the session, each folder contains the original benchmark name and the task name, which makes it relatively easy to download only a specific result.
I think it should be fine though as aws mode is running benchmarks using --session= (which removes the session folder on the ec2 instance to avoid an additional subfolder) and this should prevent the modifed params to appear anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with python runbenchmark.py constantpredictor openml/s/264 -m aws -f 0 the structure on the bucket seems the same, but the local result directory is actually different. Both have the same aws.openml_s_264.test.all_tasks.0.constantpredictor subdirectory with the data from that run, but the main directory of this branch does not feature logs and logs.zip.

You seem to be correct that task names don't need to be modified, though I find the openml/t/61 -t iris notation a bit odd to explicitly support.


_self.ext.instance_id = self._start_instance(
instance_def,
script_params="{framework} {benchmark} {constraint} {task_param} {folds_param} -Xseed={seed}".format(
framework=self._forward_params['framework_name'],
benchmark=(self._forward_params['benchmark_name']if self.benchmark_path is None or self.benchmark_path.startswith(rconfig().root_dir)
else "{}/{}".format(resources_root, self._rel_path(self.benchmark_path))),
benchmark=benchmark,
constraint=self._forward_params['constraint_name'],
task_param='' if len(task_names) == 0 else ' '.join(['-t']+task_names),
task_param='' if len(_task_names) == 0 else ' '.join(['-t']+_task_names),
folds_param='' if len(folds) == 0 else ' '.join(['-f']+folds),
seed=seed,
),
Expand Down