Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable or Disable cache for the Compiler #11209

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

diegolovison
Copy link
Contributor

@diegolovison diegolovison commented Sep 13, 2024

Description of your changes:

Support for enabling or disabling caching for the entire pipeline is being introduced. This functionality mirrors the one provided by the client::run method (see https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/client/client.py#L707). The aim is to offer a similar feature using the same parameter names to prevent confusion.

Note that this PR does not address changing the default value for structures.TaskSpec::enable_caching. That functionality will be introduced in a separate PR and is unrelated to this one.

Checklist:

Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -133,12 +134,19 @@ def parse_parameters(parameters: Optional[str]) -> Dict:
is_flag=True,
default=False,
help='Whether to disable type checking.')
@click.option(
'--enable-caching/--disable-caching',
type=bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to use is_flag functionality here, to maintain precedence (just like type_check for instance).

@@ -133,12 +134,19 @@ def parse_parameters(parameters: Optional[str]) -> Dict:
is_flag=True,
default=False,
help='Whether to disable type checking.')
@click.option(
'--enable-caching/--disable-caching',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to --disable-default-caching to maintain precedence with other flags in this file. (like the type_check one)

@@ -149,7 +157,8 @@ def compile_(
pipeline_func=pipeline_func,
pipeline_parameters=parsed_parameters,
package_path=package_path,
type_check=not disable_type_check)
type_check=not disable_type_check,
enable_caching=enable_caching)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you update as per https://github.com/kubeflow/pipelines/pull/11209/files#r1759241709,
you would need to change this as well to something like:
enable_caching=not disable_caching_default

run. If not set, defaults to the compile time settings, which
is ``True`` for all tasks by default, while users may specify
different caching options for individual tasks. If set, the
setting applies to all tasks in the pipeline (overrides the
Copy link
Contributor

@DharmitD DharmitD Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to override compile time settings, we just want to have an option to change the caching default. When we add the flag, this needs to happen for instance:

Assume that we set our flag such that default caching is disabled.
When we are done with this feature, this is expected:
@dsl.pipeline(name='iris-training-pipeline')
def my_pipeline():
   task_1 = create_dataset()
   task_2 = create_dataset()
   task_3 = create_dataset()
   task_3.set_caching_options(True)
   # tasks 1 and 2 don’t try to use the cache. Task 3 does try to use the cache.

However, with your update of overriding in the client directly, task 3's caching might also get set to False. (Not entirely sure, something you would have to test and find out @diegolovison )

@gregsheremeta could elaborate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants