Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Allow disabling default caching via a CLI flag and env var #11222

Merged

Conversation

DharmitD
Copy link
Contributor

@DharmitD DharmitD commented Sep 17, 2024

Description of your changes:
Allow setting a default of execution caching disabled via a compiler CLI flag and env var.
Resolves #11092

Key Changes:

CLI Update (kfp dsl compile):
A new flag --disable-execution-caching-by-default is added. When set, this flag disables caching for all pipeline tasks by default. Behavior specified in pipeline code still takes precedence.

A corresponding Environment Variable KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT is added as well.

Details:

By default, tasks default to caching enabled if the pipeline code doesn't specify a desired behavior via set_caching_options. For example:

@dsl.pipeline(name='my-pipeline')
def my_pipeline():
    task_1 = create_dataset()
    task_2 = create_dataset()
    task_1.set_caching_options(False)

In this example, task 1 won't try to use the cache, but task 2 will, even though the author didn't specify anything about task 2. The resulting pipeline spec will look like this:

root:
  dag:
    tasks:
      task-1:
        cachingOptions: {}
      task-2:
        cachingOptions:
          enableCache: true

This PR enhances the DSL compiler to allow the user to override that default behavior via a command line flag. This change is done:

  • in an additive way, using an optional feature flag

  • in a way that doesn't alter the DSL or pipeline spec interfaces

  • In a way that doesn't change the default behavior for those who prefer the default of caching enabled

After this change, this command will continue to result in the default behavior:

kfp dsl compile --py my_pipeline.py --output my_pipeline.yaml

Whereas this command will cause tasks to default to caching disabled if no desired caching behavior is specified in code:

kfp dsl compile --py my_pipeline.py --output my_pipeline.yaml --disable-execution-caching-by-default

For example, this pipeline compiled with --disable-execution-caching-by-default

@dsl.pipeline(name='my-pipeline')
def my_pipeline():
    task_1 = create_dataset()
    task_2 = create_dataset()
    task_1.set_caching_options(False)

would then result in both tasks having caching disabled:

root:
  dag:
    tasks:
      task-1:
        cachingOptions: {}
      task-2:
        cachingOptions: {}

even though nothing was specified about task_2 in the code.

Usage

CLI flag for dsl compile.

A new flag --disable-execution-caching-by-default is added. When set, this flag disables caching for all pipeline tasks by default.

Example:

kfp dsl compile --py my_pipeline.py --output my_pipeline.yaml --disable-execution-caching-by-default

Environment Variable KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT

You can also set the default caching behavior by using the environment variable. When set to true, 1, or other truthy values, it will disable execution caching by default for all pipelines. When set to false or when absent, the default of caching enabled remains.
Example:

KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT=true \
kfp dsl compile --py my_pipeline.py --output my_pipeline.yaml

This environment variable also works for Compiler().compile().

Given the following pipeline file my_pipeline.py:

ffrom kfp import dsl
from kfp.compiler import Compiler

@dsl.component
def create_dataset():
    pass

@dsl.pipeline(name='my-pipeline')
def my_pipeline():
    task_1 = create_dataset()
    task_2 = create_dataset()
    task_1.set_caching_options(False)

Compiler().compile(
    pipeline_func=my_pipeline,
    package_path='my_pipeline.yaml',
)

Executing this

KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT=true \
python my_pipeline.py

will result in task_2 having caching disabled.

This PR is made in collaboration with @gregsheremeta. Thank you, Greg for your contributions!

Checklist:

@DharmitD DharmitD marked this pull request as draft September 17, 2024 22:43
@DharmitD DharmitD changed the title WIP: feat(sdk): Allow setting a default of execution caching disabled via … WIP: feat(sdk): Allow disabling default caching via a CLI flag and env var Sep 17, 2024
@google-oss-prow google-oss-prow bot added size/L and removed size/S labels Sep 18, 2024
@DharmitD DharmitD force-pushed the disable-default-caching-flag-var branch from c305fa5 to b4b38ae Compare September 18, 2024 22:05
…a compiler CLI flag and env var

Co-authored-by: Greg Sheremeta <[email protected]>
Signed-off-by: ddalvi <[email protected]>
@DharmitD DharmitD force-pushed the disable-default-caching-flag-var branch 9 times, most recently from fa7a432 to 626e1c1 Compare September 20, 2024 00:11
@DharmitD DharmitD marked this pull request as ready for review September 20, 2024 02:10
@DharmitD DharmitD changed the title WIP: feat(sdk): Allow disabling default caching via a CLI flag and env var feat(sdk): Allow disabling default caching via a CLI flag and env var Sep 20, 2024
@diegolovison
Copy link
Contributor

/hold

Copy link
Contributor

@diegolovison diegolovison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote my thoughts on this PR as a general feedback without explicit approval.

sdk/python/kfp/dsl/pipeline_context.py Show resolved Hide resolved
@DharmitD DharmitD force-pushed the disable-default-caching-flag-var branch 2 times, most recently from d85943b to ad89b3e Compare September 24, 2024 20:59
@DharmitD DharmitD force-pushed the disable-default-caching-flag-var branch from ad89b3e to 69f35d8 Compare September 24, 2024 21:07
@diegolovison
Copy link
Contributor

diegolovison commented Sep 25, 2024

/lgtm

Despite having a different opinion about how this can be implemented, I support this PR

@DharmitD
Copy link
Contributor Author

/unhold

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 3f49522 into kubeflow:master Sep 25, 2024
22 checks passed
VaniHaripriya pushed a commit to VaniHaripriya/data-science-pipelines that referenced this pull request Oct 1, 2024
…kubeflow#11222)

* feat(sdk): Allow setting a default of execution caching disabled via a compiler CLI flag and env var

Co-authored-by: Greg Sheremeta <[email protected]>
Signed-off-by: ddalvi <[email protected]>

* Add tests for disabling default caching var and flag

Signed-off-by: ddalvi <[email protected]>

---------

Signed-off-by: ddalvi <[email protected]>
Co-authored-by: Greg Sheremeta <[email protected]>
@HumairAK HumairAK added this to the KFP SDK 2.10 milestone Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] allow setting a default of execution caching disabled via a compiler CLI flag and env var
7 participants