Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Remove artifact namespace constraint #11505

Open
tferi opened this issue Jan 9, 2025 · 1 comment
Open

[feature] Remove artifact namespace constraint #11505

tferi opened this issue Jan 9, 2025 · 1 comment

Comments

@tferi
Copy link

tferi commented Jan 9, 2025

Feature Area

/area sdk

What feature would you like to see?

Context:

def validate_schema_title(schema_title: str) -> None:

Python dsl users should be able to use custom namespaces for their artifacts. The currently enforced system and google namespaces seem arbitrary, I see no reason why they should be constrained like that. For instance TFX runs perfectly happily on Kubeflow with artifacts from the tfx namespace, and they do so by having their own PipelineSpec compiler that is not the KFP one.

What is the use case or pain point?

We're migrating a TFX Pipeline to Kubeflow, but there are systems that integrate with this pipeline execution, and expect certain components to have certain artifacts with certain types (tfx.Whatever). The native Kubeflow compiler does not allow components to have artifacts that aren't from google or system.

Also, companies other than Google may have perfectly legitimate use cases to introduce their own namespace.

Is there a workaround currently?

Monkey patch the KFP codebase at runtime.

type_utils.validate_schema_title = lambda x: pass


Love this idea? Give it a 👍.

@chensun
Copy link
Member

chensun commented Jan 10, 2025

If the request is to allow tfx.* schema title, I think that's okay and likely a simple change. But if it's for more generic case, that would be more complicated.

When we send the the pipeline spec to API, the artifact with scheme_title will be validated by the backend service, and the title must be preregistered in the system. While custom schema registration is possible via Vertex Metadata API: https://cloud.google.com/vertex-ai/docs/ml-metadata/custom-schemas, it's not implemented on Kubeflow Pipelines open source, and the registration is a separate step that cannot be done during pipeline submission.

I personally think the metadata scheme support is a half-baked solution while the added value is debatable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants