Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Argo WF conformance class #386

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Add Argo WF conformance class #386

wants to merge 7 commits into from

Conversation

christophenoel
Copy link
Contributor

Over the years, our team has been gradually transitioning our implementation (including operational PDGS) to the Argo Workflow Language. This decision was made based on the Argo Workflow Language superior suitability for container-based workflows and modules, particularly when interacting with Kubernetes native environments. Additionally, the specification aligns well with the OpenAPI/JSON schemas that form the foundation of OGC API Processes.

To facilitate this transition, we have prepared a pull request that incorporates the essential requirements and recommendations for integrating the newly adopted conformance class into the existing spec. We sincerely request your consideration and integration of this profile.

(see email)

Copy link
Contributor

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see more alternatives being implemented!


part:: If a process can be described for the intended use as a <<rc_argo,Argo graph>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the replacement process.

part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as <<rc_ogcapppkg,Argo>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is application/argo an official media-type? If not, the generic https://www.iana.org/assignments/media-types/application/vnd.oai.workflows+yaml with a contentSchema with the Argo Workflow schema URL might be more appropriate.

An alternative would be to push Argo maintainers to publish a media-type like CWL did:

https://www.iana.org/assignments/media-types/application/cwl

https://www.iana.org/assignments/media-types/application/cwl+json

* `type` and `href` if passed by reference
* `value` and `mediaType` if passed by value

part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use distinct type values?


part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`.

part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"json" should be uppercase here


part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format.

part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be relevant to refer to a common parameter that can be reused across Workflow languages regardless of their specific implementation.


part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed.

part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "argo-worflow-not-exist".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to contradict the previous point that is worded in a way that w is optimal, while required here.

Comment on lines +7 to +13
part:: If a process can be represented for the intended use as a <<rc_argo,Argo Application>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the process to be deployed to the API.

part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as a <<rc_argo,Argo Application>>.

part:: If the Argo contains more than one workflow, an additional `w` query parameter may be used to reference the workflow id to be deployed.

part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "worflow-not-found".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to other file.

gfenoy added a commit to GeoLabs/ogcapi-processes that referenced this pull request Jan 5, 2024
Update recommendations and add Requirements in the corresponding Requirements class

Make CWL depending on OGC Application Package for not having to add another conformance class such as

Define the Requirement for w param in DRU directly to  make it easier to extent, cf. opengeospatial#386

Move workflow-not-found exception Requirement to DRU Requirements class
@gfenoy gfenoy mentioned this pull request Jan 6, 2024
@bpross-52n
Copy link
Contributor

SWG telecon from 8th January 2024: We would like to see this tested e.g. in a testbed before adding this to the standard.

@bpross-52n
Copy link
Contributor

SWG meeting from January 22nd: Move this to Part 3 project.

@fmigneault
Copy link
Contributor

@bpross-52n

SWG meeting from January 22nd: Move this to Part 3 project.

Sorry I could not assist to today's meeting due to a conflict.
Is it possible to get more details about this? The way Argo is being described here, each of its application are distinct processes. This fits more into Part 2 than Part 3. Part 3 could chain the resulting processes without any knowledge of Argo (or CWL for that matter).

@jerstlouis
Copy link
Member

jerstlouis commented Jan 22, 2024

@fmigneault Part 3 includes a "Deployable workflows" requirement class allowing to deploy a workflow as a process using Part 2, as well as dedicated requirement classes for specific workflow definition languages.

Part 2 is about the generic idea that you can POST a process application package, regardless of what it contains.

But if the content of that package is a workflow, this is more about Part 3 (working in conjunction with Part 2).

We could also apply this to CWL, but due to the long-standing association with previous Part 2 efforts, Part 2 includes a CWL requirement class which is focused on the ability to use CWL for process description, rather than its ability to define workflows.

There is still a CWL workflow requirement class in Part 3 about defining a workflow using CWL.

@fmigneault
Copy link
Contributor

@jerstlouis
I see.
I believe Argo would need to have a similar situation to CWL where it lies on both Part 2 and Part 3 simultaneously, since both can represent either a Workflow graph or a single application on their own.

@jerstlouis
Copy link
Member

@fmigneault Sure, but that applies to all workflow definition languages, and I don't think that requires a req. class on Part 2 for that.

(I think that was even the case for CWL, but there were strong arguments in favor of including it)

@fmigneault
Copy link
Contributor

From what I see on changed files, everything is relevant to Part 2, i.e.: a process description represented by Argo format and how to distinguish it from other workflow encodings to deploy/replace/undeploy it, to later execute it after deployment.

I looked quickly at Part 3 Deployable Workflows and my impression is that it attempts to duplicate what Part 2 does, but with fewer details about the deployment itself (which makes sense since Part 3 focuses more on Execution). Because of the execution endpoint being used, it generates some issues about conflicting {processId} locations that need to reserve CWL, OpenEO (and now another Argo as well, and any future workflow encoding...) due to definitions like:

IMO, it would make more sense that "Deployable Workflows" to be considered just another "workflow process graph" representation POSTed to /processes. Therefore, one could deploy a CWL graph, an Argo graph, an openEO graph, an "OGC workflow definitions defined as an execution request" (as described in deployable-workflows), etc.

The strength of Part 3 is about chaining multiple processes input/output/collections "on the fly" at execution time. If one intends to deploy the workflow rather than executing it directly, going through a Part 3 approach seems to over-complicate the Part 3 definition. Delegating "Deployable Workflows" to Part 2 with a specific "OGC Execution Workflow" would simplify how the two parts collaborate.

@jerstlouis
Copy link
Member

jerstlouis commented Jan 22, 2024

I looked quickly at Part 3 Deployable Workflows and my impression is that it attempts to duplicate what Part 2 does,

The intent is not to duplicate anything, but to reference it normatively i.e., a workflow defined with Part 3, can be deployed using Part 2, for implementations declaring support for this requirement class, with a dependency on Part 2.

But you are right that currently Deployable Workflow is more about the "OGC workflow definitions defined as an execution request". But it could be broadened to be about workflow in any process graph definition language (CWL, openEO, Argo...).

The question really is just about where does the definition of that payload that get POSTed for definition languages belong.

Because they define workflows, I think the consensus was that it belongs to Part 3.

But of course the POST operation and the behavior is defined by Part 2.

In the end, it doesn't really matter in which document the req. classes are defined, as long as they can work together.

@fmigneault
Copy link
Contributor

I think that because they define a workflow (which can be queried as described after deployment), and then be reused with other inputs without changing the process graph, it makes more sense to have them in Part 2. All the CWL, OpenEO and Argo graphs work under the assumption that the workflow steps are defined first, and then chains the submitted inputs.

The OGC Part 3 Workflow could be implemented using any of those representations, but its real power comes from bridging data/process sources into an execution pipeline that does not need deployment, at the cost of being provided inline each time in the execution request. This is what makes it distinct from Part 2. If a Part 3 workflow was deployed, it could then be called like any other atomic process, regardless of the workflow engine under it. The workflow definition would be abstracted away.

I am having discussions with other working groups, and the issue of handling multiple workflow formats and platform APIs often arises. I think it would be more useful for users if custom workflow encodings were deployed using Part 2 (as currently), while Part 3 limited itself to chaining standardized OGC API components. This way, Part 3 Workflows offer a truly interoperable way to call processes between servers. Otherwise, we somehow need to port OGC-native concepts such as collection I/O through CWL, OpenEO, etc. to use them with Part 3, and still remain stuck with platforms that cannot exchange those custom definitions.

@jerstlouis
Copy link
Member

All the CWL, OpenEO and Argo graphs work under the assumption that the workflow steps are defined first, and then chains the submitted inputs.

The same is also true for the "Nested Processes" workflow defined in Part 3 an extension of Part 1 execution requests, they all work on existing OGC API - Processes either pre-existing for the implementation, or deployed using Part 2.

If a Part 3 workflow was deployed, it could then be called like any other atomic process, regardless of the workflow engine under it. The workflow definition would be abstracted away.

That is what the "Deployable Workflow" requirement class of Part 3 is about, leveraging part 2, f we make it agnostic of the workflow definition language (extended execution request, CWL, OpenEO, Argo...).

I think it would be more useful for users if custom workflow encodings were deployed using Part 2 (as currently), while Part 3 limited itself to chaining standardized OGC API components.

Whether things are defined in the Part 2 document or the Part 3 document should have zero impact on users. The functionality is exactly the same.

Otherwise, we somehow need to port OGC-native concepts such as collection I/O through CWL, OpenEO, etc. to use them with Part 3, and still remain stuck with platforms that cannot exchange those custom definitions.

Part 3 defines several things, which may be contributing to confusion.

"Collection Input" and "Collection Output" are really powerful concepts that bridges the data access OGC APIs as mechanisms, and is particularly relevant to the GeoDataCube API work. However, this "collection" functionality is fully orthogonal to the definition of process graphs in any particular workflow definition language, with the one exception that when using extended-Part 1 execution request, a "collection" property is used to specify a collection input.

What I mean here is that even if you used CWL or Argo for your workflow definition, there could be a specific mechanism for how one can accept an OGC API - Coverages collection as an input to the workflow definition (using Coverages as an example, but could be Features, Tiles, DGGS, Maps, EDR...). And similarly, you could support creating a virtual collection as per Part 3-Collection Output, and trigger execution of the workflow for an area/time/resolution of interset as a result of an OGC API - Coverages request ("Collection Output").

but its real power comes from bridging data/process sources into an execution pipeline that does not need deployment, at the cost of being provided inline each time in the execution request.

This cost is mitigated by either deploying the workflow using Part 2 ("Deployable Workflow"), or by setting up a virtual collection ("Collection Output", with the possibility to set up a persistent public-facing collection that can optionally expose its internal workflow).

I think that because they define a workflow (which can be queried as described after deployment), and then be reused with other inputs without changing the process graph, it makes more sense to have them in Part 2

Currently, I believe the SWG is working under the assumption that anything to do with "workflow" belongs to Part 3.

Of course Part 2 can be used to deploy both new processes that can be used within those workflows, and the workflows themselves as new processes (Part 3 - "Deployable Workflows"). The SWG could review whether more stuff should be included in Part 2, but I believe there is a preference to refrain from making too many chages to Part 2 so as to avoid delaying its completion.

@fmigneault
Copy link
Contributor

That is what the "Deployable Workflow" requirement class of Part 3 is about, leveraging part 2, f we make it agnostic of the workflow definition language (extended execution request, CWL, OpenEO, Argo...).

Exactly my point, therefore there is no need for CWL, openEO, Argo requirements classes in Part 3. It is redundant to have them there, as they should already be handled by Part 2.

Whether things are defined in the Part 2 document or the Part 3 document should have zero impact on users. The functionality is exactly the same.

Since they are not POSTed on the same endpoint, do not expect the same payload, and the result is not the same (whether the workflow is simply deployed or is executed immediately), it matters a lot.

I agree with all points regarding how powerful Part 3 concepts could be, but at the same time, they lack explicit specification on how OGC concepts can be bridged with CWL, Argo, openEO, etc. There are already many long issue discussions not just by me that illustrate how non-trivial those assumptions do not just magically work together because each workflow technology has its own structure. Like you mentioned, Part 3 includes a lot of things. Adoption of these capabilities is only harder if we include Part 2 concepts in there as well. Since Part 3 already assumes that the processes it calls are Part 1 or Part 2 references, it makes more sense to reuse this abstraction.

Currently, I believe the SWG is working under the assumption that anything to do with "workflow" belongs to Part 3.

I think this is only a side effect of Part 3 being called "Workflow" when it defines way more than that. Workflow concepts were present since at least OGC Best Practice for Earth Observation Application Package, which following initiative participants decided to ignore for whatever reason...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants