-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow examples and use cases ... #279
Comments
Using CWL, the Workflow process It encapsulate 2 chained processes ( All 3 processes embed the CWL definition in their Execution uses the following payload: On submitted execution, the workflow will run the process chain, first "generating an image" from the input string, and the second process will do a simple pass-through of the file contents. The chaining logic is all defined by CWL. Because of OGC API - Processes itself has no actual knowledge of single Process vs Workflow chaining. The implementer can decide to parse the CWL and execute it as they see fit. From the external user point of view, atomic and workflow processes are distinguishable in terms of inputs/outputs. If need be, intermediate processes can be executed by themselves as well. Side Notes
|
Examples and use cases for OGC API - Processes - Part 3: Workflows & Chaining(apologies for a complete failure at trying to make it not-too-long) Scenario 1: Land Cover Classification (collection input / remote collections / collection output)Say we have a server providing a vast collections of sentinel-2 data to which new scenes captured by the satellites get added continuously. That data is hypothetically available from a hypothetical OGC API implementation deployed at Say we have another server providing MODIS data at Research center A has developed and trained a Machine Learning model able to classify land cover from MODIS and sentinel-2 data and published it as a Process in an OGC API - Processes implementation with support for Part 3 at Research center B wants to experiment with land cover classification, and they use their favorite OGC API client and first discovers the existence of the Land Cover classification process searching for "land cover" keywords from a central OGC catalog of trusted OGC API deployments of certified implementations. The client fetches the process description, and from it it can see what types of inputs are expected. Inputs are qualified with a geodata class which allows the client to easily discover implementations able to supply the data it needs. From the same central OGC catalog, it discovers the MODIS and sentinel-2 data sources as perfect fits, and automatically generates a workflow execution request that looks like this (despite its simplicity, the user still does not need to see it): {
"process" : "https://research-alpha.org/ogcapi/processes/landcover",
"inputs" : {
"modis_data" : { "collection" : "https://usgs.gov/ogcapi/collections/modis" },
"sentinel2_data" : { "collection" : "https://esa.int/ogcapi/collections/sentinel2:level2A" }
}
} Happy to first try the defaults, the researcher clicks OK. By default the process generates a land cover classification for one specific year. When receiving the request, the first thing the Workflows implementation on research-alpha.org will do is to validate those collection URLs as safe and retrieve their collection description to verify that they are proper inputs. This includes parsing information about the spatiotemporal extent of the collections as well as data access mechanism (e.g. Coverages, Tiles, DGGS...) and supported media types. The server recognizes the inputs as valid (e.g. it sees that their geodata class is a match) and plans on using OGC API - Tiles to retrieve data from the server, since both data inputs advertise support for coverage tiles. Confident that it can acommmodate the workflow being registered, the server responds to the request by generating a collection description document where the spatiotemporal extent spans the intersection of both inputs (e.g. 2016..last year for the whole Earth). The document also declares that the results can be requested either via OGC API - Coverages (with discrete categories), as OGC API - Features, or as OGC API - Tiles (either as coverage tiles or vector tiles). The client works best with vector tiles (as it uses Vulkan or WebGL to render them client-side), and supports Mapbox Vector Tiles which is one of the media types declared as supported in the response. The response included a link to tilesets of the results of the workflow execution request as Mapbox Vector Tiles. The client selects a tileset using the GNOSISGlobalGrid TileMatrixSet which is suitable for EPSG:4326 / CRS:84 for the whole world (including polar regions). That tileset includes a templated link to trigger processing of a particular resolution and area and request the result for a specific tile: The client now requests tiles for the current visualization scale and extent currently displayed on its virtual globe, by replacing the parameter variables with tile matrices, rows and colums. Since the collection also advertised a temporal extent with a yearly resolution and support for OGC API - Tiles datetime conformance class, the client also specified that it is interested in last year with an additional The research-alpha.org receives the requests and starts distributing the work. First it needs to acquire the data from the source collections. It sends request to retrieve MODIS and sentinel-2 data tiles. The sentinel-2 server supports a "filter" query parameter allowing to filter data by cloud cover at both the scene metadata as well as the cell data values level to create a cloud-free mosaic of multiple scenes, e.g. "filter=scene.cloud_cover < 50 AND cell.cloud_cover < 15". It also supports returning a flattened GeoTIFF when requesting a temporal interval and a "sortby" parameter to order with the least cloud cover will be preserved (on top): "sortby=cell.cloud_cover(desc)". The trained model requires imagery from different times during the year, so it uses the For the MODIS data, the server supports requesting tiles for a whole month with daily values (preserving the temporal dimension). The internal As soon as all the necessary input data is available to process one tile, the prediction for that tile is executed (using the model which persists in shared memory as long as it has been used recently). As soon as the prediction is complete for a tile, the result is returned. Due to the parallel nature of the requests/processing, the small pieces of data being requested and processed, the use of GPU acceleration, and the use of efficient and well optimized technology, the client starts receiving the result tiles within 1 or 2 seconds. The client immediately starts displaying the results with a default style sheet and caches the resulting tiles. Now the user starts zooming in on an area of interest. The lower resolution tiles are still displayed on the globe while waiting for more refined results to come in (requested for a more detailed zoom level / a smaller scale denominator). Soon those show up on the client display and the user starts seeing interesting classification results. If the user zooms back out, the lower-resolution / larger area results are still cached, so the user does not see a black screen. The user notices that a classification looks off for a particular land cover category. The user goes back in the execution request / workflow editor and tweaks an input parameter that should correct the situation. The client POSTs a new execution request as a result, which results in a new collection response and a new link to generate tiles of the results. The client invalidates the currently cache tiles which no longer reflect this updated workflow. The server validates the workflow immediately because it still has active connections to the input collections used and does not need to validate them again. The new response comes back quickly and the client can display the result again, which looks good. The landcover process server had cached responses from the previous MODIS and sentinel-2, so it does not need to go back to make those requests again. It simply needs to re-run the prediction model with the new parameters. The user explores areas of interest at different resolutions of interest and results keep coming in quickly. The user is satisfied with the results and now select a large area to export at a detailed scale. A lot of the results equired for this operation have already been cached during the exploration phase by the client and / or the landcover server. The "batch process" finishes quickly. The user is very happy with OGC API - Processs workflows after having succeeded producing a land cover map in 15 minutes from discovery to the resulting map. We demonstrated a similar scenario in MOAW project using sentinel-2 data from EuroDataCube / SentinelHub. See JSON process description. Scenario 2: Custom map rendering (remote process / nested process)As a slight twist to Scenario 1, the user wishes to render a map server-side using an their own server (but it could just as easily be on any server implementing a maps rendering process) instead of rendering it client-side. The server has a RenderMap process that takes in a list of layers as input. The result of the process is available either using OGC API - Maps or as map tiles using OGC API - Tiles, in a variety of CRSes and TileMatrixSets. The discovery process and selection of processes and input is very similar as in Scenario 1, except this time the RenderMap process will be the one to which the client will be POSTing the execution request. The landcover process will become nested process, its output being an input into the RenderMap process, and could be rendered on top of a sentinel-2 mosaic: {
"process" : "https://research-beta.org/ogcapi/processes/RenderMap",
{
"inputs" : {
"layers" : [
{
"collection" : "https://esa.int/ogcapi/collections/sentinel2:level2A",
"ogcapiParameters" : {
"filter" : "scene.cloud_cover < 50 and cell.cloud_cover < 15",
"sortby": "cell.cloud_cover(desc)"
}
},
{
"process" : "https://research-alpha.org/ogcapi/processes/landcover",
"inputs" : {
"modis_data" : { "collection" : "https://usgs.gov/ogcapi/collections/modis" },
"sentinel2_data" : { "collection" : "https://esa.int/ogcapi/collections/sentinel2:level2A" }
}
}
]
}
}
} The RenderMap process may also take in other input parameters, e.g. a style definition. In a similar manner to Scenario 1, the client will receive a collection description document, this time links to map tilesets and to a map available for the results. The client decides to trigger the processing and request results using OGC API - Maps, and build a request specifying a World Mercator (EPSG:3395) CRS, a bounding box, a date & time, and a width for the result (height is automatically calculated from the normal aspect ratio):
Although the client is requesting a WorldMercator map, the RenderMap process implementation might still leverage vector tiles using the GNOSISGlobalGrid tile matrix set, and thus submit multiple requests to the landcover process server, acting in the same way as the client-side renderer in scenario 1. See JSON process description for our implementation of such a process. Scenario 3: Publishing the results of a workflow (virtual collections)The researcher may now want to publish the map as a dedicated and persistent OGC API collection. The server can execute the processing based on requests received for that collection, but would also cache results to optimize processing, bandwidth, memory and disk resources. The collection description may also link to the workflow source, making it easy to reproduce and adapt to similar and derived uses. As new data gets added to the source collections, caches expire and the virtual collection is always up to date. Rather than the providers having to continously run batch processes using up a lot of resources for areas / resolution of interest that will be mostly out of date before any client is interested in the data, they can instead prioritize resources on the latest requests and on most important (e.g. disaster response). The server can also prioritize resources to pre-empted request that will follow the current request patterns when it has free cycles. Such pre-emption could offset the latency in workflows with a larger number of hops. This can also be done in the backend without users of the API being aware, but offering these explicit capabilities facilitate the reproducibility and re-use. Scenario 4: Backend workflow and EVI expression (nested process / deploy workflow)For this scenario, let's assume the landcover process is itself a workflow that leverages other processes. In addition to the raw sentinel-2 band, the classification algorithm might for example utilize a pre-computed vegetation index, and specify the filtering logic discussed earlier. landcover process workflow {
"process" : "https://research-alpha.org/ogcapi/processes/randomForestPredict",
"inputs" : {
"trainedModel" : "https://research-alpha.org/ogcapi/models/sentinel2ModisLandCover",
"data" :
[
{ "$ref" : "#/components/monthlyInput", "{month}" : 1 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 2 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 3 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 4 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 5 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 6 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 7 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 8 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 9 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 10 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 11 },
{ "$ref" : "#/components/monthlyInput", "{month}" : 12 }
]
},
"components" :
{
"modis":
{
"input" : "modis_data",
"format": { "mediaType": "application/netcdf" },
"ogcapiParameters" : {
"datetime" : { "year" : { "{datetime}.year" }, "month" : "{month}" }
}
},
"sentinel2":
{
"input" : "sentinel2_data",
"format": { "mediaType": "image/tiff; application=geotiff" },
"ogcapiParameters" : {
"filter" : "scene.cloud_cover < 50 and cell.cloud_cover < 15",
"sortby": "cell.cloud_cover(desc)",
"datetime" : { "year" : { "{datetime}.year" }, "month" : "{month}" }
}
},
"monthlyInput":
{
{ "$ref" : "#/components/modis" },
{ "$ref" : "#/components/sentinel2" },
{
"process" : "https://research-alpha.org/ogcapi/processes/coverage_processor",
"inputs" : {
"data" : { "$ref" : "#/components/sentinel2" },
"fields" : { "evi" : "2.5 * (B08 - B04) / (1 + B08 + 6 * B04 + -7.5 * B02)" }
}
}
}
}
} Our implementation of the RFClassify process works in a similar way, but up until now it has been implemented as a single process integrating Python and Scitkit-learn. This example introduces new capabilities that would make it easier to implement this as a workflow:
Scenario 5: Point cloud gridifier (landing page output)In this scenario, a collection supporting point cloud requests (e.g. as .las using OGC API - Tiles) is provided as an input, and the process generates two outputs for it by gridifying the point cloud: an ortho-rectified imagery and a DSM. In order to have access to both outputs, the client uses a {
"process" : "https://example.com/ogcapi/processes/PCGridify",
"inputs" : {
"data" : { "collection" : "https://example.com/ogcapi/collections/bigPointCloud" },
"fillDistance" : 100,
"classes" : [ "ground", "highVegetation" ]
}
} The response is an OGC API landing page, with two collections available (one for the ortho imagery and one for the DSM). {
"process" : "https://example.com/ogcapi/processes/RoutingEngine",
"inputs" : {
"dataset" : { "collection" : "https://example.com/ogcapi/collections/osm:roads" },
"elevationModel" :
{
"process" : "https://example.com/ogcapi/processes/PCGridify",
"inputs" : {
"data" : { "collection" : "https://example.com/ogcapi/collections/bigPointCloud" },
"fillDistance" : 100,
"classes" : [ "roads" ]
},
"outputs" : { "dsm" : { } }
},
"preference" : "shortest",
"mode" : "pedestrian",
"waypoints" : { "value" : {
"type" : "MultiPoint",
"coordinates" : [
[ -71.20290940, 46.81266578 ],
[ -71.20735275, 46.80701663 ]
]
}
}
}
} An interesting extension of this use case is to generate the point cloud from a photogrametry process using a collection of oblique imagery at one end, and to use use the process in a another workflow doing classification / segmentation and conversion into mesh, that a client can trigger by requesting 3D content using OGC API - GeoVolumes. See JSON process descriptions for the Point cloud gridifier and the Routing engine in our implementation of such processes. Scenario 6: Fewer round-trips (immediate access)As a way to reduce the number of round-trips, the ability to submit workflows to other end-points has been considered. For a live example of this capability, try POSTing the following execution request to the following end-points:
{
"process": "https://maps.ecere.com/ogcapi/processes/RenderMap",
"inputs": {
"layers": [
{ "collection": "https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama" }
]
}
} More examples (in Annex B) and additional details in draft MOAW discussion paper currently @ https://maps.ecere.com/moaw/DiscussionPaper-Draft3.pdf. |
(Sorry, I forgot to work on examples and was just reminded once Peter opened the issue. As such my contribution is rather short and a bit incomplete for now.) It seems there are multiple different base "use cases":
All this may also include: A. Publishing results In openEO, the focus is on 3 while it seems the previous posts here are focussing more on the parts. This is all not mutually exclusive though. So how can you achieve the use cases above in openEO: Use Case 1 (data retrieval)You need to send a load_collection + save_result to the back-end and store the data in a format you wish to get. Depending on the execution mode you may get different results: A. You can publish the data using web services, e.g. WMTS, using openEO's "secondary web service" API. Use Case 2 (low-level processing instructions)That's the main goal of openEO and that's where it probably shines most. A substantial amount of work has lead to a list of pre-defined processes that can be used for data cube operations, math etc. See https://processes.openeo.org for a list of processes. These can easily be chained (in a process graph) to a "high-level" process, we call them user-defined processes. The EVI example mentioned above looks like this in "visual mode" (child process graphs not shown): (Please note the code below is auto-generated from the Editor that is used for the visual mode above. As such the code may not be exactly what an experienced user would write.) This is the corresponding code from Python: # Loading the data; The order of the specified bands is important for the following reduce operation.
dc = connection.load_collection(collection_id = "COPERNICUS/S2", spatial_extent = {"west": 16.06, "south": 48.06, "east": 16.65, "north": 48.35}, temporal_extent = ["2018-01-01T00:00:00Z", "2018-01-31T23:59:59Z"], bands = ["B8", "B4", "B2"])
# Compute the EVI
B02 = dc.band("B02")
B04 = dc.band("B04")
B08 = dc.band("B08")
evi = (2.5 * (B8 - B4)) / ((B8 + 6.0 * B4 - 7.5 * B2) + 1.0)
# Compute a minimum time composite by reducing the temporal dimension
mintime = evi.reduce_dimension(reducer = "min", dimension = "t")
def fn1(x, context = None):
datacube2 = process("linear_scale_range", x = x, inputMin = -1, inputMax = 1, outputMax = 255)
return datacube2
# Stretch range from -1 / 1 to 0 / 255 for PNG visualization.
datacube1 = mintime.apply(process = fn1)
save = datacube1.save_result(format = "GTIFF")
# The process can be executed synchronously (see below), as batch job or as web service now
result = connection.execute(save) This is the corresponding code in R: p = processes()
# Loading the data; The order of the specified bands is important for the following reduce operation.
dc = p$load_collection(id = "COPERNICUS/S2", spatial_extent = list("west" = 16.06, "south" = 48.06, "east" = 16.65, "north" = 48.35), temporal_extent = list("2018-01-01T00:00:00Z", "2018-01-31T23:59:59Z"), bands = list("B8", "B4", "B2"))
# Compute the EVI
evi_ <- function(x, context) {
b2 <- x[1]
b4 <- x[2]
b8 <- x[3]
return((2.5 * (b8 - b4)) / ((b8 + 6 * b4 - 7.5 * b2) + 1))
}
# reduce_dimension bands with the defined formula
evi <- p$reduce_dimension(data = dc, reducer = evi_, dimension = "bands")
mintime = function(data, context = NULL) {
return(p$min(data = data))
}
# Compute a minimum time composite by reducing the temporal dimension
mintime = p$reduce_dimension(data = evi, reducer = mintime, dimension = "t")
fn1 = function(x, context = NULL) {
datacube2 = p$linear_scale_range(x = x, inputMin = -1, inputMax = 1, outputMax = 255)
return(datacube2)
}
# Stretch range from -1 / 1 to 0 / 255 for PNG visualization.
datacube1 = p$apply(data = mintime, process = fn1)
save = p$save_result(data = datacube1, format = "GTIFF")
# The process can be executed synchronously (see below), as batch job or as web service now
result = compute_result(graph = save) This is the corresponding code in JS: let builder = await connection.buildProcess();
// Loading the data; The order of the specified bands is important for the following reduce operation.
let dc = builder.load_collection("COPERNICUS/S2", {"west": 16.06, "south": 48.06, "east": 16.65, "north": 48.35}, ["2018-01-01T00:00:00Z", "2018-01-31T23:59:59Z"], ["B8", "B4", "B2"]);
// Compute the EVI.
let evi = builder.reduce_dimension(dc, new Formula("2.5*(($B8-$B4)/(1+$B8+6*$B4+(-7.5)*$B2))"), "bands");
let mintime = function(data, context = null) {
let min = this.min(data);
return min;
}
// Compute a minimum time composite by reducing the temporal dimension
let mintime = builder.reduce_dimension(evi, mintime, "t");
// Stretch range from -1 / 1 to 0 / 255 for PNG visualization.
let datacube1 = builder.apply(mintime, new Formula("linear_scale_range(x, -1, 1, 0, 255)"));
let save = builder.save_result(datacube1, "GTIFF");
// The process can be executed synchronously (see below), as batch job or as web service now
let result = await connection.computeResult(save); And this is how it looks like in JSON as process (graph): {
"process_graph": {
"1": {
"process_id": "apply",
"arguments": {
"data": {
"from_node": "mintime"
},
"process": {
"process_graph": {
"2": {
"process_id": "linear_scale_range",
"arguments": {
"x": {
"from_parameter": "x"
},
"inputMin": -1,
"inputMax": 1,
"outputMax": 255
},
"result": true
}
}
}
},
"description": "Stretch range from -1 / 1 to 0 / 255 for PNG visualization."
},
"dc": {
"process_id": "load_collection",
"arguments": {
"id": "COPERNICUS/S2",
"spatial_extent": {
"west": 16.06,
"south": 48.06,
"east": 16.65,
"north": 48.35
},
"temporal_extent": [
"2018-01-01T00:00:00Z",
"2018-01-31T23:59:59Z"
],
"bands": [
"B8",
"B4",
"B2"
]
},
"description": "Loading the data; The order of the specified bands is important for the following reduce operation."
},
"evi": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_node": "dc"
},
"reducer": {
"process_graph": {
"nir": {
"process_id": "array_element",
"arguments": {
"data": {
"from_parameter": "data"
},
"index": 0
}
},
"sub": {
"process_id": "subtract",
"arguments": {
"x": {
"from_node": "nir"
},
"y": {
"from_node": "red"
}
}
},
"div": {
"process_id": "divide",
"arguments": {
"x": {
"from_node": "sub"
},
"y": {
"from_node": "sum"
}
}
},
"p3": {
"process_id": "multiply",
"arguments": {
"x": 2.5,
"y": {
"from_node": "div"
}
},
"result": true
},
"sum": {
"process_id": "sum",
"arguments": {
"data": [
1,
{
"from_node": "nir"
},
{
"from_node": "p1"
},
{
"from_node": "p2"
}
]
}
},
"red": {
"process_id": "array_element",
"arguments": {
"data": {
"from_parameter": "data"
},
"index": 1
}
},
"p1": {
"process_id": "multiply",
"arguments": {
"x": 6,
"y": {
"from_node": "red"
}
}
},
"blue": {
"process_id": "array_element",
"arguments": {
"data": {
"from_parameter": "data"
},
"index": 2
}
},
"p2": {
"process_id": "multiply",
"arguments": {
"x": -7.5,
"y": {
"from_node": "blue"
}
}
}
}
},
"dimension": "bands"
},
"description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)"
},
"mintime": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_node": "evi"
},
"reducer": {
"process_graph": {
"min": {
"process_id": "min",
"arguments": {
"data": {
"from_parameter": "data"
}
},
"result": true
}
}
},
"dimension": "t"
},
"description": "Compute a minimum time composite by reducing the temporal dimension"
},
"save": {
"process_id": "save_result",
"arguments": {
"data": {
"from_node": "1"
},
"format": "GTIFF"
},
"result": true
}
}
} For details about our data cubes and related processes: https://openeo.org/documentation/1.0/datacubes.html Use Case 3 (high-level processing instructions)Any process that you define you can also store as a high-level process that others can execute and re-use. So the EVI process above could simply be stored and then be executed with a single process call. Then your process ist as simple as: Which in the three programming languages looks as such: # Python
datacube = connection.datacube_from_process("evi")
result = connection.execute(datacube) # R
p = processes()
result = compute_result(graph = p$evi()) // JavaScript
let builder = await connection.buildProcess();
let result = await connection.computeResult(builder.evi()); and in JSON: {
"id": "evi",
"process_graph": {
"1": {
"process_id": "evi",
"arguments": {},
"result": true
}
}
} This is simplified though, you'd probably want to defined parameters (e.g. collection id or extents) and pass them later. Use Case 4 (processing environments)We only cater partially for this. Right now, back-ends can provide certain pre-configured environments to run user code (so-called UDFs). This is currently implemented for Python and R and the environments usually differ by the software and libraries installed. Then you would send your code using We could extend the openEO API relatively easily in a way that user could push their own environments to the servers, but ultimately this was never the goal of openEO and as such could be covered by another standard. What I haven't captured yet
|
Sorry, I had the meeting in my calendar for 16:00 CET for whatever reason and thus only heard like the last minutes of the call. Did you conclude on something? Otherwise, happy to join the next telco again. |
@m-mohr nope ... no conclusions yet. @jerstlouis and @fmigneault presented their examples so it would be good if at the next meeting you could present your examples. One outcome of today's meeting was that @fmigneault will try to cast one of @jerstlouis examples in CWL. There will also be a recording of today's meeting available if you want to listen to the meeting. @bpross-52n can you post the recording somewhere when it is available? |
Following is the conversion exercise example Scenario 5 : RoutingEngine provided by @jerstlouis The first process is {
"processDescription": {
"id": "PCGridify",
"version": "0.0.1",
"inputs": {
"data": {
"title": "Feature Collection of Point Cloud to gridify",
"schema": {
"type": "object",
"properties": {
"collection": {
"type": "string",
"format": "url"
}
}
}
},
"fillDistance": {
"schema": {
"type": "integer"
}
},
"classes": {
"schema": {
"type": "array",
"items": "string"
}
},
}
"outputs": {
"dsm": {
"schema": {
"type": "object",
"additionalProperties": {}
}
}
}
},
"executionUnit": [
{
"unit": {
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"baseCommand": ["PCGridify"],
"arguments": ["-t", "$(runtime.outdir)"],
"requirements": {
"DockerRequirement": {
"dockerPull": "example/PCGridify"
}
},
"inputs": {
"data": {
"type": "File",
"format": "iana:application/json",
"inputBinding": {
"position": 1
}
},
"fillDistance": {
"type": "float",
"inputBinding": {
"position": 2
}
},
"fillDistance": {
"type": "array",
"items": "string",
"inputBinding": {
"position": 3
}
}
},
"outputs": {
"dsm": {
"type": "File",
"outputBinding": {
"glob": "*.dsm"
}
}
},
"$namespaces": {
"iana": "https://www.iana.org/assignments/media-types/"
}
}
}
],
"deploymentProfileName": "http://www.opengis.net/profiles/eoc/dockerizedApplication"
} The second process is {
"processDescription": {
"id": "RouteProcessor",
"version": "0.0.1",
"inputs": {
"dataset": {
"title": "Collection of osm:roads"
"schema": {
"type": "object",
"properties": {
"collection": {
"type": "string",
"format": "url"
}
}
}
},
"elevationModel": {
"title": "DSM file reference",
"schema": {
"type": "string",
"format": "url"
}
},
"preference": {
"schema": {
"type": "string"
}
},
"mode": {
"schema": {
"type": "string"
}
},
"waypoints": {
"schema": {
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": [
"type": {
"type": "string"
}
"coordinates": {
"type": "array",
"items": {
"type": "array",
"items": "float"
}
}
]
}
}
}
"outputs": {
"route": {
"format": {
"mediaType": "text/plain"
}
"schema": {
"type": "string",
"format": "url"
}
}
}
},
"executionUnit": [
{
"unit": {
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"baseCommand": ["RoutingEngine"],
"arguments": ["-t", "$(runtime.outdir)"],
"requirements": {
"DockerRequirement": {
"dockerPull": "example/RoutingEngine"
}
},
"inputs": {
"dataset": {
"type": "File",
"format": "iana:application/json",
"inputBinding": {
"position": 1
}
},
"elevationModel": {
"type": "File",
"inputBinding": {
"position": 2
}
},
"waypoints": {
"doc": "Feature Collection",
"type": "File",
"format": "iana:application/json",
"inputBinding": {
"position": 3
}
},
"preference": {
"type": "string",
"inputBinding": {
"prefix": "-P"
}
},
"mode": {
"type": "string",
"inputBinding": {
"prefix": "-M"
}
}
},
"outputs": {
"route": {
"type": "File",
"format": "iana:text/plain",
"outputBinding": {
"glob": "*.txt"
}
}
},
"$namespaces": {
"iana": "https://www.iana.org/assignments/media-types/"
}
}
}
],
"deploymentProfileName": "http://www.opengis.net/profiles/eoc/dockerizedApplication"
} Finally, the In the {
"processDescription": {
"id": "RoutingEngine",
"version": "0.0.1",
"inputs": {
"point_cloud": {
"title": "Feature Collection of Point Cloud to gridify",
"schema": {
"type": "object",
"properties": {
"collection": {
"type": "string",
"format": "url"
}
}
}
},
"roads_data": {
"tite": "Collection of osm:roads",
"schema": {
"type": "object",
"properties": {
"collection": {
"type": "string",
"format": "url"
}
}
}
},
"routing_mode": {
"schema": {
"type": "string"
}
}
}
"outputs": {
"estimated_route": {
"format": {
"mediaType": "text/plain"
}
"schema": {
"type": "string",
"format": "url"
}
}
}
},
"executionUnit": [
{
"unit": {
"cwlVersion": "v1.0",
"class": "Workflow",
"inputs": {
"point_cloud": {
"doc": "Point cloud that will be gridified",
"type": "File"
}
"roads_data": {
"doc": "Feature collection of osm:roads",
"type": "File"
},
"routing_mode": {
"schema": {
"type": "string",
"enum": [
"pedestrian",
"car"
]
}
}
},
"outputs": {
"estimated_route": {
"type": "File",
"outputSource": "routing/route"
}
},
"steps": {
"gridify": {
"run": "PCGridify",
"in": {
"data": "point_cloud",
"classes": { "default": [ "roads" ] },
"fillDistance": { "default": 100 }
},
"out": [
"dsm"
]
},
"routing": {
"run": "RouteProcessor",
"in": {
"dataset": "roads_data",
"elevationModel": "gridify/dsm",
"preference": { "default": "shortest"},
"mode": "routing_mode"
},
"out": [
"route"
]
}
}
}
}
],
"deploymentProfileName": "http://www.opengis.net/profiles/eoc/workflow"
} I would like to had that these examples are extremely verbose on purpose only to demonstrate the complete capabilities of chaining potential. There are no ambiguity on how to chain elements whatsoever, no matter the amount of processes and I/O implied in the complete workflow. At least half of all those definitions could be automatically generated as we can see that there are a lot of repetition between CWL's |
As mentioned during the telco here is the example fan-out application design pattern using the CWL ScatterFeatureRequirement requirement. Scatter Crop Application ExampleExample of a CWL that scatters the processing from an array of input values.
Composite two-step Workflow ExampleThis section extends the previous example with an Application Package that is a two-step workflow that crops (using scatter over the bands) and creates a composite image.
Please check OGC 20-089 section 8.5 Application Pattern and 8.6. Extended Workflows for more information about these examples. |
A 2008 paper listing some workflow languages used in e-Science is at https://www.dcc.ac.uk/guidance/briefing-papers/standards-watch-papers/workflow-standards-e-science Note that BPMN is also an ISO standard. Some related engineering reports:
I'm not suggesting that OGC API - Processes - Part 3 should use BPMN instead of CWL. I am pointing out that there is a case for supporting multiple workflow languages, if possible. |
This is related to what I was suggesting in Scenario 3 above:
I think it could be possible when creating a process from a workflow (through Part 2: Deploy, Replace, Undeploy) to infer the process description, and then potentially add additional metadata (e.g. a title, input descriptions that cannot be inferred, etc.). In this case, the media type of the payload would be a media type specific to the workflow language... (e.g. CWL, OpenEO, execution request extended with the capabilities I initially proposed for Part 3, i.e. OGC API collections and nested process execution request, and those identified by @ghobona ). This could be done e.g. by separating out the processDescription from the executionUnit and (similar to OGC API - Styles where we first POST a style, and then add metadata to it, but the content-type of the stylesheet is exactly e.g. SLD/SE or MapboxGL Style) -- in this case the media type could be CWL directly. Another example of an execution unit media type could be a Jupyter notebook. |
@jerstlouis I think is would be a major pain point against OGC API - Processes interoperability because there would basically be no way to replicate executions since we are not even sure which process description gets executed. Each implementation could parse the contents in a completely different manner and generate different process descriptions. This is working against the purpose of the standard being developed in my opinion. The advantage of deployment, although it needs extra steps, is that at the very least we obtain some kind of standard description prior to execution that allows us validate if the process to run was parsed correctly. Can you please elaborate more on the following part? I'm not sure I understand what you propose.
Do you mean that there would be 1 "Workflow Engine" process without any specific inputs/outputs, and that each |
The original idea for these ad-hoc workflows in Part 3 is to allow clients to discover data and processes and immediately make use of these, without requiring special authentication privileges on any of the servers. In that context, I imagined that this would involve lower level processes already made available (and described) using OGC API - Processes (with support for Part 3, or support only Core and using an adapter like the one we developed). I am not sure how well this capability could extend to CWL or OpenEO as well, but was just throwing it out as a possibility because I think you had mentioned before that this could make sense. The idea is not to replace deployment either... Processes or virtual persistent collections could still be created with those workflows, but the ad-hoc mechanism can provide a way to test and tweak the workflow before publishing it and making it widely available.
In the current draft of Part 3, there is always a top-level process (the one closest to the client in the chain), and the execution request is POSTed to that process. The "process" property is only required for the nested processes, and actually this has resulted in confusion when specifying one (optional) top-level process but POSTing the workflow to the wrong process execution end-point. There could be a "workflow engine" process as you suggest that requires the "process" key even for the top-level process, avoiding that potential confusion. This might also make more sense with CWL if there is not always a top-level OGC API - Process involved at the top of the workflow.
Sorry, I might have been adding confusion by mixing up two separate things:
In those "Styles" examples and providing For b) ad-hoc workflows, in the context of Part 3 as originally proposed, it mainly means re-using processes and data collections already deployed, in a more complex high-level workflow. I imagined the same could potentially be done with CWL. Process descriptions are not involved here (except for any processes used internally, whose descriptions are useful to put together the workflow). |
Just a gentle reminder, Part 2 is not called "OGC API - Processes - Part 2: Deploy, Replace, Undeploy" (i.e. DRU) ... It is no longer called "Transactions". |
@fmigneault Thank you for adapting those examples in so much details! To attempt to get the cross-walk going, some first comments of those 3 JSON snippets:
|
I see. Yes, I was confused about the additional I agree with you, the examples I provide converting to CWL are the processes what would be dynamically generated if one wants to represent something POSTed as Part 3 on Indeed, the Regarding your 3rd point (#279 (comment)), Looking back at your examples, I understand a bit more the various use cases presented and I believe it is possible to consider all new proposed functionalities separately to better illustrate concerns. 1. Collection data typee.g.: Inputs that have this kind of definition: "layers": [
{ "collection": "https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama" }
] In my opinion, this should be a new type in itself, similar to bounding boxes. I don't think this should be part of Part 3 per se (or at least consider it as a separate feature). I believe more details needs to be provided because there are some use cases where some "magic" happens such as when 2. Nested processes (the main Part 3 feature)e.g.: A definition as follows: {
"process" : "https://example.com/ogcapi/processes/RoutingEngine", <---------- optional (POST `/processes/RoutingEngine/execution`)
"inputs" : {
"dataset" : { "collection" : "https://example.com/ogcapi/collections/osm:roads" },
"elevationModel" :
{
"process" : "https://example.com/ogcapi/processes/PCGridify", <--- required, dispatch to this process
"inputs" : {
"data" : { "collection" : "https://example.com/ogcapi/collections/bigPointCloud" },
"fillDistance" : 100,
"classes" : [ "roads" ]
},
"outputs" : { "dsm" : { } } <--- (*) this is what chains everything together
}, (*) I also think there would be no need for custom-workflow/separate POSTing of 3. Components schema to reuse definitionsThis refers to provided Scenario 4. I find the reuse of definitions with #/components/` a very nice concept feature. The The other two items from Scenario 4 ( 4. ExpressionsI think the expressions Firstly, Second, For example, in CWL, it is possible to do similar substitutions, but to processes them, a full inline parsing using I don't think many developers would adopt Part 3 Workflow/Chaining capabilities (in itself relatively simple) when such a big implementation requirement must be supported as well. I think OGC API - Processes should keep it simple. I could see the 5. New operationsThis mostly revolves around Scenario 1 and Scenario 6.
In my opinion, this also unreasonably increases the scope of Part 3 that should focus only on Workflows (ie: nesting/chaining processes). This is also the portion I find no way to easily convert to a CWL equivalent dynamically, simply because there is no detail about it. |
Thanks @fmigneault for the additional feedback. I will try to address everything you commented on, please let me know if I missed something. First, I think what you are pointing out is that there is a range of functionality covered in those scenarios which makes sense to organize into different conformance classes. Which of these conformance classes make it into Part 3 remains to be agreed upon, and I think as @pvretano pointed out is one of the main point of this exercise (although perhaps that was specifically referring to conformance classes for different workflow languages). Note that even if these conformance classes are regrouped in one Processes - Part 3 specification, an implementation could decide to implement any number of the conformance classes, and potentially none of them would be required. Therefore I suggest we focus first on the definition of these conformance classes, and worry later about how to regroup those conformance classes in one or more specification / document. In my scenarios 1-6 above, the names in parentheses are the conformance classes that I had suggested previously. Envisioned conformance classes:
In the conformance classes suggested for Part 3, this refers specifically to NestedProcess and RemoteProcess. With RemoteProcess, there would be no need to first deploy the process, whereas with NestedProcess, a process would need to be deployed first in order to use it in a workflow.
This is specifically the CollectionInput conformance class. I agree that this bit alone is very useful by itself, but it is also what greatly simplifies the chaining, because it works hand in hand with the CollectionOutput conformance class. CollectionOutput allows accessing the output of a process as an OGC API collection. Any process that accepts a collection input is automatically able to use a nested process (whether local or remote) that can generate a collection output. I fully agree that CollectionInput is useful by itself, in fact there was a perfect example in Testbed 17 - GeoDataCube where the 52 North team implemented support for a LANDSAT-8 Collection input in their Machine Learning classification process / pygeoapi deployment. Whether this conformance class is added to OGC API - Processes - Part 1: Core 2.0 or OGC API - Processes - Part 3: Workflows and Chaining however does not really matter.
In full agreement here, as these are details that need to be worked out with more experimentation. Using OGC API collections leaves a lot of flexibility, some of which might be useful to leave up to the hop end-points to negotiate between themselves, but a filter that further qualifies the collection is a good example of wanting to restrict the content of that collection directly within the workflow. The datetime parameter use case in this scenario where daily datasets are used to generate a yearly dataset, but for which the process needs to first generate monthly coverages is another good example where the end-user query
I am a bit confused by that comment.
The process description for coverage_processor in this case would define a single output (the resulting coverage), therefore it is not necessary to specify it (as in the published Processes - Part 1: Core).
I think the confusion here is caused by the use of the Potentially, those inputs could also be supplied as embedded data to the process created by the workflow, and in that case the ogcapiParameters and format would not be meaningful/used -- the filtering and proper format would have had to be done prior to submitting the data as input to the landcover process defined by the Scenario 4 workflow.
I agree this is more complicated and I just came up with those while trying to put this Scenario 4 example together. Some of this capability to reference how the OGC API collection data request were made I think would make sense to include as part of the CollectionOutput conformance class (e.g. The capability to use e.g.
You are right this is a specific capability here to be able to specify a monthly request using only the year that was provided by the OGC API request triggering the processing, but specifying a different month. What I wished for while writing this was functions to build an ISO8601 string as the OGC API This Scenario 4 example is testing new grounds in terms of the capabilities of execution request-based workflow definitions as explored so far, but despite a few things to iron out I feel it manages to very concisely and clearly express slightly more complex / practical workflows.
I would welcome suggestions in how to better express it. I actually considered whether I needed to define new processes but this was the best balance I could manage Sunday night in terms clarity / conciseness / least-hackish / ease of implementation. These aspects are definitely still Work in Progress :) The idea here is that
I have to clarify here that Scenario 6 is completely different from Scenario 1 in this regard.
If we are talking about CollectionOutput, I think it does fit well within Workflows and chaining because it provides an easy way to connect the output of any process as an input to any other process, and it enables the use of Tiles and DGGS zones as enablers for parallelism, distributedness, and real-time "just what you need right now" with hot workflows working on small pieces at a time, rather than batched processing ("wait a long time / use up a lot of resources, and what you get in the end might actually not be what you wanted, might never end up being used, or might be outdated by the time it is used"). If we are talking about ImmediateAccess which is covered by Scenario 6, it is a much less essential capability, but as I explained it is quite useful for demonstration purposes (e.g. to demonstrate a PNG response of a workflow directly in SwaggerUI as a single operation), and to some extent to provide fewer server round-trips (e.g. submitting a workflow and getting a templated Tiles URI in a single step).
Seems like there is still some confusion about POSTing workflows and media types, so I will try to clear this out :)
Leaving Scenario 6 aside, and focusing on the CollectionOutput capability (e.g. Scenario 1) whereas making an OGC API data request triggers process execution to generate the data for that response, would there be something equivalent? I don't think there are many details missing other than those in respective OGC API specifications (e.g. Tiles, Maps, Coverages...). The data access OGC APIs specify how to request data from an OGC API collection, and an implementation of Part 3: One other nice thing about CollectionOutput is that it makes support for visualizing the output from workflows in visualization clients much easier (than e.g. Processes - Part 1: Core) with doing very little specifically to implement process / workflow execution. This capability is e.g. implemented in the GDAL OGC API driver (and thus available in QGIS as well). It was also easily implemented in clients by participants in Testbed 17 / GeoDataCube. Thanks! |
@bpross-52n @pvretano Please add a |
FYI: A modern list (that is continually being updated) with over 300 workflow systems/languages/frameworks known to be used for data analysis: https://s.apache.org/existing-workflow-systems There is another list at https://workflows.community/systems that just started. This younger list aims to be a better classified subset of the big list: only the systems that are still being maintained. |
Nice. I missed the presentation about those.
We must be careful not to overlap with Part 2 here. This is the same method/endpoint to deploy the complete process.
This made me think that we must consider some parameter in the payload that will tell the nested process to return the output this way. Maybe for example
I agree that could be allowed if the default was to return CollectionOutput, but since processes are not expected to do so by default (from Core), I think the proposed
I agree. By "multiple outputs", I specifically refer to the variable
I think that a process (lets call it
For this (Collection[Inputs|Outputs] working in hand with Workflows), I totally agree.
I think this is possible to map to CWL definitions dynamically if only I think it is important to keep extensions separate for their relevant capabilities, although they can work together afterwards. |
The POST operation to
Well the idea here is that the end-points of any particular hop of that workflow would be the ones deciding whether CollectionOutput is used or not, based on conformance support. It is not required that they do so, e.g. if the Processes server does not support CollectionOutput, Processes - Core could be used and requests could be made using sync or async execution mode -- there is no assumption that one or the other is used.
Not necessary as I just pointed out, and raw vs. document is gone with #272 (2.0?).
To make things super clear: CollectionOutput allows to request a ImmediateAccess allows to both POST the workflow and request a tile at the same time, or to POST a workflow and get a tileset right away, as in Scenario 6. e.g. POST workflow to https://maps.ecere.com/ogcapi/processes/RenderMap/map , ImmediateAccess is a nice to have for demonstration and skipping HTTP roundtrips. I don't mind if it doesn't end up in Part 3.
Well one of the important ideas with the CollectionInput / CollectionOutput conformance classes is to leave flexibility to make workflows as generic and re-usable as possible with different OGC API implementations. For example, one might re-use the exact same workflow with different servers or data sources but in practice some will end-up exchanging data using DGGS, others with Tiles, others with Coverages; or another will negotiate netCDF, while another will negotiate Zarr, or GRIB. And the workflow does not need to change at all to accommodate all of these. It also leaves the workflow itself really reflecting exactly what the user is trying to do: apply this process to these data sources, feed its input to this other process, and all the exchange and communication details are left out of the workflow definition for negotiation by the hops. Of course any implemention of this is free to convert this in the back-end to smaller tasks and sub-processes invocations internally.
I think there are different opinions about this throughout OGC. With the building blocks approach, I believe that the fundamental granularity that matters for implementation is the conformance classes, whereas the parts are just a necessary organization of the conformance clases into specification documents for publication and other practical reasons. Taking OGC API - Tiles - Part 1: Core as an example, there is definitely no expectation that any implementation will implement all of its conformance classes. So I disagree that handpicking conformance classes to implement is a bad thing, just like handpicking which OGC API / parts one implements in an OGC API implementation is not a bad thing. More importantly, I think the modularity of OGC API building blocks makes it easy to start by implementing one or more conformance class, and gradually add support for additional ones based on practical needs and resources available. |
I was thinking that we could define a WellKnownProcess that allows executing command line tools with an execution request workflow, similar to the approach used in your example using CWL to define base processes @fmigneault : Scenario 7This would be POSTed to {
"process" : "http://example.com/ogcapi/processes/ExecuteCommand",
"inputs" : {
"command" : "PCGridify",
"requirements" : {
"docker" : { "pull": "example/PCGridify" }
},
"stdin" : { "input" : "data", "format": { "mediaType": "application/vnd.las" } },
"arguments" : [
"-fillDistance",
{ "input" : "fillDistance", "schema" : { "type" : "number" } },
"-classes",
{ "input" : "classes", "schema" : { "type" : "array", "items" : { "type" : "string" } } },
"-orthoOutput",
"outFile1"
]
},
"outputs" :
{
"stdout" : {
"output" : "dsm",
"format": { "mediaType": "image/tiff; application=geotiff" }
},
"outFile1" : {
"output" : "ortho",
"format": { "mediaType": "image/tiff; application=geotiff" }
}
}
} Realizing that we also probably need this |
@jerstlouis |
@fmigneault This is to allow the Part 3 execution request approach / DeployWorkflow to work as a Content-Type option for deploying workflows as a process with Part 2, will a WellKnown process that can execute a command line tool. It is not a process description, but an execution request as currently defined in OGC API - Processes - Part 1: Core using the extensions defined in Part 3 DeployWorkflow conformance class ("input" and "output"). The process description for the resulting PCGridify process could be inferred from its inputs and outputs and generated automatically. There is nothing CWL in there except for the inspiration from your example and the docker pull requirements :). One could still POST CWL instead of this execution request workflow of course to deploy a process, or an application package that bundles a process description + CWL in the executionUnit, as different supported Content-Types to deploy processes. |
@jerstlouis something seems wonky here! There should be no need for a "DeployWorkflow". Whether the execution unit of a process is a Docker container, or a Python script or a CWL workflow, that should not matter. All processes should be deployed the same way (i.e. POST to the /processes endpoint as described in Part 2). I am confused. |
In full agreement with that. We might need different media types or JSON profiles for OGC API - Processes execution requests, for CWL, and for application packages, for this. What I call the Part 3 - DeployWorkflow conformance class is:
So it is those new properties to define inputs & outputs, plus a particular Content-Type for a Part 2 Deploy operation. Will Part 2 have different conformance classes for different Content-Types? (e.g. like the different Tiles encodings conformance classes). There is already one for OGC Application Package. If not this DeployWorkflow conformance class, which conformance class could define the capability to define "input" and "output" of the workflow itself for using the workflow as a process, rather than a ready-to-execute execution request? It could be potentially be an Execution Request Deployment Package in Part 2 instead. NOTE: Different media types for CWL, execution request than for application package are in the context of NOT using the OGC Application Package conformance class defined in Part 2. It is also a possibility to include the execution request-style workflow (just like CWL) in the execution unit of an application package. Personally I find that the process description is something that the server should generate, not be provided as part of a Part 2 deployment, because it includes information about how the processes implementation is able to execute things (e.g. sync/async execution mode), and it may be able to accept more e.g. formats than the executionUnit being provided, and because most of the process description can often be inferred from the executionUnit alone. Therefore I don't like the current application package approach very much, and would prefer directly providing the executionUnit as the payload to the POST. |
@jerstlouis |
The ambiguity seems to be between the definition vs. description of a process, as in that statement you made right there. Using my understanding of those terms, when deploying a process using an OGC application package, a description is provided in the processDescription field, whereas the definition is provided in the executionUnit field. When retrieving the process description (GET Optionally being able to retrieve the definition of a process as well makes sense if you want to allow users to re-use and adapt a particular workflow, but that would be a different operation (e.g. GET I had initially suggested this capability for a workflow deployed as a persistent virtual collection, but it applies to a workflow deployed as a process as well):
About:
First I want to point out that the README gets it perfectly right:
and so does the HTTP PUT description:
Right below is where it gets muddied:
The OGC Application Package includes BOTH a description and a definition (called executionUnit). My argument is that the executionUnit is the most important piece and as a whole the package should be considered a definition, as in the README and the PUT description. That is because you could often infer most or all of the description from the executionUnit. Also in the ASCII sequence diagram below:
and the other one. Note that as we discussed previously, a per-process OpenAPI description of a process would make a lot of sense for Part 1 (e.g. GET |
I agree with you on this. For our implementation, we actually use I think the
Instead of even saying process definition for that sentence, I suggest to explicitly use execution unit definition to avoid the possible description/definition confusion altogether. It is only the execution unit (CWL, etc.) that can be anything. |
I've clarified above that the ambiguity is between description and definition.
Unfortunately here we currently have a clear mismatch between the POST/PUT and the GET.
The GET does describe the definition of what was POSTed, but it is a description, which is fundamentally different from the definition.
I would like to avoid using the word description to refer to this and call that a definition, to avoid confusion with the process description returned by GET
I don't understand why you say that the Part 3 execution workflow does not have its workflow chain defined yet? How I understand it is that the Part 3 execution request workflow is the workflow chain. Some detailed aspects of it are resolved dynamically as part of the ad-hoc execution or deployment of the workflow (e.g. format & API negotiation for data exchange a particular hop), but the overrall chain is already defined.
I would point out that this also applies to the CWL included in the application packages execution unit. A I will try to find time to address the other points you touched on in the message, but it's a busy week ;) |
@jerstlouis yup, you are right. I'll clean up the wording a bit. I think the correct statement is that you POST a "description" of a process (i.e an application package that includes ithe processes' definition) to the |
@jerstlouis GET gets the definition of a process. There is no way to GET the description of the process where description is the definition PLUS other information an OAProc endpoint needs to be able to actually deploy a process. The description of the process is what we call the application package. Is everyone in agreement with this terminology? |
@pvretano It's the other way around :) You POST a definition at We don't yet have a GET operation for retrieving the definition, but I had suggested GET How about |
Fine with me.
At the moment the request is submitted with details on how to chain I/O, the Workflow is not yet defined from the point of view of the API. After the contents are parsed, some workflow definition can then be dumped to file, database or held in memory by the runner that will execute it, then the workflow exists. I'm just pointing out that using a deployed workflow, the API doesn't even need to parse the payload, it is already aware of the full workflow definition. Because these different workflow interpretations happen at different times, it is important to properly identify them, to avoid the same confusion as for the process description/definition. @pvretano |
Yikes. Stop! @jerstlouis @fmigneault Please chime in with ONE WORD answers. I don't want an essay! ;) |
GET /processes/{processId? A description -- That's what it is called in Part 1. |
What do we call what you get from GET /processes/{processId? description |
So, we GET a description and we POST a definition. I will update the terminology in Part 2 accordingly! OK? |
Excellent! Progress ... :) |
@fmigneault About:
The MOAW workflows (Part 3 execution request-based workflow definitions) can either be used to define deployable workflows and deployed with Part 2, or executed in an ad-hoc manner by POSTing them to an execution end-point -- both options are possible (separate capabilities: a server could support either or both). Both ad-hoc execution and deployed workflows could also make sense with CWL and OpenEO process graphs.
Part 3 defines the "ac-hoc workflow execution" capability as a way to allow using pre-deployed (local and/or remote) processes (i.e. NestedProcess/RemoteProcess) and (local and/or remote) collections (i.e. CollectionInput/RemoteCollection) which does not require the client to have access to deploy new processes. With the CollectionOutput capability, even "ad-hoc workflow execution" can be POSTed only once, and data can be retrieved from it for many different regions / resolutions without having to POST the workflow for each process-triggering data request.
The selection of "outputs" is a capability already in the Core execution request. Nested processes is really the only extension for ad-hoc execution.
The DeployableWorkflows are what needs the wiring of inputs/outputs of the overall process being deployed to the inputs/outputs of the processes internally, so that is another extension specific to that capability. Still in both cases, it's the exact same execution request schema with very specific extensions. |
@jerstlouis |
Correct, plus Part 3 introduces the CollectionOutput and LandingPageOutput execution modes returning a collection description and landing page respectively (with client then triggering processing via data access requests, e.g. Coverages or Tiles).
I think we are lost in terminology again, because what I mean by "ad-hoc workflow execution" (POSTing directly to an execution end-point) is the polar opposite of "deployed workflow". However, in the case of CollectionOutput and LandingPageOutput, you could include a link to the "workflow definition" in the response. I imagine this link could also be included in the case of a job status / results document response.
The "ad-hoc workflow execution" is to avoid having to deploy it as a process. (e.g. there are fewer safety issue with executing already deployed processes vs. deploying new ones; or an EMS may only execute processes but not have ADES capabilities).
In the case of CollectionOutput and LandingPageOutput, the client just makes different OGC API data request from the links in the response. In Sync / Async mode, the user cannot -- they need to submit another ad-hoc execution (that's why it's an ad-hoc execution: no need to deploy first). Now in contrast to the "ad-hoc workflow execution", the "deployable workflow" is what you can deploy as a process, using Part 2. That can be done with CWL, or OpenEO, or a MOAW workflow (extended from the Processes - Part 1: Core execution request + nested processes + input/output wiring of overall process to internal processes) in the execution unit. That execution unit can be included in an "OGC JSON Application Package", or be directly the Content-Type POSTed to Does that make things more clear? |
@jerstlouis So if I follow correctly, the deployment of this ad-hoc workflow could be defined and referenced by a link for description provided in LandingPageOutput, but there is no methodology or schema provided by Part 3 to indicate how this deployment would be done, nor even what a MOAW workflow definition would look like? (note: I don't consider the payload in the execution body a definition itself because it employs values, which cannot be deployed as is to create a process description with I/O types. It's more like making use of the definition, but it would be wrong to have specific execution values in the process description). If that is the case, I don't think it is fair to say "The MOAW workflows [...] can either be used to define deployable workflows" if an example of a workflow definition inferred from the execution chain is not provided as example. It seems to contradict with "they need to submit another ad-hoc execution". What would a MOAW workflow even look like then when calling GET |
It would look like the Part 1 execute request, with the following two extensions:
I am not sure I understand your view on this... If you consider a workflow with single hop, it is identical to a Processes - Part 1: Core. If you have 1 nested process, the top-level process receiving workflow acts as a Processes - Part 1: Core client with that nested process. So since it works for one hop, why wouldn't it work with any number of hops?
I don't understand what you mean by this... It seems like you might possibly be mixing up the execution request invoking the blackbox process vs. the execution request defining the workflow that invokes processes internally (not the blackbox process). Could that be the case? |
It is not really about the number of hops. There is no issue about quantity of nested processes or how they connect to each other. When the ad-hoc workflow is submitted for execution, the values are embedded in the body (this is fine in itself, no problem). {
"process": "url-top-most",
"inputs": {
"input-1": {
"process": "url-nested",
"inputs": {
"some-input": "<some-real-data-here raw|href|collection|...>"
},
"outputs": { "that-one": {} }
}
}
} The problem happens when trying to explain the behaviour between Part 2 and Part 3. The above payload is not a direct definition. Let's say there was a way for the user to indicate they want that exact chain to be process
If this |
If that example workflow is intended to be a DeployableWorkflow, and "some-input" is an input parameter left open to be specified when executing mychain, then it should use the {
"process": "url-top-most",
"inputs": {
"input-1": {
"process": "url-nested",
"inputs": {
"some-input": { "input" : "myChainInput1" }
},
"outputs": { "that-one": { "output": "myChainOutput1" } }
}
}
} That wires the "myChainInput1" input of the myChain blackbox to the "some-input" of the "url-nested" internal process (and same for output). A process description for myChain can be fully inferred from this, at least in terms of inputs / outputs (but not things like title and description cannot without providing details). This is a DeployableWorkflow, so nothing to do with the "ad-hoc workflow execution" (which does not leave any input/output open, but would provide values for all inputs). And to clarify again ad-hoc workflow stands in opposition to deployed workflow:
Does that help? |
Yes, that helped a lot. My following question is not about if process description can or cannot be inferred (it definitely can), but rather which approach between (1) and (2) in #279 (comment) must be undertaken? Is it safe to say that if {
"id": "myChain",
"inputs": {
"myChainInput1": { "schema" : { "type": "string (a guess from 'some-input')" } }
},
"outputs": {
"myChainOutput1": { "schema": { "type": "string (a guess from 'that-one')" } }
}
} But if Also to make sure, would I think DeployableWorkflow and "ad-hoc workflow execution" could be considered as a whole, because I could take advantage of the similar structure to do both deploy+execute using this: "inputs": {
"some-input": { "input" : "myChainInput1 (for deploy)", "value": "<some-data> (for execute)" }
} Mapping from/to MOAW/CWL would then be very much possible. |
Correct, but then the workflow is not really intended to be deployed as a process as it does not accept any input. It would make more sense as an ad-hoc workflow execution, or POSTed as a persistent virtual collection to
My thinking (which is relatively recent since I realized we were missing this You are right that the top-level process would be pointless in this case, so for the example to make sense we should also specify another "output" from url-top-most's which would be a second output from mychain.
Well yes the MOAW syntax is the same in both cases, and much the same as Part 1 as well -- re-usability was definitely the goal.
Awesome :) "inputs": {
"some-input": { "input" : "myChainInput1 (for deploy)", "value": "<some-data> (for execute)" }
} Would that ever happen in the same workflow though? With CollectionInput, |
@pvretano I've just seen the MulAdd example in the tiger team recordings. I think it would be a first good step to translate that into openEO to see how it compares. Can you point me to the example? I can't really read the URL in the video. Then I could do a quick crosswalk... |
@m-mohr In the meantime, for Mul and Add processes taking two operand inputs value1 and value2 it would look something like: {
"process": "https://example.com/ogcapi/processes/Mul",
"inputs": {
"value1": 10.2,
"value2": {
"process": "https://example.com/ogcapi/processes/Add",
"inputs": {
"value1": 3.14,
"value2": 5.7
}
}
}
} |
Thanks, @jerstlouis, but I was looking at another example from @pvretano which had a lot more metadata included. The full example from him would be better to crosswalk as it shows more details. |
Following on from issue #278, the purpose of this issue is to capture examples of workflows from the various approaches (OpenEO, OAPIP Part 3, etc.), compare them and see where there is commonality and where there are differences. The goal is to converge on some conformance classes for Part 3.
Be specific with the examples, provide code if you can and try to make then not-too-long! ;)
The text was updated successfully, but these errors were encountered: