Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage Processing Language #146

Open
ghobona opened this issue Jul 22, 2021 · 28 comments
Open

Coverage Processing Language #146

ghobona opened this issue Jul 22, 2021 · 28 comments
Labels
2021-07 Sprint Extension Will be addressed by a future extension

Comments

@ghobona
Copy link
Contributor

ghobona commented Jul 22, 2021

The discussion today suggested the need for a simple OGC Coverage Processing Language.

This could be based on the simple expression languages of QGIS, GRASS.

@jerstlouis
Copy link
Member

Related to #108

Defining a simple expression language which can express the values for desired bands, with some capabilities for aggregration would be very useful.

It was also discussed that Workflows allow to define any number of "coverage processor" that could e.g. support exactly the processing languages of QGIS, GRASS, etc. These would profile OGC API - Processes in defining exactly the inputs and behavior of a process that implements a specific type of coverage processor.

This could also be one way to integrate WCPS, e.g. as we did with:

http://maps.ecere.com/ogcapi/processes/WCPSAdapter

WCPS can also be integrated directly within an OGC API as discussed in #97 .

In addition to full processing capabilities, we also identified that filtering at both the scene/image level as well as at the individual cell level would also be useful as a type of simple processing that could e.g. filter out cloudy scenes (discussed as part of #105) or cells (#103).

These could be separate types of filters (e.g. scene-filter= and filter=) that could both use e.g. CQL (and potentially support a filter-lang parameter as well for alternate languages to express them).

@ghobona ghobona transferred this issue from opengeospatial/ogcapi-code-sprint-2021-07 Oct 5, 2021
@jerstlouis jerstlouis added the Extension Will be addressed by a future extension label Mar 30, 2022
@jerstlouis
Copy link
Member

Most of these capabilities could actually be handled by query parameters as described in #164.

@m-mohr
Copy link

m-mohr commented Jul 3, 2023

I'd try to avoid a new language, but instead use and potentially extend what is already available:

  • WCPS (Adoption? Simplicity?)
  • openEO (Different terminology?)
  • CQL2 (boolean expressions only?)
  • ...?

Maybe this can also somehow be solved with Processes - Part 3 and its various proposed encodings?

@jerstlouis
Copy link
Member

jerstlouis commented Jul 3, 2023

@m-mohr As per the comment above referencing #164 , at this point I feel like starting from CQL2 (and extending as needed) is really the simplest solution, though there is still some work to clarify and define aggregation capabilities.
In Processes - Part 3, there is also "Input / Output Fields modifiers" requirements class where CQL2 can be used to define filter, derived fields or sortby to reshape inputs or outputs of a process.

OpenEO process graphs could also be used as an alternative per the Processes - Part 3 / OpenEO requirements class (relying on the existence of those basic operation processes), but results in a much more verbose "expression" than CQL2.

Although the first version of CQL2 is mentioned as limited to boolean expressions for use with the OGC API - Features - Part 3 for the sake of avoiding to further delay publication, other uses may extend it beyond that use, and a future version of CQL2 itself may remove that restriction. (discussed in opengeospatial/ogcapi-features#723 )

See the note to this effect in Section 1 - Scope.

We are already relying heavily on that capability for OGC Styles & Symbology 2.0 for both the CartoSym-JSON and CartoSym-CSS encodings, and we are planning to use it in extensions other OGC API standards including DGGS and Maps ( opengeospatial/ogcapi-maps#110 ). We also already implemented CQL2 expressions for derived fields in our implementation of OGC API - Coverages, and filters (for Maps and Coverages in addition to Features).

@pebau
Copy link
Contributor

pebau commented Jul 3, 2023

@m-mohr yes, a good question indeed:

I'd try to avoid a new language, but instead use and potentially extend what is already available:

* WCPS (Adoption? Simplicity?)

WCPS is OGC's geo datacube analytics language, so preferred. Also EU INSPIRE and recently ISO 19123-3 adopt it.
See this intro: https://earthserver.eu/wcs

* openEO (Different terminology?)

less powerful than WCPS, and harder to work with, you need either the visual representation (not machine readable) or the JSON representation (not what humans enjoy, WCPS is way more compat and easier to read).
BTW, openEO can be mapped to WCPS, that is implemented by Eurac.

* CQL2 (boolean expressions only?)

too restricted. If you want to do serious things you will go and extend it, and then we are outside of any standard.

* ...?

non-standard ;-)

Maybe this can also somehow be solved with Processes - Part 3 and its various proposed encodings?

I am not aware of any coverage-related functionality there.

Bottom line: use WCPS. It was foreseen for OAPI-Coverages right from the beginning BTW.

@m-mohr
Copy link

m-mohr commented Jul 3, 2023

@pebau

WCPS is OGC's geo datacube analytics language, so preferred. Also EU INSPIRE and recently ISO 19123-3 adopt it. See this intro: https://earthserver.eu/wcs

To repeat the question from this morning (via e-mail):
Could you point me to one or two WCPS implementations (ideally open source) apart from rasdaman? I'm especially looking for parsers in Python and JavaScript so that we can run some experiments. Adoption is software and usage, not standards pointing at other standards.

* openEO (Different terminology?)

less powerful than WCPS, and harder to work with, you need either the visual representation (not machine readable) or the JSON representation (not what humans enjoy, WCPS is way more compat and easier to read).

I can't agree with this.

  1. It is not less powerful as it's extensible. Everything you can do in WCPS can be expressed and implemented in openEO.
  2. Why is it harder to work with if there's easy client software in Python, R, JS and a web UI while for WCPS you need to learn a new programming language? To me it seems the opposite is true, WCPS looks more difficult to work with. In openEO you just write code in a language that you are used to or use a block-building web UI. I haven't seen any of this for WCPS. Can you point me to something like that?
* ...?

non-standard ;-)

This was asking for other standards and specifications to be added. Disregarding everything else out there seems not very solution oriented.

@jerstlouis
Copy link
Member

jerstlouis commented Jul 3, 2023

@pebau

Maybe this can also somehow be solved with Processes - Part 3 and its various proposed encodings?
I am not aware of any coverage-related functionality there.

Processes - Part 3 defines a number of things, all applicable to coverages:

  • The ability for a collection (e.g., a coverage) as an input to any process (executable with Processes - Part 1 / possibly deployed with Processes - Part 2), or to a workflow defined with Processes - Part 3
  • The ability of the output of a process to be the input to another process (that output can of course be a coverage)
  • The ability to request the output of a process using OGC API - Coverages, triggering a whole chain of processes and accessing input collections (possibly also coverages, but also to integrate with e.g., Feature Collections) for the Area/Time/Resolution of interest of the Coverages request (i.e., taking subset=, scale-factor= etc. into consideration)
  • The ability to derive fields or filter by value the output or input of a process (which can be a coverage of course) using an expression such as one defined with CQL2

[re: CQL2] too restricted. If you want to do serious things you will go and extend it, and then we are outside of any standard.

For most of what WCPS defines, there is very little missing from CQL2 as it stands, other than removing the artificial restriction that it needs to return a boolean value, and defining aggregation functions. CQL2 already supports custom functions, so it is more about standardizing those functions than the language itself. We already use CQL2 in Styles & Symbology 2.0 to style coverages.

Bottom line: use WCPS.

Strong disagreement there.

It was foreseen for OAPI-Coverages right from the beginning BTW.

It can be supported alongside OGC API - Coverages as per #97 .
But because it breaks fundamental Web API Guidelines 9 and 10 in particular (and probably others like 1 and 2), I do not think it should be the recommended approach to coverage processing for the OGC API family of standards.

BTW, openEO can be mapped to WCPS, that is implemented by Eurac.

I believe it is WCPS that was mapped to OpenEO and not the other way around.
As far as I understand, OpenEO is more flexible.

not what humans enjoy, WCPS is way more compat and easier to read).

I (think I) am a human, and I do not enjoy reading WCPS, and I find it absolutely impossible to read.

@m-mohr
Copy link

m-mohr commented Jul 3, 2023

@jerstlouis The primary issue I have with CQL2 and OGC API - Processes is - as you also acknowledge - the lack of defined specific processes for data cube processing. Only openEO and WCPS really define a list of processes. Some parts surely can be done via query parameters etc but that's really just for the basics. Defining such processes is a major tasks (as I can tell after defining many for openEO) and shouldn't be underestimated. As such it would be great if we could try to not have two of a kind here.

Without specific processes people need to go into too many technical details as Processes and CQL are generic and not datacube specific. They need to make sure datacubes get in and out and how to handle them.

I believe it is WCPS that was mapped to OpenEO and not the other way around.

Regarding openEO <-> WCPS mapping: There's an implementation that accepts openEO process graphs and translates to WCPS. I'm not sure how complete the implementation is. I have not seen any implementations that convert WCPS to openEO or anything else. That's why I've asked for any (parser) implementations above, e.g. in Python or JS. The only one I could find is the C++ implementation from rasdaman, but I'd assume that an adopted standard has to have more than one implementation, right? If that's not the case then there's a flaw in the standardization process, in my opinion.

I (think I) am a human, and I do not enjoy reading WCPS, and I find it absolutely impossible to read.

Not impossible to read, but I also find it rather difficult and don't enjoy it.

@jerstlouis
Copy link
Member

jerstlouis commented Jul 3, 2023

Defining those is a major tasks (as I can tell after defining many for openEO) and shouldn't be underestimated

Is there a list of those things that we could go through?
More complex processing operations, and processing that is not local to a particular cell, could be standardized in a Well Known Processes registry from where they could be identified by URI.

In the Coverages / CQL2 / Part 3 approach, some operations for which openEO may need a process do not require one:

  • Domain Subsetting (using Coverages subset=)
  • Range Subetting (using Coverages properties=)
  • Down/Super-sampling (using Coverages scale-factor=)
  • Deriving fields using a CQL2 expression (including arithmetic, logic, comparisons, functions calls)
  • Accessing a "collection" as an input
  • Negotiating a particular output format (done through HTTP Accept header)

Aggregation needs some work (some ideas already in #164) for the more complex cases (e.g., aggregating differently on different dimensions).

Is there something else that is supported by OpenEO and/or WCPS which would be fundamental capability that could not easily be defined as a process that takes a coverages as an input (and potentially additional parameters), and returns a coverage as an output?

An important ability of Processes is also the ability to integrate Coverages and Feature Collections (not restricting inputs only to coverage type data).

There's an implementation that accepts openEO process graphs and translates to WCPS. I'm not sure how complete the implementation is.

If I recall correctly from talking to Alex, some of the newer / more advanced openEO capability cannot be translated to WCPS.

Not impossible to read

OK, that was a hyperbole ;) I did manage to read and understand some simpler WCPS, but I really struggle with more complex ones.

@pebau
Copy link
Contributor

pebau commented Jul 3, 2023

Not impossible to read, but I also find it rather difficult and don't enjoy it.

We all have our opinions, and that's just fair. I, for example, do not at all enjoy JSON and how it forces me to do straightforward things in highly convoluted ways...

@m-mohr
Copy link

m-mohr commented Jul 3, 2023

@pebau Is that the full answer to the questions above? I'm still looking for implementations of WCPS.

Without any additional implementation, I have to assume that there is only rasdaman as an implementation for WCPS, which in my opinion pretty much nukes out WCPS as a serious alternative for any geodatacube standard going forward.

Having in mind that OGC requires adoption of a standard (i.e. multiple implementations), I'm wondering how WCPS has passed this requirement. Seems like something in the standardization process is not working correctly.

@pebau
Copy link
Contributor

pebau commented Jul 3, 2023

I'd assume that an adopted standard has to have more than one implementation

https://www.ogc.org/resources/uncertified-products/?&specid=347

I am trying to follow implementations, but the OGC site seems down currently:
http://external.opengeospatial.org/twiki_public/CoveragesDWG/WebHome

@m-mohr
Copy link

m-mohr commented Jul 3, 2023

Thanks for linking to the implementations. It seems they are all closed-source except for the C++ rasdaman code. This makes it pretty difficult to actually experiment with it in the Testbed.

@chris-little
Copy link

@m-mohr @jerstlouis @pebau As an aside, the API EDR SWG are considering the issue of using the queries to provide limited manipulation of data, which is often a coverage. Simple summary statistics such as mean, max, etc are straightforward, and we think OGC cross-API standardisation, or at least consistency, is acheivable. And the mitigation of the threat of DDoS seems feasible.
For further 'processing', for which we have concrete uses cases and unsatisfied demand, such as data at a specific resolution or more demanding derived statistics, we are thinking of how to use CQL. Again, a cross-API approach seems realistic.

@jerstlouis
Copy link
Member

jerstlouis commented Jul 4, 2023

@chris-little That is music to my ears ;)

For specific resolution, I believe this is typically orthogonal functionality to CQL queries for filtering or computing new values, which are typically specify "per cell/pixel" calculation.

We already have a number of mechanisms for super/down sampling in OGC API - Coverages and Maps, so I suggest to stick to some of these already existing parameter building blocks for that purpose:

  • scale-factor, scale-size, scale-axes (from Coverages)
  • width, height, scale-denominator (from Maps)

There may be more advanced scenarios where the down/sampling is actually done as part of the aggregation/condensing. It would be interesting to research this in more details, including how this currently works in WCPS and openEO.

The whole question on how to / whether we can use CQL2 functions for these aggregation / condensing purposes is really something of interest to discuss.

@chris-little
Copy link

chris-little commented Jul 5, 2023

@jerstlouis I have a problem with

scale-factor, scale-size, scale-axes (from Coverages)
width, height, scale-denominator (from Maps)

As they assume the exsiktence of map artefacts. EDR is about data. There are no assumptions about scale, width or height, etc. The data is requested and returned at specific (direct) positions in a CRS.

For our uses cases, and specific queries (e.g. area, cube, corridor) data may be returned at positions not explicitly stated in the query.

The Maps/Coverages terminology is just confusing.

I will try to attend at 14:00 UTC (10:00 EDT) to discuss.

@jerstlouis
Copy link
Member

EDR is about data

Definitely, so is Coverages and Features ( see opengeospatial/ogcapi-features#654 ).
scale-denominator and zoom-level are proposed for Features.

It is true that the relationship between the scale-denominator and the distance between two cells is based on a display mmPerPixel in Maps (which traditionally was always 0.28 mm/pixel but can now also specified by a parameter).

In OGC API - Tiles we defined cellSize in the description of the data.
This which might be another option to consider.

The parameters from Coverages do not rely on any concept of display resolution or maps concept:

  • scale-factor is relative to the "true" resolution of the data (1 is native resolution, 2 is half the number of values across all axes)
  • scale-size allows specifying the number of desired cells across each axis (mostly equivalent to width and height in Maps if considering a 2D spatial coverage)
  • scale-axes is like scale-factor except it allows specifying a different factor for each axes.

There 3 parameters and how they behave are inherited from WCS, so they are all about data.

@pebau
Copy link
Contributor

pebau commented Jul 5, 2023

Thanks for linking to the implementations. It seems they are all closed-source except for the C++ rasdaman code. This makes it pretty difficult to actually experiment with it in the Testbed.

depends on what you want to experiment with: change the syntax? or just run it? The latter is well possible without diving into the source code. BTW, WCPS is implemented in Java - HTH.

@m-mohr
Copy link

m-mohr commented Jul 5, 2023

depends on what you want to experiment with: change the syntax? or just run it? The latter is well possible without diving into the source code. BTW, WCPS is implemented in Java - HTH.

Parse WCPS and translate/run in any other environment. Translate to openEO, run in geopyspark, run in xarray/dask, etc. That's what WCPS is meant to do, right? Implementation independant execution of coverage/datacube workflows. So I'd like to hook into the execution steps basically.

@cnreediii
Copy link
Contributor

@chris-little This is quite the thread to follow :-) But I would like to underscore something Chris said regarding the generation of statistics and related functional capability is that this is a cross cutting standards issue and hopefully is addressed as such. We do not need on-interoperable stovepipes :-) I am wondering whether any consideration has been given to using Dana Tomlin's map algebra as an abstract starting point?? QGIS, Esri products and many others use the map algebra concepts in implemented deployed products. May or may not be germane to datacube processing requirements. Thanks for listening.

@jerstlouis
Copy link
Member

@cnreediii Thank you for pointing that out.
I think in general this is exactly what using a CQL2 expression for the properties= is all about (and a lot of WCPS as well), with the ability to use logical and arithmetic operators.

The four types of operators described at https://en.wikipedia.org/wiki/Map_algebra are insightful:

  • Local Operators (working on a single cell, the default)
  • Focal operators (neighborhood of each cell e.g., for derivatives like calculating a slope)
  • Zonal operators working on all regions of identical values
  • Global operators summarizing the whole input coverage

@chris-little
Copy link

I will raise a couple of issues in the API EDR repo:

  1. let's agree a list of 'summary stats'
  2. we'll review WCS terminology of scale-factor, scale-size, scale-axes in our resolution choice/interpolation work strand.

@jerstlouis
Copy link
Member

Thanks @chris-little
For aggregation, I think what would be really interesting is describing the complex aggregation scenarios that would be useful and are already supported in WCPS and/or openEO.

For example,

  • Compute the ndvi on at the local cell level, then
  • Compute the maximum ndvi value within a month, reducing the temporal dimension to monhly values.

See this comment #164 (comment)

This is also related to the DAPA aggregate query parameter.

And I am trying to understand how the aggregator / condenser in WCPS work and which use cases they support.

The syntax I had suggested so far (to extend CQL2) was:

  • aggregation functions (Min, Max, Sum, Avg, StdDev...),
  • with a potentially optional parameter to specify one or more dimensions over which to aggregate, and
  • possibly also a resolution for each each of these dimensions to reduce to (as opposed to a single value).

e.g.,

Sum( Min((B5-B4)/(B5+B4), time, month), space)

would aggregate the minimum NDVI on a monthly basis, then return a sum of all values aggregated to a single value over the spatial dimensions.

@m-mohr
Copy link

m-mohr commented Jul 5, 2023

@jerstlouis openEO can do various aggregations.

  • local (single cell, but that's not really an aggregation)
  • focal (nD neighborhoods, windows, intervals, ...)
  • zonal (for geometries / bboxes)
  • along a dimension (e.g. temporal, spectral, ...)
  • ...

The aggregation functions include:

  • all / any
  • array functions
  • count
  • first / last
  • min / max
  • median
  • mean
  • product
  • sum
  • sd (standard deviation)
  • variance
  • ...

See https://processes.openeo.org and https://openeo.org/documentation/1.0/datacubes.html for details.

@jerstlouis
Copy link
Member

jerstlouis commented Jul 5, 2023

Thanks @m-mohr , this is exactly the kind of capability for which it would be useful to do a WCPS / openEO side-by-side comparison, and identify the common and/or essential capabilities, and figure out whether there is a way to express this as CQL2 expressions invoking aggregation functions.

@pebau
Copy link
Contributor

pebau commented Jul 6, 2023

@cnreediii indeed, as has been pointed out already, WCPS supports all Tomlin operation, plus aggregation, in all combinations. Moreover, semantics is defined formally so that all implementation return the same result.

@pebau
Copy link
Contributor

pebau commented Jul 6, 2023

@m-mohr well, the rasdaman WCPS implementation is fully open source, so nothing hinders you from "playing with it".

@pebau
Copy link
Contributor

pebau commented Sep 25, 2023

As we are just discussing this in the GDC.SWG again: suggesting to have a WCPS conformance class for coverage processing, in addition to and at the same level as openEO and CQL2 - to make it easy for the users to pick their preference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2021-07 Sprint Extension Will be addressed by a future extension
Projects
None yet
Development

No branches or pull requests

6 participants