Advertising data completeness #359

tpowellmeto · 2022-03-25T12:37:42Z

In many circumstances data can be incomplete. Users’ definition of completeness is based upon their specific use case and is therefore unlikely to be common.

EDR uses an extent object to describe the available axes from which a user must formulate their subset (cube) request. In scenarios where there are multiple axes described by interval arrays, e.g. vertical & temporal, it is impossible to determine where data is available and where it isn’t.

For example, consider how to accurately represent the following data availability whilst constrained to using only two lists:

	temporal_1	temporal_2	temporal_3
vertical_1	1	1	1
vertical_2	1	0	0
vertical_3	1	0	1

Given the above there is a burden on data publishers to ensure data is ‘complete’ prior to publishing ’interval’ labels. This forces data publishers to impose their own view of completeness on users, thus limiting timely access to data. In the case of fault/error there is a risk that a ‘complete’ state may never be achieved.

Suggested Resolution
Extend the EDR spec to introduce a data_mask object.
The object itself is an optional property belonging to the parameter object.
The object has two required sub properties:
• order - the order of the axes as they appear in the mask
• mask - a multidimensional array describing where data is available (1) and where its not (0).
A convention could be established whereby if no data_mask is provided then it is assumed all data is available (as is the case now).

I'd be happy to draft a proposal for this if this is the correct next step. :)

The text was updated successfully, but these errors were encountered:

m-burgoyne · 2022-04-01T08:08:55Z

A data mask might also be useful when presenting information about data archives, it is possible that there would be time intervals that subset of the parameters in a collection are intentionally missing or were unavailable and a data mask provide publishers to a mechanism to advertise the completeness of the archive.

chris-little · 2022-04-14T21:00:30Z

@tpowellmeto @m-burgoyne Do not forget that the presumption in the API-EDR is that the use case is for data that are generally dense, not sparse, and a query is most likely to return data rather than an HTTP error code.
I question whether it is a sensible choice by the data service provider to expose a collection where the data is often absent. Why not use an async mechanism to say "wait a bit then try again"?

tpowellmeto · 2022-04-19T11:24:46Z

@chris-little

"I question whether is is a sensible choice by the data service provider to expose a collection where the data is often absent."
We should be empowering service providers to group data in meaningful ways, not constraining them with arbitrary rules on data density.

Whilst data may be dense when 'complete' in many situations it arrives piecemeal. Being able to accurately describe what is present, when, is a mechanism for allowing users to access just the data they need in a timely way whilst telling other users to "wait a bit then try again" without responding with a HTTP error code.

m-burgoyne · 2022-04-25T08:51:05Z

@chris-little As long as the data_mask object is optional and only published at the /collections/{collection_id} level it could reduce the overheads on a data publishing server as it would reduce the number of requests clients made to the server. Ideally a server would only return error messages when the user is making an invalid request and in the example that @tpowellmeto gives the user would not be making an invalid request based on the information provided by the server.

chris-little · 2022-04-28T13:00:42Z

@m-burgoyne @tpowellmeto Well, let's try it to see how effective it is for the described use cases. I worry that we are encouraging over-complicated systems and carrying forward undesirable legacies of WMO GRIB and BUFR and creating unnecessary future technical debt. If it works well in practice, we can then standardise it. Meanwhile, it is a good candidate to add to the Best Practice for API-EDR for Meteorology.

chris-little · 2022-07-28T14:28:57Z

EDR API SWG 81 encourages implementors to build a proof-of-concept. Provisionally tagged for V1.2.

iandruska-ibl · 2022-08-17T09:53:51Z

If we decide to add the data masks, I believe we should remove the extent property at the parameter level. With data masks it becomes unnecessary and would only bring confusion whether the data mask applies to the extent at the collection level or the one at the parameter level.

chris-little · 2022-08-17T10:33:32Z

This may cause an incompatibility with API Coverages as they have put a lot of effort into extents of various kinds.
@jerstlouis could you comment please?

jerstlouis · 2022-08-17T11:12:18Z

@chris-little I don't think this use case of different domain / extent / envelope per field is covered in Coverages / CIS, but it is somewhat related to a suggestion to be able to return a different domain (different resolution and/or envelope) when requesting specific fields (range subsetting).

As a general point, I still hope that we can eventually harmonize parameter names / Features properties schemas / CIS range type; and collection extents / CIS DomainSet :)

chris-little · 2023-11-23T15:56:12Z

Discussed at EDR API SWG 2023-11-23 that this needs wider review from implementers and users.

chris-little · 2024-06-19T20:33:34Z

Current thinking is that should be addressed by API-EDR Part 2: PubSub. Anything finer grained, such as this suggestion is an unnecessary complication. There may be a problem with downstream legacy systems.

chris-little · 2024-10-31T18:51:58Z

After the EDR API SWG 123 meeting on 31 Oct 2024, the publication of OGC API-EDR Part 2: Publish-Subscribe Workflow on 2024-09-23, and pending any practical implementations and experience of data masks and granularity of data resources, I propose to close this issue and associated PR #470.

m-burgoyne added the enhancement New feature or request label Mar 30, 2022

chris-little added the API EDR V1.2 Non-breaking change for Version 1.2 label Jul 28, 2022

m-burgoyne linked a pull request Mar 21, 2024 that will close this issue

Add support for data masks #470

Closed

chris-little added API-EDR V1.3 and removed API EDR V1.2 Non-breaking change for Version 1.2 labels Jun 19, 2024

chris-little closed this as completed Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advertising data completeness #359

Advertising data completeness #359

tpowellmeto commented Mar 25, 2022

m-burgoyne commented Apr 1, 2022

chris-little commented Apr 14, 2022 •

edited

Loading

tpowellmeto commented Apr 19, 2022

m-burgoyne commented Apr 25, 2022

chris-little commented Apr 28, 2022

chris-little commented Jul 28, 2022

iandruska-ibl commented Aug 17, 2022 •

edited

Loading

chris-little commented Aug 17, 2022

jerstlouis commented Aug 17, 2022

chris-little commented Nov 23, 2023

chris-little commented Jun 19, 2024

chris-little commented Oct 31, 2024

Advertising data completeness #359

Advertising data completeness #359

Comments

tpowellmeto commented Mar 25, 2022

m-burgoyne commented Apr 1, 2022

chris-little commented Apr 14, 2022 • edited Loading

tpowellmeto commented Apr 19, 2022

m-burgoyne commented Apr 25, 2022

chris-little commented Apr 28, 2022

chris-little commented Jul 28, 2022

iandruska-ibl commented Aug 17, 2022 • edited Loading

chris-little commented Aug 17, 2022

jerstlouis commented Aug 17, 2022

chris-little commented Nov 23, 2023

chris-little commented Jun 19, 2024

chris-little commented Oct 31, 2024

chris-little commented Apr 14, 2022 •

edited

Loading

iandruska-ibl commented Aug 17, 2022 •

edited

Loading