-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advertising data completeness #359
Comments
A data mask might also be useful when presenting information about data archives, it is possible that there would be time intervals that subset of the parameters in a collection are intentionally missing or were unavailable and a data mask provide publishers to a mechanism to advertise the completeness of the archive. |
@tpowellmeto @m-burgoyne Do not forget that the presumption in the API-EDR is that the use case is for data that are generally dense, not sparse, and a query is most likely to return data rather than an HTTP error code. |
"I question whether is is a sensible choice by the data service provider to expose a collection where the data is often absent." Whilst data may be dense when 'complete' in many situations it arrives piecemeal. Being able to accurately describe what is present, when, is a mechanism for allowing users to access just the data they need in a timely way whilst telling other users to "wait a bit then try again" without responding with a HTTP error code. |
@chris-little As long as the |
@m-burgoyne @tpowellmeto Well, let's try it to see how effective it is for the described use cases. I worry that we are encouraging over-complicated systems and carrying forward undesirable legacies of WMO GRIB and BUFR and creating unnecessary future technical debt. If it works well in practice, we can then standardise it. Meanwhile, it is a good candidate to add to the Best Practice for API-EDR for Meteorology. |
EDR API SWG 81 encourages implementors to build a proof-of-concept. Provisionally tagged for V1.2. |
If we decide to add the data masks, I believe we should remove the |
This may cause an incompatibility with API Coverages as they have put a lot of effort into extents of various kinds. |
@chris-little I don't think this use case of different domain / extent / envelope per field is covered in Coverages / CIS, but it is somewhat related to a suggestion to be able to return a different domain (different resolution and/or envelope) when requesting specific fields (range subsetting). As a general point, I still hope that we can eventually harmonize parameter names / Features properties schemas / CIS range type; and collection extents / CIS DomainSet :) |
Discussed at EDR API SWG 2023-11-23 that this needs wider review from implementers and users. |
Current thinking is that should be addressed by API-EDR Part 2: PubSub. Anything finer grained, such as this suggestion is an unnecessary complication. There may be a problem with downstream legacy systems. |
After the EDR API SWG 123 meeting on 31 Oct 2024, the publication of OGC API-EDR Part 2: Publish-Subscribe Workflow on 2024-09-23, and pending any practical implementations and experience of data masks and granularity of data resources, I propose to close this issue and associated PR #470. |
In many circumstances data can be incomplete. Users’ definition of completeness is based upon their specific use case and is therefore unlikely to be common.
EDR uses an
extent
object to describe the available axes from which a user must formulate their subset (cube) request. In scenarios where there are multiple axes described byinterval
arrays, e.g. vertical & temporal, it is impossible to determine where data is available and where it isn’t.For example, consider how to accurately represent the following data availability whilst constrained to using only two lists:
Given the above there is a burden on data publishers to ensure data is ‘complete’ prior to publishing ’interval’ labels. This forces data publishers to impose their own view of completeness on users, thus limiting timely access to data. In the case of fault/error there is a risk that a ‘complete’ state may never be achieved.
Suggested Resolution
Extend the EDR spec to introduce a
data_mask
object.The object itself is an optional property belonging to the
parameter
object.The object has two required sub properties:
• order - the order of the axes as they appear in the mask
• mask - a multidimensional array describing where data is available (1) and where its not (0).
A convention could be established whereby if no
data_mask
is provided then it is assumed all data is available (as is the case now).I'd be happy to draft a proposal for this if this is the correct next step. :)
The text was updated successfully, but these errors were encountered: