-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hierarchical Collections #298
Comments
@aaime Part of the discussion was about the potential impact on namespace prefixes of using a colon as a separator. Since GeoServer supports the use of namespace-qualified names, perhaps you could comment on the proposal? |
@arnevogt I wonder if T17-API-D165 could be easily configured by editing backend_configuration.json to demonstrate the Hierarchical Collections concept? Cc: @lieberjosh |
@jerstlouis I like the colon-delimited hierarchy for collections, and +1 for having a server declared delimiter. I wonder whether this would be a conformance class and then an added property to a given |
@tomkralidis Yes something like Hierarchical Collections would be a a conformance class, yes. Meaning two things: using a hierarchy separator, and adding listing of children collections to parent Any chance we could eventually see support in PygeoAPI? :) |
DGGS could of course also use this hierarchy notation for its hierarchy of ZoneClasses, ie the levels. |
@ghobona in GeoServer we indeed use ":" for namespacing, its usage is opaque to clients, it's just part of the identifier. Workspaces are non hierarchical, unordered containers, originally designed to allow setting a common namespace URI for all feature types in the workspace (for ease of WFS setup). In time workspaces have become a lightweight filtering mechanism too, and a way to get rid of the prefix. However, to support WMS hierarchical capability document, we also have another concept: a layer group. It's a WMS specific concept, mind, does not exist anywhere else in GeoServer. Say that in GeoServer we have a layer group contained in a workspace (sf), that contains another layer group, and we are using global services (so, prefixes are still there). Of course we cannot use : as the separator, let's imagine we use ) as the separator, then we could be looking at a URL as follows: while if we access a workspace specific service, we'd use: Seems it would work... however, it really kills me to see special characters being used to represent a hierarchy, when the URL structure itself is hierarchical. An approach that I have not seen in use would be to just have a "collections" resource under the collection, representing the nested collections. The path would become: Does not look as weird as the above path, but it's longer. Even just reserving "c" as path element, would make it use 3 chars instead of one, e.g: Another consideration is indeed... length. Whatever proposal we are looking at, the structure ends up represented in the path, whose length is limited, and already has other hungry competitors for it (e.g., a filter CQL expression, a polygon geometry used for spatial filtering in some services). |
"Dataset" is a key context In the W3C Data on the Web Best Practices when sharing data on the web. The examples that I have seen seem to mainly share multiple datasets via a single API. Any extension for hierarchical collections that allows this should clarify which resources are a considered a dataset by the data publisher and which are, for example, subsets. This could be through another member in the Collection resource that clarifies the type of collection. Personally, I think it is clearer and cleaner to share multiple datasets via separate APIs (which can then also evolve and be versioned separately) and have a kind of super landing page on top of them. That said, I also see cases where it can be intuitive to users to present the data of a single dataset in a collection hierarchy with a depth > 1. I do not see any need for a special character requirement, even if we flatten all the collections in the API (ie. only have |
@cportele Agreed, it would be nice to have something like "isDataSet" : "true" to indicate a dataset.
Well, it could be a convention + explicit relationships like a "parent" property. But the separator approach had the benefit of being considerably lighter, e.g. "parent" : "NaturalEarth:physical:bathymetry" for every child of bathymetry, which would always repeat the same information already contained within the convention (and a use case for this is thousands of collections, so that is a considerable advantage). I also think it would be confusing for the user (in web browser especially) if not all servers use a delimiter in collection IDs, and the hierarchy isn't made obvious in the ID. However I would prefer to standardize something rather than nothing, so something like a "parent" property + collections listing in parent collections as well would be a great step forward. |
If using a property to identify the relationship, the following options are relevant:
|
To re-iterate my latest proposal, revised to address @cportele 's and others' concerns of using a particular delimiter like
|
Before we invent new collection properties we should check, if we can leverage existing conventions, in particular link relation types. As Gobe has pointed out, we could use
And we could use
Since the collections are hierarchical, I assume the following statements are all true, if C is a hierarchical collection with sub-collections C1 and C2:
Correct? |
@cportele Many thanks for engaging on this, I still hold this topic dear :)
Conceptually, yes, I think this is correct. However, I think implementation should be allowed to support different access mechanisms (i.e., different OGC API specs) at different levels of the hierarchies. e.g., whether to provide This would allow collections that are only organizing the leafs, or only providing multi-layer vector tiles in the top-collections, etc. That would simply be done by including or not certain links in the collection object.
This approach might be fine for Repeating the title of the parent in this case (which would already be in the parent in the same array, for the list of collections) seems overkill too. When retrieving the list of collections, the client already have those multiple objects in memory (within the array) from the collections list resource, so I think whether links should be used or not to establish hierarchical relationships within those objects of the array is debatable. Particularly from a client's perspective (and perhaps a less "webby" client perspective), it's much more complicated to look through links and look for a particular relation type, and parse a URL, than to simply include a property that directly uses the collection ID (rather than URL, which might be relative).
vs.
That being said, I would much prefer agreeing to a best web practice that enables hierarchical collections than not agreeing on how to define hierarchical collections.
That particular approach also seems a bit complicated to me from a client implementer perspective (instead of simply having an |
OK, so that would need to be made clear in the spec for this. I do not have an issue with using different API building blocks for different collections in a hierarchy. But if an API supports, e.g., features or vector tiles for all (sub-)collections, then the collections would have meet the constraints.
Yes, I see that point. Maybe it would be good to collect implementation feedback and test it in a few code sprints. (If we end up with OGC-specific conventions we can still support an option in our implementation to represent the links in API deployments that prefer to leverage Web linking.) |
Thanks @cportele .
If what you mean is that both the parent collection and its sub-collections e.g., all support Features, then yes I agree.
We did some initial testing in past code sprints with pygeoapi server implementation in the past, but perhaps we could now test this updated approach? @tomkralidis will you be participating in the Tiles / Coverages / DGGS / EDR "Space Partitioning" Code Sprint next week? |
@jerstlouis yes I will be participating with a lense on OACov and EDR. |
Great to hear @tomkralidis . If you're interested we could discuss Hierarchical Collections and do some TIEs with our client in the context of OGC API - Coverages to evaluate the approach(es) described above and provide feedback. |
Are we still proposing to use ":" or some other non-slash character as the collection seperator? To me, this is not a hierachy: The trick is to figure out what the path elements between My thinking goes something like this:
I really dislike the colon notation that is being proposed because it means that clients would need to parse the collection id which always makes my "Spidey sense" tingle! |
I agreed with you and @cportele that this tingles the Spidey sense and moved away from relying on a particular separator. This allows to easily and unambiguously establish hierarchical relations between collections when requesting all collections at Including a A server could use whichever convention for hierarchy separators, or no particular separator. In the past, when we used If it is decided that this is done with a The The inclusion of The equivalent for listing the sub-collections in this new proposal would be |
@jerstlouis thanks ... lets discuss at the code sprint next week. Looks like we have lots of source material to consider which is a good thing. |
@pvretano off-topic, but I also hope we can discuss Common building blocks related to the Features Search extension that we proposed for Coverages and DGGS ( opengeospatial/ogcapi-coverages#164 ). Glad to hear you will be participating in this Code Sprint! :) This is what we will be focusing on. |
At the OGC API - Common session of the 127th Members Meeting in Singapore we briefly discussed this topic and there was no outspoken objection to draft and review an optional "Hierarchical collections" requirements class for Part 2 adding which would:
This would also replace capabilities that were specifically included in the 3D GeoVolumes spec ( opengeospatial/ogcapi-3d-geovolumes#5 and opengeospatial/ogcapi-3d-geovolumes#12 ). |
Has this any implementations yet or other standards using it? If yes, which and where? If not, it feels wrong to define something in "Common" that is not common yet. :-) |
@m-mohr this is the on-going Common discussion. There is a plan to use it at least together with OGC API - 3D GeoVolumes (see opengeospatial/ogcapi-3d-geovolumes#5), OGC API - Coverages, OGC API - Maps. But the fact is that it is something that deals with resource paths ( At least in this case, it feels wrong to me to define it anywhere else than in Common. The result of the discussion today were quite positive, and we have a simple way forward addressing the uses cases:
This can automatically be used (or not) together with any OGC API standards using Common - Part 2 collections. It also plays nicely with OGC API - Records and related Common requirement classes (Searchable collections and Filtering collections with CQL2). We plan on updating our implementation to what we agree to today on the call, which should be reflected in the draft hopefully by tomorrow for everyone to review. |
- Describes latest proposal agreed to in the SWG on 2024-05-16 in ( fixes opengeospatial#11 and opengeospatial#298 ) - TODO: Still need to add OpenAPI definitions - 11-sorting, 12-filtering, 14-schemas: set up requirement class tables
As per #11 (comment) , we could also consider defining an optional I would suggest to allow a dataset being inside of another dataset. |
How could I get all top-level parents so that I can show a hierarchy in the client? Is that the default? But if a client doesn't support this parameter, how would it then get all collections? Would the parent parameter include only the collection with that specific parent id or recursively everything underneath? PS: Your email that you sent on 16:58 CEST for the meeting on 17:00 CEST was delivered to me by the OGC mail servers at around 19:xx CEST. Otherwise, I'd have joined, but sometimes the OGC mail servers seem to have quite a delay. |
Yes, I notice that. Sorry for the late notice. Common meets every week at 11:00 AM Eastern on Thursdays until we finalize Part 2. We will try harder to send a reminder the day before with the topic of the week. Next week we will probably review Hierarchical Collections again, and populate the other new req. classes (Schemas taken from Features Part 5, Filtering collections by CQL2 and Sorting based on Records). If you read the newly generated draft at https://docs.ogc.org/DRAFTS/20-024.html#rc-hierarchical-collections , and if I did an okay job, the answers to these 4 questions you're asking should be crystal clear:
You request
It is not the default, but there is a permission for it to be the default specifically in an HTML representation, which should not break existing programmatic clients. (Permission 6)
A client that doesn't understand / care about Hierarchical Collections would work just as usual, because except for the HTML permission, all collections would be returned by default.
When specifying The purpose of the |
That sounds reasonable, I just think the parent=none is not ideal. I'd like to propose a slightly different alternative (names: tbd):
This way you are more flexible, avoid conflicts and for me it's more consistent with the behaviour that happens without this conformance class:
Thoughts? |
Thanks for the feedback @m-mohr .
Probably should clarify that they are mutually exclusive and the server SHALL return a 400 error. It makes no sense to provide both.
The way I initially read that I thought you meant that Using something like
Curious what @pvretano and @kalxas think of this alternate |
Yeah, default would return all collections. The name change to parent-depth makes sense to me. Not sure how much complexity it adds to count the levels? I feel it's not much more difficult compared to getting all colelctions recursively (which is already quite a task). For me personally empty string feels more intuitive than none - there could also be a collection "NONE", people come up with all kinds of acronyms. |
Specifically, it means keeping track of the current depth and comparing that. It's an extra parameter if using recursion. As I said, it's a small amount, but it is additional complexity ;)
Specifically prohibiting this in Requirement 26C which would apply if you conform to Hierarchical Collections.
With the
This would imply a default value of |
Yeah, but many people have existing IDs and don't start from scratch. Renaming a collection and breaking users workflows because of such a requirement seems like a bad idea to me.
Yeah, that's what I meant above but probably explained in a confusing way. |
… is data of parent - Suggested by opengeospatial#298 (comment)
We concluded in the session today: Parameters:
Examples:
|
- NOTE: Using descendants=immediate rather than 'children' as this has clearer meaning
The agreed upon changes have been applied in faca4aa . At @joanma747 's suggestion, we used |
In our OGC API server and client, we have implemented support for hierarchical collections to facilitate organizing a large number of collections and facilitating discovery by drilling down to the collection of interest.
We would welcome TIEs with other client or server to validate this as a potential conformance class for an extension to Common / Geospatial Data aka Collections.
The requirements are two-fold:
:
) at the moment, but it could be made flexible and declared by the server. This allows for an intuitive way to drill down collections e.g. in a Web browser. Since all collections are still listed at/collections
, the client can deduct the hierarchical relationship from the collection IDs alone without additional information. (An alternative could be to include additional metadata to describe those relationships, but if the IDs do not also follow such a pattern, it would not have the intuitiveness factor drilling down collections in a web browser)/collections
is re-used at/collections/{collectionId}
to list children collections.A permission is also granted for the HTML representation of
/collections
to list only the top-level collections.A great use case for hierarchical collections is to offer access mechanisms (e.g. Features or TileSets) both for individual FeatureTypes, as well as for collections made up of multiple FeatureTypes (or multi-layer TileSets). e.g. we have multi-layer tilesets at https://maps.ecere.com/ogcapi/collections/Daraa/tiles and single-layer tilesets at https://maps.ecere.com/ogcapi/collections/Daraa:AgricultureSrf/tiles . This would also apply for Features (but multi-feature types collections are not yet supported on our server), especially with JSON-FG which allows declaring feature types.
Another example for maps:
https://maps.ecere.com/ogcapi/collections/NaturalEarth:physical:bathymetry/map
https://maps.ecere.com/ogcapi/collections/NaturalEarth:physical:bathymetry:ne_10m_bathymetry_J_1000/map
Original discussion of this topic is in #11 .
The text was updated successfully, but these errors were encountered: