Connect Zenodo as a node or system-system bridge #400

pbuttigieg · 2024-02-15T13:44:14Z

Need to define how to create a marine / ocean filtering query to generate the sitemap on demand

pbuttigieg · 2024-03-15T10:34:31Z

@slint - following on from #407 - perhaps we can fold some of that work into this issue. The Zenodo link will be broader, naturally, but will also heavily feature the DigitalDocument, Dataset, and the higher-level CreativeWork types

slint · 2024-08-23T07:03:08Z

Reviving this issue after our discussions at Disentis (cc @lnielsen)

To recap on the Zenodo side, here are the points we need to clear out in order to move forward:

What would be the subset of Zenodo records that we expose. Ideally we can create a Zenodo search query to match:
- communities (e.g. IODP)
- keywords/subjects. Unfortunately, we don't have (yet) support for any ocean-science specific controlled vocabularies, but for now free-text keywords will have to do.
- types of records (Datasets, Reports, etc.)
Review the current JSON-LD/schema.org rendered metadata of Zenodo records
Use of OAI-PMH instead of sitemaps
- Traditionally, we expose metadata harvest feeds using our OAI-PMH API, which could also e.g. expose some flavor of RDF XML. That would be an alternative to implementing sitemaps for custom search queries on our side, which conceptually OAI-PMH supports out of the box.
- Given that OAI-PMH is a standard that many research data repositories already implement and supports incremental timestamp-based harvesting, multiple formats, custom search querysets, etc. it might be an interesting option for you to support.

pbuttigieg · 2024-09-17T11:55:02Z

@slint a similar path that may be useful
#460

Zenodo will have more @type diversity

pbuttigieg · 2024-09-17T14:03:09Z

To recap on the Zenodo side, here are the points we need to clear out in order to move forward:

What would be the subset of Zenodo records that we expose. Ideally we can create a Zenodo search query to match:

For first order linkage, we can scan keywords and titles for generic terms like "ocean", "sea", etc. We can boost that with an ontology like ENVO, querying around classes like marine water body and marine bed. SPARQLing for subclasses of those and their uses in the ontology would get a good set of keywords. One could also use a gazetteer like marineregions.org (see their webservices) to get the place names of many marine regions.

communities (e.g. IODP)

We could use OceanExpert to get lists of institutions (see here). OceanExpert is an ODIS node too, so that's available as JSON-LD/schema.org.

keywords/subjects. Unfortunately, we don't have (yet) support for any ocean-science specific controlled vocabularies, but for now free-text keywords will have to do.

As above, we can likely figure something out with ontologies like ENVO, or thesauri.

types of records (Datasets, Reports, etc.)

That's more up to you - ODIS is interested in all holdings.

Review the current JSON-LD/schema.org rendered metadata of Zenodo records

Happy to, you can add an example here.

Use of OAI-PMH instead of sitemaps

Traditionally, we expose metadata harvest feeds using our OAI-PMH API, which could also e.g. expose some flavor of RDF XML. That would be an alternative to implementing sitemaps for custom search queries on our side, which conceptually OAI-PMH supports out of the box.

Given that OAI-PMH is a standard that many research data repositories already implement and supports incremental timestamp-based harvesting, multiple formats, custom search querysets, etc. it might be an interesting option for you to support.

We have several partners that use this, so it's in discussion. We tend to opt for the static approach to avoid API calls, and for compliance to web architectural patterns.

@fils are there plans to support OAI-PMH in gleaner?

@slint - if you're unable to produce static JSON-LD/schema.org records via a sitemap, one could use URL-based API calls as the values of the sitemap, so that JSON-LD/schema.org is served back. This may cause random noise as crawlers hit the map, however.

pbuttigieg · 2024-09-17T14:08:29Z

@slint - if you're unable to produce static JSON-LD/schema.org records via a sitemap, one could use URL-based API calls as the values of the sitemap, so that JSON-LD/schema.org is served back. This may cause random noise as crawlers hit the map, however.

An alternative is that we treat Zenodo more as a system-system bridge case (like WMO), where we can write calls and stage your JSON-LD/schema.org output our side for import. We prefer to avoid this if necessary, as it adds dependencies that create overheads both sides.

pbuttigieg mentioned this issue Mar 15, 2024

Explore TreatmentBank link #407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect Zenodo as a node or system-system bridge #400

Connect Zenodo as a node or system-system bridge #400

pbuttigieg commented Feb 15, 2024

pbuttigieg commented Mar 15, 2024

slint commented Aug 23, 2024

pbuttigieg commented Sep 17, 2024

pbuttigieg commented Sep 17, 2024

pbuttigieg commented Sep 17, 2024

Connect Zenodo as a node or system-system bridge #400

Connect Zenodo as a node or system-system bridge #400

Comments

pbuttigieg commented Feb 15, 2024

pbuttigieg commented Mar 15, 2024

slint commented Aug 23, 2024

pbuttigieg commented Sep 17, 2024

pbuttigieg commented Sep 17, 2024

pbuttigieg commented Sep 17, 2024