Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect Zenodo as a node or system-system bridge #400

Open
pbuttigieg opened this issue Feb 15, 2024 · 5 comments
Open

Connect Zenodo as a node or system-system bridge #400

pbuttigieg opened this issue Feb 15, 2024 · 5 comments

Comments

@pbuttigieg
Copy link
Collaborator

Need to define how to create a marine / ocean filtering query to generate the sitemap on demand

@slint

@pbuttigieg
Copy link
Collaborator Author

@slint - following on from #407 - perhaps we can fold some of that work into this issue. The Zenodo link will be broader, naturally, but will also heavily feature the DigitalDocument, Dataset, and the higher-level CreativeWork types

@slint
Copy link

slint commented Aug 23, 2024

Reviving this issue after our discussions at Disentis (cc @lnielsen)

To recap on the Zenodo side, here are the points we need to clear out in order to move forward:

  • What would be the subset of Zenodo records that we expose. Ideally we can create a Zenodo search query to match:
    • communities (e.g. IODP)
    • keywords/subjects. Unfortunately, we don't have (yet) support for any ocean-science specific controlled vocabularies, but for now free-text keywords will have to do.
    • types of records (Datasets, Reports, etc.)
  • Review the current JSON-LD/schema.org rendered metadata of Zenodo records
  • Use of OAI-PMH instead of sitemaps
    • Traditionally, we expose metadata harvest feeds using our OAI-PMH API, which could also e.g. expose some flavor of RDF XML. That would be an alternative to implementing sitemaps for custom search queries on our side, which conceptually OAI-PMH supports out of the box.
    • Given that OAI-PMH is a standard that many research data repositories already implement and supports incremental timestamp-based harvesting, multiple formats, custom search querysets, etc. it might be an interesting option for you to support.

@pbuttigieg
Copy link
Collaborator Author

@slint a similar path that may be useful
#460

Zenodo will have more @type diversity

@pbuttigieg
Copy link
Collaborator Author

To recap on the Zenodo side, here are the points we need to clear out in order to move forward:

  • What would be the subset of Zenodo records that we expose. Ideally we can create a Zenodo search query to match:

For first order linkage, we can scan keywords and titles for generic terms like "ocean", "sea", etc. We can boost that with an ontology like ENVO, querying around classes like marine water body and marine bed. SPARQLing for subclasses of those and their uses in the ontology would get a good set of keywords. One could also use a gazetteer like marineregions.org (see their webservices) to get the place names of many marine regions.

  • communities (e.g. IODP)

We could use OceanExpert to get lists of institutions (see here). OceanExpert is an ODIS node too, so that's available as JSON-LD/schema.org.

  • keywords/subjects. Unfortunately, we don't have (yet) support for any ocean-science specific controlled vocabularies, but for now free-text keywords will have to do.

As above, we can likely figure something out with ontologies like ENVO, or thesauri.

  • types of records (Datasets, Reports, etc.)

That's more up to you - ODIS is interested in all holdings.

  • Review the current JSON-LD/schema.org rendered metadata of Zenodo records

Happy to, you can add an example here.

  • Use of OAI-PMH instead of sitemaps

    • Traditionally, we expose metadata harvest feeds using our OAI-PMH API, which could also e.g. expose some flavor of RDF XML. That would be an alternative to implementing sitemaps for custom search queries on our side, which conceptually OAI-PMH supports out of the box.
    • Given that OAI-PMH is a standard that many research data repositories already implement and supports incremental timestamp-based harvesting, multiple formats, custom search querysets, etc. it might be an interesting option for you to support.

We have several partners that use this, so it's in discussion. We tend to opt for the static approach to avoid API calls, and for compliance to web architectural patterns.

@fils are there plans to support OAI-PMH in gleaner?

@slint - if you're unable to produce static JSON-LD/schema.org records via a sitemap, one could use URL-based API calls as the values of the sitemap, so that JSON-LD/schema.org is served back. This may cause random noise as crawlers hit the map, however.

@pbuttigieg
Copy link
Collaborator Author

@slint - if you're unable to produce static JSON-LD/schema.org records via a sitemap, one could use URL-based API calls as the values of the sitemap, so that JSON-LD/schema.org is served back. This may cause random noise as crawlers hit the map, however.

An alternative is that we treat Zenodo more as a system-system bridge case (like WMO), where we can write calls and stage your JSON-LD/schema.org output our side for import. We prefer to avoid this if necessary, as it adds dependencies that create overheads both sides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants