Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we just use rest_oai_pmh ? #1192

Closed
dannylamb opened this issue Jun 26, 2019 · 34 comments
Closed

Can we just use rest_oai_pmh ? #1192

dannylamb opened this issue Jun 26, 2019 · 34 comments
Assignees

Comments

@dannylamb
Copy link
Contributor

Seriously, https://www.drupal.org/project/rest_oai_pmh looks like it's made for Islandora 8. Is there anything preventing us from using it? Has anyone tried it?

@joecorall? You wrote it. Any thoughts about this being a generic oai solution for Islandora 8?

@seth-shaw-unlv
Copy link
Contributor

I've tested it and seems to work well. I'm working on an islandora_defaults submodule using it right now.

@seth-shaw-unlv
Copy link
Contributor

It won't be much, just configs for now; but it is a good jumping off point to include some of the twig-based metadata serializations you mentioned a while ago.

@dannylamb
Copy link
Contributor Author

Yep, that part definitely caught my eye. I'd love to be able to render out things like MODS on the fly.

@seth-shaw-unlv
Copy link
Contributor

It just took a few minutes, but the branch I linked earlier, if you check it out and enable the new islandora_oaipmh submodule it will give you an OAI-PMH endpoint at /oai/request with a set called oai_pmh:all_repository_items with all the repository items you have (sans Collections, because no one expects those as objects in their OAI-PMH feed). Documentation and PR to come.

Note: the rest_oai_pmh module needs to build a set index before it returns results, so hit the 'Rebuild OAI-PMH' button @ http://localhost:8000/admin/config/services/rest/oai-pmh/queue when you load it up. Cron will refresh it over time.

@jonathangreen
Copy link
Contributor

@dbernstein and I were talking this week about him getting rest_oai_pmh setup in the Islandora playbook so it ships with Islandora. Seems like a good way to satisfy the OAI need in Islandora.

It sounds like your already most of the way there @seth-shaw-unlv. Do you want / need any help pulling things into the playbook or should we leave this one to you?

@seth-shaw-unlv
Copy link
Contributor

At its most basic, adding rest_oai_pmh support includes two configs, a dependency on rest_oai_pmh (of course), plus some README documentation. So, how do we want to go about including them?

  1. Adding the configs straight into islandora_defaults/config/install adds another dependency that someone might not want and could consider bloat. (Although I think this sentiment is unlikely, given the target community.)
  2. Make this a sub-module. That way there is an obvious place for users to enable this module and the rest_oai_pmh one. They would need to install rest_oai_pmh as a separate step (or include it in the composer variable in their playbook) but we can rely on the Drupal module install/enable system to flag the dependency and turn it on or off. However, some might consider this module bloat.

I'm planning on the latter option, but I can switch to the former if ya'll (@dannylamb, @jonathangreen, and @dbernstein) feel strongly about it.

@seth-shaw-unlv
Copy link
Contributor

@jonathangreen, to your point on the playbook, if we do make this enabled by default, I think adding this to the playbook only requires updating two variables. Not really any trouble at all.

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Jun 27, 2019

Okay, another wrinkle. Because we use a Linked Agent field and a hook alter to modify the JSON-LD, we took out the Dublin Core mapping for that field. This means, by default, we have no agents appearing in the OAI-PMH metadata. The first step is to simply add the mapping back in as contributor, but then we have all our creators listed simply as contributors, which isn't good enough.

Two possible solutions:

  1. We simply add another Linked Agent field and delineate one as types of creators and the other as types of contributors and add RDF mappings accordingly. True, the OAI-PMH metadata will toss out all of our fine-grained relators (but you're going to Dublin Core, you knew that was going to happen). It also means metadata creators need to decide which field an agent belongs in.
  2. We skip the RDF mapping altogether and dive straight into a Twig-based metadata profile. We have pretty much unlimited flexibility here, and I may need to do it eventually for our own local purposes anyway, but I don't know how long it will take to do it (having never done it).

Thoughts? Especially yours @rosiel.

@seth-shaw-unlv seth-shaw-unlv self-assigned this Jun 27, 2019
@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

@seth-shaw-unlv my $0.02 is go with Twig templates, but ideally we'd provide a UI for allowing the site admin to select a default template. Also.... it would be so awesome to allow for Context Reactions to make a give template used by a request. I can come up with use cases if you want. Not sure where that feature would fit into what you're doing, maybe it's best part of the main REST OAI module.

@seth-shaw-unlv
Copy link
Contributor

@mjordan, currently you only have three options: metatag, RDF mapping, or custom Twig. There aren't any hooks for Contexts unless we want to hook them in ourselves. That would probably be better done as part of the rest_oai_module though because part of the OAI-PMH spec is being able to reply with what metadata schemas are available.

@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

Yes, I think rest_oai_module is the place for Contexts. I'll follow up.

@seth-shaw-unlv
Copy link
Contributor

Correction, I meant to type rest_oai_pmh module... sometimes my fingers can't keep up with me.

@rosiel
Copy link
Member

rosiel commented Jun 27, 2019

I don't really know what I think re "fields -> twig -> DC", vs. "fields -> RDF -> something -> DC". On one hand, the idea of a "creator" linked agent field maybe makes sense, in a case where 'photographer' could mean a creator or a contributor depending on the context (photograph vs. book containing photographs).

Know that there is a 'relator' called 'creator [cre]' so that is one way to identify creators in DC.

If you're looking at using RDF mappings because at the moment they contain only Dublin Core Terms, know that they won't always. The mapping is going to (by necessity) expand out of DC. Is that ok for OAI-PMH? I'm sorry, I should know whether OAI requires DC Elements or DC terms but if i spend the 10 minutes to find out y'all are going to have added 20 more comments...

But it would be ... "nice"... to have an, um, generic DC endpoint, in the same way that we did in 7.x, so that (if configured right) you could go to example.com/islandora/object/pid/datastream/DC/view to get "the DC". It seems this is something that is maybe going to be (automatically) available through OAI?

@seth-shaw-unlv
Copy link
Contributor

Although, I'm still not sure Contexts is the way to go here. Contexts implies that we are able to make a decision based on some pre-set condition of the record. With OAI-PMH, the metadata profile used is either a default one OR one specified by a URL parameter. That is something probably better done with a config file and module code.

@seth-shaw-unlv
Copy link
Contributor

@rosiel, the rest_oai_pmh RDF mapping option uses the Dublin Core predicates and filters out the others, so even if we add other predicates, as long as we leave the Dublin Core ones alongside (RDF mappings can have multiple predicates for a single field), it still works. That means that fields would be duplicated in our RDF as it would have our more specific predicates and the generic Dublin Core. I could live with that, but I don't know if everyone will.

I'm inclined to break-up the linked agent field into creators and contributors. However, this could complicate migrations; can we tell if a photographer in the Islandora 7 MODS should be a creator or contributor?

In anycase, the current RDF mapping path gives us a Dublin Core default endpoint, but only that. I think we do need code-up a PR to the rest_oai_pmh module that allows multiple metadata profile options based on Twig templates that the OAI-PMH harvester can choose from.

@joecorall
Copy link
Member

joecorall commented Jun 27, 2019

I'm glad to hear you all are considering adding rest_oai_pmh to Islandora 8. Though there is one major issue that I'm working to resolve before moving the project from "alpha" to "beta". The issue is essentially to ensure the OAI endpoint is always up to date with the records assigned to be exposed to OAI via Views.

FWIW for the Linked Agent field, printing that field to OAI could be done like this:

function islandora_oaipmh_preprocess_rest_oai_pmh_record(&$variables) {
  $entity = $variables['entity'];
  if ($entity->hasField('field_linked_agent')) {
    foreach ($entity->get('field_linked_agent') as $linked_agent) {
      $dc_field = $linked_agent->rel_type == 'relators:cre' ? 'dc:creator' : 'dc:contributor';
      $variables['elements'][$dc_field][] = $linked_agent->entity->label();
    }
  }
}

I'm open to adding other ways of mapping Drupal fields to oai_dc in the rest_oai_pmh module. I can investigate integrating with Context if you all think that's a good idea, or any other solutions that would provide the most utility, and easy of use for site admins.

Thanks!
Joe

@DiegoPino
Copy link
Contributor

@seth-shaw-unlv my $0.02 is go with Twig templates, but ideally we'd provide a UI for allowing the site admin to select a default template. Also.... it would be so awesome to allow for Context Reactions to make a give template used by a request. I can come up with use cases if you want. Not sure where that feature would fit into what you're doing, maybe it's best part of the main REST OAI module.

Interesting to read you planning on integrating the same or similar Archipelago architectural feature, Dynamic Twig based Metadata Shaping into Islandora 8. I guess some of you are already aware of this, but in our implementation they work on every aspect of the output chain, including Display, Download and API (e.g IIIF). We followed the success we had with our Islandora Multi Importer shaping of MODS via twig templates, all this so many years ago. Of course details will vary since our source data is always JSON and not a list of field/values.

For those not following the google group, maybe there are some other features you could find interesting too here in our condensed roadmap: esmero/archipelago-aws-demo#6. A lot of care, research and development has been put into this. Feel free to take a look, there is little about other modules, we wrote most of this ourselves.

I wonder if this departs from the whole philosophy of Islandora 8 regarding Drupal RDF mapping and JSON-LD serialization. Would that not require double effort from an implementation team? Mapping on one side and then having to figure out again the DC part for OAI?

Our team is much smaller so we do not move as fast as you, but since twig templates as metadata casting services is already core for us we are just evolving this approach to cover more bases. We would love to see how this evolves in your architecture and what are the commonalities. We already have some interesting plans for the second iteration since this has served us well. Thanks a lot

@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

@seth-shaw-unlv I agree with

Contexts implies that we are able to make a decision based on some pre-set condition of the record. With OAI-PMH, the metadata profile used is either a default one OR one specified by a URL parameter.

but some OAI consumers want specific flavor of DC/MODS, etc. Context will let us do things like "when the request is from SFU's Alma library platform for oai_dc, use this template to generate the DC". The request would still be for oai_dc, but the exact flavor of DC returned would be specific to the Context.

@rosiel
Copy link
Member

rosiel commented Jun 27, 2019

Considering all the work that went into making "the RDF representation" cohesive, I don't think it makes good sense to throw DCE into that as well. In addition, DCE do not have ranges, so a "creator" could be a URI (hypothetically - and in our case it's a locally minted one) but I assume (@mjordan please correct me if I'm wrong) OAI-PMH wants a literal for all its DCE?

@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

OAI-PMH just wants DC, so I assume the values are expected to be literal strings (I assume by 'DCE' you mean DC elements).

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Jun 27, 2019

I wonder if this departs from the whole philosophy of Islandora 8 regarding Drupal RDF mapping and JSON-LD serialization. Would that not require double effort from an implementation team? Mapping on one side and then having to figure out again the DC part for OAI?

I don't think so. (Granted, I did come to the game after the Drupal->JSON-LD->Fedora chain was established.) Once we do the initial Drupal to RDF mapping, most of that stack is simply working with serializations of RDF.

However, not everyone wants RDF and many other metadata schemas tightly bind their fields to their structure, which means that in many cases a map to another schema is going to require additional mapping somewhere. Sure, serializing to Dublin Core XML is easy if you are using DC predicates. MODS, MARCXML, EAD, and others will require a map of some type.

@seth-shaw-unlv
Copy link
Contributor

Considering all the work that went into making "the RDF representation" cohesive, I don't think it makes good sense to throw DCE into that as well.

It sounds like we may want to keep our RDF "pure" (not adding in close-enough DC predicates to field mappings) while also providing other harvesters what they want. In that case we should make an oai_dc specific mapping separate from the RDF mapping.

We would still need to somehow distinguish which relators are creators and which are contributors (in the case that the a specific relator, such as a photographer, could be either). Or, use the condition @joecorall did in his example where only relator:cre are creators and all others are contributors.

I'll note here that the rest_oai_pmh RDF mapping option does not serialize the linked entity's URIs, just the target entity's label.

@seth-shaw-unlv
Copy link
Contributor

MODS, MARCXML, EAD, and others will require a map of some type.

Yeah, add Google-scholar specific metatags to that list....

@seth-shaw-unlv
Copy link
Contributor

@DiegoPino, I was just looking over the Archipelago code you linked. I see you are storing Twig templates in the database as part of a custom entity. Are you still able to add template preprocessing? Or do we need to keep them as module or theme templates for that?

@rosiel
Copy link
Member

rosiel commented Jun 27, 2019

OAI-PMH just wants DC, so I assume the values are expected to be literal strings (I assume by 'DCE' you mean DC elements).

@mjordan I'm using "DCE" to refer to the Dublin Core Element Set, yes. It seems commonplace nowadays to use "DC" to mean the DCMI Metadata Terms, a much more nuanced collection of terms and the successor to the now defunct "Qualified Dublin Core". But while anyone using Dublin Core Elements is most likely just providing literals as objects, those elements are not officially defined with a range, so anything is technically acceptable. You can see this difference when comparing creator in the DC Terms namespace with creator in the DC Elements namespace. dcterms:creator has range Agent so providing a literal name would be wrong.

@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

@rosiel right, that DCE. Been so long since I used it, it fell out of my instantly accessible vocabulary. (See, there is a use for URIs in normal human dialogue!)

I understand what you're saying, but I think that most OAI-PMH consumers would be expecting literals. But, being able to include anything else (like a URI) would support the idea of using Context to optionally allow for returning consumer-specific values in DC.

@seth-shaw-unlv
Copy link
Contributor

So, strictly speaking, if OAI-PMH feeds are only expressing literals, we need to convert DC terms that aren't ranged literals into their DC Elements correlarries. I don't think I had noticed that they weren't also ranged for literals. Good to know.

@seth-shaw-unlv
Copy link
Contributor

Attempt to summarize some of this conversation:

In the short term using the RDF mapping is reasonable because all of our existing mappings are RDF. The one hitch is the linked agent, but that can be accounted for by implementing hook_preprocess_rest_oai_pmh_record (as @joecorall indicated). However, we should still make a list of relators most likely to be creators (e.g. Author, Photographer, Creator) while the others can default to contributors. I can make an initial pass but I'll want feedback on it. Also, we need to be careful about using our DC namespaces carefully to make sure literals are permissible.

Would that be acceptable as an initial PR? It so, it is nearly ready.

In the long term we want to enable multiple metadata schemas, such as MODS, (most likely as customizable Twig templates) and enable Contexts to switch them as desired. Side note, I'll eventually want EAD for my archivesspace-drupal integration.

@mjordan
Copy link
Contributor

mjordan commented Jun 27, 2019

@seth-shaw-unlv thanks that's useful. Just to clarify the bit about Contexts, I'm saying that they could determine which template to use within a particular schema, not that they should replace the metadataPrefix request parameter. I can provide some specific use cases later.

@DiegoPino
Copy link
Contributor

@seth-shaw-unlv sorry for the delayed response: You can add template preprocessing since its part of the rendering chain, but we have no use for that right now since we pass JSON data + whole Node too to the template engine.I think i pointed you to our first public commit but code has of course evolved in the meantime. Twig D8 best practices encourage to pass render arrays only but we are dealing with Metadata in a different way where field preprocessing (plural) for rendering makes little sense to us. Look at the format_strawberryfield module if you are curious about our approach. Since we are approaching beta we will commit this week some extra goodies that have been in the works for the last few weeks.

@mjordan
Copy link
Contributor

mjordan commented Jul 12, 2019

As a test of how this module can respond with non-DC metadata, I'd like to nominate ETDMS as the second format.

@joecorall
Copy link
Member

joecorall commented Jul 12, 2019

I created a feature branch on rest_oai_pmh for a submodule that provides etdms format. It's only a very rough start since I am not familiar with the ETDMS metadata format. It basically acts identical to oai_dc, but instead of wrapping the record metadata in <oai_dc:dc> tags, it's wrapped in a <thesis> tag. You can take a look at the submodule here >>> https://github.com/kent-state-university-libraries/rest_oai_pmh/tree/3061577-etdms/modules/rest_oai_pmh_etdms <<< pull requests welcome.

Additional metadata elements for a node/entity can be added to a record in rest_oai_pmh_etdms_preprocess_rest_oai_pmh_record__etdms and it will print in OAI-PMH. If the additional metadata elements don't print correctly, you can alter the template file at rest-oai-pmh-record--etdms.html.twig

@mjordan
Copy link
Contributor

mjordan commented Jul 12, 2019

@joecorall sounds good. I like the hook approach! I know a lot of Islandora sites will be very interested in early ETDMS support.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

@dannylamb Islandora/islandora_defaults#4 has been merged. OK to close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants