-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we just use rest_oai_pmh ? #1192
Comments
I've tested it and seems to work well. I'm working on an islandora_defaults submodule using it right now. |
It won't be much, just configs for now; but it is a good jumping off point to include some of the twig-based metadata serializations you mentioned a while ago. |
Yep, that part definitely caught my eye. I'd love to be able to render out things like MODS on the fly. |
It just took a few minutes, but the branch I linked earlier, if you check it out and enable the new islandora_oaipmh submodule it will give you an OAI-PMH endpoint at /oai/request with a set called oai_pmh:all_repository_items with all the repository items you have (sans Collections, because no one expects those as objects in their OAI-PMH feed). Documentation and PR to come. Note: the rest_oai_pmh module needs to build a set index before it returns results, so hit the 'Rebuild OAI-PMH' button @ http://localhost:8000/admin/config/services/rest/oai-pmh/queue when you load it up. Cron will refresh it over time. |
@dbernstein and I were talking this week about him getting It sounds like your already most of the way there @seth-shaw-unlv. Do you want / need any help pulling things into the playbook or should we leave this one to you? |
At its most basic, adding rest_oai_pmh support includes two configs, a dependency on rest_oai_pmh (of course), plus some README documentation. So, how do we want to go about including them?
I'm planning on the latter option, but I can switch to the former if ya'll (@dannylamb, @jonathangreen, and @dbernstein) feel strongly about it. |
@jonathangreen, to your point on the playbook, if we do make this enabled by default, I think adding this to the playbook only requires updating two variables. Not really any trouble at all. |
Okay, another wrinkle. Because we use a Linked Agent field and a hook alter to modify the JSON-LD, we took out the Dublin Core mapping for that field. This means, by default, we have no agents appearing in the OAI-PMH metadata. The first step is to simply add the mapping back in as contributor, but then we have all our creators listed simply as contributors, which isn't good enough. Two possible solutions:
Thoughts? Especially yours @rosiel. |
@seth-shaw-unlv my $0.02 is go with Twig templates, but ideally we'd provide a UI for allowing the site admin to select a default template. Also.... it would be so awesome to allow for Context Reactions to make a give template used by a request. I can come up with use cases if you want. Not sure where that feature would fit into what you're doing, maybe it's best part of the main REST OAI module. |
@mjordan, currently you only have three options: metatag, RDF mapping, or custom Twig. There aren't any hooks for Contexts unless we want to hook them in ourselves. That would probably be better done as part of the rest_oai_module though because part of the OAI-PMH spec is being able to reply with what metadata schemas are available. |
Yes, I think rest_oai_module is the place for Contexts. I'll follow up. |
Correction, I meant to type rest_oai_pmh module... sometimes my fingers can't keep up with me. |
I don't really know what I think re "fields -> twig -> DC", vs. "fields -> RDF -> something -> DC". On one hand, the idea of a "creator" linked agent field maybe makes sense, in a case where 'photographer' could mean a creator or a contributor depending on the context (photograph vs. book containing photographs). Know that there is a 'relator' called 'creator [cre]' so that is one way to identify creators in DC. If you're looking at using RDF mappings because at the moment they contain only Dublin Core Terms, know that they won't always. The mapping is going to (by necessity) expand out of DC. Is that ok for OAI-PMH? I'm sorry, I should know whether OAI requires DC Elements or DC terms but if i spend the 10 minutes to find out y'all are going to have added 20 more comments... But it would be ... "nice"... to have an, um, generic DC endpoint, in the same way that we did in 7.x, so that (if configured right) you could go to example.com/islandora/object/pid/datastream/DC/view to get "the DC". It seems this is something that is maybe going to be (automatically) available through OAI? |
Although, I'm still not sure Contexts is the way to go here. Contexts implies that we are able to make a decision based on some pre-set condition of the record. With OAI-PMH, the metadata profile used is either a default one OR one specified by a URL parameter. That is something probably better done with a config file and module code. |
@rosiel, the rest_oai_pmh RDF mapping option uses the Dublin Core predicates and filters out the others, so even if we add other predicates, as long as we leave the Dublin Core ones alongside (RDF mappings can have multiple predicates for a single field), it still works. That means that fields would be duplicated in our RDF as it would have our more specific predicates and the generic Dublin Core. I could live with that, but I don't know if everyone will. I'm inclined to break-up the linked agent field into creators and contributors. However, this could complicate migrations; can we tell if a photographer in the Islandora 7 MODS should be a creator or contributor? In anycase, the current RDF mapping path gives us a Dublin Core default endpoint, but only that. I think we do need code-up a PR to the rest_oai_pmh module that allows multiple metadata profile options based on Twig templates that the OAI-PMH harvester can choose from. |
I'm glad to hear you all are considering adding rest_oai_pmh to Islandora 8. Though there is one major issue that I'm working to resolve before moving the project from "alpha" to "beta". The issue is essentially to ensure the OAI endpoint is always up to date with the records assigned to be exposed to OAI via Views. FWIW for the Linked Agent field, printing that field to OAI could be done like this: function islandora_oaipmh_preprocess_rest_oai_pmh_record(&$variables) {
$entity = $variables['entity'];
if ($entity->hasField('field_linked_agent')) {
foreach ($entity->get('field_linked_agent') as $linked_agent) {
$dc_field = $linked_agent->rel_type == 'relators:cre' ? 'dc:creator' : 'dc:contributor';
$variables['elements'][$dc_field][] = $linked_agent->entity->label();
}
}
} I'm open to adding other ways of mapping Drupal fields to oai_dc in the rest_oai_pmh module. I can investigate integrating with Context if you all think that's a good idea, or any other solutions that would provide the most utility, and easy of use for site admins. Thanks! |
Interesting to read you planning on integrating the same or similar Archipelago architectural feature, Dynamic Twig based Metadata Shaping into Islandora 8. I guess some of you are already aware of this, but in our implementation they work on every aspect of the output chain, including Display, Download and API (e.g IIIF). We followed the success we had with our Islandora Multi Importer shaping of MODS via twig templates, all this so many years ago. Of course details will vary since our source data is always JSON and not a list of field/values. For those not following the google group, maybe there are some other features you could find interesting too here in our condensed roadmap: esmero/archipelago-aws-demo#6. A lot of care, research and development has been put into this. Feel free to take a look, there is little about other modules, we wrote most of this ourselves. I wonder if this departs from the whole philosophy of Islandora 8 regarding Drupal RDF mapping and JSON-LD serialization. Would that not require double effort from an implementation team? Mapping on one side and then having to figure out again the DC part for OAI? Our team is much smaller so we do not move as fast as you, but since twig templates as metadata casting services is already core for us we are just evolving this approach to cover more bases. We would love to see how this evolves in your architecture and what are the commonalities. We already have some interesting plans for the second iteration since this has served us well. Thanks a lot |
@seth-shaw-unlv I agree with
but some OAI consumers want specific flavor of DC/MODS, etc. Context will let us do things like "when the request is from SFU's Alma library platform for oai_dc, use this template to generate the DC". The request would still be for oai_dc, but the exact flavor of DC returned would be specific to the Context. |
Considering all the work that went into making "the RDF representation" cohesive, I don't think it makes good sense to throw DCE into that as well. In addition, DCE do not have ranges, so a "creator" could be a URI (hypothetically - and in our case it's a locally minted one) but I assume (@mjordan please correct me if I'm wrong) OAI-PMH wants a literal for all its DCE? |
OAI-PMH just wants DC, so I assume the values are expected to be literal strings (I assume by 'DCE' you mean DC elements). |
I don't think so. (Granted, I did come to the game after the Drupal->JSON-LD->Fedora chain was established.) Once we do the initial Drupal to RDF mapping, most of that stack is simply working with serializations of RDF. However, not everyone wants RDF and many other metadata schemas tightly bind their fields to their structure, which means that in many cases a map to another schema is going to require additional mapping somewhere. Sure, serializing to Dublin Core XML is easy if you are using DC predicates. MODS, MARCXML, EAD, and others will require a map of some type. |
It sounds like we may want to keep our RDF "pure" (not adding in close-enough DC predicates to field mappings) while also providing other harvesters what they want. In that case we should make an oai_dc specific mapping separate from the RDF mapping. We would still need to somehow distinguish which relators are creators and which are contributors (in the case that the a specific relator, such as a photographer, could be either). Or, use the condition @joecorall did in his example where only relator:cre are creators and all others are contributors. I'll note here that the rest_oai_pmh RDF mapping option does not serialize the linked entity's URIs, just the target entity's label. |
Yeah, add Google-scholar specific metatags to that list.... |
@DiegoPino, I was just looking over the Archipelago code you linked. I see you are storing Twig templates in the database as part of a custom entity. Are you still able to add template preprocessing? Or do we need to keep them as module or theme templates for that? |
@mjordan I'm using "DCE" to refer to the Dublin Core Element Set, yes. It seems commonplace nowadays to use "DC" to mean the DCMI Metadata Terms, a much more nuanced collection of terms and the successor to the now defunct "Qualified Dublin Core". But while anyone using Dublin Core Elements is most likely just providing literals as objects, those elements are not officially defined with a range, so anything is technically acceptable. You can see this difference when comparing creator in the DC Terms namespace with creator in the DC Elements namespace. dcterms:creator has range Agent so providing a literal name would be wrong. |
@rosiel right, that DCE. Been so long since I used it, it fell out of my instantly accessible vocabulary. (See, there is a use for URIs in normal human dialogue!) I understand what you're saying, but I think that most OAI-PMH consumers would be expecting literals. But, being able to include anything else (like a URI) would support the idea of using Context to optionally allow for returning consumer-specific values in DC. |
So, strictly speaking, if OAI-PMH feeds are only expressing literals, we need to convert DC terms that aren't ranged literals into their DC Elements correlarries. I don't think I had noticed that they weren't also ranged for literals. Good to know. |
Attempt to summarize some of this conversation: In the short term using the RDF mapping is reasonable because all of our existing mappings are RDF. The one hitch is the linked agent, but that can be accounted for by implementing hook_preprocess_rest_oai_pmh_record (as @joecorall indicated). However, we should still make a list of relators most likely to be creators (e.g. Author, Photographer, Creator) while the others can default to contributors. I can make an initial pass but I'll want feedback on it. Also, we need to be careful about using our DC namespaces carefully to make sure literals are permissible. Would that be acceptable as an initial PR? It so, it is nearly ready. In the long term we want to enable multiple metadata schemas, such as MODS, (most likely as customizable Twig templates) and enable Contexts to switch them as desired. Side note, I'll eventually want EAD for my archivesspace-drupal integration. |
@seth-shaw-unlv thanks that's useful. Just to clarify the bit about Contexts, I'm saying that they could determine which template to use within a particular schema, not that they should replace the |
@seth-shaw-unlv sorry for the delayed response: You can add template preprocessing since its part of the rendering chain, but we have no use for that right now since we pass JSON data + whole Node too to the template engine.I think i pointed you to our first public commit but code has of course evolved in the meantime. Twig D8 best practices encourage to pass render arrays only but we are dealing with Metadata in a different way where field preprocessing (plural) for rendering makes little sense to us. Look at the format_strawberryfield module if you are curious about our approach. Since we are approaching beta we will commit this week some extra goodies that have been in the works for the last few weeks. |
As a test of how this module can respond with non-DC metadata, I'd like to nominate ETDMS as the second format. |
I created a feature branch on rest_oai_pmh for a submodule that provides etdms format. It's only a very rough start since I am not familiar with the ETDMS metadata format. It basically acts identical to oai_dc, but instead of wrapping the record metadata in Additional metadata elements for a node/entity can be added to a record in rest_oai_pmh_etdms_preprocess_rest_oai_pmh_record__etdms and it will print in OAI-PMH. If the additional metadata elements don't print correctly, you can alter the template file at rest-oai-pmh-record--etdms.html.twig |
@joecorall sounds good. I like the hook approach! I know a lot of Islandora sites will be very interested in early ETDMS support. |
@dannylamb Islandora/islandora_defaults#4 has been merged. OK to close? |
Seriously, https://www.drupal.org/project/rest_oai_pmh looks like it's made for Islandora 8. Is there anything preventing us from using it? Has anyone tried it?
@joecorall? You wrote it. Any thoughts about this being a generic oai solution for Islandora 8?
The text was updated successfully, but these errors were encountered: