Skip to content

20181129 Ontology Improvement Call

marijane white edited this page Nov 29, 2018 · 1 revision

2018.11.29

Attendees: Marijane White, Brian Lowe, Michael Conlon, Tatitana Walther, Christian Hauschke, Ralph O'Flinn

Agenda: https://wiki.duraspace.org/display/VIVO/2018-11-29+Ontology+Improvement+Call

Interest Group of Task Force?: recap for Mike. Everyone who was on the last call and possibly also the VIVO Steering Group felt it would be good to transition the TF to an IG. Mike clarifies that the SG is mostly interested in IGs being active, because a couple current IGs are not.

Documentation for ontology updates: Mike did not understand this item from a previous meeting, asks for clarification. Christian explains that we should make it clear how people can request ontology changes. Mike volunteers to write this up.

Ontology Licensing: Marijane has started a document to review the landscape of ontology licenses at https://docs.google.com/spreadsheets/d/1nDDOevoqzTUJcTp0uP6QXteHlF09cmDD0DUB9liV8pA/edit#gid=0

Right now the ontology is in a GitHub repository that says everything is licensed under Apache 2.0, which is fine for software but not for an ontology.

The goal here is to choose and suggest a license for the ontology and to figure out what we need to do about the various licenses for the ontologies we reuse and put together an attribution statement. We definitely need the ability for commercial reuse, which makes the terms for ontologies like the FAO Event ontology concerning. Ralph has shared a few in the Slack channel that are compatible with the Apache license and our needs.

Ralph believes that licenses have to be on a per-file basis. We are going to follow whatever we discover ontology best practice.

Brian points out that we need to keep in mind the original goal here, which is sharing data with Wikidata, and having different parts with different licenses would be too much trouble for anyone to figure out whether they can reuse the ontology. Mike agrees.

Marijane is using the list of namespaces that Javed put together to figure out what we need to track down. Everyone is welcome to contribute to the spreadsheet.

Identifiers: Need to deal with the inconsistent way we deal with identifiers. Mike recapped the proposal. Christian notes that Stefan Wolff has raised some important issues, and wonders if there's a use case beyond what Wikidata does. Mike says he put this together before he knew what Wikidata does, and that it was mostly about how we treat ORCiDs as owl:Things, and the inconsistency with respect to all the other identifiers in the ontology, which are implemented as datatype properties, which doesn't give us the ability to say more about them.

There is also the question about handle URIs, we also can't say things about those. This might be an example of the use case that Stefan has asked about. There are also lots of examples in Wikidata where they are doing something similar. Another use case is being able to note the issuing authority of a DOI, say CrossRef. Yet another use case: ISSNs have properties, and the ISSN people went into lots of detail about ISSN. We might care about this someday, for example, the fact that you can't get a list of ISSNs because they're licensed property

Christian and Marijane asked Mike to respond to Stefan's comment in the document proposal and add a section enumerating use cases.

Brian asks if we know how many of our use cases are covered by properties that Wikidata has created? Mike says no. That gets into complex issues with Wikidata that might be a separate discussion. Brian thinks we need to be clear about why we might or might not use Wikidata's properties, and whether it might help get buy in on the proposal. Mike says that tangles the issue of using Wikidata properties and let them reuse our data with the question of good modeling. Brian agrees, but if we adopt our own proper modeling and then later decide to reuse wikidata, we don't want to end up proposing changing the same thing twice. Marijane notes that Wikidata's data model is not actually documented -- Bob DuCharme has been blogging about this recently, pulling models out with SPARQL queries -- and that might mean we don't want to reuse their model. But we should definitely see what they are doing in the case that are relevant to us and consider following their example. Tatiana notes that new properties are always being added, and Christian notes there is a process for this that is discussed on their mailing list. Mike draws an analogy with the early days of Wikipedia where interesting niche subjects were very thorough. Tatiana was at a Wikidata workshop in the spring and didn't get a definitive answer to her questions. Mike notes that Daniel Mietchen from Wikidata is going to be presenting at an upcoming Developer IG call. Mike and Marijane are both cautious about Wikidata, but Mike notes that we could be better off by replacing the FAO Geopolitical ontology with Wikidata's model. An investigation of Wikidata seems like a good future project. Christian did a project finding towns with more than 100,000 residents in GeoNames but thinks getting closer to Wikidata is a good idea. Christian also notes that TIB has a student working on something that could be a Wikidata lookup service for VIVO.

Back to Identifiers: Marijane wonders what the next steps are? Mike says we will of course need to socialize this with the steering group and the developer IG,and Marijane notes that the change document we originally put together might need to be reworked. Mike says we do have one crude mechanism: we can create a pull request. There are some related issues that Mike might pull in with identifiers in a pull request to create a coherent piece of work.

December Sprint: An issue has been raised to replace parent classes for things that have a parent of skos:Concept. There are five of them. Mike suggests we don't change any of the modeling (related by, etc), we just change the subclass assertion to say quality. Tatiana also believes they should be qualities. The main goal here is to get them to stop showing up as concepts in the application. Some of these things are outputs of processes, but we haven't modeled the process, but those are complicated conversations for the future and which are not necessary in order to fix the parent class issues. Mike asks for comments on the issue, or the pull request when it comes to that. We will of course confirm that the classes still work in the application.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally