Skip to content

20180403 Ontology Change Improvement Call

marijane white edited this page Apr 3, 2018 · 1 revision

Date: 20180403

Attendees: Marijane White, Mike Conlon, Christian Hauschke, Muhammad Javed, Anna Kasprzik, Damaris Murry

Agenda: see https://wiki.duraspace.org/display/VIVO/2018-04-03+Ontology+Improvement+Call

Mike is switching back to his UFL email because the relationship between Duraspace and VIVO is changing. Mike is moving his volunteer role outside of Duraspace and there is a Memorandum of Understanding outlining responsibilities. This is all good for the future of VIVO.

Mike: Concern that the project needed to make more progress, led to changes in technical and project leadership, and a strategy on March 1st at Duke. Five areas identified for action planning: Vision, Resources, Product Evolution, Community Development, and Governance and Structure. Each group has a chair, each will give updates every week. Lots of pent-up interest and people who did not know how to participate, people with ideas who didn't know where to go. Now we have a sprint planned, and Javed is leading it.

Sprint Update from Javed: We have 10 people committed to work!

Sprint Tasks: https://wiki.duraspace.org/display/VIVO/Sprint+1+-+Proposed+Tasks Three tasks related to ontology work. Two of them are marked "homework", which means they will involve analysis, but no actual changes this sprint.

Reviewing the first task, preparing modules to remove ontologies/entities from vivo.owl. Identifying entities that are in the VIVO ontology that shouldn't be there, such as biomedical ontologies and OCRE. The idea is that for our homework, we will generate a vivo.owl that doesn't contain these modules, and the removed entities will go into their own module which end users can load optionally. Task will have three deliverables: updated vivo.owl and ontologies.owl, and new module files.

Damaris: How does this relate to VIVO core? Is it changing or replacing any of it?

Javed: None of this is part of the VIVO core namespace.

Mike: First, vivo.owl is VIVO core as it exists today, and this task is not removing anything. But the end goal is to eventually pull things out into modules.

Marijane: vivo.owl contains a lot of things from other namespaces that are not relevant to the core VIVO functionality.

Javed: Can you clarify what you mean by VIVO core?

Damaris: Like artistic works, treated like publications and grants. I was hoping that would be treated as core.

Javed: Are they core or upstream?

Damaris: Right now they are local extensions.

Javed: This ticket will not affect those.

Mike: There is a ticket to add humanities concepts to the ontologies, and this group could choose to work on it on a future sprint.

Javed: Anyway, I am leading this task, and Mike is reviewing it.

Next task is the VCARD ontology update. We have some discrepancies with the latest version of VCARD that need to be resolved. Tatiana is leading, with myself and Mike are reviewing. The deliverable will be a file that contains a list of VCARD entities used in the VIVO software with an analysis of usage, a file with a VCARD graph of entities to be removed from vivo.owl, and a file with a VCARD graph that needs to be added to VCARD.

Christian: Anna, can we analyze this with VoCol?

Anna: Sounds more like ROBOT.

Mike: Agreed, sounds like a ROBOT task.

Marijane: Should there be a task with a SPARQL CONSTRUCT query to translate triples?

Mike: That sounds like a very good idea. Also need to look for references in the code, which should also be included in the analysis, to create a list of software modules that contain VCARD references.

Javed: The final task is the vivo.owl and ontologies.owl update. Testing whether vivo.owl contains all entities that exist in the current software, and test whether ontologies.owl is complete, all the ontologies are mentioned in the siteAdmin page. So task one is to test with the software, task two is to make sure the file is correct.

Mike: To clarify, there is a goal to create a release, with vivo.owl and ontologies.owl as part of it.

Javed: and most of the work has already been done, this is just to re-confirm we haven't broken anything.

There is a task to re-evaluate the 1.10 branch task, which will be discussed more in depth in the Dev IG meeting, they will be testing the vivo-ontology-lab pull request.

I'm thinking each sprint we should have tasks for both groups, the ontology engineers and the developer group.

Mike: Having vivo.owl in the project means we can make small pull requests for low-impact changes between sprints.

Javed: Sprints can be reserved for major tasks.

Mike: Like developing a humanities ontology.

Next, MIRO guidelines for documenting ontology. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0172-7 UFL looked at this, has some good things in it, section A in particular. It's an enumeration of what must be in an ontology for others to understand it. This is written by people who really know their stuff with respect to ontologies.

VIVO has some of the things in this document, but not all of it. Raising this topic to give people a chance to read the paper, and perhaps we can discuss it next time we meet, create a JIRA ticket, etc.

Anna: Have they considered practical considerations with this? How do you get a community to do all of this?

Mike: Well, I think we should focus on Section A, that much seems doable.

Next agenda item, developer friendly ontology. There was some discussion about this on the developer list, and it's always been a topic of discussion, that the ontology is difficult for developers and that we might make it more friendly. One issue is the use of numeric predicates. My feeling is that the use of numeric predicates is required. Tools could perhaps make this easier. We could perhaps use some sameAs assertions to make it easier for people writing queries.

Javed: How will sameAs help here, because we're not using inferencing?

Mike: Right, we're not doing that at the moment, but we could imagine a world where we define sameAs relationships that would make queries easier.

Javed: I'm not sure the SPARQL queries can handle this?

Christian: If the reasoning has taken place it could.

Mike: Before we engineer a solution, let's consider whether this is a problem that needs to be solved.

Christian: As a simple librarian, I hate the OBO properties. Non-numeric predicates are much easier to work with.

Mike: There are other examples of developer friendly properties -- relates and relatedBy, which then require faux properties, this does seem like a problem to me. We might be able to deal with this subproperties, with defined domain and range. Seems like it could get rid of faux properties while we're at it.

The third example is complex patterns. Say, instead of saying Mike is a member of something, we have to say Mike has a membership role in something else -- we reify things which makes the pattern more complex. Reification is used when you want to say things about the reified things. I'm not sure if this is something we could do.

Marijane: This seems like it could be another use case for a CONSTRUCT query. Like for translating to schema.org, which does not have a lot of reification.

Mike: I also think this could be inferenced. So we should be thinking about this.

Javed: A couple years ago I was thinking about this, where you have say a complex relationship like Authorship, but also a direct relationship between the entities. If you want to know more about the authorship, then you look at the reified node.

Marijane: So there would be multiple representations in the data. And I think the only concern is making sure it's well documented so new users aren't confused about why there are two different ways to say things.

Javed, Mike: Yes.

Mike: We are very dependent on our upper ontologies, BFO and OBO. And when you look at these upper ontologies, things like related and relatedBy become red flags. In BFO these things are modeled as processes. VIVO does have this for say, Educational Process, but it is missing other things.

Marijane: And I complained about this a couple years ago, that the reified Relationship is a processual entity, and it's in the wrong part of the ontology.

Mike: and this raises issues with our Publication model, there's the one where no person is involved, and the institutional view which is a series of processes that go into the production of the paper, none of which are in the model, because we're using a simple repository type model.

This leads directly to the next topic, VIVO-ISF and PROV. RDA is starting a work group on provenance patterns, which I attended at the meeting in Berlin. Led by Nicholas Carr, and Dave Dubin gave a talk about provenance contrasting VIVO-ISF and PROV, very thoughtful analysis. Don't have his slides yet. Talked about the BFO/OBO nature of the ISF, and the concept of a role in BFO, and how it completely conflicts with the PROV notion of a role, because PROV reinvented a bunch of concepts without an upper ontology. Dubin did a very thoughtful analysis of role, and how BFO is doing it correctly. Left conclusion unsaid, which seemed to be that VIVO doesn't need PROV. We already have all of it, we just need to arrange it in provenance patterns. This is another topic for future discussion.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally