Skip to content

20170522 Ontology Change Improvement Call

marijane white edited this page May 22, 2017 · 2 revisions

Date: May 22, 2017

Attendees: Mike Conlon, Tatiana Walther, Brian Lowe, Damaris Murry, Graham Triggs, Muhammad Javed, Violeta Ilik, Juliane Schneider, DJ Lee

Agenda:

  • Recap and updates
  • Ontology domain definition
  • Ontology change process at Duke – Damaris
  • Discussion regarding VIVO

Mike: Apologies for missing several meetings, several issues kept me away from things.

One of the things I did was attending the Protege Short Course at Stanford, heard from them about ontology design, construction, etc. Very intense -- 150 exercises in three days, all while being lectured by very smart postdocs. Had interesting conversations over lunch about ontology change management. They don't have a lot of stuff wrapped into software and a community like we do. Mostly vocabularies.

Also trying to push concepts of property chains and inference in VIVO. Initially conservative but more could be done here.

OK, Ontology Domain Definition https://docs.google.com/document/d/1T92B9H7c7R7zrbyZ0DubAc49CxVzt__FNKk3K6bYpys/edit

Document was readable by the whole internet, so we got some feedback from people not so close to the community.
Christian noted absence of "scholarly output", so added a competency question to this point. And deliberately added two new questions that VIVO doesn't really do yet, strong hints for future work.

  • What people are available to work on activity X? for things like finding people for PhD committees. Credentials? Willing? Choice of "available" very deliberate to encompass many contexts.

  • Provenance of the evidence regarding expertise? Again, another hint for future work. Some have done work in this area, not much in VIVO. There were early discussions about this, were kicked down the road.

Violeta: The document is great as is, do we have to run this by anyone, or do you have the authority to decide what's final?

Mike: Wanted to get everyone's feedback, it's the work of the Task Force if I don't hear any objections.

Damaris, can you share some thoughts about your ontology change process at Duke? And then

Damaris: We actually make a lot of changes to our ontology because we are constantly getting feedback from faculty members about what's missing. Have added a lot about artistic works. Always trying to communicate changes. Recently, mostly additions. We've discussed high-impact vs low-impact, most of our communication lately has been promotion of the changes, which are mostly low-impact, letting faculty know what they can now add to their profile. Don't delete things often, but that is when we start contacting our data consumers. Two main ways we give data to people: data feed which are JSON widgets, and then some embedded stuff. Isolates people from changes because we can just update our SPARQL queries, mostly just need to tell them what's new. Try to give a big heads-up when things are changing. For the people who are using SPARQL queries, we keep track of them and know exactly who they are, we email them individually with at least a month's warning before making changes. Really try to avoid breaking their websites, and try to communicate as much as possible. We have a data dictionary, and we make PDFs of all our communications. Overall approach is to protect people from ontology changes, and help people out on an individual basis.

Mike: how about more substantive changes, or have you made any?

Damaris: The upgrade was the most significant. About 5 years ago when I started, we'd talk about updates on the ontology call. This unfortunately did not help a lot, made a lot of local extensions. Have a lot that we hope we can bring in and share with everyone else. Still making a lot of changes in local extensions, wondering how to incorporate them into the main ontology.

Mike: Yes, we need to talk about how to do that. Javed asked if we could talk about this at the conference, currently thinking about a Thursday Birds of a Feather lunch about ontology, encourage people to bring their extensions.

Violeta: Damaris, is your ontology development file is on Github? What file format?

Damaris: RDF.

Javed: Also, you have made a number of changes. Can I ask, what entities you've added in the last year, and which entities have been updated in the last year or two?

Damaris: Most of our additions have been adding things from CVs that are not in there. Like News stories.

Javed: Do you have a changelog for your ontology changes?

Damaris: Not officially, besides what is on GitHub and in communications.

Javed: I would like to present two slides if possible. My PhD is in ontology change management, so I've been thinking about this. Conduct a change analysis, identify changed and impacted entities, and then you propagate it to VIVO software, and then VIVO instances. Focusing on the last two. Let's say we have Ontology V1 and Application V5. Update the ontology to V2, we should have a Machine Readable Change log with details about the changes and understand how the ontology changed over time. Update the Application for the new ontology changes, and then propagate to instances. How do we do this? What do we provide to them? I believe it should be a new application, and perhaps also a script that updates data in the triplestore automatically. This is my high-level idea about ontology change, and propagation to the application and to instances.

Mike: This is very good, and very clear. A couple comments: with semantic versioning we might be able to signal lesser changes that don't require a new application. As Damaris was pointing out, they change the ontology but not necessarily the software. So there can be an in-between step where an ontology update is still useful in the application and the software doesn't need to be updated. Might be a dot release. Semantic Versioning can signal software compatibility. But yes, this is very much like the process we have in mind.

Violeta: The ontology should be separate from the application, I thought we were trying to avoid this.

Mike: That is a very long term goal, for now everything is very tightly coupled.

Tatiana: A significant problem is with the hard coding and the templates.

Javed: It depends on the context of the changes.

Mike: Go back to previous slide -- this is about the change analysis. We have to figure out if a change is going to change the templates, or if the change goes in without the templates, so it would go into the ontology but never seen by the users. Some changes can be introduced that are automatically surfaced by the application, like adding new identifiers. VIVO will add it to screens and users will be able to see it and put in new data. So it works well in some areas, where we can update ontology without updating the software, but it doesn't in others.

Javed: My question is what is the plan, are we going to assign some people for those tasks?

Violeta: Yes, definitely, someone needs to be responsible for that.

Javed: Perhaps have 2 or 3 people who review these things.

Mike: A practical issue: our current practices use Jira and GitHub, for which I don't know how to implement a workflow. We need a workflow to identify when things need to be done, like change analysis.

Violeta: Can I suggest something? Since I design workflows at work, can we use GitHub, can the person who needs to do the change analysis, can they be tagged?

Marijane: I agree with Violeta, we can do this by tagging or assigning people to issues in GitHub

Violeta: We can identify primary person, backup person. It works well at Northwestern, perhaps because they work with me. (laughter)

Graham: That does work when you know people are available, but it might work in this distributed situation.

Violeta: I know, that was just a suggestion.

Mike: I agree with the approach on Javed's slides, that there's a request with an analysis. The idea of using GitHub is fine, there might be more repositories than we need, so I have a question about that. And I have a question about which repository and which issue trackers.

Marijane: We talked about this while you were unable to join us.

Javed: The issues should be created in the same project where the changes were made.

Marijane: Agreed.

Mike: I have some principles of workflow. If we create an ontology change workflow, what are we trying to accomplish? We're trying to make changes that benefit the VIVO community. We also understand that at some level, the changes must be implementable. The workflow must be feasible, it has to work, it has to be timely, efficient, understood, etc. We could imagine, in the change analysis, closing a ticket without making a change, because the change suggested is contrary to the way VIVO works.

Damaris: I agree that we should use GitHub, but we need to keep the wider community informed about what's going on.

Mike: I agree, when I am thinking about workflow, I am thinking about all the communication that needs to happen. To execute properly, communicaton must be part of the process. Also thinking about how this integrates with the application software, at which point there has to be a JIRA ticket because that is what we use for software changes. So Ontology tickets might kick off JIRA tickets. Those things become linked.

Javed: Can the changes be requested in GitHub and the change analysis in JIRA?

Mike: Currently all ontology tickets are duplicated in JIRA. Would like to expand beyond that, not as open as it should be.

Violeta: I think we should document everything in the GitHub wiki. Accessible by everyone with or without a GitHub account.

Marijane: I think the software documentation should stay at the Duraspace wiki, but the ontology documentation should mostly be organized at GitHub.

Violeta: I agree.

Mike: If we have two spaces for documentation, there will be two spaces.

Should we do anything between now and the next meeting?

Marijane: We could invite the ROBOT tool people to the next meeting. I need to follow up with them.

Javed: Note that the OWL file doesn't match the source code

Mike: I posted the OWL file so it will resolve. I believe from the repository.

Marijane: That is why there is still some mystery around the OWL file, it doesn't match what is in the repositories.

Violeta: can we extract the ontology from the repository?

Mike: So is there anything we can do specifically about the ontology change workflow between now and the next meeting? Diagrams? etc?

Damaris: One way to be able to talk about the work done here, using a relevant example, do we have a concrete change that we're working through? Pick one example?

Javed: I've identified a number of changes we could make.

Violeta: Can we make a Google folder for this group and start working there?

Mike: So Javed is going to start a Vue diagram, I will create a Google folder and add everyone to it.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally