Skip to content

20170313 Ontology Change Improvement Call

Javed edited this page Mar 14, 2017 · 4 revisions

Date: Monday, March 13, 2017

Attendees: Marijane White, Mike Conlon, Graham Triggs, Brian Lowe, Juliane Schneider, Violeta Ilik, Damaris Murry, Tenille Johnson, Dong Joon (DJ) Lee, Muhammad Javed

Mike: Last time we talked about what it might take to change the VIVO ontology, how to do it, etc. As a Task Force we want this to be a time-limited activity. What would we do to say we did something? Based on conversations, assembled the following goals:

Duraspace Wiki Notes: https://wiki.duraspace.org/display/VIVO/2017-03-13+Ontology+Improvement+Meeting+notes

Goals:

  • Create a domain definition for the VIVO ontology
  • Make an ontology change - that is expected to be no / low impact; e.g. spelling correction, additional class; doesn't involve a change in the VIVO software
  • Make an ontology change where there is a software change (preferably a small change)

Define what VIVO should talk about and what it should not talk about.

Violeta: Before we define a domain, shouldn't we do what eagle-i is doing?

Mike: That sounds like step two. First we define what the domain is, and then we implement it. We might come to one of several different conclusions. (create extraction, create module, create separate ontology.) And we've seen some examples that make us nervous, things that VIVO doesn't talk about. Right now there is confusion about what should happen. But we have to have a definition before we do any of these things.

Javed: Agree, domain definition needs to come first. Then we can see what can be extracted and what should not. And what local changes should be defined in the main ontology, and how to do that.

Mike: Is it even correct to say "VIVO ontology"?

Javed: But there is a vivo core

Mike: That's the ontology that's available to the software. It ends up there as a result of the installation process.

Javed: Yes, that file only contains stuff in vivo core. But there are other ontologies like foaf, etc.

Juliane: This is kind of confusing. Can we define terms, what the boundaries of all these things and how they fit together?

Mike: Some of us have done this work, and we should probably just present it. The ontology is a collection of ontologies. But your question is around "what is it?" and we can put that on the agenda for next time. We do need to have a shared understanding amongst this group about what all is in VIVO.

Juliane: OK, so I'm not just confused, there is a VIVO software and a VIVO ontology.

Mike: Yes.

Marijane: Plus there's the fact that there are two VIVO ontologies

Mike: That touches on the real core of the problem, let's do some table setting. In the VIVO software, we distribute a set of ontology files that contain bits and pieces of other ontologies, whatever they are called, there are bits and pieces of other ontologies. Sometimes we distribute the whole ontology, sometimes just fragments, and that's about representing scholarship in the VIVO software. It's about scholarly activities. In order to do this, we need to represent things like publications, concepts, contact information, and when the VIVO software devs looked at this, they decided to borrow representations from other ontologies. 25 different ontologies are referenced in the VIVO software. Again, may or may not be complete, some are partial. And there is a debate about those partial extractions. In the case of VCard, we distribute all of it, no extraction. So VIVO has a relationship to many other ontologies, some extracted and some are not. All are useful for representing scholarship as VIVO thinks about it, but we don't have a domain definition.

And then there's the special case, VIVO-ISF, which reconciles a bunch of things, makes sure they are consistent with each other, as many as 45 aligned/reconciled ontologies. All the VIVO concepts became part of VIVO-ISF, and the VIVO core was extracted out of that. In a way, the VIVO ontology lost its identity.

Violeta: That's a complaint we get, that the VIVO software doesn't have it's own identity. We have to think strategically, give VIVO a real home. Have a real presence, have users, like eagle-i has.

Mike: For those who are not familiar, eagle-i was developed at the same time as the VIVO ontology, same grant program RFA. Eagle-i is an ontologically driven application, and VIVO is too, quite different in how they relate to their ontologies, but both were reconciled into VIVO-ISF. Eagle-i has it's own identity, presence, purls, part of the OBO foundry, where VIVO no longer has an ontological identity, which creates problems.

At this point the VIVO ontology is really a small part of the VIVO-ISF. ISF has a lot more classes, VIVO core creates data instead, for example universities. Classes in VIVO include concepts like cell types, etc. In contrast, VIVO data has organizations, 65,000 GRID organizations. So there are some different design principles underlying the two ontologies.

Brian: Just want to comment on the differences in philosophy, and there are still unresolved issues in that vein, but hopefully as of the 2013 release of the ISF, there should be more similarity now. Sure there are lots of cells and instances of cells out there, but there's only one University of Florida. Can we create sets of instances to distribute. Don't think these two thigns are incompatible.

Mike: That is a good point. The domains of cellular biology and representation of scholarship may be the thing driving the difference. Which raises the point, why isn't there a microbiology domain, why are they part of the VIVO-ISF? And reiterate point about distributing instance data. For example VIVO distributes countries. One country class, but country instances are data. We could easily be doing that for other sets of data that are common in the VIVO domain. Actually did this in the VIVO project -- journals, dates, etc. So why wouldn't we just create data and distribute it with VIVO? It's something that VIVO aspires to, highlights distinction between model work and data work.

Back to the goals -- if we're interested in ontological improvement, that would mean we'd actually improve the ontology. To do that, we would need a method or process, which we don't really have at this point, some pieces that some people think might work, but really nothing since 2013. As we talked about last time, suggested that we would discover the method and process as we attempted to implement something, questions will come up and we'll have to resolve them in order to create ontological change. So the two goals, the first is a "small change", that have no impact on running VIVO systems. Things like correcting spelling errors, providing attribution for creation of ontological elements, additional documentation. These kind of changes would become part of the ontology and improve things like automatic display of the ontology, but they would not impact running VIVO systems, so need for change management in the VIVO community would be low. So one goal for this TF would be to execute on a few of these changes. Deal with standard changes issues and how the work gets done. Would also come to know, how did we decide a proposed improvement was going to be made and how would we know that it would not impact the software? By executing on a small change, the TF will figure out how to do it.

Javed: That sounds like a good idea.

Mike: I found an error in one of the ontologies we get from somewhere else. Collection of statements from the FAO, one of the URLs was not a URL. This raises a question, you claim to be using an ontology you got from someone else, but you fixed it, so are you really using it.

Javed: I can prepare a list of things I noticed in my work, and others can too, things that are not correct and should be changed, and then we can choose the lowest impact changes.

Brian: Like the idea of these practical focused tasks that we want to accomplish, but is that going far enough or in the right direction to start with. Because if we solve these issues, but without something else we might be back at the same spot, perhaps with different people, trying to figure out this again. Need the human infrastructure maintain this effort in the long term, can respond to suggestions, etc. Hopefully pass the knowledge on to others. So while I know we hate to talk about process, etc, is what we really need to address.

Mike: Those are precisely the kind of issues that will be surfaced as we figure out how to achieve goals 2 and 3. It's not going to be simple, we'll run into things that might make us decide that something larger needs to happen. It's not as simple as write down a process and execute it. Can only say that, in terms of people, VIVO is an open source project, which means people have to volunteer effort, which means we're always looking for people who can volunteer their time to improve the software or the ontology. Perhaps we could get a grant, or some underwritten work, but in general the open source project operates on volunteerism.

Brian: ok, that sounds good. as long as we don't end up checking off a small list of issues and calling it good.

Mike: it's like we have a house but we haven't decided what a house is, on the other end it kind of sounds like we're figuring out how to use a hammer.

Juliane: so we have these users of the software, who haven't upgraded, are we looking to solve those problems as we work through the smaller issues?

Mike: Yes, that's another thing. Yes, VIVO has a collection of sites that never made the transition to VIVO ontology 1.6 or VIVO software 1.6, which were co-numbered. There was a big jump and we left a bunch of sites behind. And we have not made ontology changes since, software continues to improve but the ontology does not. People didn't understand that the software and the ontology could have separate versions. Some sites that are still on 1.4, some producing VIVO data but not the current ontology.

Marijane: isn't that also true of some of the non-VIVO software that uses the VIVO software? Like Profiles.

Mike: and who knows about what Elsevier is using in Pure. But to Juliane's question, that's not actually in scope for the TF. What I would like to do is create a process for sites that are current, ontology 1.6 and software 1.6+, to make sure that group is not damaged by any further ontological change.

Tenille: sounds like the architecture of VIVO and eagle-i are similar. We also have sites on previous versions of the software and the ontology, but the releases of both are linked, so it would not be easy to grab a newer version of the software but keep the old ontology. Is that also true of VIVO?

Mike: yes, we distribute the ontology with the software. That hasn't stopped some sites from trying to stick old data into new software.

Marijane: it seems some users don't realize the ontology changed so much, from observing questions on vivo-tech.

Mike: Sites that might try to transition now would have a hard time, the last change was 3.5 years ago and the shared knowledge is fading. One site has gotten professional help to make that change. Would be a good exercise (not in scope for the TF) to identify active VIVO sites that are running pre-1.6 software. Not many -- some went out of business, some eventually transitioned. Understanding this would be useful.

So we want to make a change that has no impact to the community but that is useful for improving function or documentation. And then we do want to make a change that would have a community impact, to understand how to lower the impact on the community. Something small, but real, that would require a software change made in concert with the ontological change, and require changing the data of the sites using VIVO. Provide an update mechanism to change the existing representation to the new. Was quite common until 2013. Haven't done that recently, want to get back into it. Something small, something where there is not a lot of data. Deliver ontology change, software change, update scripts, but with as little impact as possible. So we can demonstrate that the change can be made and it won't break stuff. Trying to meet goal 3 with lowest possible burden on the community.

Javed: 1. ontology change, 2. software change, and 3. data change. we need to do all of these things.

Mike: That's right. If you change something for which there are instances, you have to change the instances. Will need communication process, timelines, etc. People need to know that things will change. Need to exercise our governance so that we know how it will work.

Tenille: communication needs to include VIVO-ISF/openrif community, not just the software community.

Mike: right, and we know there are other projects using VIVO ontology. How this happens is up for grabs because we've never really done it.

Tenille: and this could go different ways, changes might not impact eagle-i, change impacts but eagle-i is ok, or it doesn't work, and then eagle-i needs to figure out a workaround.

Mike: Damaris, any thoughts or questions?

Damaris: no, this all sounds good, haven't thought about it this deeply before.

Mike: DJ?

DJ: Yes, I like the plan.

Mike: so the plan is to meet again in two weeks. We have several things to look at. Javed is going to look at some minimal changes, I will also look at the issue trackers, to see if any are trivial. Will also look to see which VIVO sites have been left behind, which may help think through process. Figure out what led to sites not upgrading. And maybe we can in the meantime think more about the domain definition.

Marijane: Do we want to continue discussion on openrif-dev? https://groups.google.com/forum/#!forum/openrif-dev

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally