Skip to content

20170814 Ontology Change Improvement Call

marijane white edited this page Aug 14, 2017 · 1 revision

Date: August 14, 2017

Attendees: Christian Hauschke, Tatiana Walther, Marijane White, DJ Lee, Brian Lowe, Mike Conlon, Muhammed Javed

Agenda:

  1. Recap of discussion at Birds of a Feather meeting at conference.
  2. Next steps for deliverable 2 – a wiki page?
  3. Update on action items (see below)
  4. Process for making changes – see Ontology Change Process (draft) https://goo.gl/3ZMSwC

Mike: Some of us were at the BoF meeting, some of us weren't, we don't have notes from that meeting so I thought we'd recap here so we'll have notes for the recap. Jim Hendler joined us, which was wonderful. Damaris from Duke was wondering why we'd make changes that don't have a functional impact on the application, an example of this is the situation where author lists can contain VCards, but VCards aren't authors. This is not ontologically correct, and it would be a big change, but it would not change the way VIVO runs. I thought Jim Hendler had an interesting perspective on that, which was (paraphrasing) basically, "don't try to control that", or, accept what you've done/multiple things that you may have done. I think in previous comments Marijane was trying to suggest something similar. This feels like a big idea, while Jim was talking I was both trying to understand what he was saying and the consequences of what he was saying. Afterwards, he almost apologized, he's not responsible for the project and he gets to come in and say provocative things. So we had a bunch of fishing around getting to that point and then he provided that input about that point. Those of you who were there, do you agree?

Javed: I think so. There was some discussion about the linkage to external ontologies. For example if we decide to remove BIBO ontology, Jim said don't do that, because we should have the links to make data shareable.

I think since we've decided we're not going to do large changes, we should start working on the small changes. The first thing we should have is a master ontology file that is used everywhere.

Mike: Absolutley. We need a master file before we can do anything. We've been working on a change process, but moving forward with that process requires a master. And creating a master is why I did my investigation of ROBOT and the various sources of ontological assertions. The idea of creating a master led me to ROBOT, because it can find the differences between things. Although, there is continuing conversation about what it considers a difference. So others should try ROBOT and see for themselves. I am confident that when ROBOT does an ontological diff, I think it's correct about say, which properties are in one vs. the other. I think it can discover the things we need to address. It reminded us of some things that we will need to address.

So, if we can take a look at the agenda, I would like to talk about just finishing some things and continuing moving forward. One of the things this TF was tasked with was coming up with a process to report ontological needs, I think we have that now, we've triaged our issues, and we just need to document it. I think we should do this at the OpenRIF wiki.

Marijane: That seems like a good idea.

Mike: I would also like to propose that we communicate between meetings via the openrif-dev listserv.

Marijane: I think that it is also a good idea.

Mike: So if you're not on the list, you should be. As work gets done, I will post updates

Christian: There should be a note in the VIVO wiki that everything happens at OpenRIF now.

Javed: We should also have a link at vivoweb.org, since people are most familiar with that.

Mike: I will review all the links and make sure people can find their way in.

Christian: There is one problem I have with this happening in another place, is that we have a search function that searches everything related to VIVO development, and if we move some things over to OpenRIF we will lose that.

Mike: there is some truth to that. I need to finish thinking about it. Having the ontological development done in GitHub in the open on the wiki with GitHub tools seems attractive to me, we just have to make sure there are references in the VIVO wiki.

Marijane: I'll point out that the eagle-i documentation is at Harvard's website, that said it was Shahim's intention to move it over to the OpenRIF website, which is managed out of the GitHub organization. I don't know, we have this weird ontology that's used by two projects.

Mike: The VIVO wiki will always have details.

Christian: Maybe I'm making it too complicated, maybe we can agree on the process of doing the work in GitHub and documenting the results at the VIVO wiki.

Mike: That seems workable to me.

OK, so we actually have completed a bunch of items, which I like to point out we have gotten some things done. We've created a group of committers and reviewers, the Readme explains how that gets done. Analysis has been completed. We've triaged our issues.

I did try tracking changes in WebProtege and it was pretty primitive. I think we should use Javed's ontology for tracking changes, and maybe he can present about it at the next meeting?

Javed: Ok.

Mike: You can send your slides out to openrif-dev in advance.

I did present the various analyses I did with ROBOT, but then there was some conversation about how it does things. I don't think it's doing anything wrong, we just need to decide what to do with it. I am really just using it as a diffing tool. It's a computer program, it's very accurate, for example, when labels are different it finds those differences. And in my investigation, I was comparing, what VIVO calls the filegraph, which is a set of assertions in various formats, there's a collection of files distributed with the VIVO software that get loaded and that constitute the ontological assertions that VIVO is going to use. That source is the most authoritative source we have, it's distributed with the software and loaded into instances.
Would be happy to go through with someone else, share screens and whatnot, to see what I did.

Javed: I would like to work with you.

Mike: OK.

Marijane: I would like to observe, I don't know that I can commit to work outside of that, I don't have a lot of bandwidth right now.

Mike: I really just want a second set of eyes. If anyone else wants to join in, we'll send out an invite on openrif-dev.

Javed: I also wrote some Java code to see if ROBOT is giving us the right things, I would like to compare mine vs ROBOT.

Mike: That's exactly what we should do. For anyone who wasn't on the last call, here's the slides with my analysis: https://goo.gl/Wztgon Slide three is the most interesting.

OK, good. Can we take a look at the change process? https://goo.gl/3ZMSwC

We describe how we're going to create issues. We understand that there needs to be analysis. And at the outset, the impact has to do with whether it's a change to existing software, existing data, both, or neither. And Javed points out there are some that are neither, that we can use as test cases. Community discussion needs to be fleshed out, issues need to be discussed by the community, I don't think that should be done passively it's not good enough to expect people to pay attention to GitHub, we need to let everyone know what it is that we're working on, and ask people to participate and specifically follow the work.

Javed: I can give you an example of what we do on LD4L. Whenever there is a need for an ontological change, we send an email on the google group, this is a topic, this is what we're planning to change or at least discuss, this is the time we will discuss this change, if you are interested, join this call, and then if a decision is made in that meeting, anybody can comment on it. So if anyone is interested in ontological change, they can attend this meeting, we send out an email about what happened in the meeting, and if people have concerns about a decision made in the meeting, to send feedback by some date.

Mike: So there's proposing some change to the ontology, there's a recommendation for the change, there's deciding whether to accept the recommendation. The call is which of these? At what point in the process is the community invited to join?

Javed: explains again

Mike: ok, to discuss the recommendation. Ontological work has been done, a recommendation has been made, and the community comes together to discuss that recommendation.

How would this group like to proceed with this document? Does it need more work? Put it on the wiki?

Javed: it seems detailed, I would put some bullet points at the beginning to summarize the process for people who want to quickly understand what it's about.

So in terms of this process, i think there are responsibilities here, so if someone is posting about a change on the list, they should have responsibility to create the issue at github, we should spell out who is responsible for what.

Mike: I think it was Anna who said we need to understand roles and responsibilities.

Marijane: sounds like a job for a RACI chart.

Mike: I'm familiar with these.

Christian: I think the process is pretty clear and well described, maybe we should add an example after we go through it the first time, so others have something to work from. Other than that I'm pretty satisfied with the process.

Mike: I think having a worked example is a great idea.

Well, that's pretty much what I wanted to discuss today. We have the follow-ups, and in particular the follow-up to go through the VIVO sources and check my work, I will be sure to schedule that as soon as possible, because as pointed out we need a master file to execute on any process.

Other comments, suggestions, concerns, questions?

Marijane: I would like to revisit Anna's email to openrif dev. I am not sure I understand where the differences she observed came from.

Mike: Perhaps if Anna would like to join us she can walk us through it. I suspect ROBOT detects any textual differences. I am not sure she was finding ontological differences vs textual differences. For example, when I compared filegraph to the source, it found 350 annotation differences, they were all improvements in language that were made after filegraph had been snapshotted from it. So people were improving comments on ontological entities and ROBOT found every single one of them. Only 15 of the thousand or so differences were classes.

Javed: If there's nothing else to discuss, I have one question. So we use these upstream ontologies, and vivo-core is linked to one or more of these. So you have, say, Position as a subclass of Relationship, but there are are other superclasses.

Mike: the BFO classes.

Javed: I don't know if anyone is using those superclasses. The inference engine asserts those superclasses, and the inference graph gets larger and larger, and it's hard to tell if someone is using them.

Mike: I am familiar with this problem. When we added BFO, we went from 3 inferred upper classes to 11. If you go to VIVO TPF, you get 4.5 million. If you ask how many are <> 2.8. The question is who is using all those upper level type assertions?

Javed: we don't know if anyone is using these upper level classes in their queries or wherever.

Mike: I agree. Here's a specific example. A VCard link, if you have a URL that appears in a VCard, it has seven type assertions.

Marijane: I remember the first time I downloaded RDF from VIVO, I was surprised to see all the inferences in the download.

Mike: the tricky part is that VIVO uses some of the inferences. And knowing which are used and which are not is difficult.

Javed: it makes removing triples very slow.

Mike: You're right. I think there is a specific problem with the VCard implementation. It is sort of messy and I don't know if we implemented it properly. They have a lot of types, none of which are hierarchically interesting. I don't need to know all the different things it says it is. May be a specific problem with the VCards. There is also a problem with the upper level ontologies, for example a date, a vivo:dateTimeValue is asserted to be a bunch of BFO types.

Javed: My proposal at Cornell was to just have one link and forget about the rest of the hierarchy.

Mike: VIVO did this automatically by taking things to its logical conclusion. We ran an inferencer and it took things to its logical conclusion.

Marijane: I should add that part of the reason I found the RDF strange was because before I joined the VIVO community, I had worked with a system that used dynamic inferencing, so you only got inferences if you asked for them. At the time I assumed that having all the inferences was just how it works in a static inferencing context, which I might be wrong about. It seems like it should be possible to separate the two, but that feels like it's getting more into what the software does with the ontology rather than the ontology itself.

Mike: I think that's right.

Well, if nobody has anything else, this is a good time to wrap up.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally