20170828 Ontology Change Improvement Call

Date: 28 August 2017

Attendees: Anna Kazprzik, Christian Hauschke, Graham Triggs, Muhammad Javed, Marijane White, Mike Conlon, Violeta Ilik

Agenda:

Recap of Friday meeting to assess the state of source.owl
Muhammad Javed will discuss ontology change management

Mike: We met on Friday, Marijane, Javed, Violeta, and myself, and went over the use of ROBOT and files that we thought might be able to be assembled to create a VIVO ontology, because there are definitely gaps in the ontology at OpenRIF. Compared VIVO filegraph with OpenRIF, discussed assembly process and what we found, there are some notes about that at https://goo.gl/u5xwW5. In going through the file that was made (which is available in the task force Google drive) we took the 42 filegraph files distributed by VIVO, processed them to make sure they're valid, in RDF/XML format, assembled them into a single filegraph.owl file, then took a look at what was in that file. A minor number of things corrected along the way. Then there are notes about things that need to be examined. Reconstruction of the file needs to be redone because of prefix issues, some XML entities that ROBOT might not have been prepared for. A bit of tangle between VIVO and Vitro. The VIVO ontology probably shouldn't contain Vitro assertions, Vitro is for defining how an element is to be handled, but it's not part of the domain knowledge, so those should be separated. Taking a look to see if they are already separated, meaning they're also in the filegraph for Vitro, therefore they don't need to be in the VIVO filegraph, still need to confirm this. Then there was an ontology issue regarding Academic Degree, the line number in the notes is the line number in filegraph.owl. This class couldn't be loaded into Protege because of the way the restriction was defined, I reimplemented it so it's valid in Protege and in RDF/XML, but that needs to be checked to make sure we're making the cardinality restrictions that we intend. That is the summary of what we did and reviewed and observed regarding the files we ended up with.

Comments and questions about that? We have a full recording of the meeting if you'd like to view it.

brief recap for Violeta after she joined

We are making progress on making a complete OWL file with assertions about VIVO and we should be able to add them to the source.owl file at OpenRIF and get a working file. Still have some work to do, and procedures to add things back into source.owl.

Javed, Marijane, any comments about that?

Javed: I still need to look at that.

Violeta: I had to reinstall Protege, haven't looked at it yet.

Marijane: I don't have the bandwidth to look at things right now, no comments.

Mike: again, the file is in the Google drive for anyone who wants to look at it. With that, we can turn it over to Javed.

Javed: We are going to talk about Ontology Change Management. Say you have a person request a change, and then you have some people analyze that change, what is it about, is it valid, etc. The next step, I think, is applying the change and how we will log it, and where it will be recorded. My presentation today is about this.

Showing a diagram from Stojanovic (https://www.cs.ox.ac.uk/boris.motik/pubs/smms02userdriven.pdf). Starting on the left we have a request for a change. Next, what are the semantics of the change? Next, you see between the second and third box, "Required and derived changes", this is important. If we say we are going to change some part of the ontology, it may have some impact on other parts of the ontology. And then you Implement the change, and then propagate it to the software, etc.

The process looks like a cycle.

Right now my work is on the capturing and representing of the change, this was my PhD work. My Colleague is working on analyzing the semantic impact of the change.

Ontology change are operationalized
Ontology changes are logged (change representation)
Ontology changes are analyzed (transform change into a graph, log in an ontology change log graph)
Mining - finding composite changes, understand what has changed, from the change log.

Today we are talking about the second phase, change logging.

What is the purpose of this? Why do we want to do this? Keep a record of different versions? Make it possible to transform between versions? To analyze to understand how entities evolved over time? To be able to query and understand who changed what, when it was changed, what entities were changed, and what was the impact of the change? Not just analysis, but algorithms, so we understand the whole history of the change. These are all the different purposes we can think about, from general to very specific purposes.

For example, for the first purpose, we could use github. For the second, we could have transformation logs. For the third, we could have entity evolution logs. For example, keeping different versions of the Faculty Member entity. For the last, recording every single change.

Github records each version, we can tag versions. It records each change commit (not each ontology change, each commit can have multiple changes. Not necessarily one commit per class). Commits can be analyzed manually, not aware of a way to do this automatically.

Transformation logs can perhaps be used for this purpose, the list of commits can be used as a transformation log. This is good for operational purposes, moving from version to version.

Change logs record each atomic change (every change is separate). Recorded in a structured format, and record the metadata of the change.

In my work, I chose the last strategy because I wanted to analyze the changes.
I believe every change is eithe ran addition or deletion. Update/Edit is a composite change, as is renaming, and adding new classes may be composite if it has subclass relationships with a parent class.

There are different ways we can operationalize this. We can have atomic changes, like adding a class, adding a subclass, making classes disjoint, adding new instances, etc. Composite changes such as splitting classes, moving classes, merging classes, etc.

examples of composite changes in protege screenshots

domain specific example in VIVO

Ontology change logging: in my PhD I proposed a ontology change framework, focused on the atomic change log and operations. So how do we present an atomic change? The figure on the bottom of this slide is a representation of an atomic change. There is a session id, a timestamp, a change id, and then an operation (like add) the element (in this case an axiom, but can be all kinds of operations) and then you have parameters. So we have metadata and what was applied.

So I built a model for recording this change. In the middle we have a change element, and it's marked up with it's metadata. Again, atomic changes are add/delete only. Then we have the entities, and the parameters, and the element.

Based on this model, I built an ontology, called Change Metadata Model, built in OWL. The benefit of writing it in OWL is that I could record it in my triplestore, the change log in my triplestore, together with my ontology, each in different graphs, then you can use SPARQL to query any of it.

Can query for things like how the class evolved. What classes were added in the last year? Who added them? What properties were added in the last version? Which were deleted? We can query whatever we want. This is my core purpose -- if you want to have this information five years from now, we need to record it in a very structured format. If you just want to log versions, and not query to that depth, then perhaps this structure is not needed, but I wanted to go to this level and have this specialized changelog. When I joined Cornell a couple years ago, my first focus was to go on that path. I started to figure out how we could have this changelog in the VIVO application. I was thinking about why we can't use Vitro as a standalone ontology editor. I started by updating the pages for changing the ontology. This was presented a long time ago on a VIVO call as well. I hired a student here and we started updating the ontology change pages in Vitro. Here you see the page for updating a class. We wanted to keep it very close to the ontology, so we have the class hierarchy on the left, etc. Similarly for object properties. We have all this in our Github, but it hasn't been merged because, as Graham asked, it does not replicate all the Vitro functionality.

Violeta: This is just your instance?

Javed: No, it's not in our instance. When we moved to Maven I believe we merged it in, not sure of the words.

Violeta: Because I don't have this version

Mike: Noone has this version. We have a call in October with Cornell to discuss this.

Violeta: Because this is much better, more understandable, because of the hierarchy. It looks like Protege, it's perfect. This will make things much easier for people.

Javed: So my idea was to update pages for ontology editing, and then let's talk about someone using these pages to change the ontology, and let's log the changes in the triplestore.

So that's it. I asked Mike, what is our purpose in recording ontology changes?

Violeta: perfect, great. Question, you said the changes would be logged with all the others?

Javed: When someone changes something, it will go into the change graph.

Mike: I may be asking the same question, but just to be clear, using Vitro to edit the ontology this way, you're automatically generating these changelog triples.

Javed: Yes

Mike: And then this new Vitro mechanism provides a very compelling way to edit ontologies and record what changes were made, but that ontology still needs to end up in Github for distribution purposes?

Javed: Correct. I believe right now when you extracted the filegraphs from VIVO, that should be the core file?

Mike: Yes.

Javed: Once we know this extraction process is working as expected.

Mike: Yes, once we have the ontology files. That's something I didn't mention in the summary of Friday's call -- we created the file to make it easy to analyze, but we want to be able to put things into source.owl so we can extract the ontology, but we would also like namespace subsets.

Javed: We could have a separate functionality here, asking if you want to update the data, and then updates the data based on this ontology change. If we keep everything in one place that may help. Still not sure how that will happen, changing the ontology and propagating it, but I know people have done that work, propagating changes to the instance level.

Anna: If you change something fundamental and you don't change the data, don't you get problems?

Javed: Yes. The question is there some automated process for updating the data, or does the person making the change applying it manually? Do we create some sort of strategy that allows people to update in an automated fashion?

Mike: again, to clarify, the changelog should provide the information necessary to automate that change?

Javed: not all the time. Example, say you have a class like Faculty Position, and you have a number of instances of it. Then you decide to create two subclasses of it. Tenure track, and non tenure track, for example. How would you apply this to the data? Manual intervention may be required. A human is required to figure out which classes to apply the new subclasses to. Despite needing manual intervention, this will still really help the user. In the VIVO case, I believe most of the changes will be coming from the software when new versions are released. In which case we should provide a process to convert the data.

We will need to provide a utility.

Violeta: Some kind of migration plan.

Javed, can you post the link to the github for this?

Javed: Yes. I will ask Jim Blake where it resides right now, I will send the link.

Violeta: Is this also being used by LD4L?

Javed: No, they are a little bit behind us, they are still working on their first version of their ontology.

Violeta: And they're using Vitro?

Javed: Yes, but they're not using to build the ontology, they are using it to build the data. Catalogers use it to record the data. I believe in the next version they are talking about using the ontology change management work.

I wanted to complete this project, but due to resources, we had to push Scholars @ Cornell, but I think it would be really great if we looked at Vitro as a standalone ontology editor.

Violeta: This

Marijane: I agree this is really great

Javed: The question is, is Protege better for TBOX editing? The benefit for Vitro is that you have the data right there, so you can update your ontology AND your data. that is the benefit here. Everything is in one place.

Christian: I didn't see the ablilty to edit labels?

Javed: points out where to edit labels

Christian: Can you edit in multiple languages?

Javed: Yes, you should be able to add labels in any language.

Christian: Nice. Thank you, well done.

Javed: If there are no more questions I will stop here.

So our next step would be to look into that filegraph.owl file, and try to build one single vivo ontology file, and we'll discuss that at our next meeting.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

VIVO project task force wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20170828 Ontology Change Improvement Call

Clone this wiki locally