Skip to content

20171009 Ontology Change Improvement Call

annakasprzik edited this page Oct 10, 2017 · 5 revisions

Date:

Attendees: Javed, Anna, Mike, Graham, Damaris

Note: These notes were created by Mike after the fact. Not as good as Marijane's notes. Sorry.

Agenda:

  1. Construction of filegraph.owl
  2. Data and Schema. Which is which, where does each go?
  3. What do we do next?

Notes:

  1. The group discussed the concepts of "data" and "schema." Schema refers to class definitions, object properties, data properties, and annotation properties. Data refers to individuals defined by the schema. Ontologies may have both. The Bibo ontology used by VIVO defines document status as one of several individuals. The VIVO ontology defines DateTimeValuePrecision as one of four individuals. Ontologists may place these individuals in the schema files or as separate data files. After discussion the group decided to store individuals as data files in the "abox" areas of VIVO. Schema will be stored in tbox areas. Labels for individuals should be in data files with the individuals. Labels for schema entities should be in the files which define the entities.
  2. Javed reviewed work to create a filegraph.owl file containing schema assertions for VIVO. Files were found containing labels, individuals and schema, and other anomalies that will need to be reconciled to complete the work. Mike's file and Javed's file are now very similar. The work recommended below will result in a filegraph.owl file that can represent the schema used in VIVO. Separate files will contain schema for the two vitro ontologies. Not discussed Mike is suggesting that when an appropriate filegraph.owl file is constructed, that we name it vivo.owl and put it in tbox/filegraph, replacing the component files currently found there. [Anna: Not knowing much about the file structure, just a question -- will this still allow an individual ontology management as modular as we would like it to be? For example, in addition to our German VIVO ontology we are also developing an ontology based on recommendations of a central institution that can be used if you want to comply to their regulations for statistical output, and you should be able to switch it on or off -- are there such components in the VIVO ontology?]
  3. Entities in VIVO are often the subjects of a third type of assertion (beyond schema and data) -- application assertions indicating interface properties or other application-specific assertions which are neither data nor schema. These belong in areas of the VIVO filesystem distinct from data and schema. Use N3 file format for application assertions.
  4. The group discussed file types. Mike suggested schema be RDF/XML with a filetype of OWL, data be RDF/XML with a filetype of RDF. This makes the purpose of files immediately obvious, and reduces the need for conversion.
  5. The group discussed the need for firsttime, and filegraph in the abox and tbox directories. It was unclear what relevance this has for ontology and data files. Graham expressed concern that the manual interface might introduce changes that would not be preserved on restarts. More investigation is needed. The group will work with the file structure as is, although significant simplification could be considered -- abox, tbox and application, with or without substructure for firsttime, everytime, and filegraph.
  6. Suggested specific improvements were discussed. It is unclear how these might impact the application, nor how changes might be tested. The group decided to develop a list of specific improvements to the files that will be shared with the community for comment. This anticipates the ontology change process. The improvements suggested here are organizational -- all assertions previously made will continue to be made, nothing will be gained or lost. The result should be easier to explain and work with.
  7. Damaris asked about the relationship of this discussion to existing issues in GitHub. Most of today's discussion involved preparing VIVO and Vitro for ontology work -- getting the right material into the right files in the right places. Issues should be opened to do the work. We believe all the work described here is "low impact" -- all assertions will end up in places where they can be used by VIVO and the result will be invisible to end users. Data managers will see a significant improvement -- RDF files in abox directories will be data, as expected. Ontologists will see a significant improvement -- schema assertions will be in tbox directories as expected. The production of a vivo.owl file (see below) will allow the ontology improvement process to move forward. System admins will see a minor improvement -- some application metadata files will be moved to more natural locations. All viewers of the rdf file folders will see improved README.md files (see below).

Specific improvements to be recommended to the community for discussion and adoption are shown below. All changes will follow the ontology improvement process and include issues, documentation, and testing.

  1. Discover the purpose of validation.n3 -- it has a single triple with fixed URI pertaining to ORCiD validation. Can it be removed? If so, remove it, otherwise rename it to clarify its purpose, make it RDF/XML, and include in the README.
  2. Move vocabularySource.n3 from abox/filegraph to display/firsttime. The file contains assertions for the vocabulary services -- this is an application configuration file.
  3. Change documentStatus from owl to RDF.
  4. Change dateTimeValuePrecision.owl to dateTimeValuePrecision.rdf
  5. Provide the continents file as data (RDF file, labels). Fix the typo in the URI of the North America continent. Was northern_america. Should be North_America.
  6. Add lang="en" as needed in data files.
  7. Reorganize the assertions regarding the geopolitical entities. Schema should be in tbox/filegraph, data in abox/filegraph, and application assertions in display/firsttime
  8. Each folder in home/rdf needs a README.md describing the kinds of files in the folder, along with conventions for file types and names. There are 18 folders and 3 README.md files currently.
  9. Determine if the assertions in displayTbox are used. It appears they are not. If this can be confirmed, it should be removed. Otherwise the assertions can be moved to display/firsttime
  10. Data and application configuration assertions should be removed from tbox/filegraph, a directory for schema. The VIVO ontology assertions will be combined into a single file vivo.owl, to accompany vitroPublic.owl and vitro-0.7.owl which are schema files. dateTimeValuePrecision.owl will be retained as data in abox/filegraph. documentStatus will be retained in abox/filegraph. linkSuppression.n3 will be moved to applicationMetadata/firsttime 40 other files will be combined into vivo.owl.

The VIVO-ISF ontology is an information standard for representing scholarly work.

Additional Resources

Clone this wiki locally