Skip to content

Commit

Permalink
Expand docs re: OWL templating (#286)
Browse files Browse the repository at this point in the history
  • Loading branch information
caufieldjh authored Dec 5, 2023
2 parents 918a057 + 3cd0765 commit 11c1cff
Show file tree
Hide file tree
Showing 3 changed files with 122 additions and 17 deletions.
96 changes: 93 additions & 3 deletions docs/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ Ex.:
ontogpt extract -t mendelian_disease.MendelianDisease -i marfan-wikipedia.txt
```

OntoGPT-specific extensions are specified as _annotations_.
OntoGPT-specific extensions are specified as *annotations*.

You can specify a set of annotators for a field using the `annotators` annotation.

Expand Down Expand Up @@ -353,7 +353,7 @@ Ex. the `gocam` schema has an attribute:
range: GeneMolecularActivityRelationship
```

The range `GeneMolecularActivityRelationship` has been specified _inline_, so it will nest.
The range `GeneMolecularActivityRelationship` has been specified *inline*, so it will nest.

The generated prompt is:

Expand All @@ -367,14 +367,104 @@ The output of this is then passed through further SPIRES iterations.

LLMs have context sizes limiting the combined length of their inputs and outputs. The `gpt-3.5-turbo` model, for example, has a 4,096 token limit (prompt + completion), while the `gpt-3.5-turbo-16k` model has a larger context of 16,384 tokens.

## Advanced functionality with linkml-owl

A LinkML schema used in OntoGPT may include annotations describing how each component relates to OWL syntax.

This level of detail may be necessary if your data model includes complex logic beyond simple hierarchical relationships.

For example, if you are extracting details of chemical reactions, it may be necessary to keep track of details like stoichiometry or charge. [See a relevant example here](https://linkml.io/linkml-owl/templates/).

Incorporating OWL annotations into the custom schema (remember to export using the `-O owl` option) also supports importing the results into an ontology editor like Protege, at which point it may be reasoned over.

The `recipe` template in OntoGPT incorporates several OWL annotations:

```yaml
classes:
Recipe:
tree_root: true
close_mappings:
- FOODON:00004081
attributes:
url:
identifier: true
range: uriorcurie
slot_uri: rdf:Resource
annotations:
prompt.skip: true
label:
description: the name of the recipe
slot_uri: rdfs:label
annotations:
owl: AnnotationProperty, AnnotationAssertion
description:
description: a brief textual description of the recipe
slot_uri: dcterms:description
annotations:
owl: AnnotationProperty, AnnotationAssertion
categories:
description: a semicolon separated list of the categories to which this recipe belongs
range: RecipeCategory
multivalued: true
slot_uri: dcterms:subject
annotations:
owl: AnnotationAssertion
ingredients:
description: a semicolon separated list of the ingredients plus quantities of the recipe
multivalued: true
range: Ingredient
slot_uri: FOODON:00002420
annotations:
owl: ObjectProperty, ObjectSomeValuesFrom
steps:
description: a semicolon separated list of the individual steps involved in this recipe
multivalued: true
range: Step
annotations:
owl: ObjectProperty, ObjectSomeValuesFrom
annotations:
owl: Class
owl.template: |
EquivalentClasses(
{{url}}
ObjectIntersectionOf(
recipe:Recipe
{% for step in steps %}
ObjectSomeValuesFrom(
recipe:steps
{{tr(step)}}
)
{% endfor %}
{% for ingredient in ingredients %}
ObjectSomeValuesFrom(
FOODON:00002420
{{tr(ingredient)}}
)
{% endfor %}
)
)
...
```

Several of the slots above, like `close_mappings` and `slot_uri`, aren't exclusive to OWL but define the parts of this data model in terms of existing vocabularies, so the schema and any extracted results will be more compatible with other models and methods. Here, `close_mappings` is used to show that the `Recipe` class is close but not necessarily identical to `FOODON:00004081`, or "food recipe".

The `owl` slot under `annotations` for these attributes defines one or more corresponding OWL axiom types. Because the `label` attribute may be `AnnotationProperty, AnnotationAssertion` in OWL ([see the OWL2 syntax on Annotations](https://www.w3.org/TR/owl2-syntax/#Annotations)) we know it can be applied as a property for something else, like an axiom or a specific entity with an IRI. This is how the `rdfs:label` annotation property usually works so this isn't surprising.

The `owl.template` slot defines template logic relating a Recipe to its component objects: steps and ingredients. It begins with an `EquivalentClasses` axiom to define the identifier of the recipe, which we assume to be a URL, as identical to the class expression in subsequent lines. Specifically, that `Recipe` must include both a series of steps (e.g., "fry", "chop", etc.) and a series of ingredients. The ingredients relate to the recipe through the property `FOODON:00002420`, or "has ingredient".

Note that everything in {curly brackets} is a template of some kind. The Jinja template system is used in the example in lines like `{% for step in steps %}` where a loop is used. Template slots like `{{url}}` may be accessed directly with their names. So what makes `{{tr(step)}}` different from `{{step}}`? The `tr()` function used here translates its input into an OWL entity so it may be used to generate valid OWL axioms.

See also: the [documentation page on OWL exports](owl_exports.md) and the [linkml-owl documentation](https://linkml.io/linkml-owl/).

## Install a custom schema

If you have installed OntoGPT directly from its GitHub repository, then you may install a custom schema like this:

1. Move the schema file to the `src/ontogpt/templates` directory.
2. Run `make` from the root of the repository to generate Pydantic versions of the schema.

If you have installed OntoGPT from `pip`, _or_ if you can't use the `make` command, the process is similar, though it will depend on where the package is installed.
If you have installed OntoGPT from `pip`, *or* if you can't use the `make` command, the process is similar, though it will depend on where the package is installed.

1. Use the LinkML `gen-pydantic` tool to generate Pydantic classes. If your schema is named `alfred.yaml`, then run the following:

Expand Down
33 changes: 22 additions & 11 deletions src/ontogpt/templates/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from datetime import datetime, date
from enum import Enum
from typing import List, Dict, Optional, Any, Union
from pydantic import BaseModel as BaseModel, Field
from pydantic import BaseModel as BaseModel, ConfigDict, Field
import sys
if sys.version_info >= (3, 8):
from typing import Literal
Expand All @@ -13,13 +13,13 @@
metamodel_version = "None"
version = "None"

class ConfiguredBaseModel(BaseModel,
validate_assignment = True,
validate_default = True,
extra = 'forbid',
arbitrary_types_allowed = True,
use_enum_values = True):
pass
class ConfiguredBaseModel(BaseModel):
model_config = ConfigDict(
validate_assignment=True,
validate_default=True,
extra='forbid',
arbitrary_types_allowed=True,
use_enum_values = True)


class NullDataOptions(str, Enum):
Expand All @@ -35,7 +35,7 @@ class NullDataOptions(str, Enum):

class Recipe(ConfiguredBaseModel):

url: Optional[str] = Field(None)
url: str = Field(...)
label: Optional[str] = Field(None, description="""the name of the recipe""")
description: Optional[str] = Field(None, description="""a brief textual description of the recipe""")
categories: Optional[List[str]] = Field(default_factory=list, description="""a semicolon separated list of the categories to which this recipe belongs""")
Expand Down Expand Up @@ -136,11 +136,21 @@ class Triple(CompoundExpression):


class TextWithTriples(ConfiguredBaseModel):

"""
A text containing one or more relations of the Triple type.
"""
publication: Optional[Publication] = Field(None)
triples: Optional[List[Triple]] = Field(default_factory=list)


class TextWithEntity(ConfiguredBaseModel):
"""
A text containing one or more instances of a single type of entity.
"""
publication: Optional[Publication] = Field(None)
entities: Optional[List[str]] = Field(default_factory=list)


class RelationshipType(NamedEntity):

id: str = Field(..., description="""A unique identifier for the named entity""")
Expand Down Expand Up @@ -181,7 +191,8 @@ class AnnotatorResult(ConfiguredBaseModel):
FoodItem.model_rebuild()
Triple.model_rebuild()
TextWithTriples.model_rebuild()
TextWithEntity.model_rebuild()
RelationshipType.model_rebuild()
Publication.model_rebuild()
AnnotatorResult.model_rebuild()

10 changes: 7 additions & 3 deletions src/ontogpt/templates/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ prefixes:
qudt: http://qudt.org/schema/qudt/
dbpediaont: http://dbpedia.org/ontology/

# This template incorportates syntax from
# linkml-owl to define OWL interpretations
# and enable advanced functionality.
# https://linkml.io/linkml-owl/templates/

default_prefix: recipe
default_range: string
Expand All @@ -29,7 +33,7 @@ classes:
Recipe:
tree_root: true
close_mappings:
- FOODON:00004081
- FOODON:00004081 # food recipe
attributes:
url:
identifier: true
Expand All @@ -49,7 +53,7 @@ classes:
owl: AnnotationProperty, AnnotationAssertion
categories:
description: a semicolon separated list of the categories to which this recipe belongs
range: RecipeCategory
range: RecipeCategory
multivalued: true
slot_uri: dcterms:subject
annotations:
Expand Down Expand Up @@ -168,7 +172,7 @@ classes:
state:
description: the state of the food item (e.g. chopped, diced)
annotations:
owl: DataProperty, DataHasValue
owl: DataProperty, DataHasValue

FoodType:
is_a: NamedEntity
Expand Down

0 comments on commit 11c1cff

Please sign in to comment.