-
Notifications
You must be signed in to change notification settings - Fork 0
Model Registry
To unify model definitions, simplify providing information to the Napari plugin, and allow for community contribution we have a model registry repository that defines the models available within AI OnDemand (AIoD).
This section covers how to contribute to the registry, whether it's a completely new model, new model version, or a model for a completely new task (e.g. new organelle). Depending on what is contributed, additonal work may be needed (e.g. update/add a Python script to the Segment-Flow
pipeline). If so, this will be outlined with relevant links in the sections below.
To add a new model, a new manifest needs to be added to the manifest directory in the repo. Note that if this includes a new task (e.g. new thing to segment), you'll also need to add a new task.
The schema section below will provide an outline of the bare minimum needed, but you're encouraged to look at previously-created manifests, and ensure that they locally pass validation with the Pydantic model before opening a pull request. Either way, the validation will be done on a pull request, and will not be merged until it passes.
Note
Adding a new (base) model will also need an accompanying Python script and process in the Segment-Flow
pipeline. For further details, see the relevant page.
To add a new model version, simply add the version to the appropriate existing manifest, then make a pull request where the updated schema will be validated (though you can test eligibility locally to make sure).
See the schema section below for further guidelines on what is needed to define a model version.
To contribute a model or model version with a new task, the list of available tasks will need to be updated, as these are used to constrain model schemas and define what is available in the Napari plugin.
In the model schema, TASK_NAMES
is a dictionary defines that defines the short-hand name (key) and the display name (value) for a given task. Simple add the new key:pair value, and make a pull request to add this new task.
The Napari UI will automatically update to include this new task, and the model schema will be updated to include this new task as an option for any new models or model versions such that they can pass validation.
While schema are not always the most readable for humans, a few perspectives are given between:
- The Pydantic model used for parsing and validation can be found (here)
- The generated JSON schema from the Pydantic model (here)
- Existing schema, all in the manifests directory which should help clarify what is needed!
Overall, models are specified hierarchically, from a base model to a model version to a task-specific version. The following information is required for a schema:
- A model name (the
short_name
is used as the name for the Python script and conda environments in the Nextflow pipeline) - Model versions
- For each version, its name and each of the tasks that model is trained for (normally one), and the model location (either a filepath or a URL)
- Optionally, a path/URL to a config file can also be provided in case any additional parameters are needed that are not defined by users
- Relevant metadata
- While a DOI is not required, some basic information about the model is needed, and will be reviewed upon a PR. See the relevant contribution section.
Each model version represents a variant of a base model, where differences may be different input data (e.g. for a different task), different checkpoints/hyperparameters for the same model, or they could even be architectural differences (e.g. varying sizes). Ultimately, as long as the underlying Python script to run the model handles everything needed for that version (if anything extra is needed), then that's enough.
If the version is a big enough departure that a different environment is needed, it may be better to create a new base model, but this is up to the contributor and will be reviewed upon a PR.
Note
Any parameters given at the root input level apply for all model versions. However, a config_path
or list of params
can be given to specific model-task-versions if they differ from the root.