From 15b293cf90076f782242b33d5f1ce6cd94e96957 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Wed, 30 Oct 2024 14:35:14 -0700 Subject: [PATCH 01/11] wip adding tensorflow artifact types, remove torch.compile --- README.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ad10344..423139f 100644 --- a/README.md +++ b/README.md @@ -657,10 +657,12 @@ the users understand the source explicitly, although this is not strictly requir | Artifact Type | Description | |--------------------|--------------------------------------------------------------------------------------| -| `torch.save` | A model artifact obtained by [Serialized Pickle Object][pytorch-save] (i.e.: `.pt`). | -| `torch.jit.script` | A model artifact obtained by [`TorchScript`][pytorch-jit-script]. | -| `torch.export` | A model artifact obtained by [`torch.export`][pytorch-export] (i.e.: `.pt2`). | -| `torch.compile` | A model artifact obtained by [`torch.compile`][pytorch-compile]. | +| `torch.save` | A [serialized python pickle object][pytorch-save] (i.e.: `.pt`) which can represent a model or state_dict. | +| `torch.jit.save` | A [`TorchScript`][pytorch-jit-script] model artifact obtained with one or more of the graph export options Torchscript Tracing and Torchscript Scripting. | +| `torch.export.save` | A model artifact storing an ExportedProgram obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | +| `TFSavedModel` | A [SavedModel][tf-save] from Tensorflow or Keras. | +| `Keras_v3` | Keras v3 is the [recommended format][keras-recommended] by the Tensorflow team. See this example to [save and load models][keras-example] and the update to date docs [disambiguating different save methods][keras-methods] in TF and Keras.. | +| `h5` | [Keras and tf.keras model][h5] weights format, which uses HDF5. | [pytorch-compile]: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html [pytorch-export]: https://pytorch.org/docs/main/export.html @@ -668,6 +670,11 @@ the users understand the source explicitly, although this is not strictly requir [pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html [pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html [pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html +[tf-save]: https://www.tensorflow.org/guide/saved_model +[keras-example]: https://keras.io/guides/serialization_and_saving/ +[keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras +[h5]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/ +[keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/model_saving_and_loading/ ### Source Code Asset From b4885f6c07c4efd346b5502680865f24161973ec Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 15:58:36 -0800 Subject: [PATCH 02/11] update TF/Keras formats using the saving method as the artifact type --- README.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 423139f..0707203 100644 --- a/README.md +++ b/README.md @@ -655,26 +655,29 @@ Following are some proposed *Artifact Type* values for corresponding approaches, permitted as well. Note that the names are selected using the framework-specific definitions to help the users understand the source explicitly, although this is not strictly required either. -| Artifact Type | Description | -|--------------------|--------------------------------------------------------------------------------------| -| `torch.save` | A [serialized python pickle object][pytorch-save] (i.e.: `.pt`) which can represent a model or state_dict. | -| `torch.jit.save` | A [`TorchScript`][pytorch-jit-script] model artifact obtained with one or more of the graph export options Torchscript Tracing and Torchscript Scripting. | -| `torch.export.save` | A model artifact storing an ExportedProgram obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | -| `TFSavedModel` | A [SavedModel][tf-save] from Tensorflow or Keras. | -| `Keras_v3` | Keras v3 is the [recommended format][keras-recommended] by the Tensorflow team. See this example to [save and load models][keras-example] and the update to date docs [disambiguating different save methods][keras-methods] in TF and Keras.. | -| `h5` | [Keras and tf.keras model][h5] weights format, which uses HDF5. | +| Artifact Type | Description | +|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `torch.save` | A [serialized python pickle object][pytorch-save] (i.e.: `.pt`) which can represent a model or state_dict. | +| `torch.jit.save` | A [`TorchScript`][pytorch-jit-script] model artifact obtained with one or more of the graph export options Torchscript Tracing and Torchscript Scripting. | +| `torch.export.save` | A model artifact storing an [ExportedProgram][exported-program] obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | +| `tf.keras.Model.save` | Saves a [.keras model file][keras-model], a unified zip archive format containing the architecture, weights, optimizer, losses, and metrics. | +| `tf.keras.Model.save_weights` | A [.weights.h5][keras-save-weights] file containing only model weights for use by Tensorflow or Keras. | +| `tf.keras.Model.export(format='tf_saved_model')` | TF Saved Model is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See this example to [save and load models][keras-example] and the docs for [different save methods][keras-methods] in TF and Keras. Also available from `keras.Model.export(format='tf_saved_model')` | + + +[exported-program]: https://pytorch.org/docs/main/export.html#serialization [pytorch-compile]: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html [pytorch-export]: https://pytorch.org/docs/main/export.html [pytorch-frameworks]: https://pytorch.org/docs/main/export.html#existing-frameworks [pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html [pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html [pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html -[tf-save]: https://www.tensorflow.org/guide/saved_model +[keras-save-weights]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#save_weights-method [keras-example]: https://keras.io/guides/serialization_and_saving/ -[keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras -[h5]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/ -[keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/model_saving_and_loading/ +[tf-keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras +[keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/ +[keras-model]: https://keras.io/api/models/model_saving_apis/model_saving_and_loading/ ### Source Code Asset From 2c63d5e8ba31119cdae72066c9ba72dfd240aecd Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 16:36:17 -0800 Subject: [PATCH 03/11] move framework specific fields to best practices --- README.md | 43 ++++++------------------------------------- best-practices.md | 28 ++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index b1840a8..4b83870 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,7 @@ However, fields that relate to supervised ML are optional and users can use the See [Best Practices](./best-practices.md) for guidance on what other STAC extensions you should use in conjunction -with this extension. +with this extension as well as suggested values for specific ML framework. The Machine Learning Model Extension purposely omits and delegates some definitions to other STAC extensions to favor reusability and avoid metadata duplication whenever possible. A properly defined MLM STAC Item/Collection should almost @@ -668,6 +668,7 @@ In order to provide more context, the following roles are also recommended were | type | string | The media type of the artifact (see [Model Artifact Media-Type](#model-artifact-media-type). | | roles | \[string] | **REQUIRED** Specify `mlm:model`. Can include `["mlm:weights", "mlm:checkpoint"]` as applicable. | | mlm:artifact_type | [Artifact Type](#artifact-type) | Specifies the kind of model artifact. Typically related to a particular ML framework. | +| mlm:compile_method | string | Describes the method used to compile the ML model at either save time or runtime prior to inference. These options are mutually exclusive `["aot", "jit"]`. | Recommended Asset `roles` include `mlm:weights` or `mlm:checkpoint` for model weights that need to be loaded by a model definition and `mlm:compiled` for models that can be loaded directly without an intermediate model definition. @@ -703,42 +704,10 @@ official. In order to validate the specific framework and artifact type employed [iana-media-type]: https://www.iana.org/assignments/media-types/media-types.xhtml -#### Artifact Type - -This value can be used to provide additional details about the specific model artifact being described. -For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model definitions, -such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], -or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] (`.pt2`) approach. -Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) -can be insufficient in this case to detect which strategy should be used to employ the model definition. - -Following are some proposed *Artifact Type* values for corresponding approaches, but other names are -permitted as well. Note that the names are selected using the framework-specific definitions to help -the users understand the source explicitly, although this is not strictly required either. - -| Artifact Type | Description | -|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `torch.save` | A [serialized python pickle object][pytorch-save] (i.e.: `.pt`) which can represent a model or state_dict. | -| `torch.jit.save` | A [`TorchScript`][pytorch-jit-script] model artifact obtained with one or more of the graph export options Torchscript Tracing and Torchscript Scripting. | -| `torch.export.save` | A model artifact storing an [ExportedProgram][exported-program] obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | -| `tf.keras.Model.save` | Saves a [.keras model file][keras-model], a unified zip archive format containing the architecture, weights, optimizer, losses, and metrics. | -| `tf.keras.Model.save_weights` | A [.weights.h5][keras-save-weights] file containing only model weights for use by Tensorflow or Keras. | -| `tf.keras.Model.export(format='tf_saved_model')` | TF Saved Model is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See this example to [save and load models][keras-example] and the docs for [different save methods][keras-methods] in TF and Keras. Also available from `keras.Model.export(format='tf_saved_model')` | - - - -[exported-program]: https://pytorch.org/docs/main/export.html#serialization -[pytorch-compile]: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html -[pytorch-export]: https://pytorch.org/docs/main/export.html -[pytorch-frameworks]: https://pytorch.org/docs/main/export.html#existing-frameworks -[pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html -[pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html -[pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html -[keras-save-weights]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#save_weights-method -[keras-example]: https://keras.io/guides/serialization_and_saving/ -[tf-keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras -[keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/ -[keras-model]: https://keras.io/api/models/model_saving_apis/model_saving_and_loading/ +#### Framework Specific Artifact Types + +The `mlm:artifact_type` field can be used to clarify how the model was saved which can help users understand how to load it or in what runtime contexts it should be used. For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model definitions, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] (`.pt2`) approach. Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) can be insufficient in this case to detect which strategy should be used to employ the model definition. See the [the best practices document](./best-practices#framework-specific-artifact-types) on suggested fields for framework specific artifact types. + ### Source Code Asset diff --git a/best-practices.md b/best-practices.md index d7e1675..4236db9 100644 --- a/best-practices.md +++ b/best-practices.md @@ -282,3 +282,31 @@ training process to find the "best model". This field could also be used to indi educational purposes only. [stac-ext-version]: https://github.com/stac-extensions/version + +## Framework Specific Artifact Types + +The following are some proposed *Artifact Type* values for the Model Asset's [`mlm:artifact_type` field](./README.md#model-asset). Other names are +permitted, as these values are not validated by the schema. Note that the names are selected using the framework-specific definitions to help the users understand how the model artifact was created, although these exact names are not strictly required either. + +| Artifact Type | Description | +|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `torch.save` | A [serialized python pickle object][pytorch-save] (i.e.: `.pt`) which can represent a model or state_dict. | +| `torch.jit.save` | A [`TorchScript`][pytorch-jit-script] model artifact obtained with one or more of the graph export options Torchscript Tracing and Torchscript Scripting. | +| `torch.export.save` | A model artifact storing an [ExportedProgram][exported-program] obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | +| `tf.keras.Model.save` | Saves a [.keras model file][keras-model], a unified zip archive format containing the architecture, weights, optimizer, losses, and metrics. | +| `tf.keras.Model.save_weights` | A [.weights.h5][keras-save-weights] file containing only model weights for use by Tensorflow or Keras. | +| `tf.keras.Model.export(format='tf_saved_model')` | TF Saved Model is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See this example to [save and load models][keras-example] and the docs for [different save methods][keras-methods] in TF and Keras. Also available from `keras.Model.export(format='tf_saved_model')` | + + +[exported-program]: https://pytorch.org/docs/main/export.html#serialization +[pytorch-compile]: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html +[pytorch-export]: https://pytorch.org/docs/main/export.html +[pytorch-frameworks]: https://pytorch.org/docs/main/export.html#existing-frameworks +[pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html +[pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html +[pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html +[keras-save-weights]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#save_weights-method +[keras-example]: https://keras.io/guides/serialization_and_saving/ +[tf-keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras +[keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/ +[keras-model]: https://keras.io/api/models/model_saving_apis/model_saving_and_loading/ From 3cb4e0558d228140ca142715e5a323b63f9c431f Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 16:36:41 -0800 Subject: [PATCH 04/11] edit line length --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4b83870..c5e2ebb 100644 --- a/README.md +++ b/README.md @@ -706,7 +706,12 @@ official. In order to validate the specific framework and artifact type employed #### Framework Specific Artifact Types -The `mlm:artifact_type` field can be used to clarify how the model was saved which can help users understand how to load it or in what runtime contexts it should be used. For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model definitions, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] (`.pt2`) approach. Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) can be insufficient in this case to detect which strategy should be used to employ the model definition. See the [the best practices document](./best-practices#framework-specific-artifact-types) on suggested fields for framework specific artifact types. +The `mlm:artifact_type` field can be used to clarify how the model was saved which can help users understand how to load it or in +what runtime contexts it should be used. For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model +definitions, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] +(`.pt2`) approach. Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) can be +insufficient in this case to detect which strategy should be used to employ the model definition. See the [the best practices +document](./best-practices#framework-specific-artifact-types) on suggested fields for framework specific artifact types. ### Source Code Asset From 4c1c63a8e3e6e117664266c4a62abed4f20a6d0d Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 16:53:16 -0800 Subject: [PATCH 05/11] update json schema, note when the mlm:artifact type is required by mlm:model role --- README.md | 9 +++++---- json-schema/schema.json | 22 +++++++++++++++++++--- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c5e2ebb..2d16b3d 100644 --- a/README.md +++ b/README.md @@ -667,8 +667,8 @@ In order to provide more context, the following roles are also recommended were | href | string | URI to the model artifact. | | type | string | The media type of the artifact (see [Model Artifact Media-Type](#model-artifact-media-type). | | roles | \[string] | **REQUIRED** Specify `mlm:model`. Can include `["mlm:weights", "mlm:checkpoint"]` as applicable. | -| mlm:artifact_type | [Artifact Type](#artifact-type) | Specifies the kind of model artifact. Typically related to a particular ML framework. | -| mlm:compile_method | string | Describes the method used to compile the ML model at either save time or runtime prior to inference. These options are mutually exclusive `["aot", "jit"]`. | +| mlm:artifact_type | [Artifact Type](#artifact-type) | Specifies the kind of model artifact. Typically related to a particular ML framework. This is **REQUIRED** if the `mlm:model` role is specified. | +| mlm:compile_method | string | Describes the method used to compile the ML model at either save time or runtime prior to inference. These options are mutually exclusive `["aot", "jit", null]`. | Recommended Asset `roles` include `mlm:weights` or `mlm:checkpoint` for model weights that need to be loaded by a model definition and `mlm:compiled` for models that can be loaded directly without an intermediate model definition. @@ -710,8 +710,9 @@ The `mlm:artifact_type` field can be used to clarify how the model was saved whi what runtime contexts it should be used. For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model definitions, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] (`.pt2`) approach. Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) can be -insufficient in this case to detect which strategy should be used to employ the model definition. See the [the best practices -document](./best-practices#framework-specific-artifact-types) on suggested fields for framework specific artifact types. +insufficient in this case to detect which strategy should be used to employ the model definition. +See the [the best practices document](./best-practices#framework-specific-artifact-types) on suggested + fields for framework specific artifact types. ### Source Code Asset diff --git a/json-schema/schema.json b/json-schema/schema.json index b87ad45..0dbf2ad 100644 --- a/json-schema/schema.json +++ b/json-schema/schema.json @@ -302,6 +302,9 @@ }, "mlm:artifact_type": { "$ref": "#/$defs/mlm:artifact_type" + }, + "mlm:compile_method": { + "$ref": "#/$defs/mlm:compile_method" } }, "$comment": "Allow properties not defined by MLM prefix to work with other extensions and attributes, but disallow undefined MLM fields.", @@ -324,7 +327,8 @@ "required": [ "mlm:input", "mlm:output", - "mlm:artifact_type" + "mlm:artifact_type", + "mlm:compile_method" ] } }, @@ -354,7 +358,8 @@ "anyOf": [ { "required": [ - "mlm:artifact_type" + "mlm:artifact_type", + "mlm:compile_method" ] } ] @@ -460,7 +465,18 @@ "examples": [ "torch.save", "torch.jit.save", - "torch.export.save" + "torch.export.save", + "tf.keras.Model.save", + "tf.keras.Model.save_weights", + "tf.saved_model.export(format='tf_saved_model')" + ] + }, + "mlm:compile_method": { + "type": "string", + "minLength": 1, + "examples": [ + "aot", + "jit" ] }, "mlm:tasks": { From 357ad61f5b3e03f64276076c8dbd518e2041820a Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 17:14:00 -0800 Subject: [PATCH 06/11] lint fixes --- README.md | 16 ++++------------ best-practices.md | 19 ++++++++++++++----- 2 files changed, 18 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 2d16b3d..768a021 100644 --- a/README.md +++ b/README.md @@ -667,7 +667,7 @@ In order to provide more context, the following roles are also recommended were | href | string | URI to the model artifact. | | type | string | The media type of the artifact (see [Model Artifact Media-Type](#model-artifact-media-type). | | roles | \[string] | **REQUIRED** Specify `mlm:model`. Can include `["mlm:weights", "mlm:checkpoint"]` as applicable. | -| mlm:artifact_type | [Artifact Type](#artifact-type) | Specifies the kind of model artifact. Typically related to a particular ML framework. This is **REQUIRED** if the `mlm:model` role is specified. | +| mlm:artifact_type | [Artifact Type](./best-practices.md#framework-specific-artifact-types) | Specifies the kind of model artifact. Typically related to a particular ML framework. This is **REQUIRED** if the `mlm:model` role is specified. | | mlm:compile_method | string | Describes the method used to compile the ML model at either save time or runtime prior to inference. These options are mutually exclusive `["aot", "jit", null]`. | Recommended Asset `roles` include `mlm:weights` or `mlm:checkpoint` for model weights that need to be loaded by a @@ -701,19 +701,11 @@ is used for the artifact described by the media-type. However, users need to rem official. In order to validate the specific framework and artifact type employed by the model, the MLM properties `mlm:framework` (see [MLM Fields](#item-properties-and-collection-fields)) and `mlm:artifact_type` (see [Model Asset](#model-asset)) should be employed instead to perform this validation if needed. +See the [the best practices document](./best-practices.md#framework-specific-artifact-types) on suggested +fields for framework specific artifact types. [iana-media-type]: https://www.iana.org/assignments/media-types/media-types.xhtml - -#### Framework Specific Artifact Types - -The `mlm:artifact_type` field can be used to clarify how the model was saved which can help users understand how to load it or in -what runtime contexts it should be used. For example, PyTorch offers [various strategies][pytorch-frameworks] for providing model -definitions, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] -(`.pt2`) approach. Since they all refer to the same ML framework, the [Model Artifact Media-Type](#model-artifact-media-type) can be -insufficient in this case to detect which strategy should be used to employ the model definition. -See the [the best practices document](./best-practices#framework-specific-artifact-types) on suggested - fields for framework specific artifact types. - +[pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html ### Source Code Asset diff --git a/best-practices.md b/best-practices.md index 4236db9..d778909 100644 --- a/best-practices.md +++ b/best-practices.md @@ -285,8 +285,19 @@ educational purposes only. ## Framework Specific Artifact Types -The following are some proposed *Artifact Type* values for the Model Asset's [`mlm:artifact_type` field](./README.md#model-asset). Other names are -permitted, as these values are not validated by the schema. Note that the names are selected using the framework-specific definitions to help the users understand how the model artifact was created, although these exact names are not strictly required either. +The `mlm:artifact_type` field can be used to clarify how the model was saved which +can help users understand how to load it or in what runtime contexts it should be used. For example, PyTorch offers +[various strategies][pytorch-frameworks] for providing model definitions, such as Pickle (`.pt`), + [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] +(`.pt2`) approach. Since they all refer to the same ML framework, the +[Model Artifact Media-Type](./README.md#model-artifact-media-type) can be insufficient in this case to detect which +strategy should be used to employ the model definition. + +The following are some proposed *Artifact Type* values for the Model Asset's +[`mlm:artifact_type` field](./README.md#model-asset). Other names are +permitted, as these values are not validated by the schema. Note that the names are selected using the +framework-specific definitions to help the users understand how the model artifact was created, although these exact +names are not strictly required either. | Artifact Type | Description | |--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -297,12 +308,10 @@ permitted, as these values are not validated by the schema. Note that the names | `tf.keras.Model.save_weights` | A [.weights.h5][keras-save-weights] file containing only model weights for use by Tensorflow or Keras. | | `tf.keras.Model.export(format='tf_saved_model')` | TF Saved Model is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See this example to [save and load models][keras-example] and the docs for [different save methods][keras-methods] in TF and Keras. Also available from `keras.Model.export(format='tf_saved_model')` | - [exported-program]: https://pytorch.org/docs/main/export.html#serialization -[pytorch-compile]: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html +[pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html [pytorch-export]: https://pytorch.org/docs/main/export.html [pytorch-frameworks]: https://pytorch.org/docs/main/export.html#existing-frameworks -[pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html [pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html [pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html [keras-save-weights]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#save_weights-method From b54dd11694f39796024d7bc38dbfcef743fd9e9d Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Thu, 12 Dec 2024 17:48:15 -0800 Subject: [PATCH 07/11] CHANGELOG updates --- CHANGELOG.md | 41 ++++++++++------------------------------- 1 file changed, 10 insertions(+), 31 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 66b5335..fc558f9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,44 +5,23 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased](https://github.com/stac-extensions/mlm/tree/main) +## [v1.4.0](https://github.com/stac-extensions/mlm/tree/v1.4.0) ### Added -- Add better descriptions about required and recommended *MLM Asset Roles* and their implications - (fixes [#54](https://github.com/stac-extensions/mlm/issues/54)). -- Add explicit check of `value_scaling` sub-fields `minimum`, `maximum`, `mean`, `stddev`, etc. for - corresponding `type` values `min-max` and `z-score` that depend on it. -- Allow different `value_scaling` operations per band/channel/dimension as needed by the model. -- Allow a `processing:expression` for a band/channel/dimension-specific `value_scaling` operation, - granting more flexibility in the definition of input preparation in contrast to having it applied - for the entire input (but still possible). +- mlm:compile_method with options 'aot' for Ahead of Time Compilation, 'jit' for Just-In Time Compilation ### Changed -- Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level. - These fields describe the model as a whole and should therefore be defined in Item properties. -- Moved `norm_type` to `value_scaling` object to better reflect the expected operation, which could be another - operation than what is typically known as "normalization" or "standardization" techniques in machine learning. -- Moved `statistics` to `value_scaling` object to better reflect their mutual `type` and additional - properties dependencies. +- moved mlm:artifact_type field value descriptions that are framework specific to best-practices section. +- expanded suggested mlm:artifact_type values to include Tensorflow/Keras ### Deprecated - n/a ### Removed -- Removed `norm_type` enum values that were ambiguous regarding their expected result. - Instead, a `processing:expression` should be employed to explicitly define the calculation they represent. -- Removed `norm_clip` property. It is now represented under `value_scaling` objects with a - corresponding `type` definition. -- Removed `norm_by_channel` from `mlm:input` objects. If rescaling (previously normalization in the documentation) - is a single value, broadcasting to the relevant bands should be performed implicitly. - Otherwise, the amount of `value_scaling` objects should match the number of bands or channels involved in the input. +- n/a ### Fixed -- Fix missing `mlm:artifact_type` property check for a Model Asset definition - (fixes ). - The `mlm:artifact_type` is now mutually and exclusively required by the corresponding Asset with `mlm:model` role. -- Fix check of disallowed unknown/undefined `mlm:`-prefixed fields - (fixes [#41](https://github.com/stac-extensions/mlm/issues/41)). +- n/a ## [v1.3.0](https://github.com/stac-extensions/mlm/tree/v1.3.0) @@ -73,7 +52,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 when a `mlm:input` references names in `bands` are now properly validated. - Fix the examples using `raster:bands` incorrectly defined in STAC Item properties. The correct use is for them to be defined under the STAC Asset using the `mlm:model` role. -- Fix the [EuroSAT ResNet pydantic example](stac_model/examples.py) that incorrectly referenced some `bands` +- Fix the [EuroSAT ResNet pydantic example](./stac_model/examples.py) that incorrectly referenced some `bands` in its `mlm:input` definition without providing any definition of those bands. The `eo:bands` properties have been added to the corresponding `model` Asset using the [`pystac.extensions.eo`](https://github.com/stac-utils/pystac/blob/main/pystac/extensions/eo.py) utilities. @@ -134,7 +113,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - more [Task Enum](README.md#task-enum) tasks - [Model Output Object](README.md#model-output-object) - batch_size and hardware summary -- [`mlm:accelerator`, `mlm:accelerator_constrained`, `mlm:accelerator_summary`](README.md#accelerator-type-enum) +- [`mlm:accelerator`, `mlm:accelerator_constrained`, `mlm:accelerator_summary`](./README.md#accelerator-type-enum) to specify hardware requirements for the model - Use common metadata [Asset Object](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object) @@ -149,7 +128,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 STAC Item properties (top-level, not nested) to allow better search support by STAC API. - reorganized `dlm:architecture` nested fields to exist at the top level of properties as `mlm:name`, `mlm:summary` and so on to provide STAC API search capabilities. -- replaced `normalization:mean`, etc. with [statistics](README.md#bands-and-statistics) from STAC 1.1 common metadata +- replaced `normalization:mean`, etc. with [statistics](./README.md#bands-and-statistics) from STAC 1.1 common metadata - added `pydantic` models for internal schema objects in `stac_model` package and published to PYPI - specified [rel_type](README.md#relation-types) to be `derived_from` and specify how model item or collection json should be named @@ -165,7 +144,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - any `dlm`-prefixed field or property ### Removed -- Data Object, replaced with [Model Input Object](README.md#model-input-object) that uses the `name` field from +- Data Object, replaced with [Model Input Object](./README.md#model-input-object) that uses the `name` field from the [common metadata band object][stac-bands] which also records `data_type` and `nodata` type ### Fixed From 2c3c44ec5524dcfc803e559eb21ec45a304e2678 Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Fri, 13 Dec 2024 15:02:40 -0800 Subject: [PATCH 08/11] correct the deleted unreleased items in the CHANGELOG, woops --- CHANGELOG.md | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fc558f9..02e0ea2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,9 +8,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [v1.4.0](https://github.com/stac-extensions/mlm/tree/v1.4.0) ### Added +- Add better descriptions about required and recommended *MLM Asset Roles* and their implications + (fixes [#54](https://github.com/stac-extensions/mlm/issues/54)). +- Add explicit check of `value_scaling` sub-fields `minimum`, `maximum`, `mean`, `stddev`, etc. for + corresponding `type` values `min-max` and `z-score` that depend on it. +- Allow different `value_scaling` operations per band/channel/dimension as needed by the model. +- Allow a `processing:expression` for a band/channel/dimension-specific `value_scaling` operation, + granting more flexibility in the definition of input preparation in contrast to having it applied + for the entire input (but still possible). - mlm:compile_method with options 'aot' for Ahead of Time Compilation, 'jit' for Just-In Time Compilation ### Changed +- Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level. + These fields describe the model as a whole and should therefore be defined in Item properties. +- Moved `norm_type` to `value_scaling` object to better reflect the expected operation, which could be another + operation than what is typically known as "normalization" or "standardization" techniques in machine learning. +- Moved `statistics` to `value_scaling` object to better reflect their mutual `type` and additional + properties dependencies. - moved mlm:artifact_type field value descriptions that are framework specific to best-practices section. - expanded suggested mlm:artifact_type values to include Tensorflow/Keras @@ -18,10 +32,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - n/a ### Removed -- n/a +- Removed `norm_type` enum values that were ambiguous regarding their expected result. + Instead, a `processing:expression` should be employed to explicitly define the calculation they represent. +- Removed `norm_clip` property. It is now represented under `value_scaling` objects with a + corresponding `type` definition. +- Removed `norm_by_channel` from `mlm:input` objects. If rescaling (previously normalization in the documentation) + is a single value, broadcasting to the relevant bands should be performed implicitly. + Otherwise, the amount of `value_scaling` objects should match the number of bands or channels involved in the input. ### Fixed -- n/a +- Fix missing `mlm:artifact_type` property check for a Model Asset definition + (fixes ). + The `mlm:artifact_type` is now mutually and exclusively required by the corresponding Asset with `mlm:model` role. +- Fix check of disallowed unknown/undefined `mlm:`-prefixed fields + (fixes [#41](https://github.com/stac-extensions/mlm/issues/41)). ## [v1.3.0](https://github.com/stac-extensions/mlm/tree/v1.3.0) From 66ca180a212f894917dbac1e780d34edae6363dd Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Fri, 13 Dec 2024 15:45:01 -0800 Subject: [PATCH 09/11] correct best practices links, be more specific in table descriptions --- README.md | 27 +++++++++++++-------- best-practices.md | 54 +++++++++++++++++++++-------------------- json-schema/schema.json | 8 ++---- 3 files changed, 47 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 768a021..5bf4e99 100644 --- a/README.md +++ b/README.md @@ -101,7 +101,7 @@ connectors, please refer to the [STAC Model](./README_STAC_MODEL.md) document. - [Collection example](examples/collection.json): Shows the basic usage of the extension in a STAC Collection - [JSON Schema](https://stac-extensions.github.io/mlm/) - [Changelog](./CHANGELOG.md) -- [Open access paper](https://dl.acm.org/doi/10.1145/3681769.3698586) describing version 1.3.0 of the extension +- [Open access paper](https://dl.acm.org/doi/10.1145/3681769.3698586) describing version 1.3.0 of the extension - [SigSpatial 2024 GeoSearch Workshop presentation](/docs/static/sigspatial_2024_mlm.pdf) ## Item Properties and Collection Fields @@ -340,13 +340,13 @@ defined at the "Band Object" level, but at the [Model Input](#model-input-object This is because, in machine learning, it is common to need overall statistics for the dataset used to train the model to normalize all bands, rather than normalizing the values over a single product. Furthermore, statistics could be applied differently for distinct [Model Input](#model-input-object) definitions, in order to adjust for intrinsic -properties of the model. +properties of the model. Another distinction is that, depending on the model, statistics could apply to some inputs that have no reference to any `bands` definition. In such case, defining statistics under `bands` would not be possible, or would intrude ambiguous definitions. -Finally, contrary to the "`statistics`" property name employed by [Band Statistics][stac-1.1-stats], MLM employs the +Finally, contrary to the "`statistics`" property name employed by [Band Statistics][stac-1.1-stats], MLM employs the distinct name `value_scaling`, although similar `minimum`, `maximum`, etc. sub-fields are employed. This is done explicitly to disambiguate "informative" band statistics from "applied scaling operations" required by the model inputs. This highlights the fact that `value_scaling` are not *necessarily* equal @@ -449,7 +449,7 @@ Select one option from: | `scale` | `value` | $data / value$ | | `processing` | [Processing Expression](#processing-expression) | *according to `processing:expression`* | -When a scaling `type` approach is specified, it is expected that the parameters necessary +When a scaling `type` approach is specified, it is expected that the parameters necessary to perform their calculation are provided for the corresponding input dimension data. If none of the above values applies for a given dimension, `type: null` (literal `null`, not string) should be @@ -463,7 +463,7 @@ dimensions. In such case, implicit broadcasting of the unique [Value Scaling Obj performed for all applicable dimensions when running inference with the model. If a custom scaling operation, or a combination of more complex operations (with or without [Resize](#resize-enum)), -must be defined instead, a [Processing Expression](#processing-expression) reference can be specified in place of +must be defined instead, a [Processing Expression](#processing-expression) reference can be specified in place of the [Value Scaling Object](#value-scaling-object) of the respective input dimension, as shown below. ```json @@ -478,7 +478,7 @@ the [Value Scaling Object](#value-scaling-object) of the respective input dimens For operations such as L1 or L2 normalization, [Processing Expression](#processing-expression) should also be employed. This is because, depending on the [Model Input](#model-input-object) dimensions and reference data, there is an -ambiguity regarding "how" and "where" such normalization functions must be applied against the input data. +ambiguity regarding "how" and "where" such normalization functions must be applied against the input data. A custom mathematical expression should provide explicitly the data manipulation and normalization strategy. In situations of very complex `value_scaling` operations, which cannot be represented by any of the previous definition, @@ -667,8 +667,8 @@ In order to provide more context, the following roles are also recommended were | href | string | URI to the model artifact. | | type | string | The media type of the artifact (see [Model Artifact Media-Type](#model-artifact-media-type). | | roles | \[string] | **REQUIRED** Specify `mlm:model`. Can include `["mlm:weights", "mlm:checkpoint"]` as applicable. | -| mlm:artifact_type | [Artifact Type](./best-practices.md#framework-specific-artifact-types) | Specifies the kind of model artifact. Typically related to a particular ML framework. This is **REQUIRED** if the `mlm:model` role is specified. | -| mlm:compile_method | string | Describes the method used to compile the ML model at either save time or runtime prior to inference. These options are mutually exclusive `["aot", "jit", null]`. | +| mlm:artifact_type | [Artifact Type](./best-practices.md#framework-specific-artifact-types) | Specifies the kind of model artifact, any string is allowed. Typically related to a particular ML framework, see [Best Practices - Framework Specific Artifact Types](./best-practices.md#framework-specific-artifact-types) for **RECOMMENDED** values. This field is **REQUIRED** if the `mlm:model` role is specified. | +| mlm:compile_method | [Compile Method](#compile-method) | null | Describes the method used to compile the ML model either when the model is saved or at model runtime prior to inference. | Recommended Asset `roles` include `mlm:weights` or `mlm:checkpoint` for model weights that need to be loaded by a model definition and `mlm:compiled` for models that can be loaded directly without an intermediate model definition. @@ -701,12 +701,19 @@ is used for the artifact described by the media-type. However, users need to rem official. In order to validate the specific framework and artifact type employed by the model, the MLM properties `mlm:framework` (see [MLM Fields](#item-properties-and-collection-fields)) and `mlm:artifact_type` (see [Model Asset](#model-asset)) should be employed instead to perform this validation if needed. -See the [the best practices document](./best-practices.md#framework-specific-artifact-types) on suggested -fields for framework specific artifact types. +See the [Best Practices - Framework Specific Artifact Types](./best-practices.md#framework-specific-artifact-types) on + suggested fields for framework specific artifact types. [iana-media-type]: https://www.iana.org/assignments/media-types/media-types.xhtml [pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html +#### Compile Method + +| Compile Method | Description | +|-:-:------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| aot | [Ahead-of-Time Compilation](https://en.wikipedia.org/wiki/Ahead-of-time_compilation). Converts a higher level code description of a model and a model's learned weights to a lower level representation prior to executing the model. This compiled model may be more portable by having fewer runtime dependencies and optimized for specific hardware. | +| jit | [Just-in-Time Compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation). Converts a higher level code description of a model and a model's learned weights to a lower level representation while executing the model. JIT provides more flexibility in the optimization approaches that can be applied to a model compared to AOT, but sacrifices portability and performance. | + ### Source Code Asset | Field Name | Type | Description | diff --git a/best-practices.md b/best-practices.md index d778909..2036523 100644 --- a/best-practices.md +++ b/best-practices.md @@ -3,18 +3,20 @@ This document makes a number of recommendations for creating real world ML Model Extensions. None of them are required to meet the core specification, but following these practices will improve the documentation of your model and make life easier for client tooling and users. They come about from practical experience of -implementors and introduce a bit more 'constraint' for those who are creating STAC objects representing their +implementors and introduce a bit more 'constraint' for those who are creating STAC objects representing their models or creating tools to work with STAC. -- [Using STAC Common Metadata Fields for the ML Model Extension](#using-stac-common-metadata-fields-for-the-ml-model-extension) -- [Recommended Extensions to Compose with the ML Model Extension](#recommended-extensions-to-compose-with-the-ml-model-extension) - - [Processing Extension](#processing-extension) - - [ML-AOI and Label Extensions](#ml-aoi-and-label-extensions) - - [Classification Extension](#classification-extension) - - [Scientific Extension](#scientific-extension) - - [File Extension](#file-extension) - - [Example Extension](#example-extension) - - [Version Extension](#version-extension) +- [ML Model Extension Best Practices](#ml-model-extension-best-practices) + - [Using STAC Common Metadata Fields for the ML Model Extension](#using-stac-common-metadata-fields-for-the-ml-model-extension) + - [Recommended Extensions to Compose with the ML Model Extension](#recommended-extensions-to-compose-with-the-ml-model-extension) + - [Processing Extension](#processing-extension) + - [ML-AOI and Label Extensions](#ml-aoi-and-label-extensions) + - [Classification Extension](#classification-extension) + - [Scientific Extension](#scientific-extension) + - [File Extension](#file-extension) + - [Example Extension](#example-extension) + - [Version Extension](#version-extension) + - [Framework Specific Artifact Types](#framework-specific-artifact-types) ## Using STAC Common Metadata Fields for the ML Model Extension @@ -68,8 +70,8 @@ information regarding these references, see the [ML-AOI and Label Extensions](#m ### Processing Extension -It is recommended to use at least the `processing:lineage` and `processing:level` fields from -the [Processing Extension](https://github.com/stac-extensions/processing) to make it clear +It is recommended to use at least the `processing:lineage` and `processing:level` fields from +the [Processing Extension](https://github.com/stac-extensions/processing) to make it clear how [Model Input Objects](./README.md#model-input-object) are processed by the data provider prior to an inference preprocessing pipeline. This can help users locate the correct version of the dataset used during model inference or help them reproduce the data processing pipeline. @@ -99,7 +101,7 @@ Furthermore, the [`processing:expression`](https://github.com/stac-extensions/pr should be specified with a reference to the STAC Item employing the MLM extension to provide full context of the source of the derived product. -A potential representation of a STAC Asset could be as follows: +A potential representation of a STAC Asset could be as follows: ```json { "model-output": { @@ -186,7 +188,7 @@ leading to a new MLM STAC Item definition (see also [STAC Version Extension](#ve ### Classification Extension -Since it is expected that a model will provide some kind of classification values as output, the +Since it is expected that a model will provide some kind of classification values as output, the [Classification Extension](https://github.com/stac-extensions/classification) can be leveraged inside MLM definition to indicate which class values can be contained in the resulting output from the model prediction. @@ -201,7 +203,7 @@ For more details, see the [Model Output Object](README.md#model-output-object) d ### Scientific Extension -Provided that most models derive from previous scientific work, it is strongly recommended to employ the +Provided that most models derive from previous scientific work, it is strongly recommended to employ the [Scientific Extension][stac-ext-sci] to provide references corresponding to the original source of the model (`sci:doi`, `sci:citation`). This can help users find more information about the model, its underlying architecture, or ways to improve it by piecing together the related work (`sci:publications`) that @@ -285,17 +287,17 @@ educational purposes only. ## Framework Specific Artifact Types -The `mlm:artifact_type` field can be used to clarify how the model was saved which -can help users understand how to load it or in what runtime contexts it should be used. For example, PyTorch offers -[various strategies][pytorch-frameworks] for providing model definitions, such as Pickle (`.pt`), - [TorchScript][pytorch-jit-script], or [PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] -(`.pt2`) approach. Since they all refer to the same ML framework, the -[Model Artifact Media-Type](./README.md#model-artifact-media-type) can be insufficient in this case to detect which -strategy should be used to employ the model definition. +The `mlm:artifact_type` field can be used to clarify how the model was saved which can help users understand how to +load it or in which runtime contexts it should be used. Applying this artifact type definition should restrict +explicitly its use to a specific runtime. For example, PyTorch offers [various strategies][pytorch-frameworks] for +exporting models, such as Pickle (`.pt`), [TorchScript][pytorch-jit-script], and +[PyTorch Ahead-of-Time Compilation][pytorch-aot-inductor] (`.pt2`). Since each approach is associated with the same +ML framework, the [Model Artifact Media-Type](./README.md#model-artifact-media-type) can be insufficient in this case +to detect which strategy should be used to deploy the model artifact. -The following are some proposed *Artifact Type* values for the Model Asset's +The following are some proposed *Artifact Type* values for the Model Asset's [`mlm:artifact_type` field](./README.md#model-asset). Other names are -permitted, as these values are not validated by the schema. Note that the names are selected using the +permitted, as these values are not validated by the schema. Note that the names are selected using the framework-specific definitions to help the users understand how the model artifact was created, although these exact names are not strictly required either. @@ -306,7 +308,7 @@ names are not strictly required either. | `torch.export.save` | A model artifact storing an [ExportedProgram][exported-program] obtained by [`torch.export.export`][pytorch-export] (i.e.: `.pt2`). | | `tf.keras.Model.save` | Saves a [.keras model file][keras-model], a unified zip archive format containing the architecture, weights, optimizer, losses, and metrics. | | `tf.keras.Model.save_weights` | A [.weights.h5][keras-save-weights] file containing only model weights for use by Tensorflow or Keras. | -| `tf.keras.Model.export(format='tf_saved_model')` | TF Saved Model is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See this example to [save and load models][keras-example] and the docs for [different save methods][keras-methods] in TF and Keras. Also available from `keras.Model.export(format='tf_saved_model')` | +| `tf.keras.Model.export` | [TF Saved Model][tf-saved-model] is the [recommended format][tf-keras-recommended] by the Tensorflow team for whole model saving/loading for inference. See the docs for [different save methods][keras-methods] in TF and Keras. | [exported-program]: https://pytorch.org/docs/main/export.html#serialization [pytorch-aot-inductor]: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html @@ -315,7 +317,7 @@ names are not strictly required either. [pytorch-jit-script]: https://pytorch.org/docs/stable/jit.html [pytorch-save]: https://pytorch.org/tutorials/beginner/saving_loading_models.html [keras-save-weights]: https://keras.io/api/models/model_saving_apis/weights_saving_and_loading/#save_weights-method -[keras-example]: https://keras.io/guides/serialization_and_saving/ +[tf-saved-model]: https://keras.io/api/models/model_saving_apis/export/ [tf-keras-recommended]: https://www.tensorflow.org/guide/saved_model#creating_a_savedmodel_from_keras [keras-methods]: https://keras.io/2.16/api/models/model_saving_apis/ [keras-model]: https://keras.io/api/models/model_saving_apis/model_saving_and_loading/ diff --git a/json-schema/schema.json b/json-schema/schema.json index 0dbf2ad..813bb17 100644 --- a/json-schema/schema.json +++ b/json-schema/schema.json @@ -356,12 +356,8 @@ "$comment": "Particularity of the 'not/required' approach: they must be tested one by one. Otherwise, it validates that they are all (simultaneously) not present.", "not": { "anyOf": [ - { - "required": [ - "mlm:artifact_type", - "mlm:compile_method" - ] - } + {"required": ["mlm:artifact_type"]}, + {"required": ["mlm:compile_method"]} ] } }, From d770629cf2dbc2528cbaa5b6ec0129c9646c871f Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Mon, 16 Dec 2024 11:23:01 -0800 Subject: [PATCH 10/11] code ticks --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 02e0ea2..a9d542b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,7 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Allow a `processing:expression` for a band/channel/dimension-specific `value_scaling` operation, granting more flexibility in the definition of input preparation in contrast to having it applied for the entire input (but still possible). -- mlm:compile_method with options 'aot' for Ahead of Time Compilation, 'jit' for Just-In Time Compilation +- Add optional `mlm:compile_method` field at the Asset level with options `aot` for Ahead of Time Compilation, `jit` for Just-In Time Compilation. ### Changed - Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level. @@ -25,8 +25,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 operation than what is typically known as "normalization" or "standardization" techniques in machine learning. - Moved `statistics` to `value_scaling` object to better reflect their mutual `type` and additional properties dependencies. -- moved mlm:artifact_type field value descriptions that are framework specific to best-practices section. -- expanded suggested mlm:artifact_type values to include Tensorflow/Keras +- moved `mlm:artifact_type` field value descriptions that are framework specific to best-practices section. +- expanded suggested `mlm:artifact_type` values to include Tensorflow/Keras. ### Deprecated - n/a From 3a3d2620a5bec2363e8763e7220a1433987e259a Mon Sep 17 00:00:00 2001 From: Ryan Avery Date: Fri, 3 Jan 2025 12:52:15 -0800 Subject: [PATCH 11/11] fix line length --- CHANGELOG.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a9d542b..02567e1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,15 +8,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [v1.4.0](https://github.com/stac-extensions/mlm/tree/v1.4.0) ### Added -- Add better descriptions about required and recommended *MLM Asset Roles* and their implications - (fixes [#54](https://github.com/stac-extensions/mlm/issues/54)). +- Add better descriptions about required and recommended *MLM Asset Roles* and + their implications (fixes + [#54](https://github.com/stac-extensions/mlm/issues/54)). - Add explicit check of `value_scaling` sub-fields `minimum`, `maximum`, `mean`, `stddev`, etc. for corresponding `type` values `min-max` and `z-score` that depend on it. - Allow different `value_scaling` operations per band/channel/dimension as needed by the model. - Allow a `processing:expression` for a band/channel/dimension-specific `value_scaling` operation, granting more flexibility in the definition of input preparation in contrast to having it applied for the entire input (but still possible). -- Add optional `mlm:compile_method` field at the Asset level with options `aot` for Ahead of Time Compilation, `jit` for Just-In Time Compilation. +- Add optional `mlm:compile_method` field at the Asset level with options `aot` + for Ahead of Time Compilation, `jit` for Just-In Time Compilation. ### Changed - Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level. @@ -143,7 +145,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [Asset Object](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object) to refer to model asset and source code. - use `classification:classes` in Model Output -- add `scene-classification` to the Enum Tasks to allow disambiguation between pixel-wise and patch-based classification +- add `scene-classification` to the Enum Tasks to allow disambiguation between + pixel-wise and patch-based classification ### Changed - `disk_size` replaced by `file:size` (see [Best Practices - File Extension](best-practices.md#file-extension))