Skip to content

Commit

Permalink
feature: Add SageMaker Experiment (aws#3536)
Browse files Browse the repository at this point in the history
* feature: Add experiment plus Run class (aws#691)

* feature: Add Experiment helper classes (aws#646)

* feature: Add Experiment helper classes

feature: Add helper class _RunEnvironment

* change: Change sleep retry to backoff retry for get TC

* minor fixes in backoff retry

Co-authored-by: Dewen Qi <[email protected]>

* feature: Add helper classes and methods for Run class (aws#660)

* feature: Add helper classes and methods for Run class

* Add Parent class to address comment

* fix docstyle check

* Add arg docstrings in _helper

Co-authored-by: Dewen Qi <[email protected]>

* feature: Add Experiment Run class (aws#651)

Co-authored-by: Dewen Qi <[email protected]>

* change: Add integ tests for Run (aws#673)

Co-authored-by: Dewen Qi <[email protected]>

* Update run log metric to use MetricsManager (aws#678)

* Update run.log_metric to use _MetricsManager

* fix several metrics issues

* Add doc strings to metrics.py

Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dana Benson <[email protected]>

* change: Simplify exp plus integ test configuration (aws#694)

Co-authored-by: Dewen Qi <[email protected]>

* feature: add RunName to expeirment_config (aws#696)

* change: Update Run init and add Run load and _RunContext (aws#707)

* change: Update Run init and add Run load

Add exp name and run group name to load and address comments

* Address nit comments

Co-authored-by: Dewen Qi <[email protected]>

* fix: Fix run name uniqueness issue (aws#730)

Co-authored-by: Dewen Qi <[email protected]>

* change: Update integ tests for Exp Plus M1 changes (aws#741)

Co-authored-by: Dewen Qi <[email protected]>

* add metrics client to session object (aws#745)

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>

* change: Add integ test for using Run in Transform Job (aws#749)

Co-authored-by: Dewen Qi <[email protected]>

* Add async metrics sink (aws#739)

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>

* use metrics client provided by session (aws#754)

* fix flaky metrics test (aws#753)

* change: Change Run.init and Run.load to constructor and module method respectively (aws#752)

Co-authored-by: Dewen Qi <[email protected]>

* feature: Add latest metric service model (aws#757)

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>

* fix: lowercase run name (aws#767)

* Change: Minimize use of lower case tc name (aws#769)

* change: Clean up test resources to remove model files (aws#756)

* change: Clean up test resources to remove model files

* fix: Change experiment enums to upper case

* change: Upgrade boto3 and update test to validate mixed case name

* fix: Update as per latest botocore release and backend change

Co-authored-by: Dewen Qi <[email protected]>

* lowercase trial component name (aws#776)

* change: Expose sagemaker experiment doc strings

* fix: Fix exp name mixed case in issue

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
  • Loading branch information
5 people authored and mufiAmazon committed Dec 19, 2022
1 parent 975c32d commit 46fcc16
Show file tree
Hide file tree
Showing 82 changed files with 7,894 additions and 263 deletions.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,6 @@ env/
.vscode/
**/tmp
.python-version
**/_repack_model.py
**/_repack_script_launcher.sh
**/_repack_script_launcher.sh
tests/data/**/_repack_model.py
tests/data/experiment/sagemaker-dev-1.0.tar.gz
10 changes: 10 additions & 0 deletions doc/experiments/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
############################
Amazon SageMaker Experiments
############################

The SageMaker Python SDK supports to track and organize your machine learning workflow across SageMaker with jobs, such as Processing, Training and Transform, or locally.

.. toctree::
:maxdepth: 2

sagemaker.experiments
20 changes: 20 additions & 0 deletions doc/experiments/sagemaker.experiments.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Experiments
============

Run
-------------

.. autoclass:: sagemaker.experiments.Run
:members:

.. automethod:: sagemaker.experiments.load_run

.. automethod:: sagemaker.experiments.list_runs

.. autoclass:: sagemaker.experiments.SortByType
:members:
:undoc-members:

.. autoclass:: sagemaker.experiments.SortOrderType
:members:
:undoc-members:
10 changes: 10 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ Orchestrate your SageMaker training and inference workflows with Airflow and Kub
workflows/index


****************************
Amazon SageMaker Experiments
****************************
You can use Amazon SageMaker Experiments to track machine learning experiments.

.. toctree::
:maxdepth: 2

experiments/index

*************************
Amazon SageMaker Debugger
*************************
Expand Down
1 change: 1 addition & 0 deletions requirements/extras/test_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ requests==2.27.1
sagemaker-experiments==0.1.35
Jinja2==3.0.3
pandas>=1.3.5,<1.5
scikit-learn==1.0.2
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def read_requirements(filename):
# Declare minimal set for installation
required_packages = [
"attrs>=20.3.0,<23",
"boto3>=1.26.20,<2.0",
"boto3>=1.26.28,<2.0",
"google-pasta",
"numpy>=1.9.0,<2.0",
"protobuf>=3.1,<4.0",
Expand Down
7 changes: 4 additions & 3 deletions src/sagemaker/amazon/amazon_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
from sagemaker.deprecations import renamed_warning
from sagemaker.estimator import EstimatorBase, _TrainingJob
from sagemaker.inputs import FileSystemInput, TrainingInput
from sagemaker.utils import sagemaker_timestamp
from sagemaker.utils import sagemaker_timestamp, check_and_get_run_experiment_config
from sagemaker.workflow.entities import PipelineVariable
from sagemaker.workflow.pipeline_context import runnable_by_pipeline
from sagemaker.workflow import is_pipeline_variable
Expand Down Expand Up @@ -242,8 +242,8 @@ def fit(
generates a default job name, based on the training image name
and current timestamp.
experiment_config (dict[str, str]): Experiment management configuration.
Optionally, the dict can contain three keys:
'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
Optionally, the dict can contain four keys:
'ExperimentName', 'TrialName', 'TrialComponentDisplayName' and 'RunName'.
The behavior of setting these keys is as follows:
* If `ExperimentName` is supplied but `TrialName` is not a Trial will be
automatically created and the job's Trial Component associated with the Trial.
Expand All @@ -255,6 +255,7 @@ def fit(
"""
self._prepare_for_training(records, job_name=job_name, mini_batch_size=mini_batch_size)

experiment_config = check_and_get_run_experiment_config(experiment_config)
self.latest_training_job = _TrainingJob.start_new(
self, records, experiment_config=experiment_config
)
Expand Down
6 changes: 4 additions & 2 deletions src/sagemaker/apiutils/_base_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,10 @@ def _search(
search_items = search_method_response.get("Results", [])
next_token = search_method_response.get(boto_next_token_name)
for item in search_items:
if cls.__name__ in item:
yield search_item_factory(item[cls.__name__])
# _TrialComponent class in experiments module is not public currently
class_name = cls.__name__.lstrip("_")
if class_name in item:
yield search_item_factory(item[class_name])
if not next_token:
break
except StopIteration:
Expand Down
4 changes: 3 additions & 1 deletion src/sagemaker/apiutils/_boto_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ def from_boto(boto_dict, boto_name_to_member_name, member_name_to_type):
api_type, is_collection = member_name_to_type[member_name]
if is_collection:
if isinstance(boto_value, dict):
member_value = api_type.from_boto(boto_value)
member_value = {
key: api_type.from_boto(value) for key, value in boto_value.items()
}
else:
member_value = [api_type.from_boto(item) for item in boto_value]
else:
Expand Down
6 changes: 4 additions & 2 deletions src/sagemaker/dataset_definition/inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,10 @@ class DatasetDefinition(ApiObject):
"""DatasetDefinition input."""

_custom_boto_types = {
"redshift_dataset_definition": (RedshiftDatasetDefinition, True),
"athena_dataset_definition": (AthenaDatasetDefinition, True),
# RedshiftDatasetDefinition and AthenaDatasetDefinition are not collection
# Instead they are singleton objects. Thus, set the is_collection flag to False.
"redshift_dataset_definition": (RedshiftDatasetDefinition, False),
"athena_dataset_definition": (AthenaDatasetDefinition, False),
}

def __init__(
Expand Down
16 changes: 10 additions & 6 deletions src/sagemaker/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@
get_config_value,
name_from_base,
to_string,
check_and_get_run_experiment_config,
)
from sagemaker.workflow import is_pipeline_variable
from sagemaker.workflow.entities import PipelineVariable
Expand Down Expand Up @@ -1103,8 +1104,8 @@ def fit(
job_name (str): Training job name. If not specified, the estimator generates
a default job name based on the training image name and current timestamp.
experiment_config (dict[str, str]): Experiment management configuration.
Optionally, the dict can contain three keys:
'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
Optionally, the dict can contain four keys:
'ExperimentName', 'TrialName', 'TrialComponentDisplayName' and 'RunName'..
The behavior of setting these keys is as follows:
* If `ExperimentName` is supplied but `TrialName` is not a Trial will be
automatically created and the job's Trial Component associated with the Trial.
Expand All @@ -1122,6 +1123,7 @@ def fit(
"""
self._prepare_for_training(job_name=job_name)

experiment_config = check_and_get_run_experiment_config(experiment_config)
self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
self.jobs.append(self.latest_training_job)
if wait:
Expand Down Expand Up @@ -2023,8 +2025,8 @@ def start_new(cls, estimator, inputs, experiment_config):
inputs (str): Parameters used when called
:meth:`~sagemaker.estimator.EstimatorBase.fit`.
experiment_config (dict[str, str]): Experiment management configuration.
Optionally, the dict can contain three keys:
'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
Optionally, the dict can contain four keys:
'ExperimentName', 'TrialName', 'TrialComponentDisplayName' and 'RunName'.
The behavior of setting these keys is as follows:
* If `ExperimentName` is supplied but `TrialName` is not a Trial will be
automatically created and the job's Trial Component associated with the Trial.
Expand All @@ -2033,6 +2035,7 @@ def start_new(cls, estimator, inputs, experiment_config):
* If both `ExperimentName` and `TrialName` are not supplied the trial component
will be unassociated.
* `TrialComponentDisplayName` is used for display in Studio.
* `RunName` is used to record an experiment run.
Returns:
sagemaker.estimator._TrainingJob: Constructed object that captures
all information about the started training job.
Expand All @@ -2053,8 +2056,8 @@ def _get_train_args(cls, estimator, inputs, experiment_config):
inputs (str): Parameters used when called
:meth:`~sagemaker.estimator.EstimatorBase.fit`.
experiment_config (dict[str, str]): Experiment management configuration.
Optionally, the dict can contain three keys:
'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
Optionally, the dict can contain four keys:
'ExperimentName', 'TrialName', 'TrialComponentDisplayName' and 'RunName'.
The behavior of setting these keys is as follows:
* If `ExperimentName` is supplied but `TrialName` is not a Trial will be
automatically created and the job's Trial Component associated with the Trial.
Expand All @@ -2063,6 +2066,7 @@ def _get_train_args(cls, estimator, inputs, experiment_config):
* If both `ExperimentName` and `TrialName` are not supplied the trial component
will be unassociated.
* `TrialComponentDisplayName` is used for display in Studio.
* `RunName` is used to record an experiment run.
Returns:
Dict: dict for `sagemaker.session.Session.train` method
Expand Down
20 changes: 20 additions & 0 deletions src/sagemaker/experiments/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
"""Sagemaker Experiment Module"""
from __future__ import absolute_import

from sagemaker.experiments.run import Run # noqa: F401
from sagemaker.experiments.run import load_run # noqa: F401
from sagemaker.experiments.run import list_runs # noqa: F401
from sagemaker.experiments.run import SortOrderType # noqa: F401
from sagemaker.experiments.run import SortByType # noqa: F401
Loading

0 comments on commit 46fcc16

Please sign in to comment.