Skip to content

Commit

Permalink
remove alpha from pipeline parameters (#169)
Browse files Browse the repository at this point in the history
  • Loading branch information
sbaidachni authored Feb 1, 2020
1 parent 8ae6701 commit 312dfdf
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 26 deletions.
12 changes: 1 addition & 11 deletions .pipelines/diabetes_regression-ci-build-train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,30 +62,20 @@ stages:
echo "##vso[task.setvariable variable=AMLPIPELINEID;isOutput=true]$AMLPIPELINEID"
name: 'getpipelineid'
displayName: 'Get Pipeline ID'
- bash: |
# Generate a hyperparameter value as a random number between 0 and 1.
# A random value is used here to make the Azure ML dashboards "interesting" when testing
# the solution sample.
alpha=$(printf "0.%03d\n" $((($RANDOM*1000)/32767)))
echo "Alpha: $alpha"
echo "##vso[task.setvariable variable=ALPHA;isOutput=true]$alpha"
name: 'getalpha'
displayName: 'Generate random value for hyperparameter alpha'
- job: "Run_ML_Pipeline"
dependsOn: "Get_Pipeline_ID"
displayName: "Trigger ML Training Pipeline"
pool: server
variables:
AMLPIPELINE_ID: $[ dependencies.Get_Pipeline_ID.outputs['getpipelineid.AMLPIPELINEID'] ]
ALPHA: $[ dependencies.Get_Pipeline_ID.outputs['getalpha.ALPHA'] ]
steps:
- task: ms-air-aiagility.vss-services-azureml.azureml-restApi-task.MLPublishedPipelineRestAPITask@0
displayName: 'Invoke ML pipeline'
inputs:
azureSubscription: '$(WORKSPACE_SVC_CONNECTION)'
PipelineId: '$(AMLPIPELINE_ID)'
ExperimentName: '$(EXPERIMENT_NAME)'
PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)", "hyperparameter_alpha": "$(ALPHA)"}'
PipelineParameters: '"ParameterAssignments": {"model_name": "$(MODEL_NAME)"}'
- job: "Training_Run_Report"
dependsOn: "Run_ML_Pipeline"
condition: always()
Expand Down
14 changes: 14 additions & 0 deletions diabetes_regression/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"training":
{
"alpha": 0.4
},
"evaluation":
{

},
"scoring":
{

}
}
21 changes: 12 additions & 9 deletions diabetes_regression/training/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
import json


def train_model(run, data, alpha):
Expand Down Expand Up @@ -62,13 +63,6 @@ def main():
help="Name of the Model",
default="sklearn_regression_model.pkl",
)
parser.add_argument(
"--alpha",
type=float,
default=0.5,
help=("Ridge regression regularization strength hyperparameter; "
"must be a positive float.")
)

parser.add_argument(
"--dataset_name",
Expand All @@ -79,14 +73,23 @@ def main():

print("Argument [build_id]: %s" % args.build_id)
print("Argument [model_name]: %s" % args.model_name)
print("Argument [alpha]: %s" % args.alpha)
print("Argument [dataset_name]: %s" % args.dataset_name)

model_name = args.model_name
build_id = args.build_id
alpha = args.alpha
dataset_name = args.dataset_name

print("Getting training parameters")

with open("config.json") as f:
pars = json.load(f)
try:
alpha = pars["training"]["alpha"]
except KeyError:
alpha = 0.5

print("Parameter alpha: %s" % alpha)

run = Run.get_context()
ws = run.experiment.workspace

Expand Down
7 changes: 4 additions & 3 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ For instructions on how to set up a local development environment, refer to the

For using Azure DevOps Pipelines all other variables are stored in the file `.pipelines/diabetes_regression-variables.yml`. Using the default values as a starting point, adjust the variables to suit your requirements.

**Note:** In `diabetes_regression` folder you can find `config.json` file that we would recommend to use in order to provide parameters for training, evaluation and scoring scripts. An example of a such parameter is a hyperparameter of a training algorithm: in our case it's the ridge regression [*alpha* hyperparameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We don't provide any special serializers for this config file. So, it's up to you which template to support there.

Up until now you should have:

* Forked (or cloned) the repo
Expand Down Expand Up @@ -120,7 +122,7 @@ Check out the newly created resources in the [Azure Portal](portal.azure.com):
(Optional) To remove the resources created for this project you can use the [/environment_setup/iac-remove-environment.yml](../environment_setup/iac-remove-environment.yml) definition or you can just delete the resource group in the [Azure Portal](portal.azure.com).

**Note:** The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. If you want to use your own dataset, you need to [create and register a datastore](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#azure-machine-learning-studio) in your ML workspace and upload the datafile (e.g. [diabetes.csv](./data/diabetes.csv)) to the corresponding blob container. You can also define a datastore in the ML Workspace with [az cli](https://docs.microsoft.com/en-us/cli/azure/ext/azure-cli-ml/ml/datastore?view=azure-cli-latest#ext-azure-cli-ml-az-ml-datastore-attach-blob).
You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group.
You'll also need to configure DATASTORE_NAME and DATAFILE_NAME variables in ***devopsforai-aml-vg*** variable group.


## Create an Azure DevOps Azure ML Workspace Service Connection
Expand Down Expand Up @@ -187,7 +189,7 @@ specified).
**Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled.

* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format.

Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**:

![trained model](./images/trained-model.png)
Expand Down Expand Up @@ -247,7 +249,6 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta

* The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment.
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
* The sample pipeline generates a random value for a model hyperparameter (ridge regression [*alpha*](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)) to generate 'interesting' charts when testing the sample. In a real application you should use fixed hyperparameter values. You can [tune hyperparameter values using Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), and manage their values in Azure DevOps Variable Groups.
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
* You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments.
* You can explore aspects of model observability in the solution, such as:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,6 @@ def main():
name="model_name", default_value=e.model_name)
build_id_param = PipelineParameter(
name="build_id", default_value=e.build_id)
hyperparameter_alpha_param = PipelineParameter(
name="hyperparameter_alpha", default_value=0.5)

dataset_name = ""
if (e.datastore_name is not None and e.datafile_name is not None):
Expand All @@ -66,7 +64,6 @@ def main():
arguments=[
"--build_id", build_id_param,
"--model_name", model_name_param,
"--alpha", hyperparameter_alpha_param,
"--dataset_name", dataset_name,
],
runconfig=run_config,
Expand Down

0 comments on commit 312dfdf

Please sign in to comment.