Skip to content

Commit de3e37a

Browse files
iamrk04Rahul Kumar
and
Rahul Kumar
authored
Revert "Revert Update v1 Many Models and HTS Notebook" (Azure#1823)
* Revert "Revert "Update v1 Many Models and HTS Notebook" (Azure#1763)" This reverts commit 13f44d0. * fix PR comments * added new line at end * fix black issue Co-authored-by: Rahul Kumar <[email protected]>
1 parent de7447a commit de3e37a

File tree

3 files changed

+100
-74
lines changed

3 files changed

+100
-74
lines changed

v1/python-sdk/tutorials/automl-with-azureml/forecasting-backtest-many-models/auto-ml-forecasting-backtest-many-models.ipynb

+12-5
Original file line numberDiff line numberDiff line change
@@ -387,6 +387,7 @@
387387
"| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long. |\n",
388388
"| **process_count_per_node** | Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance. |\n",
389389
"| **train_pipeline_parameters** | The set of configuration parameters defined in the previous section. |\n",
390+
"| **run_invocation_timeout** | Maximum amount of time in seconds that the ``ParallelRunStep`` class is allowed. This is optional but provides customers with greater control on exit criteria. This must be greater than ``experiment_timeout_hours`` by at least 300 seconds. |\n",
390391
"\n",
391392
"Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
392393
]
@@ -529,6 +530,8 @@
529530
" target_column_name=TARGET_COLNAME,\n",
530531
")\n",
531532
"\n",
533+
"output_file_name = \"parallel_run_step.csv\"\n",
534+
"\n",
532535
"inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
533536
" experiment=experiment,\n",
534537
" inference_data=test_data,\n",
@@ -540,6 +543,7 @@
540543
" train_run_id=training_run.id,\n",
541544
" train_experiment_name=training_run.experiment.name,\n",
542545
" inference_pipeline_parameters=mm_parameters,\n",
546+
" append_row_file_name=output_file_name,\n",
543547
")"
544548
]
545549
},
@@ -587,18 +591,21 @@
587591
"source": [
588592
"from azureml.contrib.automl.pipeline.steps.utilities import get_output_from_mm_pipeline\n",
589593
"\n",
594+
"PREDICTION_COLNAME = \"Predictions\"\n",
590595
"forecasting_results_name = \"forecasting_results\"\n",
591596
"forecasting_output_name = \"many_models_inference_output\"\n",
592597
"forecast_file = get_output_from_mm_pipeline(\n",
593-
" inference_run, forecasting_results_name, forecasting_output_name\n",
598+
" inference_run, forecasting_results_name, forecasting_output_name, output_file_name\n",
594599
")\n",
595-
"df = pd.read_csv(forecast_file, delimiter=\" \", header=None, parse_dates=[0])\n",
596-
"df.columns = list(X_train.columns) + [\"predicted_level\"]\n",
600+
"df = pd.read_csv(forecast_file, parse_dates=[0])\n",
597601
"print(\n",
598602
" \"Prediction has \", df.shape[0], \" rows. Here the first 10 rows are being displayed.\"\n",
599603
")\n",
600-
"# Save the scv file with header to read it in the next step.\n",
601-
"df.rename(columns={TARGET_COLNAME: \"actual_level\"}, inplace=True)\n",
604+
"# Save the csv file to read it in the next step.\n",
605+
"df.rename(\n",
606+
" columns={TARGET_COLNAME: \"actual_level\", PREDICTION_COLNAME: \"predicted_level\"},\n",
607+
" inplace=True,\n",
608+
")\n",
602609
"df.to_csv(os.path.join(forecasting_results_name, \"forecast.csv\"), index=False)\n",
603610
"df.head(10)"
604611
]

v1/python-sdk/tutorials/automl-with-azureml/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb

+45-33
Original file line numberDiff line numberDiff line change
@@ -251,8 +251,16 @@
251251
"source": [
252252
"### Set up training parameters\n",
253253
"\n",
254-
"This dictionary defines the AutoML and hierarchy settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, the hierarchy definition, and the level of the hierarchy at which to train.\n",
254+
"We need to provide ``ForecastingParameters``, ``AutoMLConfig`` and ``HTSTrainParameters`` objects. For the forecasting task we need to define several settings including the name of the time column, the maximum forecast horizon, the hierarchy definition, and the level of the hierarchy at which to train.\n",
255255
"\n",
256+
"#### ``ForecastingParameters`` arguments\n",
257+
"| Property | Description|\n",
258+
"| :--------------- | :------------------- |\n",
259+
"| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
260+
"| **time_column_name** | The name of your time column. |\n",
261+
"| **time_series_id_column_names** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
262+
"\n",
263+
"#### ``AutoMLConfig`` arguments\n",
256264
"| Property | Description|\n",
257265
"| :--------------- | :------------------- |\n",
258266
"| **task** | forecasting |\n",
@@ -262,18 +270,21 @@
262270
"| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
263271
"| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
264272
"| **label_column_name** | The name of the label column. |\n",
265-
"| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
266-
"|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value.\n",
267-
"|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value.\n",
273+
"| **n_cross_validations** | Number of cross-validation folds to use for model/pipeline selection. The default value is \\\"auto\\\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value. |\n",
274+
"| **cv_step_size** | Number of periods between two consecutive cross-validation folds. The default value is \\\"auto\\\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value. |\n",
268275
"| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
269-
"| **time_column_name** | The name of your time column. |\n",
270-
"| **hierarchy_column_names** | The names of columns that define the hierarchical structure of the data from highest level to most granular. |\n",
271-
"| **training_level** | The level of the hierarchy to be used for training models. |\n",
272276
"| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
273-
"| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
274277
"| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
275278
"| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
276-
"| **model_explainability** | Flag to disable explaining the best automated ML model at the end of all training iterations. The default is True and will block non-explainable models which may impact the forecast accuracy. For more information, see [Interpretability: model explanations in automated machine learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl). |"
279+
"| **model_explainability** | Flag to disable explaining the best automated ML model at the end of all training iterations. The default is True and will block non-explainable models which may impact the forecast accuracy. For more information, see [Interpretability: model explanations in automated machine learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl). |\n",
280+
"\n",
281+
"#### ``HTSTrainParameters`` arguments\n",
282+
"| Property | Description|\n",
283+
"| :--------------- | :------------------- |\n",
284+
"| **automl_settings** | ``AutoMLConfig`` object.\n",
285+
"| **hierarchy_column_names** | The names of columns that define the hierarchical structure of the data from highest level to most granular. |\n",
286+
"| **training_level** | The level of the hierarchy to be used for training models. |\n",
287+
"| **enable_engineered_explanations** | The switch controls engineered explanations. |"
277288
]
278289
},
279290
{
@@ -287,6 +298,9 @@
287298
"outputs": [],
288299
"source": [
289300
"from azureml.train.automl.runtime._hts.hts_parameters import HTSTrainParameters\n",
301+
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
302+
"from azureml.train.automl.automlconfig import AutoMLConfig\n",
303+
"\n",
290304
"\n",
291305
"model_explainability = True\n",
292306
"\n",
@@ -300,24 +314,26 @@
300314
"label_column_name = \"quantity\"\n",
301315
"forecast_horizon = 7\n",
302316
"\n",
317+
"forecasting_parameters = ForecastingParameters(\n",
318+
" time_column_name=time_column_name,\n",
319+
" forecast_horizon=forecast_horizon,\n",
320+
")\n",
303321
"\n",
304-
"automl_settings = {\n",
305-
" \"task\": \"forecasting\",\n",
306-
" \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
307-
" \"label_column_name\": label_column_name,\n",
308-
" \"time_column_name\": time_column_name,\n",
309-
" \"forecast_horizon\": forecast_horizon,\n",
310-
" \"hierarchy_column_names\": hierarchy,\n",
311-
" \"hierarchy_training_level\": training_level,\n",
312-
" \"track_child_runs\": False,\n",
313-
" \"pipeline_fetch_max_batch_size\": 15,\n",
314-
" \"model_explainability\": model_explainability,\n",
315-
" \"n_cross_validations\": \"auto\", # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
316-
" \"cv_step_size\": \"auto\",\n",
322+
"automl_settings = AutoMLConfig(\n",
323+
" task=\"forecasting\",\n",
324+
" primary_metric=\"normalized_root_mean_squared_error\",\n",
325+
" experiment_timeout_hours=1,\n",
326+
" label_column_name=label_column_name,\n",
327+
" track_child_runs=False,\n",
328+
" forecasting_parameters=forecasting_parameters,\n",
329+
" pipeline_fetch_max_batch_size=15,\n",
330+
" model_explainability=model_explainability,\n",
331+
" n_cross_validations=\"auto\", # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
332+
" cv_step_size=\"auto\",\n",
317333
" # The following settings are specific to this sample and should be adjusted according to your own needs.\n",
318-
" \"iteration_timeout_minutes\": 10,\n",
319-
" \"iterations\": 10,\n",
320-
"}\n",
334+
" iteration_timeout_minutes=10,\n",
335+
" iterations=15,\n",
336+
")\n",
321337
"\n",
322338
"hts_parameters = HTSTrainParameters(\n",
323339
" automl_settings=automl_settings,\n",
@@ -345,6 +361,7 @@
345361
"* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long.\n",
346362
"* **process_count_per_node:** Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance.\n",
347363
"* **train_pipeline_parameters:** The set of configuration parameters defined in the previous section. \n",
364+
"* **run_invocation_timeout:** Maximum amount of time in seconds that the ``ParallelRunStep`` class is allowed. This is optional but provides customers with greater control on exit criteria. This must be greater than ``experiment_timeout_hours`` by at least 300 seconds.\n",
348365
"\n",
349366
"Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
350367
]
@@ -621,9 +638,9 @@
621638
"automated-machine-learning"
622639
],
623640
"kernelspec": {
624-
"display_name": "Python 3.7.13 ('dev')",
641+
"display_name": "Python 3.8 - AzureML",
625642
"language": "python",
626-
"name": "python3"
643+
"name": "python38-azureml"
627644
},
628645
"language_info": {
629646
"codemirror_mode": {
@@ -635,12 +652,7 @@
635652
"name": "python",
636653
"nbconvert_exporter": "python",
637654
"pygments_lexer": "ipython3",
638-
"version": "3.7.13"
639-
},
640-
"vscode": {
641-
"interpreter": {
642-
"hash": "6db9c8d9f0cce2d9127e384e15560d42c3b661994c9f717d0553d1d8985ab1ea"
643-
}
655+
"version": "3.6.8"
644656
}
645657
},
646658
"nbformat": 4,

0 commit comments

Comments
 (0)