Skip to content

The reproduction of paper "Adaptive Prediction Models for Data Center Resources Utilization Estimation"

Notifications You must be signed in to change notification settings

LeonCai1/adaptive_prediction_models

Repository files navigation

Adaptive Prediction Models

The reproduction of the paper Adaptive Prediction Models for Data Center Resources Utilization Estimation

Requirements

  • python >= 3.9

Please install other packages according requirements.txt.

Dataset

Alibaba

  1. Download cluster-trace-v2017 from alibaba/clusterdata.

  2. Put server_usage.csv in data/trace_201708/.

    $ mkdir -p data/trace_201708
    $ mv <your_downloaded_folder>/server_usage.csv data/trace_201708
    

Kaggle Demand

  1. Download train_0irEZ2H.csv from kaggle/demand-forecasting.

  2. Rename train_0irEZ2H.csv to train.csv.

  3. Put train.csv in data/kaggle_demand.

    $ mkdir -p data/kaggle_demand
    $ mv <your_downloaded_folder>/train.csv data/kaggle_demand
    

Data Preprocess

Extract and filter features from time series using TSFRESH and features-selector

Execute this command:

$ python data_helper.py

The structure of data/ folder will be like:

data/
|-- kaggle_demand
|   |-- df_rolled_12.csv
|   |-- extracted_features_12_106.csv
|   |-- labels_12_106.csv
|   |-- train.csv
|   `-- uniform_data.csv
`-- trace_201708
    |-- df_rolled_12.csv
    |-- extracted_features_12_106.csv
    |-- labels_12_106.csv
    `-- server_usage.csv

Alibaba

For Alibaba Dataset, if the window size is 60, the df_rolled_12.csv, extracted_features_12_106.csv and labels_12_106.csv will be generated in data/trace_201708/.

This process is too slow, you can download these files from here. And put it in data/ folder.

Kaggle Demand

Also, you can download files from here for kaggle_demand, and put it in data/ folder.

Note, we transform the value of units_sold using np.log1p(). If you want get the real value of units_sold, you can use np.expm() to restore it.

Run

If you wanna train the model from scratch, just ams.run() in main.py.

Otherwise, ams.run(test=True) will load the dumped model from files.

After preprocessing the data, you can change the code in main.py for the specific task, and run this command to get the result

$ python main.py

The dumped models will be downloaded from alibaba-dumped and kaggle-demand-dumped. Just put it into out/ folder.

The structure of out/ folder will be like:

out/
|-- alibaba
|   |-- ams.pickle
|   |-- gb_model.pickle
|   |-- gpr_model.pickle
|   |-- lr_model.pickle
|   |-- svm_model.pickle
|   |-- test_method.pickle
|   `-- train_method.pickle
`-- kaggle_demand
    |-- ams.pickle
    |-- gb_model.pickle
    |-- gpr_model.pickle
    |-- lr_model.pickle
    |-- svm_model.pickle
    |-- test_method.pickle
    `-- train_method.pickle

Report

Alibaba

  1. AMS Evaluation Results using RDF
Classifier TPR FPR TNR FNR Precision Recall F1-score Accuracy
RDF 0.6585 0.1138 0.8862 0.3415 0.6585 0.6585 0.6585 0.6585
  1. RMSE and MAE for Different Methods
Method RMSE MAE
Linear Regression 4.2173 3.2020
SVM 6.0078 4.9288
GB (Gradient Boosting) 3.8648 2.9206
GPR (Gaussian Process Regressor) 4.2752 3.2570
Proposed 2.9524 1.8990

After using tsfel and no overlap between windows:

method mse mae time
Linear Regression 2.9321 2.2226 0.04s
SVM 2.5042 1.6472 4.47s
Gradient Boosting 2.3409 1.7571 1.81s
Gaussian Process Regressor 2.9559 2.2814 0.99s
Light gbm 1.9441 1.3883 0.22s
AMS 1.4993 0.9339 0.96s
  1. Time of some processes
Process Time
Roll time series 9.00s
Extract features using TSFRESH 446s (7m 26s)
Filter features using FeatureSelector About 3h ~ 4h
Train Linear Regression Model 0.40s
Train SVM Model 287.44s
Train GB (Gradient Boosting) Model 35.76s
Train GPR (Gaussian Process Regressor) Model 2.35s
Train AMS using RDF 89.75s

Kaggle Demand

  1. AMS Evaluation Results using RDF
Classifier TPR FPR TNR FNR Precision Recall F1-score Accuracy
RDF 0.3891 0.2036 0.7964 0.6109 0.3891 0.3891 0.3891 0.3891
  1. RMSE and MAE for Different Methods
Method RMSE MAE
Linear Regression 0.5503 0.3900
SVM 0.5598 0.4117
GB (Gradient Boosting) 0.5440 0.3833
GPR (Gaussian Process Regressor) 0.5560 0.3948
Proposed 0.4431 0.2819
Proposed with tsfel 0.4177 0.27
Proposed with tsfel + lgbm base 0.4 0.26
Proposed with tsfel + lgbm - tuned 0.3841 0.2472
  1. Time of some processes
Process Time
Roll time series 15.55s
Extract features using TSFRESH 1007.06s
Filter features using FeatureSelector 730.74s
Train Linear Regression Model 0.14s
Train SVM Model 236.93s
Train GB (Gradient Boosting) Model 18.45s
Train GPR (Gaussian Process Regressor) Model 1.77s
Train AMS using RDF 74.47s

Acknowledgments

About

The reproduction of paper "Adaptive Prediction Models for Data Center Resources Utilization Estimation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published