The reproduction of the paper Adaptive Prediction Models for Data Center Resources Utilization Estimation
- python >= 3.9
Please install other packages according requirements.txt
.
-
Download
cluster-trace-v2017
from alibaba/clusterdata. -
Put
server_usage.csv
indata/trace_201708/
.$ mkdir -p data/trace_201708 $ mv <your_downloaded_folder>/server_usage.csv data/trace_201708
-
Download
train_0irEZ2H.csv
from kaggle/demand-forecasting. -
Rename
train_0irEZ2H.csv
totrain.csv
. -
Put
train.csv
indata/kaggle_demand
.$ mkdir -p data/kaggle_demand $ mv <your_downloaded_folder>/train.csv data/kaggle_demand
Extract and filter features from time series using TSFRESH and features-selector
Execute this command:
$ python data_helper.py
The structure of data/
folder will be like:
data/
|-- kaggle_demand
| |-- df_rolled_12.csv
| |-- extracted_features_12_106.csv
| |-- labels_12_106.csv
| |-- train.csv
| `-- uniform_data.csv
`-- trace_201708
|-- df_rolled_12.csv
|-- extracted_features_12_106.csv
|-- labels_12_106.csv
`-- server_usage.csv
For Alibaba Dataset, if the window size is 60
, the df_rolled_12.csv
, extracted_features_12_106.csv
and labels_12_106.csv
will be generated in data/trace_201708/
.
This process is too slow, you can download these files from here. And put it in data/
folder.
Also, you can download files from here for kaggle_demand
, and put it in data/
folder.
Note, we transform the value of units_sold
using np.log1p()
. If you want get the real value of units_sold
, you can use np.expm()
to restore it.
If you wanna train the model from scratch, just ams.run()
in main.py
.
Otherwise, ams.run(test=True)
will load the dumped model from files.
After preprocessing the data, you can change the code in main.py
for the specific task, and run this command to get the result
$ python main.py
The dumped models will be downloaded from alibaba-dumped and kaggle-demand-dumped. Just put it into out/
folder.
The structure of out/
folder will be like:
out/
|-- alibaba
| |-- ams.pickle
| |-- gb_model.pickle
| |-- gpr_model.pickle
| |-- lr_model.pickle
| |-- svm_model.pickle
| |-- test_method.pickle
| `-- train_method.pickle
`-- kaggle_demand
|-- ams.pickle
|-- gb_model.pickle
|-- gpr_model.pickle
|-- lr_model.pickle
|-- svm_model.pickle
|-- test_method.pickle
`-- train_method.pickle
- AMS Evaluation Results using RDF
Classifier | TPR | FPR | TNR | FNR | Precision | Recall | F1-score | Accuracy |
---|---|---|---|---|---|---|---|---|
RDF | 0.6585 | 0.1138 | 0.8862 | 0.3415 | 0.6585 | 0.6585 | 0.6585 | 0.6585 |
- RMSE and MAE for Different Methods
Method | RMSE | MAE |
---|---|---|
Linear Regression | 4.2173 | 3.2020 |
SVM | 6.0078 | 4.9288 |
GB (Gradient Boosting) | 3.8648 | 2.9206 |
GPR (Gaussian Process Regressor) | 4.2752 | 3.2570 |
Proposed | 2.9524 | 1.8990 |
After using tsfel and no overlap between windows:
method | mse | mae | time |
---|---|---|---|
Linear Regression | 2.9321 | 2.2226 | 0.04s |
SVM | 2.5042 | 1.6472 | 4.47s |
Gradient Boosting | 2.3409 | 1.7571 | 1.81s |
Gaussian Process Regressor | 2.9559 | 2.2814 | 0.99s |
Light gbm | 1.9441 | 1.3883 | 0.22s |
AMS | 1.4993 | 0.9339 | 0.96s |
- Time of some processes
Process | Time |
---|---|
Roll time series | 9.00s |
Extract features using TSFRESH | 446s (7m 26s) |
Filter features using FeatureSelector | About 3h ~ 4h |
Train Linear Regression Model | 0.40s |
Train SVM Model | 287.44s |
Train GB (Gradient Boosting) Model | 35.76s |
Train GPR (Gaussian Process Regressor) Model | 2.35s |
Train AMS using RDF | 89.75s |
- AMS Evaluation Results using RDF
Classifier | TPR | FPR | TNR | FNR | Precision | Recall | F1-score | Accuracy |
---|---|---|---|---|---|---|---|---|
RDF | 0.3891 | 0.2036 | 0.7964 | 0.6109 | 0.3891 | 0.3891 | 0.3891 | 0.3891 |
- RMSE and MAE for Different Methods
Method | RMSE | MAE |
---|---|---|
Linear Regression | 0.5503 | 0.3900 |
SVM | 0.5598 | 0.4117 |
GB (Gradient Boosting) | 0.5440 | 0.3833 |
GPR (Gaussian Process Regressor) | 0.5560 | 0.3948 |
Proposed | 0.4431 | 0.2819 |
Proposed with tsfel | 0.4177 | 0.27 |
Proposed with tsfel + lgbm base | 0.4 | 0.26 |
Proposed with tsfel + lgbm - tuned | 0.3841 | 0.2472 |
- Time of some processes
Process | Time |
---|---|
Roll time series | 15.55s |
Extract features using TSFRESH | 1007.06s |
Filter features using FeatureSelector | 730.74s |
Train Linear Regression Model | 0.14s |
Train SVM Model | 236.93s |
Train GB (Gradient Boosting) Model | 18.45s |
Train GPR (Gaussian Process Regressor) Model | 1.77s |
Train AMS using RDF | 74.47s |