Adaptive Prediction Models

The reproduction of the paper Adaptive Prediction Models for Data Center Resources Utilization Estimation

Requirements

python >= 3.9

Please install other packages according requirements.txt.

Dataset

Alibaba

Download cluster-trace-v2017 from alibaba/clusterdata.

Put server_usage.csv in data/trace_201708/.

$ mkdir -p data/trace_201708
$ mv <your_downloaded_folder>/server_usage.csv data/trace_201708

Kaggle Demand

Download train_0irEZ2H.csv from kaggle/demand-forecasting.
Rename train_0irEZ2H.csv to train.csv.

Put train.csv in data/kaggle_demand.

$ mkdir -p data/kaggle_demand
$ mv <your_downloaded_folder>/train.csv data/kaggle_demand

Data Preprocess

Extract and filter features from time series using TSFRESH and features-selector

Execute this command:

$ python data_helper.py

The structure of data/ folder will be like:

data/
|-- kaggle_demand
|   |-- df_rolled_12.csv
|   |-- extracted_features_12_106.csv
|   |-- labels_12_106.csv
|   |-- train.csv
|   `-- uniform_data.csv
`-- trace_201708
    |-- df_rolled_12.csv
    |-- extracted_features_12_106.csv
    |-- labels_12_106.csv
    `-- server_usage.csv

Alibaba

For Alibaba Dataset, if the window size is 60, the df_rolled_12.csv, extracted_features_12_106.csv and labels_12_106.csv will be generated in data/trace_201708/.

This process is too slow, you can download these files from here. And put it in data/ folder.

Kaggle Demand

Also, you can download files from here for kaggle_demand, and put it in data/ folder.

Note, we transform the value of units_sold using np.log1p(). If you want get the real value of units_sold, you can use np.expm() to restore it.

Run

If you wanna train the model from scratch, just ams.run() in main.py.

Otherwise, ams.run(test=True) will load the dumped model from files.

After preprocessing the data, you can change the code in main.py for the specific task, and run this command to get the result

$ python main.py

The dumped models will be downloaded from alibaba-dumped and kaggle-demand-dumped. Just put it into out/ folder.

The structure of out/ folder will be like:

out/
|-- alibaba
|   |-- ams.pickle
|   |-- gb_model.pickle
|   |-- gpr_model.pickle
|   |-- lr_model.pickle
|   |-- svm_model.pickle
|   |-- test_method.pickle
|   `-- train_method.pickle
`-- kaggle_demand
    |-- ams.pickle
    |-- gb_model.pickle
    |-- gpr_model.pickle
    |-- lr_model.pickle
    |-- svm_model.pickle
    |-- test_method.pickle
    `-- train_method.pickle

Report

Alibaba

AMS Evaluation Results using RDF

Classifier	TPR	FPR	TNR	FNR	Precision	Recall	F1-score	Accuracy
RDF	0.6585	0.1138	0.8862	0.3415	0.6585	0.6585	0.6585	0.6585

RMSE and MAE for Different Methods

Method	RMSE	MAE
Linear Regression	4.2173	3.2020
SVM	6.0078	4.9288
GB (Gradient Boosting)	3.8648	2.9206
GPR (Gaussian Process Regressor)	4.2752	3.2570
Proposed	2.9524	1.8990

After using tsfel and no overlap between windows:

method	mse	mae	time
Linear Regression	2.9321	2.2226	0.04s
SVM	2.5042	1.6472	4.47s
Gradient Boosting	2.3409	1.7571	1.81s
Gaussian Process Regressor	2.9559	2.2814	0.99s
Light gbm	1.9441	1.3883	0.22s
AMS	1.4993	0.9339	0.96s

Time of some processes

Process	Time
Roll time series	9.00s
Extract features using TSFRESH	446s (7m 26s)
Filter features using FeatureSelector	About 3h ~ 4h
Train Linear Regression Model	0.40s
Train SVM Model	287.44s
Train GB (Gradient Boosting) Model	35.76s
Train GPR (Gaussian Process Regressor) Model	2.35s
Train AMS using RDF	89.75s

Kaggle Demand

AMS Evaluation Results using RDF

Classifier	TPR	FPR	TNR	FNR	Precision	Recall	F1-score	Accuracy
RDF	0.3891	0.2036	0.7964	0.6109	0.3891	0.3891	0.3891	0.3891

RMSE and MAE for Different Methods

Method	RMSE	MAE
Linear Regression	0.5503	0.3900
SVM	0.5598	0.4117
GB (Gradient Boosting)	0.5440	0.3833
GPR (Gaussian Process Regressor)	0.5560	0.3948
Proposed	0.4431	0.2819
Proposed with tsfel	0.4177	0.27
Proposed with tsfel + lgbm base	0.4	0.26
Proposed with tsfel + lgbm - tuned	0.3841	0.2472

Time of some processes

Process	Time
Roll time series	15.55s
Extract features using TSFRESH	1007.06s
Filter features using FeatureSelector	730.74s
Train Linear Regression Model	0.14s
Train SVM Model	236.93s
Train GB (Gradient Boosting) Model	18.45s
Train GPR (Gaussian Process Regressor) Model	1.77s
Train AMS using RDF	74.47s

Acknowledgments

WillKoehrsen/features-selector

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bin		bin
log		log
third_party		third_party
.gitignore		.gitignore
README.md		README.md
ams_wstlgbm.py		ams_wstlgbm.py
data_helper.py		data_helper.py
main.py		main.py
preprocess.ipynb		preprocess.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rolled.ipynb		rolled.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Prediction Models

Requirements

Dataset

Alibaba

Kaggle Demand

Data Preprocess

Alibaba

Kaggle Demand

Run

Report

Alibaba

Kaggle Demand

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

LeonCai1/adaptive_prediction_models

Folders and files

Latest commit

History

Repository files navigation

Adaptive Prediction Models

Requirements

Dataset

Alibaba

Kaggle Demand

Data Preprocess

Alibaba

Kaggle Demand

Run

Report

Alibaba

Kaggle Demand

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages