Oscar Rojas, Veronica Gil-Costa and Mauricio Marín
Explore the docs »
View train-test
Series of machine learning experiments and data mining tests for the evaluation of query execution time predictors in web search engines.
Techniques to predict the processing time of document ranking algorithms are compared, where predicting performance allows to allocate resources efficiently in web search engines.
- open file ./test.m and run with matlab
To get a local copy up and running follow these simple steps.
matlab 2014 or higher
- Clone the repo
git clone https://github.com/neurovisionhub/dft-running-time-prediction.git
- cd directory
cd ./dft-running-time-prediction
In root folder, edit "test.m" to run predefined experiments ('simple','parallel','all','cross-fold','visual-map','block'), where 'configdefault.m' loads the experimental data.
clear
addpath(genpath('./'))
configdefault;
cd train-test\
% config examples
% Simple test Baseline vs DFT-Based
% Selecction data-web
[data_simple,labelsD,labelsA]= main('simple');
data = [labelsD;labelsA;data_simple];
data = [["","", "Baseline","DFT-Based"]',data];
T = table(data)
% runtime-parallel prediction
%[data_parallel,~]= main('parallel');
.
.
main.m is a routine that receives as input a label (that describes the experiment) and outputs an array or tables.
Matrix Data
data_simple =
0.9636 0.0219 0.8906 0.0356 0.9015 0.0490 0.9309 0.0330
0.9688 0.0203 0.9279 0.0290 0.9235 0.0432 0.9324 0.0325
Table
data
___________________________________________________________________________
"" "W.CW.BM25" "W.CW.BM25" "BMW.CW.BM25" "BMW.CW.BM25"
"" "PCr" "RMSE" "PCr" "RMSE"
"Baseline" "0.96363" "0.021945" "0.90145" "0.049025"
"DFT-Based" "0.96875" "0.020327" "0.92353" "0.043161"
The training and test run configuration is set to ./train-test/main.m. General options include:
% ls = {0:local,1:shared}
% t = {0,1,2,3} = {1 thread, 2 threads, 4 threads, 8 threads}
% k = {10,100,1000,10000}
% P = {0:baseline, 1:DFT-Based}
% p_test ={ %test }
% sequence of neurons hidden layer for multiple test
vector_NHNeurons = [1,5,10,25,50];
% number of hidden neurons
NHNeurons = 5;
More information in train-test
Directory data-web contains:
- Queries Trec09 processed for ClueWeb and Gov2
- Descriptors of terms with baseline technique (for ClueWeb/Gov2 - BM25/TFIDF) and query 42 descriptors.
- Baseline technique query descriptors computed for k = 10,100,1000,10000 for ClueWeb with BM25
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
Distributed under the MIT License. See LICENSE
for more information.
O. Rojas - orojas.cl - [email protected]
Project Link: https://github.com/neurovisionhub/dft-running-time-prediction