-
Notifications
You must be signed in to change notification settings - Fork 6
/
testSissoRegressor.m
73 lines (65 loc) · 3.75 KB
/
testSissoRegressor.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
% Paul Gasper, NREL, September 2020
% Test the use of the Sisso Regressor. Data and generated features have
% been copied from the 'Compressed Sensing' example published online:
% https://analytics-toolkit.nomad-coe.eu/hub/user-redirect/notebooks/tutorials/compressed_sensing.ipynb
% This Jupyter notebook demonstrates findings published in the paper:
% L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, M. Scheffler:
% Big Data of Materials Science: Critical Role of the Descriptor, Phys. Rev. Lett. 114, 105503 (2015)
% This demonstrates that the SissoRegressor (translated into
% Matlab from Python from code from the same Jupyter notebook) class is
% operating.
clear; clc; close all;
%% Test on 'small' feature set (115 features, 82 data points):
% Load files:
% featureList: cell array of chacter vectors
% x: array of input data, each column is a feature
% y: column vector of response data
load('sissoTestData.mat')
fprintf("Fitting 'small' data: %d data points, %d features.\n",length(y), size(x,2))
% Expected result for this data (copied from the Jupyter notebook):
% RMSE Model
% 1D: 0.296668 - 0.484 (r_p(A)+r_d(B)) + 1.944
% 2D: 0.218070 - 3.483 (r_p(A)+r_d(B)) + 0.392 (r_p(A)+r_d(B))^2 + 7.495
% 3D: 0.193928 - 3.528 (r_p(A)+r_d(B)) + 0.405 (r_p(A)+r_d(B))^2 + 0.293 |r_s(A)-r_d(B)| + 7.280
% Execution time on Jupyter: 406 ms
% Test
nNonzeroCoefs = 3; nFeaturesPerSisIter = 10;
fprintf("Searching for models up to %d dimemsions, considering %d new features per iteration.\n",nNonzeroCoefs, nFeaturesPerSisIter)
sisso = SissoRegressor(nNonzeroCoefs, nFeaturesPerSisIter);
sisso = fitSisso(sisso, x, y);
printModels(sisso, featureList)
% Matlab solution:
% (formatted slightly different cause I like it better)
% RMSE Model
% 1D: 0.296696 1.922 - 0.478 (r_p(A)+r_d(B))
% 2D: 0.218070 7.495 - 3.483 (r_p(A)+r_d(B)) + 0.392 (r_p(A)+r_d(B))^2
% 3D: 0.193928 7.280 - 3.528 (r_p(A)+r_d(B)) + 0.405 (r_p(A)+r_d(B))^2 + 0.293 |r_s(A)-r_d(B)|
% Execution time on Matlab: 116 ms (faster!)
% 1D case is not coming out with the exact same coef/intercept, but the
% RMSE is almost exactly the same, so perhaps the linear solution in that
% case is simply underdetermined.
%% Test on 'large' feature set (3391 features, 82 data points):
% Load files:
% featureList: cell array of chacter vectors
% x: array of input data, each column is a feature
% y: column vector of response data
load('sissoTestDataBig.mat')
fprintf("Fitting 'big' data: %d data points, %d features.\n",length(y), size(x,2))
% Expected result for this data (copied from Jupyter notebook):
% RMSE Model
% 1D: 0.137212 - 0.055 (IP(A)+IP(B))/r_p(A)^2 - 0.332
% 2D: 0.100216 + 0.114 |IP(B)-EA(B)|/r_p(A)^2 - 1.482 |r_s(A)-r_p(B)|/exp(r_s(A)) - 0.145
% 3D: 0.076428 + 0.109 |IP(B)-EA(B)|/r_p(A)^2 - 1.766 |r_s(A)-r_p(B)|/exp(r_s(A)) - 6.032 |r_s(B)-r_p(B)|/(r_p(B)+r_d(A))^2 - 0.005
% Execution time on Jupyter: 5.16 s
% Test (Note that there are more features per iteration as well as more features):
nNonzeroCoefs = 3; nFeaturesPerSisIter = 26;
fprintf("Searching for models up to %d dimemsions, considering %d new features per iteration.\n",nNonzeroCoefs, nFeaturesPerSisIter)
sissoBig = SissoRegressor(nNonzeroCoefs, nFeaturesPerSisIter);
sissoBig = fitSisso(sissoBig, x, y);
printModels(sissoBig, featureList)
% Matlab solutions:
% RMSE Model
% 1D: 0.137310 -0.327 - 0.055 (IP(A)+IP(B))/r_p(A)^2
% 2D: 0.100216 -0.145 + 0.114 |IP(B)-EA(B)|/r_p(A)^2 - 1.482 |r_s(A)-r_p(B)|/exp(r_s(A))
% 3D: 0.076428 -0.005 + 0.109 |IP(B)-EA(B)|/r_p(A)^2 - 1.766 |r_s(A)-r_p(B)|/exp(r_s(A)) - 6.032 |r_s(B)-r_p(B)|/(r_p(B)+r_d(A))^2
% Execution time on Matlab: 0.593 s (~ 10x faster)