Explore a Gaussian Process Regression #3

SarahAlidoost · 2024-11-25T15:58:02Z

We want to explore Gaussian Process regression models on hybrid labs data. For that, a python module needed to be called to fit, and predict. The module should:

similar to pycaret to compare different models, see examples
have a notebook in docs folder to show a simple example
have tests
have an easy interface and run fast (to be discussed)

Data:
The data folder exp699_032024_TUDelft is available on Atlas SharePoint at Documents > HybridLabs > Example_data. NOTE: The data folder exp699_032024_TUDelft is not public and cannot be shared with others. The trained model on this data cannot be also shared. For now, no need to store the trained model.

The data folder exp699_032024_TUDelft includes:

readme.md: it contains the experiment details and ML input/output
channel.csv: it contains the variable names and units
exp699.mat: it contains the data and it is in matlab format.

For reading data in python, see this notebook.

Literature:

Some literature is available at on Atlas SharePoint at Documents > HybridLabs > Literature. The two most related to FOWT are:

The text was updated successfully, but these errors were encountered:

SarahAlidoost · 2024-12-19T09:28:52Z

see #14

SarahAlidoost · 2025-01-28T10:04:45Z

As found by #17, the Gaussian process is computationally expensive with large data (lots of samples). Techniques like PCA can help a bit by reducing the number of features, but they’re not enough on their own. There are other approaches to tackle the issue e.g. Sparse Gaussian Processes, but scikit-learn does not natively support this. I suggest checking out some other techniques and packages like GPyTorch, see GPyTorch Regression Tutorial.

vanlankveldthijs · 2025-01-29T13:34:19Z

Yeah. In fact it seems that PCA has no effect whatsoever on this particular issue, because the issue isn't caused by the size of the data (observations x features), but only by the number of observations. PCA does not reduce the number of observations.

I did not explore any implementations outside of mlflow, which uses sklearn, because that was the objective for this sprint. I agree that other packages may provide out-of-the-box solutions for this issue.

SarahAlidoost added the data-driven models data-driven models label Nov 27, 2024

SarahAlidoost moved this to Backlog in FOWT-ML - Sprint 1 Nov 27, 2024

SarahAlidoost added this to FOWT-ML - Sprint 1 Nov 27, 2024

SarahAlidoost moved this from Backlog to Ready in FOWT-ML - Sprint 1 Nov 27, 2024

vanlankveldthijs self-assigned this Jan 20, 2025

vanlankveldthijs moved this from Ready to In progress in FOWT-ML - Sprint 1 Jan 24, 2025

SarahAlidoost moved this from In progress to In review in FOWT-ML - Sprint 1 Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore a Gaussian Process Regression #3

Explore a Gaussian Process Regression #3

SarahAlidoost commented Nov 25, 2024 •

edited

Loading

SarahAlidoost commented Dec 19, 2024

SarahAlidoost commented Jan 28, 2025

vanlankveldthijs commented Jan 29, 2025

Explore a Gaussian Process Regression #3

Explore a Gaussian Process Regression #3

Comments

SarahAlidoost commented Nov 25, 2024 • edited Loading

SarahAlidoost commented Dec 19, 2024

SarahAlidoost commented Jan 28, 2025

vanlankveldthijs commented Jan 29, 2025

SarahAlidoost commented Nov 25, 2024 •

edited

Loading