Surrogate modeling is an umbrella term for approximations of building energy models using machine learning (ML) or algorithmic approaches, as compared to (slower) simulations. This repository contains utilities and models to replicate the ResStock dataset using surrogate modeling techniques.
This code was developed in and depends on Databricks.
The steps run the training pipleine can be found here, and the details on versioning and model artifacts can be found here.
More technical documentation is available in the following locations:
There are two deprecated versions of the model stored in deprecated/
that are no longer maintained.
This repository is designed to be run on Databricks and follows the conventions of the dml-sample-transmission-line repository. Please review its README for details on setup and usage patterns.
We currently run this project on clusters with DB 14.3 LTS runtime (Python 3.10).
To configure the cluster:
- Upload
install-db-requirements.sh
to Advanced Options > Init Script in your cluster settings. - Restart the cluster for changes to take effect.
Whenever you add a requirement to pyproject.toml
, follow these steps:
- Run
poetry update
. - Generate requirements files with
dml-gen-requirements
as described in the dml-sample-transmission-line README.
├── LICENSE
├── README.md
├── deprecated/ # Old, unmaintained models
├── docs/ # Documentation
│ ├── Building_towards_an_MVP.pdf # Model iteration notes pre-v1.0.0, now this is in release notes
│ ├── architecture.md
│ └── features_upgrades.md
├── images/ # Architecture diagrams and visuals
├── install-db-requirements.sh # Cluster init file, used to install `requirements-db-14.3.txt` on databricks
├── model_artifacts/ # Stored model artifacts, including data params and evaluation results
├── notebooks/ # Jupyter notebooks for analysis
├── poetry.lock # Poetry files
├── pyproject.toml #
├── scripts # Data extraction, training, and evaluation scripts
│ ├── megastock/ # Megastock-specific scripts (See scripts/megastock/README.md)
│ └── deprecated/ # Old scripts, no longer used
├── src/ # Source code for the surrogate model
│ ├── utils/ # General utility functions
│ ├── globals.py # Global variables
│ ├── surrogate_model.py # Main NN model implementation
│ ├── datagen.py # Generates training data to feed into NN
│ ├── feature_utils.py # Feature transformation utilities, used by main training pipleine and megastock
│ ├── versioning.py # Version control utilities
├── tests/ # Unit tests
└── requirements-*.txt # Dependencies
This project is licensed under the terms specified in LICENSE
.