Improve database structure #579

nick-harder · 2025-05-05T14:51:13Z

Pull Request

Description

This PR addresses two key pain points:

Slow startup: Deleting millions of rows between learning runs can take ~5 minutes before a new simulation even begins.
Grafana errors: Missing tables or columns on first startup cause dashboard panels to raise errors.

We solve these by:

Partitioning all time-series tables by simulation so dropping a partition is an O(1) metadata operation, instantly clearing old data.
Bootstrapping the full schema at container init (using REAL types and enabling timescaledb/postgis) so Grafana never sees missing objects.
Bulk writes via a single COPY … FROM STDIN per table inside one transaction, replacing thousands of INSERTs for multi× speedup.

These changes bring a speed-up of around 25 to 30% to the large non-learning simulations by increasing the write speed. And this allows to start new learning simulations much faster without the wait before the simulation. This is true only when using TimescaleDB and not the local DB.

Performance: Example 'base_case_2019' for a full year, current time: 14:22, new time: 10:22.

I also fixe a small issue with the unit operator still submitting tensors when not in learning mode, which can cause issues in the future as the behavior of tensors within an optimization algorithm is unpredictable.

Changes Proposed

Docker-init schema (docker_configs/db-init/assume_schema.sql): pre-creates every table with REAL precision and required extensions, eliminating Grafana errors on first use.
LIST-partitioned tables: market_meta, market_dispatch, unit_dispatch, rl_params, grid_flows, kpis now use PARTITION BY LIST (simulation), and per-run partitions are created/dropped for instant deletes.
Bulk COPY writes: Refactored store_dfs to stream DataFrames via COPY … FROM STDIN in a single transaction, cutting write overhead by an order of magnitude.
Dynamic table creation: Catch NoSuchTableError to auto-bootstrap new output tables with df.head(0).to_sql(), so no manual schema changes are ever needed.

Testing

Tested using timescaleDB and localdb, all code work fine. Grafana is also operational.

Checklist

Please check all applicable items:

…ormance - use partitions - use schema during initialization - fix conversions of tensors

- improve flow and add checks for tables

codecov · 2025-05-05T14:54:58Z

Codecov Report

Attention: Patch coverage is 41.07143% with 132 lines in your changes missing coverage. Please review.

Project coverage is 79.30%. Comparing base (53e9fb3) to head (e86a04f).

Files with missing lines	Patch %	Lines
assume/common/outputs.py	38.02%	132 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #579      +/-   ##
==========================================
- Coverage   79.67%   79.30%   -0.38%     
==========================================
  Files          52       52              
  Lines        7416     7528     +112     
==========================================
+ Hits         5909     5970      +61     
- Misses       1507     1558      +51

Flag	Coverage Δ
pytest	`79.30% <41.07%> (-0.38%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maurerle · 2025-05-06T07:23:06Z

Very cool that you are looking to improve the performance.

We currently are not using some crucial features of timescaledb - which are the hypertables - which is basically what you try to achieve, but partition by time.
Furthermore I would rather investigate having an index on the simulation_id.
So I don't think this PR is the best long term solution.

I am currently quite sparse on time, but would try to file a PR with a proper solution in the next weeks..

- fix table locking

- update schema - introduce order for unit_dispatch

nick-harder · 2025-05-06T14:34:43Z

@maurerle hey, thanks! I considered these hypertables but because they are using the time chunks I decided against that idea. Also the main thing here is to introduce a proper database schema with proper keys and rules so we can see if something is broken and being logged somehow twice and so on. Now all entries are unique and it is actively controlled by the database if some double values are added, which should not be the case. Also, I have introduced the indexing for different tables, which should improve the speed of the dashboards. Please take a look at the schema when you have time. The speed improvements is just to compensate for the increased complexity of the database structure, so we have a more robust database with same or even better performance.

nick-harder added 5 commits May 5, 2025 14:02

Initial commit changing the structure of the database for better perf…

b7b9671

…ormance - use partitions - use schema during initialization - fix conversions of tensors

Merge branch 'main' into improve-database-structure

13989d8

- improve schema

827cc1d

- improve flow and add checks for tables

- update database maintenance class

66c0805

- update schema

4fe9966

nick-harder requested a review from maurerle May 5, 2025 14:51

nick-harder added 2 commits May 5, 2025 16:59

- fix example.py

9ac18a0

- add fallbacks for localdb using SQLite

fb1224c

nick-harder added 5 commits May 6, 2025 10:24

- improve and small fixes

3b5f3f4

- update schema and introduce unique keys to ensure consistency

e03e969

- fix table locking

- update dashboards

189699e

- update schema - introduce order for unit_dispatch

- fix upinserts

bba502f

- make the code more sleak

b23cff5

Merge branch 'main' into improve-database-structure

e86a04f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve database structure #579

Improve database structure #579

nick-harder commented May 5, 2025 •

edited

Loading

codecov bot commented May 5, 2025 •

edited

Loading

maurerle commented May 6, 2025

nick-harder commented May 6, 2025 •

edited

Loading

Improve database structure #579

Are you sure you want to change the base?

Improve database structure #579

Conversation

nick-harder commented May 5, 2025 • edited Loading

Pull Request

Description

Changes Proposed

Testing

Checklist

codecov bot commented May 5, 2025 • edited Loading

Codecov Report

maurerle commented May 6, 2025

nick-harder commented May 6, 2025 • edited Loading

nick-harder commented May 5, 2025 •

edited

Loading

codecov bot commented May 5, 2025 •

edited

Loading

nick-harder commented May 6, 2025 •

edited

Loading