-
Notifications
You must be signed in to change notification settings - Fork 17
Improve database structure #579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ormance - use partitions - use schema during initialization - fix conversions of tensors
- improve flow and add checks for tables
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #579 +/- ##
==========================================
- Coverage 79.67% 79.30% -0.38%
==========================================
Files 52 52
Lines 7416 7528 +112
==========================================
+ Hits 5909 5970 +61
- Misses 1507 1558 +51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Very cool that you are looking to improve the performance. We currently are not using some crucial features of timescaledb - which are the hypertables - which is basically what you try to achieve, but partition by time. I am currently quite sparse on time, but would try to file a PR with a proper solution in the next weeks.. |
- fix table locking
- update schema - introduce order for unit_dispatch
@maurerle hey, thanks! I considered these hypertables but because they are using the time chunks I decided against that idea. Also the main thing here is to introduce a proper database schema with proper keys and rules so we can see if something is broken and being logged somehow twice and so on. Now all entries are unique and it is actively controlled by the database if some double values are added, which should not be the case. Also, I have introduced the indexing for different tables, which should improve the speed of the dashboards. Please take a look at the schema when you have time. The speed improvements is just to compensate for the increased complexity of the database structure, so we have a more robust database with same or even better performance. |
Pull Request
Description
This PR addresses two key pain points:
We solve these by:
simulation
so dropping a partition is an O(1) metadata operation, instantly clearing old data.REAL
types and enablingtimescaledb
/postgis
) so Grafana never sees missing objects.COPY … FROM STDIN
per table inside one transaction, replacing thousands of INSERTs for multi× speedup.These changes bring a speed-up of around 25 to 30% to the large non-learning simulations by increasing the write speed. And this allows to start new learning simulations much faster without the wait before the simulation. This is true only when using TimescaleDB and not the local DB.
Performance: Example 'base_case_2019' for a full year, current time: 14:22, new time: 10:22.
I also fixe a small issue with the unit operator still submitting tensors when not in learning mode, which can cause issues in the future as the behavior of tensors within an optimization algorithm is unpredictable.
Changes Proposed
docker_configs/db-init/assume_schema.sql
): pre-creates every table withREAL
precision and required extensions, eliminating Grafana errors on first use.market_meta
,market_dispatch
,unit_dispatch
,rl_params
,grid_flows
,kpis
now usePARTITION BY LIST (simulation)
, and per-run partitions are created/dropped for instant deletes.store_dfs
to stream DataFrames viaCOPY … FROM STDIN
in a single transaction, cutting write overhead by an order of magnitude.NoSuchTableError
to auto-bootstrap new output tables withdf.head(0).to_sql()
, so no manual schema changes are ever needed.Testing
Tested using timescaleDB and localdb, all code work fine. Grafana is also operational.
Checklist
Please check all applicable items:
doc
folder updates)pyproject.toml
doc/release_notes.rst
of the upcoming release is included