Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duckdb with sqlalchemy #380

Draft
wants to merge 3 commits into
base: new-data-input
Choose a base branch
from
Draft

Duckdb with sqlalchemy #380

wants to merge 3 commits into from

Conversation

cc-a
Copy link
Collaborator

@cc-a cc-a commented Jun 28, 2024

Description

This PR builds on #379 prototypes a partial adoption of SQLAlchemy in combination with Duckdb. This attempts to get the best of both worlds by using SQLAlchemy to define the database schema whilst still leveraging the capabilities of Duckdb to work with CSV files and easily spit out numpy arrays.

The main point in using SQLAlchemy here is to cut down on the amount of SQL that needs to be written and to do schema definition via the Python declarative interface. By having a generic CSV reader function you can then cut down to two simple lines of SQL. The advantage of this may be more apparent in a more mature implementation that will require more complex SQL in the duckdb only approach.

One very nice side benefit of having the schema in SQLAlchemy would be on the input creation side. At the moment the only real approach is to hack a bunch of csv files and deal with errors as you try to read them in. Instead you could use the SQLAlchemy classes to populate a database then dump it into csv files which could be quite powerful in the case of large or complex input datasets.

Type of change

Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.

  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass: $ python -m pytest
  • The documentation builds and looks OK: $ python -m sphinx -b html docs docs/build

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@cc-a cc-a requested review from alexdewar and tsmbland June 28, 2024 13:08
Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.41%. Comparing base (935b3e2) to head (606dd66).
Report is 3 commits behind head on new-data-input.

Additional details and impacted files
@@                Coverage Diff                 @@
##           new-data-input     #380      +/-   ##
==================================================
+ Coverage           71.35%   71.41%   +0.06%     
==================================================
  Files                  44       44              
  Lines                5892     5892              
  Branches             1163     1163              
==================================================
+ Hits                 4204     4208       +4     
+ Misses               1368     1364       -4     
  Partials              320      320              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tsmbland
Copy link
Collaborator

Funnily enough I actually find the SQL table schemas more readable than the SQLAlchemy typed classes, but I get your point about more complex tables.

RE input creation - that reminds me: One thing that we will want to do is the reverse process of populating the database using data from the xarrays, and then creating csv files from the database. This is so we can essentially convert input files from the old format to the new format (i.e. old format files -> xarrays -> database -> new format files). We should have a go at doing this before scaling up the database as this may influence our approach.

@alexdewar alexdewar removed their request for review January 30, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants