Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR builds on #379 prototypes a partial adoption of SQLAlchemy in combination with Duckdb. This attempts to get the best of both worlds by using SQLAlchemy to define the database schema whilst still leveraging the capabilities of Duckdb to work with CSV files and easily spit out numpy arrays.
The main point in using SQLAlchemy here is to cut down on the amount of SQL that needs to be written and to do schema definition via the Python declarative interface. By having a generic CSV reader function you can then cut down to two simple lines of SQL. The advantage of this may be more apparent in a more mature implementation that will require more complex SQL in the duckdb only approach.
One very nice side benefit of having the schema in SQLAlchemy would be on the input creation side. At the moment the only real approach is to hack a bunch of csv files and deal with errors as you try to read them in. Instead you could use the SQLAlchemy classes to populate a database then dump it into csv files which could be quite powerful in the case of large or complex input datasets.
Type of change
Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.
Key checklist
$ python -m pytest
$ python -m sphinx -b html docs docs/build
Further checks