Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all dependencies from core ehrql functionality #2225

Closed
wants to merge 3 commits into from

Conversation

bloodearnest
Copy link
Member

@bloodearnest bloodearnest commented Nov 18, 2024

This means we can run the sandbox w/o pyarrow, sqlalchemy, or the various graphviz dependencies installed.

Introduces a new fixture that installs ehrql in an isolated venv to test
the dependency-not-installed code paths work.

Copy link

cloudflare-workers-and-pages bot commented Nov 18, 2024

Deploying databuilder-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1adc3cc
Status: ✅  Deploy successful!
Preview URL: https://53c280d8.databuilder.pages.dev
Branch Preview URL: https://minimal-dependencies.databuilder.pages.dev

View logs

@bloodearnest
Copy link
Member Author

This currently failing coverage because of this bug: pytest-dev/pytest-cov#667

We need to either remove --dist=loadgroup from the default, or always run with a -n param too, I think

@bloodearnest bloodearnest marked this pull request as ready for review December 2, 2024 09:46
Copy link
Contributor

@rebkwok rebkwok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I guess you'll need to do one of the suggested coverage fixes in order to merge it. Is it definitely a pytest bug, and not something to do with the non-standard way we add xdist groups?

@bloodearnest
Copy link
Member Author

? Is it definitely a pytest bug, and not something to do with the non-standard way we add xdist groups?

I think so. I don't think it should use the controller it does unless you are running with -n. Basically, its hardcoded detection of xdist being used is wrong.

Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly hesitant about merging this to be honest, because at things stand it adds complexity without giving us any direct benefit. It's good to know how we'd do this kind of optional import if we needed to, but we don't need to right now.

I'm also a bit surprised that pyarrow is the only troublesome dependency here. I thought we end up importing the SQLite engine inside ehrql.main which ends up dragging in sqlean? But if you've tested that this does all work in Pyodide with just pyarrow removed then I'm prepared to believe it.

@bloodearnest
Copy link
Member Author

I'm slightly hesitant about merging this to be honest, because at things stand it adds complexity without giving us any direct benefit. It's good to know how we'd do this kind of optional import if we needed to, but we don't need to right now.

Understood. When this work was done, we were still exploring running it in pyodide, but that has been punted in favour of vscode extension, AIUI.

I'm also a bit surprised that pyarrow is the only troublesome dependency here. I thought we end up importing the SQLite engine inside ehrql.main which ends up dragging in sqlean? But if you've tested that this does all work in Pyodide with just pyarrow removed then I'm prepared to believe it.

It's not, I have follow on branch that defers imports for some commands (sqlalchemy et al for dump_dataset_sql and graphviz et all for graph_dataset), which use the standalone venv fixture added here to test.

The mechanisms for making imports optional are quite different though - here we use the pre-existing machinery detect valid outputs, whereas we just more clumsily defer imports for the others, so I though best as separate PRs.

@evansd
Copy link
Contributor

evansd commented Dec 4, 2024

I have follow on branch that defers imports for some commands

Ah I see, great. I feel like it would be worth including those commits in this PR. In terms of the aims it's all of a piece, and it's not like this PR is so huge that it needs splitting up for reviewability.

That would mean all the work's in one place. And there's a separate question as to whether to leave it as a proof-of-viability for further down the line, or whether to actually merge – though we'd still need to decide how to resolve the pytest issue.

This means we can run the sandbox w/o pyarrow installed.

Introduces a new fixture that installs ehrql in an isolated venv to test
the pyarrow-not-installed code paths, which will also be used in
subsequent commints testing other optional dependencies.
This defers some sqlalchemy imports and some graphviz imports in
ehrql/main.py, in order not to block running the sandbox if they are
missing.

The primary motiviation was to support running the sandbox in pyodide,
but is also just a useful property for ehrql core not to require any
dependencies.

Have not added tests that error handling works as expected, as we may
not move foward with this.
readline is not available in pyodide, it has its own readline
implementation which is enable by default in its REPL implementation.

Note: cannot test this easily, as linux python comes with readline
pre-installed.
@bloodearnest
Copy link
Member Author

Have tidied up the other commits and pushed.

We may want a bit more testing polish if we want to land it.

@bloodearnest bloodearnest changed the title Make pyarrow support optional. Remove all dependencies from core ehrql functionality Dec 4, 2024
@bloodearnest
Copy link
Member Author

WE've got rid of the sandbox, so the motivation is gone, so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants