-
Notifications
You must be signed in to change notification settings - Fork 1
recipe_run
SQLModel -> FastAPI -> CRUD Client & CLI
#6
Conversation
An update on my current thinking here. I've now got a reasonable handle on how SQLModel works. Before building out our This multiple model approach is clearly a robust style which is more DRY than the alternatives. The one maintainability issue I foresee, however, is that as we scale it beyond The relationship of the class HeroBase(SQLModel):
name: str
secret_name: str
age: Optional[int] = None
class HeroCreate(HeroBase):
pass we can do def make_subclass(base: SQLModel, rename_base_to: str, attrs: dict = {}):
cls_name = base.__name__.replace("Base", rename_base_to)
return type(cls_name, (base,), attrs)
HeroCreate = make_subclass(HeroBase, "Create") the real conciseness gains will be in using a similar approach for the other derived classes, but this is a bit more involved, as it requires changing attributes, defaults, and/or annotations.
Perhaps it seems like premature optimization to be concerned about this before implementing our own minimal database, however I'm quite concerned a real implementation will quickly become cluttered with hundreds of lines of boilerplate that will make it difficult to understand the core of what we are trying to achieve.
If we're able to find a reasonable way to implement them, it seems like some of these ideas may be worth proposing upstream. |
891372e (without diff here) reflects a bit more work on this. We're now able to define 3 of the 5 model types for the multiple models with inheritance style using factory functions. The models defined by these factories pass all of the API tests described in the SQLModel docs (which are copied in this PR's I'm still unclear as to whether or not the remaining two model types will be factory-able, given that both of them require (non-default) fields, and I'm not sure if there's a reliable way to dynamically specify that via |
Today's work was focused on adding and testing Taken together, these objects allow us to abstract the process of adding arbitrary numbers of tables to our SQL database while greatly reducing boilerplate and thereby places for bugs to creep in. I've got a bit more work remaining to add the full complement of CRUD commands to the Then it's off to the races with our actual database implementation (presumably by later this week). |
It's great to see all the progress here. I can sense you have gone down a bit of a rabbit hole with SQLModel, but my naive impression is that what you have come up with is pretty reasonable. My one recommendation before committing to this path (factory functions, automatic CRUD test generation) would be to try to socialize this approach with someone who really groks SQLModel / Pydantic, etc. I don't see any red flags, but I am not that type of python developer. |
Great suggestion.
Working up the reproducer was itself a helpful exercise, prompting further refinement of the approach. Hopefully we'll be able to get some good feedback on this via the Issue and/or other avenues, now that we have a distilled reference for it. |
Looks like great progress happening here. One minor suggestion: rename |
I believe I've now filled all of the missing docstrings you found, with the exception of That fields rewrite is beyond the scope of what we'll be able to finish before the holiday break, so in the interest of merging
Totally agree, and as you suggest here, this is probably most fruitful as its own PR(s) once we've had experience running this package in the wild.
There's a lot of new organization now, some of which I've highlighted in responses to your inline comments, but all of which I hope will be self explanatory by just taking a fresh browse through the tests directory. Your feedback on this point was very helpful in pushing us towards a more readable and maintainable organization for the test suite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we are getting really close with this. Thanks for all of your hard work on refactoring the test suite, which I agree is much improved.
I have left a few requests for clarifications, but those are all pretty minor.
My only major-ish issue is that I still find the structure of the test suite to be pretty byzantine. The big win in this version is that you have eliminated lots of the repetition. a class the TestUpdate
is now pretty compact, with the core test logic (e.g. evaluate data
, test_update
) fairly straightforward and readable.
The cost of this is several layers of inherited classes, spanning conftest.py
, interfaces.py
, and test_databse.py
. This code is hard to follow; it was hard for me to figure out which arguments were fixures and where those fixtures are defined. And it uses some sketchy anti-patterns, such as encoding key information inside function names (c.f. get_interface
).
If you are not totally sick of this yet, I'd like to propose one more iteration, inspired by what is here, but hopefully with more simplicity and fewer hacks. The key is to use class mixins and inheritance smartly.
Imagine the following code in test_database.py
.
class CreateLogic:
# contains only the "business logic" of creation
def test_create(self, model_to_create):
models, request, blank_opts = model_to_create
data = self.create(models) # defined in a mixin
# now do specific tests of correctness, e.g.
assert data == request
# by combining a Logic class with a CRUD class, we get a complete test
class TestCreateDatabase(CreateLogic, DatabaseCRUD):
pass
# this is so easy and simple we don't even need parameterization
class TestCreateCLI(CreateLogic, CLICRUD):
pass
...
class TestUpdateDatabase(UpdateLogic, DatabaseCRUD):
pass
# etc
feedstock_id: int # TODO: Foreign key | ||
commit: str | ||
version: str | ||
status: str # TODO: Enum or categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we going to punt on this suggestion for now?
To be more explicit, here are all the fields from github check runs API:
Name | Type | In | Description |
---|---|---|---|
accept | string | header | Setting to application/vnd.github.v3+json is recommended. |
owner | string | path | |
repo | string | path | |
name | string | body | Required. The name of the check. For example, "code-coverage". |
head_sha | string | body | Required. The SHA of the commit. |
details_url | string | body | The URL of the integrator's site that has the full details of the check. If the integrator does not provide this, then the homepage of the GitHub app is used. |
external_id | string | body | A reference for the run on the integrator's system. |
status | string | body | The current status. Can be one of queued, in_progress, or completed.Default: queued |
started_at | string | body | The time that the check run began. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ. |
conclusion | string | body | Required if you provide completed_at or a status of completed. The final conclusion of the check. Can be one of action_required, cancelled, failure, neutral, success, skipped, stale, or timed_out. When the conclusion is action_required, additional details should be provided on the site specified by details_url.Note: Providing conclusion will automatically set the status parameter to completed. You cannot change a check run conclusion to stale, only GitHub can set this. |
completed_at | string | body | The time the check completed. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ. |
output | object | body | Check runs can accept a variety of data in the output object, including a title and summary and can optionally provide descriptive details about the run. See the output object description. |
Noting idea from our discussion today. In order to eliminate all of the calls to
|
@rabernat, I believe I've covered everything we discussed today in the last few commits.
3e0f540 shows what was needed to do this. It was slightly more involved than we'd initially guessed, but ultimately quite straightforward. And it eliminated the need to call It's certainly possible there's something I've overlooked, so definitely take a look, but AFAICT we're ready to merge. |
Yesterday @rabernat and I discussed the fact that #1 contains a lot of cobbled together workarounds precipitated by the fact that we have not yet implemented pangeo-forge/roadmap#31. Rather than building upon the un-scalable, unsteady footing of CSV logs (as #1 suggests), we are going to take a step back and implement an actual database + API for Pangeo Forge.
The first table in our database will be for
recipe_run
logs: records of which recipes were run, when, deposited where, etc. These are the first priority because unlike the Bakery database or recipe metadata, they don't currently exist anywhere else.This PR currently just mirrors the SQLModel documentation example of a "Heroes" sqlite database plus FastAPI layer, with the addition of sketches for
server
andcli
modules. Opening this as a draft now, and will mark as Ready for review once that's the case. Unlike #1, going to aim to keep this one truly minimal. 😄