`recipe_run` SQLModel -> FastAPI -> CRUD Client & CLI #6

cisaacstern · 2021-11-17T02:08:35Z

Yesterday @rabernat and I discussed the fact that #1 contains a lot of cobbled together workarounds precipitated by the fact that we have not yet implemented pangeo-forge/roadmap#31. Rather than building upon the un-scalable, unsteady footing of CSV logs (as #1 suggests), we are going to take a step back and implement an actual database + API for Pangeo Forge.

The first table in our database will be for recipe_run logs: records of which recipes were run, when, deposited where, etc. These are the first priority because unlike the Bakery database or recipe metadata, they don't currently exist anywhere else.

This PR currently just mirrors the SQLModel documentation example of a "Heroes" sqlite database plus FastAPI layer, with the addition of sketches for server and cli modules. Opening this as a draft now, and will mark as Ready for review once that's the case. Unlike #1, going to aim to keep this one truly minimal. 😄

cisaacstern · 2021-11-19T22:03:44Z

An update on my current thinking here. I've now got a reasonable handle on how SQLModel works. Before building out our recipe_run SQL table, I'm taking a moment to consider if there's an approachable way to write factory functions to generate derived classes for the multiple models with inheritance approach.

This multiple model approach is clearly a robust style which is more DRY than the alternatives. The one maintainability issue I foresee, however, is that as we scale it beyond recipe_run to numerous interrelated tables, a lot of boilerplate classes will be defined (each of which is then referenced multiple times in the tests, API functions, etc.). So if there was a way to automate generation of derived models with factories, our implementation would be a lot more readable and maintainable.

The relationship of the Base and Create classes is the simplest example, as they are the same classes with different names. In this case, instead of

class HeroBase(SQLModel):
    name: str
    secret_name: str
    age: Optional[int] = None

class HeroCreate(HeroBase):
    pass

we can do

def make_subclass(base: SQLModel, rename_base_to: str, attrs: dict = {}):
    cls_name = base.__name__.replace("Base", rename_base_to)
    return type(cls_name, (base,), attrs)

HeroCreate = make_subclass(HeroBase, "Create")

the real conciseness gains will be in using a similar approach for the other derived classes, but this is a bit more involved, as it requires changing attributes, defaults, and/or annotations.

The next step would be, for a given set of dynamically generated Models, to also dynamically generate the associated API functions and tests. This seems doable, given that for a given CRUD API function (e.g. create), the basic syntax is quite standardized, with the only real differences being the arguments and/or type hints passed to the function and its decorator.

Perhaps it seems like premature optimization to be concerned about this before implementing our own minimal database, however I'm quite concerned a real implementation will quickly become cluttered with hundreds of lines of boilerplate that will make it difficult to understand the core of what we are trying to achieve.

Following the multiple models approach proposed in the SQLModel documentation, each SQL table requires at least: five model definitions + four API functions + four client functions + at least eight tests (if you include error condition tests) + (optionally) four CLI functions = 25 objects, almost all of which conform to a known (class or function) template of some sort.

If we're able to find a reasonable way to implement them, it seems like some of these ideas may be worth proposing upstream. SQLModel is a relatively recently released project, the core of which is very elegant (unsurprisingly, given its author). I'm cautiously optimistic we may be able to come up with some convenience methods that make it even easier to use than it currently is.

cisaacstern · 2021-11-21T00:19:06Z

the real conciseness gains will be in using a similar approach for the other derived classes, but this is a bit more involved, as it requires changing attributes, defaults, and/or annotations.

891372e (without diff here) reflects a bit more work on this. We're now able to define 3 of the 5 model types for the multiple models with inheritance style using factory functions. The models defined by these factories pass all of the API tests described in the SQLModel docs (which are copied in this PR's tests/test_api.py).

I'm still unclear as to whether or not the remaining two model types will be factory-able, given that both of them require (non-default) fields, and I'm not sure if there's a reliable way to dynamically specify that via type() or types.new_class(). (Feels like it might require overwriting __init__ or __init_subclass__ or somesuch thing, but with all the metaclass wizardry going on in SQLModel, that seems inadvisable.) Even if the proposed factory production of 3 out of 5 models is where we land, though, this feels like a meaningful boilerplate reduction + readability boost to me. Not sure if there are any reliability factors I'm overlooking in this style of dynamic class definitions, but they seem pretty straightforward to me.

…tests

cisaacstern · 2021-11-23T01:31:18Z

MultipleModels is a new container for holding associated models generated by the factory functions described above.

Today's work was focused on adding and testing GenerateEndpoints which streamlines creation of the FastAPI endpoints for a MultipleModels collection.

Taken together, these objects allow us to abstract the process of adding arbitrary numbers of tables to our SQL database while greatly reducing boilerplate and thereby places for bugs to creep in.

I've got a bit more work remaining to add the full complement of CRUD commands to the client and cli interfaces, and then verify that all this works with relationship attributes (i.e., foreign keys).

Then it's off to the races with our actual database implementation (presumably by later this week).

rabernat · 2021-11-23T01:39:13Z

It's great to see all the progress here.

I can sense you have gone down a bit of a rabbit hole with SQLModel, but my naive impression is that what you have come up with is pretty reasonable. My one recommendation before committing to this path (factory functions, automatic CRUD test generation) would be to try to socialize this approach with someone who really groks SQLModel / Pydantic, etc. I don't see any red flags, but I am not that type of python developer.

cisaacstern · 2021-11-24T22:20:47Z

try to socialize this approach with someone who really groks SQLModel / Pydantic, etc.

Great suggestion.

Minimal reproducer repo: cisaacstern/sqlmodel-abstraction
Convenience methods to create multiple models + associated API endpoints fastapi/sqlmodel#166

Working up the reproducer was itself a helpful exercise, prompting further refinement of the approach. Hopefully we'll be able to get some good feedback on this via the Issue and/or other avenues, now that we have a distilled reference for it.

rabernat · 2021-12-16T18:58:03Z

Looks like great progress happening here.

One minor suggestion: rename entrypoints to interfaces. "Entry points" has a very specific meaning in python.

cisaacstern · 2021-12-16T20:53:50Z

@rabernat:

My comments are basically in three categories:

Let's make sure everything is documented with docstrings

I believe I've now filled all of the missing docstrings you found, with the exception of .models.RecipeRunBase. I've left this unfinished to avoid redoing the work once we change the fields to match GitHub check runs, as you propose in #6 (comment).

That fields rewrite is beyond the scope of what we'll be able to finish before the holiday break, so in the interest of merging ~~this week (as a bit of a pre-holiday morale boost 🚀 )~~, I suggest we do the check runs work and associated docstring as its own PR.

I'm a bit on-the-fence about just passing the urls + json directly to the CLI. At that point, do we even need the CLI? We could accomplish almost the same thing with curl. However, this is minor and I think we can leave things as they are for now. Eventually, we should try to make the CLI more verbose about the options supported. We will learn a lot from using the CLI in our infrastructure.

Totally agree, and as you suggest here, this is probably most fruitful as its own PR(s) once we've had experience running this package in the wild.

The test suit is very comprehensive but has some ugliness. Let's see if we can find a more elegant and also more readable solution. I think it will involve defining custom classes for use in the test suite.

There's a lot of new organization now, some of which I've highlighted in responses to your inline comments, but all of which I hope will be self explanatory by just taking a fresh browse through the tests directory. Your feedback on this point was very helpful in pushing us towards a more readable and maintainable organization for the test suite.

tests/conftest.py

rabernat

Ok, we are getting really close with this. Thanks for all of your hard work on refactoring the test suite, which I agree is much improved.

I have left a few requests for clarifications, but those are all pretty minor.

My only major-ish issue is that I still find the structure of the test suite to be pretty byzantine. The big win in this version is that you have eliminated lots of the repetition. a class the TestUpdate is now pretty compact, with the core test logic (e.g. evaluate data, test_update) fairly straightforward and readable.

The cost of this is several layers of inherited classes, spanning conftest.py, interfaces.py, and test_databse.py. This code is hard to follow; it was hard for me to figure out which arguments were fixures and where those fixtures are defined. And it uses some sketchy anti-patterns, such as encoding key information inside function names (c.f. get_interface).

If you are not totally sick of this yet, I'd like to propose one more iteration, inspired by what is here, but hopefully with more simplicity and fewer hacks. The key is to use class mixins and inheritance smartly.

Imagine the following code in test_database.py.

class CreateLogic:
    # contains only the "business logic" of creation
    def test_create(self, model_to_create):
        models, request, blank_opts = model_to_create
        data = self.create(models) # defined in a mixin
        # now do specific tests of correctness, e.g.
        assert data == request

# by combining a Logic class with a CRUD class, we get a complete test
class TestCreateDatabase(CreateLogic, DatabaseCRUD):
    pass

# this is so easy and simple we don't even need parameterization
class TestCreateCLI(CreateLogic, CLICRUD):
    pass

...

class TestUpdateDatabase(UpdateLogic, DatabaseCRUD):
    pass

# etc

pangeo_forge_orchestrator/cli.py

rabernat · 2022-01-03T17:23:08Z

pangeo_forge_orchestrator/models.py

+    feedstock_id: int  # TODO: Foreign key
+    commit: str
+    version: str
+    status: str  # TODO: Enum or categorical


So are we going to punt on this suggestion for now?

To be more explicit, here are all the fields from github check runs API:

Name Type In Description

accept string header Setting to application/vnd.github.v3+json is recommended.

owner string path

repo string path

name string body Required. The name of the check. For example, "code-coverage".

head_sha string body Required. The SHA of the commit.

details_url string body The URL of the integrator's site that has the full details of the check. If the integrator does not provide this, then the homepage of the GitHub app is used.

external_id string body A reference for the run on the integrator's system.

status string body The current status. Can be one of queued, in_progress, or completed.Default: queued

started_at string body The time that the check run began. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ.

conclusion string body Required if you provide completed_at or a status of completed. The final conclusion of the check. Can be one of action_required, cancelled, failure, neutral, success, skipped, stale, or timed_out. When the conclusion is action_required, additional details should be provided on the site specified by details_url.Note: Providing conclusion will automatically set the status parameter to completed. You cannot change a check run conclusion to stale, only GitHub can set this.

completed_at string body The time the check completed. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ.

output object body Check runs can accept a variety of data in the output object, including a title and summary and can optionally provide descriptive details about the run. See the output object description.

pangeo_forge_orchestrator/models.py

tests/conftest.py

tests/interfaces.py

tests/test_database.py

tests/conftest.py

rabernat · 2022-01-07T19:26:19Z

Noting idea from our discussion today.

In order to eliminate all of the calls to clear_table, we could create a fixture that does this for us. To do this, we would chain together two fixutres:

The current database / API session-scoped fixtures
Function-scoped fixture that takes those as inputs but automatically clears the database tables before each invocation

cisaacstern · 2022-01-07T22:14:50Z

@rabernat, I believe I've covered everything we discussed today in the last few commits.

to eliminate all of the calls to clear_table, we could create a fixture that does this for us

3e0f540 shows what was needed to do this. It was slightly more involved than we'd initially guessed, but ultimately quite straightforward. And it eliminated the need to call clear_table at the test level, a definite win for test simplicity.

It's certainly possible there's something I've overlooked, so definitely take a look, but AFAICT we're ready to merge.

rabernat · 2022-01-07T22:55:07Z

Just wanted to note that I followed the development guide and everything worked perfectly. The auto-generated API docs are a thing of beauty 🤩

cisaacstern added 8 commits October 4, 2021 15:57

Initial commit

73dceb3

first database pr commit

ec5ffce

Merge remote-tracking branch 'charles/main' into database

1f3b27e

remove merge annotation from .gitignore

9035035

SQLModel/FastAPI Hero example

218955e

add server and cli modules

dd9ce1d

cli test first draft

0916a1c

client draft

254eaf1

cisaacstern added 2 commits November 20, 2021 15:52

add model factory funcs

891372e

fix workflow pr branch trigger

5dedcf9

cisaacstern added 3 commits November 20, 2021 17:22

generalize api funcs

cc91aa4

generalize api creation with MultipleModels + GenerateEndpoints objects

dfdfdd1

match client + cli creation funcs to api, and generalize/parametrize …

16c3af6

…tests

cisaacstern added 2 commits November 23, 2021 13:27

make model factories methods of MultipleModels

238c8c2

move abstractions to standalone module

8c7ff8c

cisaacstern added 2 commits November 24, 2021 19:53

generalize update + delete tests

80d5b5d

tests rewrite, make db in process cwd, crud methods

0b18bc4

cisaacstern mentioned this pull request Dec 6, 2021

Minimal CLI #1

Closed

cisaacstern added 4 commits December 6, 2021 14:20

revert pre-commit to 19.10b0

f2f9f41

rename test module

5789ff0

swap hero example for recipe_run first draft

6a13aa9

make database sub-cli & add docstrings forit

4d85280

cisaacstern mentioned this pull request Dec 7, 2021

Package setup #7

Merged

cisaacstern added 2 commits December 6, 2021 21:37

Merge branch 'pangeo-forge:main' into database

a1f30cc

docstrings for client dataclass

2d7b200

cisaacstern added 8 commits December 15, 2021 13:32

add ModelKwargs + ModelFixture classes for tests

881e44b

add create_func lazy fixture

88f5029

add TestCreate & test all create errors with raises

566daa8

move create failure tests into TestCreate

a1b0fb9

add TestReadRange container

2b8e28c

TestReadRange -> TestRead

d60ab09

fixture containers, CRUD test classes

ad1d0cc

org per-entrypoint CRUD funcs together; all fixtures in conftest

5141edc

cisaacstern added 2 commits December 16, 2021 11:19

rename entrypoint -> interface in tests

b81f1b3

abstractions docstrings + models comment

c089105

cisaacstern marked this pull request as ready for review December 16, 2021 20:39

cisaacstern commented Dec 16, 2021

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

Update tests/conftest.py

9471721

cisaacstern commented Dec 16, 2021

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

typo fix

7ddd177

rabernat reviewed Jan 3, 2022

View reviewed changes

cisaacstern added 2 commits January 6, 2022 11:45

make RecipeRunRead model comment a docstring

cb8ed6f

refactor tests with mixins

72a24ca

cisaacstern added 3 commits January 7, 2022 12:29

add explanation for custom error classes

c6d073e

clear database in fixtures, not tests

3e0f540

reduce len(ModelFixture.kwargs) to 2 and explain why

051523b

fix two typos

c582ca6

rabernat merged commit 2b76858 into pangeo-forge:main Jan 7, 2022

rabernat mentioned this pull request Jan 10, 2022

Expand recipe_runs status fields to approach GitHub check runs API #8

Closed

cisaacstern mentioned this pull request Jan 10, 2022

Update RecipeRun model to match GitHub check runs API #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`recipe_run` SQLModel -> FastAPI -> CRUD Client & CLI #6

`recipe_run` SQLModel -> FastAPI -> CRUD Client & CLI #6

cisaacstern commented Nov 17, 2021

cisaacstern commented Nov 19, 2021 •

edited

Loading

cisaacstern commented Nov 21, 2021 •

edited

Loading

cisaacstern commented Nov 23, 2021

rabernat commented Nov 23, 2021

cisaacstern commented Nov 24, 2021

rabernat commented Dec 16, 2021

cisaacstern commented Dec 16, 2021 •

edited

Loading

rabernat left a comment •

edited

Loading

rabernat Jan 3, 2022

rabernat commented Jan 7, 2022

cisaacstern commented Jan 7, 2022

rabernat commented Jan 7, 2022 •

edited

Loading

Name	Type	In	Description
accept	string	header	Setting to application/vnd.github.v3+json is recommended.
owner	string	path
repo	string	path
name	string	body	Required. The name of the check. For example, "code-coverage".
head_sha	string	body	Required. The SHA of the commit.
details_url	string	body	The URL of the integrator's site that has the full details of the check. If the integrator does not provide this, then the homepage of the GitHub app is used.
external_id	string	body	A reference for the run on the integrator's system.
status	string	body	The current status. Can be one of queued, in_progress, or completed.Default: queued
started_at	string	body	The time that the check run began. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ.
conclusion	string	body	Required if you provide completed_at or a status of completed. The final conclusion of the check. Can be one of action_required, cancelled, failure, neutral, success, skipped, stale, or timed_out. When the conclusion is action_required, additional details should be provided on the site specified by details_url.Note: Providing conclusion will automatically set the status parameter to completed. You cannot change a check run conclusion to stale, only GitHub can set this.
completed_at	string	body	The time the check completed. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ.
output	object	body	Check runs can accept a variety of data in the output object, including a title and summary and can optionally provide descriptive details about the run. See the output object description.

recipe_run SQLModel -> FastAPI -> CRUD Client & CLI #6

recipe_run SQLModel -> FastAPI -> CRUD Client & CLI #6

Conversation

cisaacstern commented Nov 17, 2021

cisaacstern commented Nov 19, 2021 • edited Loading

cisaacstern commented Nov 21, 2021 • edited Loading

cisaacstern commented Nov 23, 2021

rabernat commented Nov 23, 2021

cisaacstern commented Nov 24, 2021

rabernat commented Dec 16, 2021

cisaacstern commented Dec 16, 2021 • edited Loading

rabernat left a comment • edited Loading

Choose a reason for hiding this comment

rabernat Jan 3, 2022

Choose a reason for hiding this comment

rabernat commented Jan 7, 2022

cisaacstern commented Jan 7, 2022

rabernat commented Jan 7, 2022 • edited Loading

`recipe_run` SQLModel -> FastAPI -> CRUD Client & CLI #6

`recipe_run` SQLModel -> FastAPI -> CRUD Client & CLI #6

cisaacstern commented Nov 19, 2021 •

edited

Loading

cisaacstern commented Nov 21, 2021 •

edited

Loading

cisaacstern commented Dec 16, 2021 •

edited

Loading

rabernat left a comment •

edited

Loading

rabernat commented Jan 7, 2022 •

edited

Loading