Skip to content

Commit

Permalink
Convert the UWS library to use the Wobbly backend
Browse files Browse the repository at this point in the history
Rather than storing UWS jobs directly in a database, which requires
every UWS-based application to manage its own separate database, use
the Wobbly service to manage all job storage. This service uses a
delegated token to determine the user and service, so considerably
less tracking of the user is required.

UWS applications now store the serialized parameter model in the
database rather than a list of key/value pairs, and rely on methods
on the parameters model to convert to the XML format for the current
IVOA UWS protocol.

Add a mock for Wobbly that can be used to test UWS applications
without having the Wobbly API available.

Drop the `ErrorCode` enum, since its values were specific to SODA,
and instead take the error code as a string. Drop some related
exceptions that are not used directly in Safir and are specific to
SODA.
  • Loading branch information
rra committed Dec 11, 2024
1 parent 8662035 commit 6aae663
Show file tree
Hide file tree
Showing 44 changed files with 1,852 additions and 2,282 deletions.
7 changes: 7 additions & 0 deletions changelog.d/20241209_145305_rra_DM_47986.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Backwards-incompatible changes

- Rewrite the Safir UWS support to use Pydantic models for job parameters. Services built on the Safir UWS library will need to change all job creation dependencies to return Pydantic models.
- Use the Wobbly service rather than a direct database connection to store UWS job information. Services built on the Safir UWS library must now configure a Wobbly URL and will switch to Wobbly's storage instead of their own when updated to this release of Safir.
- Support an execution duration of 0 in the Safir UWS library, mapping it to no limit on the execution duration. Note that this will not be allowed by the default configuration and must be explicitly allowed by an execution duration validation hook.
- Convert all models returned by the Safir UWS library to Pydantic. Services built on the Safir UWS library will have to change the types of validator functions for destruction time and execution duration.
- Safir no longer provides the `safir.uws.ErrorCode` enum or the exception `safir.uws.MultiValuedParameterError`. These values were specific to a SODA service, and different IVOA UWS services use different error codes. The Safir UWS library now takes error code as a string, and each application should define its own set of error codes in accordance with the IVOA standard it is implementing.
2 changes: 2 additions & 0 deletions docs/_rst_epilog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
.. _pre-commit: https://pre-commit.com
.. _Pydantic: https://docs.pydantic.dev/latest/
.. _Pydantic BaseSettings: https://docs.pydantic.dev/latest/concepts/pydantic_settings
.. _pydantic-xml: https://pydantic-xml.readthedocs.io/en/latest/
.. _PyPI: https://pypi.org/project/safir/
.. _pytest: https://docs.pytest.org/en/latest/
.. _redis-py: https://redis.readthedocs.io/en/stable/
Expand All @@ -33,3 +34,4 @@
.. _Uvicorn: https://www.uvicorn.org/
.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/stable/
.. _vo-models: https://vo-models.readthedocs.io/latest/
.. _Wobbly: https://github.com/lsst-sqre/wobbly/
4 changes: 4 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
from documenteer.conf.guide import *

# Disable JSON schema because it doesn't seem that useful and apparently can't
# deal with generics, so it produces warnings for the UWS Job model.
autodoc_pydantic_model_show_json = False

html_sidebars["api"] = [] # no sidebar on the API page
2 changes: 0 additions & 2 deletions docs/user-guide/database/schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Safir provides some additional supporting functions to make using Alembic more s

These instructions assume that you have already defined your schema with SQLAlchemy's ORM model.
If you have not already done that, do that first.
For UWS applications that only have the UWS database, the declarative base of the schema is `safir.uws.UWSSchemaBase`.

Set up Alembic
==============
Expand Down Expand Up @@ -85,7 +84,6 @@ Replace :file:`alembic/env.py` with the following:
)
Replace ``example`` with the module name and application name of your application as appropriate.
For applications that only use the UWS database, replace ``example.schema.Base`` in the above with `safir.uws.UWSSchemaBase`.

Add Alembic to the Docker image
-------------------------------
Expand Down
104 changes: 10 additions & 94 deletions docs/user-guide/uws/create-a-service.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Select the ``UWS`` flavor.

Then, flesh out the application by following these steps:

#. :doc:`Define the API parameters <define-inputs>`
#. :doc:`Define the parameter models <define-models>`
#. :doc:`Define the API parameters <define-inputs>`
#. :doc:`Write the backend worker <write-backend>`
#. :doc:`Write the test suite <testing>`

Expand Down Expand Up @@ -40,7 +40,10 @@ This will add standard configuration options most services will need and provide
Second, add a property to ``Config`` that returns the UWS configuration.
For some of these settings, you won't know the values yet.
You will be able to fill in the value of ``parameters_type`` after reading :doc:`define-models`, the values of ``async_post_route`` and optionally ``sync_get_route`` and ``sync_post_route`` after reading :doc:`define-inputs`, and the value of ``worker`` after reading :doc:`write-backend`.

You will be able to fill in the values of ``job_summary_type`` and ``parameters_type`` after reading :doc:`define-models`.
You will be able to fill in the values of ``async_post_route`` and optionally ``sync_get_route`` and ``sync_post_route`` after reading :doc:`define-inputs`.
You will be able to fill in the value of ``worker`` after reading :doc:`write-backend`.
For now, you can just insert placeholder values.

.. code-block:: python
Expand Down Expand Up @@ -88,11 +91,12 @@ Set up the FastAPI application
The Safir UWS library must be initialized when the application starts, and requires some additional FastAPI middleware and error handlers.
These need to be added to :file:`main.py`.

First, initialize the UWS application in the ``lifespan`` function:
First, initialize and shut down the UWS application in the ``lifespan`` function:

.. code-block:: python
:caption: main.py
:emphasize-lines: 1,6,8
from safir.dependencies.http_client import http_client_dependency
from .config import uws
Expand All @@ -104,7 +108,7 @@ First, initialize the UWS application in the ``lifespan`` function:
await uws.shutdown_fastapi()
await http_client_dependency.aclose()
Second, install the UWS routes into the external router before including it in the application:
Second, install the UWS routes into the external router **before** including it in the application:

.. code-block:: python
:caption: main.py
Expand All @@ -128,94 +132,6 @@ Third, install the UWS middleware and error handlers.
# Install error handlers.
uws.install_error_handlers(app)
Add a command-line interface
============================

The UWS implementation uses a PostgreSQL database to store job status.
Your application will need a mechanism to initialize that database with the desired schema.
The simplest way to do this is to add a command-line interface for your application with an ``init`` command that initializes the database.

.. note::

This approach has inherent race conditions and cannot handle database schema upgrades.
It will be replaced with a more sophisticated approach using Alembic_ once that support is ready.

First, create a new :file:`cli.py` file in your application with the following contents:

.. code-block:: python
:caption: cli.py
import click
import structlog
from safir.asyncio import run_with_asyncio
from safir.click import display_help
from .config import uws
@click.group(context_settings={"help_option_names": ["-h", "--help"]})
@click.version_option(message="%(version)s")
def main() -> None:
"""Administrative command-line interface for example."""
@main.command()
@click.argument("topic", default=None, required=False, nargs=1)
@click.pass_context
def help(ctx: click.Context, topic: str | None) -> None:
"""Show help for any command."""
display_help(main, ctx, topic)
@main.command()
@click.option(
"--reset", is_flag=True, help="Delete all existing database data."
)
@run_with_asyncio
async def init(*, reset: bool) -> None:
"""Initialize the database storage."""
logger = structlog.get_logger("example")
await uws.initialize_uws_database(logger, reset=reset)
Look for the instances of ``example`` and replace them with the name of your application.

Second, register this interface with Python in :file:`pyproject.toml`:

.. code-block:: toml
:caption: pyproject.toml
[project.scripts]
example = "example.cli:main"
Again, replace ``example`` with the name of your application.

Third, change the :file:`Dockerfile` for your application to run a startup script rather than run :command:`uvicorn` directly:

.. code-block:: docker
:caption: Dockerfile
# Copy the startup script
COPY scripts/start-frontend.sh /start-frontend.sh
# Run the application.
CMD ["/start-frontend.sh"]
Finally, create the :file:`scripts/start-frontend.sh` file:

.. code-block:: bash
:caption: scripts/start-frontend.sh
#!/bin/bash
#
# Create the database and then start the server.
set -eu
example init
uvicorn example.main:app --host 0.0.0.0 --port 8080
Again, replace ``example`` with the name of your application.

Create the arq worker for database updates
==========================================

Expand Down Expand Up @@ -248,6 +164,6 @@ Next steps

Now that you have set up the basic structure of your application, you can move on to the substantive parts.

- Define the API parameters: :doc:`define-inputs`
- Define the parameter models: :doc:`define-models`
- Define the API parameters: :doc:`define-inputs`
- Write the backend worker :doc:`write-backend`
69 changes: 24 additions & 45 deletions docs/user-guide/uws/define-inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,8 @@ Defining service inputs

Your UWS service will take one more more input parameters.
The UWS library cannot know what those parameters are, so you will need to define them and pass that configuration into the UWS library configuration.
This is done by writing a FastAPI dependency that returns a list of input parameters as key/value pairs.

What parameters look like
=========================

UWS input parameters for a job are a list of key/value pairs.
The value is always a string.
Other data types are not directly supported.
If your service needs a different data type as a parameter value, you will need to accept it as a string and then parse it into a more complex structure.
See :doc:`define-models` for how to do that.

All FastAPI dependencies provided by your application must return a list of `UWSJobParameter` objects.
The ``parameter_id`` attribute is the key and the ``value`` attribute is the value.

The key (the ``parameter_id``) is case-insensitive in the input, but it will be lowercased by middleware installed by Safir.
You will therefore always see lowercase query and form parameters in your dependency and do not have to handle other case possibilities.

UWS allows the same ``parameter_id`` to occur multiple times with different values.
For example, multiple ``id`` parameters may specify multiple input objects for a bulk operation that processes all of the input objects at the same time.
This is done by writing a FastAPI dependency that returns a Pydantic model for your job parameters.
See :doc:`define-models` for details on how to define that model.

Ways to create jobs
===================
Expand All @@ -43,12 +26,12 @@ Sync jobs are not supported by default, but can be easily enabled.
Sync jobs can be created via either ``POST`` or ``GET``.
You can pick whether your application will support sync ``POST``, sync ``GET``, both, or neither.
Supporting ``GET`` makes it easier for people to assemble ad hoc jobs by writing the URL directly in their web browser.
However, due to unfixable web security reasons, ``GET`` jobs can be created by any malicious site on the Internet, and therefore should not be supported if the operation of your service is destructive, expensive, or dangerous if performed by unauthorized people.
However, due to unfixable web security limitations in the HTTP protocol, ``GET`` jobs can be created by any malicious site on the Internet, and therefore should not be supported if the operation of your service is destructive, expensive, or dangerous if performed by unauthorized people.

For each supported way to create a job, your application must provide a FastAPI dependency that reads input parameters via that method and returns a list of `UWSJobParameter` objects.
For each supported way to create a job, your application must provide a FastAPI dependency that reads input parameters via that method and returns the Pydantic model for parameters that you defined in :doc:`define-models`.

Async POST dependency
---------------------
=====================

Supporting async ``POST`` is required.
First, writing a FastAPI dependency that accepts the input parameters for your job as `form parameters <https://fastapi.tiangolo.com/tutorial/request-forms/>`__.
Expand Down Expand Up @@ -88,22 +71,19 @@ Here is an example for a SODA service that performs circular cutouts:
),
),
] = None,
) -> list[UWSJobParameter]:
"""Parse POST parameters into job parameters for a cutout."""
params = []
for i in id:
params.append(UWSJobParameter(paramater_id="id", value=i))
for c in circle:
params.append(UWSJobParameter(parameter_id="circle", value=c))
return params
) -> CutoutParameters:
return CutoutParameters(
ids=id,
stencils=[CircleStencil.from_string(c) for c in circle],
)
This first declares the input parameters, with full documentation, as FastAPI ``Form`` parameters.

Note that the type is ``list[str]``, which allows the parameter to be specified multiple times.
If the parameters for your service cannot be repeated, change this to `str` (or another appropriate basic type, such as `int`).

You do not need to do any input validation of the parameter values here.
This will be done later as part of converting the input parameters to your parameter model, as defined in :doc:`define-models`.
Then, it converts the form parameters into the Pydantic model for your job parameters.
Here, most of the work is done by the ``from_string`` static method on ``CircleStencil``, defined in :ref:`uws-model-parameters`.
This conversion should also perform any necessary input validation.

Async POST configuration
------------------------
Expand Down Expand Up @@ -134,9 +114,9 @@ The ``summary`` and ``description`` attributes are only used to generate the API
They contain a brief summary and a longer description of the async ``POST`` route and will be copied into the generated OpenAPI specification for the service.

Sync POST
---------
=========

Supporting sync ``POST`` is very similar: define a FastAPI dependency that accepts ``POST`` parameters and returns a list of `UWSJobParameter` objects, and then define a `UWSRoute` object including that dependency and pass it as the ``sync_post_route`` argument to `UWSAppSettings.build_uws_config`.
Supporting sync ``POST`` is very similar: define a FastAPI dependency that accepts ``POST`` parameters and returns your Pydantic parameter model, and then define a `UWSRoute` object including that dependency and pass it as the ``sync_post_route`` argument to `UWSAppSettings.build_uws_config`.
By default, sync ``POST`` is not supported.

Normally, the input parameters for sync ``POST`` will be the same as the input parameters for async ``POST``, so you can reuse the same FastAPI dependency.
Expand Down Expand Up @@ -165,7 +145,7 @@ Here is an example for the same cutout service:
This would then be passed as the ``sync_post_route`` argument.

Sync GET
--------
========

Supporting sync ``GET`` follows the same pattern, but here you will need to define a separate dependency that takes query parameters rather than form parameters.
Here is an example dependency for a cutout service:
Expand Down Expand Up @@ -202,14 +182,14 @@ Here is an example dependency for a cutout service:
),
),
],
request: Request,
) -> list[UWSJobParameter]:
"""Parse GET parameters into job parameters for a cutout."""
return [
UWSJobParameter(parameter_id=k, value=v)
for k, v in request.query_params.items()
if k in {"id", "circle"}
]
) -> CutoutParameters:
return CutoutParameters(
ids=id,
stencils=[CircleStencil.from_string(c) for c in circle],
)
The body here is identical to the body of the dependency for ``POST``.
The difference is in how the parameters are defined (``Query`` vs. ``Form``).

As in the other cases, you will then need to pass a `UWSRoute` object as the ``sync_get_route`` argument to `UWSAppSettings.build_uws_config`.
Here is an example:
Expand Down Expand Up @@ -238,5 +218,4 @@ This would then be passed as the ``sync_post_route`` argument.
Next steps
==========

- Define the parameter models: :doc:`define-models`
- Write the backend worker :doc:`write-backend`
Loading

0 comments on commit 6aae663

Please sign in to comment.