Skip to content

Latest commit

 

History

History

backend

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Backend documentation

Core technologies

CZGE's Backend application is built with the following stack:

  • FastAPI is an async Python API framework
    • pydantic is a data validation library used to de/serialize and validate api requests and responses
  • SQLAlchemy is an ORM
  • Alembic is a tool for managing database migrations
  • pytest is a testing framework

In addition it depends on some core external tools for data storage

  • PostgreSQL is an RDBMS that stores most of our metadata
  • Amazon S3 is a blob storage service that stores most of our job output

App structure

The root of our backend application is in src/backend. noteworthy subdirectories are annotated here:

  • aspen - Most of our backend application lives here
    • api - entrypoint for our backend API service
      • schemas - API input/output validation models live here
      • utils - Code used by multiple endpoints (this could be better organized)
      • views - API endpoint code lives here
    • workflows - code related to our compute jobs
    • database - code related to establishing DB connections / sessions
      • models - SQLAlchemy models used across our backend application
  • database_migrations - alembic migrations
  • etc - some basic setup configuration for our backend container

Discoverability

Our backend API is self-documenting via OpenAPI and JSON-Schema:

Creating a new API endpoint.

Some endpoints will only need the last few steps, but this is the maximal case:

  1. Create a new module in src/backend/aspen/api/views and add a router:
# src/backend/aspen/api/views/new_module.py
from fastapi import APIRouter
router = APIRouter()
  1. Import it into src/backend/aspen/api.main.py and create a route for it:
# src/backend/aspen/api/main.py
import new_module from api.views
...
    _app.include_router(
        new_module.router,
        prefix="/v2/new_module",
        dependencies=[Depends(get_auth_user)],  # If all endpoints require authentication
    )
  1. Add any new pydantic validation schemas as necessary:
# src/backend/aspen/api/schemas/new_module.py
from aspen.api.schemas.base import BaseResponse, BaseRequest


class NewRequest(BaseRequest):
    foo: str

class NewResponseItem(BaseResponse):
    id: int
    bar: str

class NewResponseList(BaseResponse):
    rows: List[NewResponseItem]
  1. Add endpoints to your view module.
# src/backend/aspen/api/schemas/new_schemas.py
import sqlalchemy as sa
from fastapi import APIRouter, Depends
from sqlalchemy.ext.asyncio import AsyncSession

from aspen.api.auth import get_auth_user
from aspen.api.deps import get_db, get_settings
from aspen.api.schemas.new_module import NewRequest, NewResponseList
from aspen.api.settings import Settings
from aspen.database.models import User, SomeModel

router = APIRouter()


# Specify the HTTP method, sub-path (appending to the path added to main.py) and response model type.
# The response model type here populates our API documentation.
@router.get("/", response_model=NewResponseList) # All endpoints that return a list must end with a trailing slash
async def list_items(
    request: NewRequest, # GET requests often don't need input model validation, but this is here for completeness.
    db: AsyncSession = Depends(get_db),
    settings: Settings = Depends(get_settings),
    # If this endpoint requires authentication, we need to make sure to depend on get_auth_user
    # or get_admin_user here to validate the user's credentials and return 401/403 responses
    # if their credentials are invalid. For most endpoints, get_auth_user is added at the root
    # router in aspen/api/main.py, and it only needs to be included again here if we want to
    # *use* the user object in our endpoint.
    user: User = Depends(get_auth_user),
) -> NewResponseList:

    rows = (await db.execute(sa.select(SomeModel).filter(SomeModel.somefield == request.bar))).scalars().all()
    return NewResponseList.parse_obj({"rows": rows})

Database concerns

Interacting with the local database in sql

You can also connect to your local psql console by running:

aspen% make local-pgconsole
psql (13.1)
Type "help" for help.

aspen_db=> select * from aspen.users;
 id | name | email | auth0_user_id | group_admin | system_admin | group_id
----+------+-------+---------------+-------------+--------------+----------
(0 rows)

aspen_db=>

Interacting with the local database in python

It is possible to interact with the local database in ipython:

aspen% make local-dbconsole
docker-compose exec backend aspen-cli db --local interact
Python 3.9.1 (default, Feb  9 2021, 07:55:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: session = engine.make_session()

In [2]: session.query(User).all()
Out[2]: []

In [3]:

Profiling SqlAlchemy queries

Interactive profiling

aspen-cli db interact has a --profile option that prints out every query that's executed and how long they take:

aspen% make local-dbconsole-profile
docker-compose exec backend aspen-cli db --local interact --profile
Python 3.9.1 (default, Feb  9 2021, 07:55:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: session = engine.make_session()

In [2]: results = session.query(Sample).all()
DEBUG:sqltime:Start Query: SELECT aspen.samples.id AS aspen_samples_id, aspen.samples.submitting_group_id AS aspen_samples_submitting_group_id, aspen.samples.private_identifier AS aspen_samples_private_identifier, aspen.samples.original_submission AS aspen_samples_original_submission, aspen.samples.public_identifier AS aspen_samples_public_identifier, aspen.samples.sample_collected_by AS aspen_samples_sample_collected_by, aspen.samples.sample_collector_contact_email AS aspen_samples_sample_collector_contact_email, aspen.samples.sample_collector_contact_address AS aspen_samples_sample_collector_contact_address, aspen.samples.authors AS aspen_samples_authors, aspen.samples.collection_date AS aspen_samples_collection_date, aspen.samples.location AS aspen_samples_location, aspen.samples.division AS aspen_samples_division, aspen.samples.country AS aspen_samples_country, aspen.samples.organism AS aspen_samples_organism, aspen.samples.host AS aspen_samples_host, aspen.samples.purpose_of_sampling AS aspen_samples_purpose_of_sampling, aspen.samples.specimen_processing AS aspen_samples_specimen_processing
FROM aspen.samples
DEBUG:sqltime:Query Complete!
DEBUG:sqltime:Total Time: 0.012416

In [3]:

Profiling in code

We previously had nicer DB profiling in code, but it appears to have broken at some point. In the meantime, if you just need very, very basic profiling, you can just add an echo=True argument to whatever runs the create_engine() call and then remove it when you're done. If you need something more complete, you'll probably want to fix the below. Also check out these docs on SQLAlchemy for more guidance/context.

FIXME (Vince) As of Nov 14, 2022, the following does not appear to work. The code is present, but invoking enable_profiling, etc has no effect right now. We should fix that. --- OLD README: The module aspen.database.connection contains a number of methods to manage the capture of queries issued by sqlalchemy. enable_profiling()/disable_profiling() can be used to enable and disable profiling, and enable_profiling_ctx() is a context manager that can be used to manage a block of code that requires profiling.

Autogeneration of schema migration

  • after modifying/adding any database table/schema code run:
    • make backend-alembic-autogenerate MESSAGE="descriptive message"
  • this will create a migration file under src/backend/database_migrations
    • make sure you look this file over and verify that alembic made the appropriate changes
  • run make backend-alembic-upgrade-head
    • this updates your local running database, make sure you use either make pg-console or make db-console to check that changes were applied appropriately!

Updating python dependencies

To update python dependencies, update the Pipfile and run make update-deps. This will update Pipfile.lock and requirements.txt (used by setup.py). You will also need to rebuild local running docker containers by running make local-rebuild.

If you add a third-party library (directly or indirectly) that does not support python typing, then you may need to add an entry to mypy.ini to let mypy know not to expect type hints for that library.

Adding new users to the staging/prod db:

Create a new user to the auth0 covidtracker tenet, take note of auth0 user id

  • connect to staging aspen_db

    • make remote-pgconsole ENV=<staging|prod> DB=aspen_db
  • execute insert sql:

    • aspen_db=> INSERT INTO users (name, email, auth0_user_id, group_admin, system_admin, group_id) VALUES ('<name>', '<email>', '<auth0 user ID>', 'f', 't', <group ID>);
      • to see all possible group ids:
        • select * from groups;

How to use aspencli:

the cli is useful to call api endpoints through the terminal. To start using the cli you must be logged into rdev, staging, or prod with your aspen system admin account.

Example endpoint call to update public_ids based on private to public id mapping csv file (column headers must be named private_identifier,public_identifier, no line numbering)

  • python src/cli/aspencli.py --env <local|staging|prod|rdev> samples update_public_ids --group-id 1 --private-to-public-id-mapping ~/Downloads/test_rename_public_identifiers.csv

    • if using rdev also specify the stack name with --stack <stack-name> flag