Skip to content

Commit

Permalink
Add Schema migration process documentation and update alembic script …
Browse files Browse the repository at this point in the history
…notes
  • Loading branch information
Vebop committed Nov 25, 2024
1 parent 64b1cc5 commit f4b66f1
Show file tree
Hide file tree
Showing 11 changed files with 378 additions and 40 deletions.
22 changes: 11 additions & 11 deletions alembic-autogenerate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@

#
# How to use this script:
# 1. Load the LSST environment and setup sdm_schemas and felis.
# source loadLSST.bash
# setup felis
# setup -r /path/to/sdm_schemas
# 1. Install Felis and sdm_schemas, set environment variables:
# pip install lsst-felis testing.postgresql alembic sqlalchemy pyyaml
# git clone https://github.com/lsst/sdm-schemas
# cd sdm_schemas
# export SDM_SCHEMAS_DIR=`pwd```
# 2. From the root of the consdb git repo, invoke the script. Supply a
# revision message as the command line argument:
# python alembic-autogenerate.py this is my revision message "\n" \
# the message can span multiple lines "\n" \
# if desired
# python alembic-autogenerate.py DM-12345
# 3. Heed the message at the end to revise your auto-generated code as needed.
# 4. Remove the autogenerated creation of sql views (visit1, ccdvisit1).
#

import os
import sys

from alembic.config import Config
from alembic import command
from felis.tests.postgresql import setup_postgres_test_db
from sqlalchemy.sql import text

from felis.tests.postgresql import setup_postgres_test_db
from alembic import command
from alembic.config import Config

if len(sys.argv) <= 1:
print(
Expand Down Expand Up @@ -68,7 +68,7 @@
Don't forget to edit your migration
files! You'll need to remove the visit1
and ccdvisit1 tables, and you might need
to shuffle data around to accomodate the
to shuffle data around to accommodate the
new schema!
==========================================
"""
Expand Down
2 changes: 2 additions & 0 deletions doc/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ doxygen.conf
# Sphinx products
_build
py-api

*.DS_Store
2 changes: 1 addition & 1 deletion doc/developer-guide/consdbclient-summit-utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
ConsDbClient in summit_utils
############################

How to write and test code in summit_utils for ConsDbClient
How to write and test code in summit_utils for ConsDbClient
42 changes: 40 additions & 2 deletions doc/operator-guide/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,43 @@
Deployment
###########

* Database
* REST API Server
Database
========

- Deployments of the Consolidated Database are currently located at

- Summit
- USDF (+ dev, use the same underlying database, a replication of Summit)
- Base Test Stand (BTS)
- Tucson Test Stand (TTS)

- Updates to these deployments may be needed when there are edits to the schema for any of the cdb_* tables defined in <link to> sdm_schemas.

Tools:
------

- Argo-CD
- LOVE
- Felis

Repositories:
-------------

- ``phalanx`` (https://github.com/lsst-sqre/phalanx)
- ``sdm_schemas`` (https://github.com/lsst/sdm_schemas)
- ``consdb`` (https://github.com/lsst-dm/consdb)

Access needed:
--------------

- NOIRLab VPN
- Summit VPN
- USDF

Process:
--------



REST API Server
===============
28 changes: 26 additions & 2 deletions doc/operator-guide/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,29 @@
Monitoring
###########

* Database
* REST API Server
Reporting channels
==================

- Users of ConsDB, ConsDBClient (``pqserver``) will usually report via #consolidated-database in rubin-obs.slack.com when they are having issues.
- ConsDB operators should monitor this channel and #ops-usdf, #ops-usdf-alerts for issues and outages reported, as well as escalate verified database issues.

Database
========

- The ConsDB team is responsible for verifying whether or not the database is up when issues are reported
- They can check the method reported by the users, check using ``psql``/ ``pgcli``, and check in the #ops-usdf slack channel for currently reported issues.

- Once the ConsDB team has confirmed there is an issue with the database, they should notify #ops-usdf slack channel and USDF DBAs should be responsible for fixing/restarting.

REST API Server
===============

- If we suspect the API server died, the ConsDB team should be responsible for checking and restarting
- Use the appropriate argo-cd deployment graph to check deployment logs, and potentially restart the service.


Other issues
------------

- K8s infrastructure died The ConsDB team can verify that that is the problem, but there are likely to be wider issues seen
- USDF or Summit K8s/IT support should be responsible for fixing.
113 changes: 112 additions & 1 deletion doc/operator-guide/runbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,115 @@
RunBook
########

Maybe from ConsDb Usage Confluence page?
https://rubinobs.atlassian.net/wiki/spaces/LSSTOps/pages/45665320/Consolidated+Database+ConsDB+Runbook+draft+incomplete

Overview
========

This application does ...

Its design and architecture are documented at ...

Usage
=====

Most users
----------

Administration
--------------

Architecture
============

Kubernetes vclusters used

Relevant policies

S3DF Dependencies
-----------------

Kubernetes
Weka storage for Kubernetes
...

Systems
-------

Components, Kubernetes namespaces, deployments

Backups
-------

Associated Systems
------------------

IAM
===

Requesting Access
-----------------

Key Roles
---------

Service Accounts
----------------

Network
=======

External endpoints, IP and port, encryption, authentication, clients, API

SLAC-internal endpoints, IP and port, encryption, authentication, clients, API

Configuration
=============

GitHub repos with deployments

Monitoring
==========

Grafana or other links

Maintenance
===========

Periodic tasks

Documentation and Training
==========================

- Links to documentation and training resources

- Here: (https://consdb.lsst.io)

Support
=======

#consolidated-database

Overall complaints:
-------------------

Kian-Tat Lim

``consdb`` services (hinfo, pqserver):
--------------------------------------

Brian Brondel , Valerie Becker

Transformed EFD component:
--------------------------

Rodrigo Boufleur , Glauber Costa Vila Verde

``consdb`` component in Jira.


Known Issues
============

Standard Procedures
===================
120 changes: 115 additions & 5 deletions doc/operator-guide/schema-migration-process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,118 @@
Schema Migration Process
########################

* Add columns to sdm_schemas
* Create alembic migration
* Test migration and code to populate the new columns/tables at TTS/BTS if Summit schema is changing
* Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
* Deploy code to populate at Summit and/or USDF
Add columns to sdm_schemas
==========================

- Add the requested database additions, justifications, and where they are generated to our `confluence entry table <https://rubinobs.atlassian.net/wiki/spaces/DM/pages/246644760/Consolidated+Database+Non-EFD+Entries>`_.
- Create a ticket and edit the repository at https://github.com/lsst/sdm_schemas to apply your schema changes to any of the ``cdb_*.yml`` schemas.
- If your sdm_schemas PR has issues, check that the schema conforms to Felis's data model and valid SQL tables can be created with `felis validate/create <https://felis.lsst.io/user-guide/cli.html#felis-validate>`_
- Alembic migrations should be automatically created by a git workflow after your sdm_schemas pull request completes.

Create an Alembic Migration (manually)
======================================

- Alembic (https://alembic.sqlalchemy.org/en/latest/front.html) keeps track of versioning by autogenerated migrations to sync the test stands and summit databases.
- Versioning our database schema changes allows us to apply edits and move the database’s state forward or backward as needed.

1. Create an Alembic migration on your ConsDB ticket branch.
2. Use the script ``consdb/alembic-autogenerate.py`` to generate Alembic migrations.
3. Follow the directions in the header of the script, then run ``python alembic-autogenerate.py`` to create version files in respective database-named directories in ``consdb/alembic/``.
4. Manually edit the generated files in ``consdb/alembic/<table-name>/`` to:

- Remove the ``visit1`` and ``ccdvisit1`` views.
- Ensure constraints and renamed columns are correct.

Test alembic migration
======================
- You will need to test the version migrating the TTS db using your ConsDB branch on before merging or applying the migration to the Summit.
- This testing should include testing any code that populates the new columns/tables at TTS/BTS if Summit schema is changing.

1. Update the deployment on the test stand:
-------------------------------------------

- Choose the appropriate test stand (TTS, BTS)
- Create a branch in ``phalanx`` and edit the corresponding test stand environment file ``phalanx/applications/consdb/values-<test stand>.yaml`` to point to your branch's built docker image (tickets-DM-###).
- Coordinate and announce in the appropriate slack channel that you will begin testing your migrations.
- Update the ConsDB deployment in ``<url.to.teststand>/argo-cd`` to use your ``phalanx`` branch in the ``Target Revision``. Refresh and check pod logs.
- Verify the tables that you will be upgrading exist using ``psql``
- From the ``consdb/`` directory, (where ``alembic.ini`` file is) use the alembic commands to upgrade the existing database tables: ``alembic upgrade head -n <database name>``
- Deploy new ConsDB software (``hinfo``, ``pqserver``) and check the initial logs.

2. Test with LATISS imaging in ATQueue:
---------------------------------------

- Access LOVE via ``<url.to.teststand>/love`` and use the 1Password admin information to sign in, or your SLAC username and password.
- Navigate to the ATQueue or Auxillary Telescope (AuxTel) Script Queue.
- See (TTS Start Guide)[https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53739987/Tucson+Test+Stand+Start+Guide] for guidelines on using the test stands.
- Before editing these scripts, note their starting configurations, as we will return the configuration to that when we are done.
- Take a test/simulated picture with LATISS through the ATQueue using these three scripts:

1. ``set_summary_state.py`` Change the configuration to set ATHeaderService and ATCamera to ENABLED.
2. ``enable_latiss.py`` Remove any existing configuration.
3. ``take_image_latiss.py`` Update the configuration to remove anything that is not 'nimages' (1) and 'image_type' (BIAS or DARK or FLAT)

- Once you have put these three scripts in the queue, click ``run``.
- Watch for errors in the Script Queue and the Argo-CD ConsDB pod logs and ``hinfo-latiss`` deployment.
- Address any errors and retest.
- Check the database by using ``psql`` commands like ``\dt`` to display the table names and maybe even ``SELECT * from cdb_latiss.exposure where day_obs == <YYYYMMDD>;`` to view the most recent data.

- Run set_summary_state to set ATHeaderService and ATCamera back to STANDBY, and return LATISS back to STANDBY.
- Then return these three scripts to their original configurations.

- If you have encountered errors in this process, do not proceed to the summit, but address those errors and retest them with your ``phalanx`` branch pointing to your ConsDB branch with the updates that fix the errors.


- If tests are successful, create a pull request for the Alembic migration in ConsDB. Tag the release according to ``standards-practices`` guidelines.
- Update your existing phalanx branch to point the environment based deployments to this ConsDB tag.
- You are able to retest on the test stand at this point, hopefully there were no changes to your ConsDB pull request so this step is trivial.


Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
-----------------------------------------------------------------------------------------------

- What is prompt release?


Deploy code to populate db at Summit and/or USDF
------------------------------------------------

- Follow the testing steps above for testing alembic migration and code at TTS/BTS, before the you consider deploying at the summit.

- The steps to deploy at the summit mirror the steps to test on a test stand with coordination and permission from the observers and site teams.
- Access to argo-cd deployments is available via the Summit OpenVPN.
- To coordinate your deployment update on the summit, you must attend Coordination Activities Planning (CAP) meeting on Tuesday mornings and announce your request.

- Add it to the agenda here: https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53765933/Agenda+Items+for+Future+CAP+Meetings

- The CAP members may tell you a time frame that is acceptable for you to perform these changes.

- They may also tell you specific people to coordinate with to help you take images to test LATISS and LSSTCOMCAMSIM tables. There will be more tables to test eventually.
- Channels to note: #rubinobs-test-planning; #summit-announce; #summit-auxtel, https://obs-ops.lsst.io/Communications/slack-channel-usage.html.

- When you get your final approval and designated time to perform the changes to ConsDB, announce on #summit-announce, and follow similar steps as test stand procedure above.

USDF Deployment Steps
---------------------

- Must happen in synchrony with Summit deployment

1. Disable (pause) SUBSCRIPTION at USDF. (in what table)
2. Perform the migration at the summit with the steps below before step 3.
3. Connect to the USDF database via psql and perform the alembic migration.
4. Check or test as agreed upon with the ConsDB team.
5. Enable and Refresh Subscription at USDF.

If there is no impact or coordination with Summit needed: Run alembic migration at USDF, and test as appropriate.

Summit Deployment Steps
-----------------------

1. Use a branch in ``phalanx`` to point to the ConsDB tag for deployment.
2. Set the Argo-CD application ``consdb's`` target revision to your ``phalanx`` branch.
3. Refresh the ConsDB application and review pod logs.
4. Connect to the summit database via psql and perform the alembic migration.
5. Have an image taken with the observing team, then verify database entries with a SQL query or Jupyter notebook.
6. Check your new entries in the database using a jupyter notebook or SQL query in RSP showing your new image has been inserted to the database as expected.

- Once deployment succeeds, set the ``Target Revision`` in Argo-CD back to ``main`` and complete the ``phalanx`` PR for the tested ConsDB tag.
2 changes: 1 addition & 1 deletion doc/user-guide/consdb-client-library-in-summit-utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
ConsDB Client Library in summit_utils
######################################

Querying using ConsDbClient
Querying using ConsDbClient
Loading

0 comments on commit f4b66f1

Please sign in to comment.