Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider setting up Alembic for Data Migrations #584

Open
maxachis opened this issue Jan 20, 2025 · 4 comments
Open

Consider setting up Alembic for Data Migrations #584

maxachis opened this issue Jan 20, 2025 · 4 comments

Comments

@maxachis
Copy link
Contributor

maxachis commented Jan 20, 2025

Having learned a number of lessons from the initial database migration setup, I think it's worth converting to Alembic.

Rationale is as follows:

  1. Alembic is a popular database migration tool with a lot of community support.
  2. Alembic automatically enables breaking up revisions into separate revision files, which can be alternately upgraded or downgraded as necessary. This would make it easier for us to reverse changes.
  3. Migrations can be individually tested to ensure expected behavior (i.e., that certain data changes form as expected).

Obstacles/Required Steps

  1. Alembic is tightly coupled with SQLAlchemy, and our current SQLAlchemy models do not perfectly mirror the database -- there might be some information (such as certain enum types and constraints, etc.) that are currently hidden in the bowels of the database and not reflected in the app models. These would need to be accounted for and adjusted.
  2. Logic such as the creation of test database users would need modified.
  3. Logic such as our existing Jenkins migration jobs would also need modfied.
@maxachis
Copy link
Contributor Author

maxachis commented Jan 22, 2025

Alembic has now been set up in the dev version of the database.

The next step would be to set up the in-app startup logic in v2 production (and stage as well). Because this has already been done in the Source Collector App, we do have a framework for which to do this.

The plan for this would be as follows:

  • If we don't yet have the Data Sources App containerized, we probably will want to look into doing that to make these steps easier (Done in dev)
    Following steps would probably require v2 Prod to be the new source of truth.
  • Modify the existing container startup jobs for Prod and Stage to incorporate Alembic. Again, using Source Collector App as a model for this (Done for Stage, not yet for v2 Prod)
  • Likely will need to set up a GitHub action to run alembic migrations using a user with elevated permissions. 1
  • Any additional database changes will have to be done via Alembic rather than through the Prod-To-Dev migration repository. The Prod-To-Dev repository would thus become deprecated, although the backup management functionality would still remain in place.

Edit: How we run the migrations has changed, see comments below.

Footnotes

  1. This is in contrast with the Source Collector app, where migrations were run on app startup in digital ocean. We probably want a separate line of logic with separate connection, because otherwise we'd have our API running with elevated permissions, which is probably not kosher, security-wise.

@josh-chamberlain
Copy link
Contributor

@maxachis obstacle number 1 seems like a hidden blessing to me, anyway!

@maxachis
Copy link
Contributor Author

The Data Source API has been containerized, and successfully set up on data-sources-v2-dev.

The current alembic migration logic will likely need modifying, but how it's modified will differ depending on if we're doing stage or production:

Stage

Because the stage database will regularly refresh from production, we need to be able to apply that alembic migration after the refresh has occurred. Thus, the Jenkins Prod-To-Stage automation job will need to implement the alembic migration, or else trigger a separate job which performs that migration.

Production

Because production will be the source of truth, we want alembic migrations to occur in a way that is synchronized with the app updates. That will likely require a GitHub action (perhaps one which triggers a Jenkins job, just to keep things consistent) that activates following a push into main.

@maxachis
Copy link
Contributor Author

The alembic migrations have been tested successfully on Stage! I added a small alembic migration, and the combination of the automation jobs and app logic caused it to show up in the stage database. Woohoo 🎉

At this point, all that remains is prod. I've set up an automation job for Prod Alembic migration, which would run right after a v1 to v2 migration, but I can't run it yet because the latest version of v2 main doesn't have the alembic changes. So here are the steps remaining:

  • Merge alembic changes into v2 main
  • Enable the Prod Alembic Migration job and test that it works.
  • Cut the v1 cord and have v2 Prod as our source of truth,
  • Adjust Prod Alembic Migration so that it runs when something is pushed to the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants