Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations along with first batch of data should be performed in a single transaction #92

Open
mrsarm opened this issue May 20, 2021 · 1 comment
Labels

Comments

@mrsarm
Copy link
Contributor

mrsarm commented May 20, 2021

Each script and each batch of data is synchronized in a different transaction, so if we introduce a bug like the unique index added recently into the telemetry view, the process is aborted while synchronizing the first batch of data but the changes are not rolled back, so in that case the bugs were actually 2: the index definition was wrong and the process should roll back the last migrations and set of data.

How it should work

  1. A connection and a transaction with PG is created and shared across all the "postgrator" objects that execute the migrations
  2. Then the same transaction object is passed to couch2pg object to perform the first batch of synchronization.
  3. If the are an error either in the step 1 or 2 the transaction is aborted. If not, the transaction is committed.
  4. At some point medic-couch2pg will try to synchronize data again, a new transaction need to be created, because we cannot persist infinitely the transaction created before. Also the transaction should be shared among all the couch2pg objects that sync data, because we are synchronizing the same PG database against 3 different Couch databases (medic, medic-sentinel and medic-users-meta), and each synchronization is performed by different couch2pg objects.

Considerations

  • This may slow down the sync process, we should perform some stress tests. By the other hand it shouldn't be a big problem if sync is performed in a time where the data is little used.
  • May also require more disc space in the PG instances, specially when materialized views with big chunk of data are recreated.
  • Not 100% sure that materialized views are kept in transactions, at a first glance looks like they are supported within transactions.

CC @garethbowen @kennsippell

@mrsarm mrsarm added the bug label May 20, 2021
@mrsarm
Copy link
Contributor Author

mrsarm commented May 20, 2021

Also another important point to move to transactions: if we implement it, and we do not perform any more CASCADE migrations, and instead we DROP first the elements one by one that we control in the migrations, we will avoid the problem of cascading undesired drops, eg. a partner add a view that depends of one view maintained here, the process will fail because the partner view depends on the dropped view, the transaction will be reverted back and no undesired deletions will happen.

However this won't avoid other undesired consequences, eg. the partner edited one view controlled here and when medic-couch2pg recreates it with new changes, nothing will prevent the change to be lost, that kind of issues need to be addressed handling outside changes in an accountable manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant