Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(re)Organize sync vs async propagation: actual delay vs requirements analysis #1181

Open
smadbe opened this issue Oct 1, 2024 · 2 comments
Labels
Definition in Progress Issue definition not completed

Comments

@smadbe
Copy link
Contributor

smadbe commented Oct 1, 2024

Preamble

  • I will not quote all discussions that we have in the last few months, but of course, there is a lot of context for this discussion
  • "propagation" used in general includes all group ancestors, item ancestors, permissions and result propagations
  • the list of propagations applied by service

Need for immediate change vs duration

  • some propagations may be quite long (let's say >1sec), some may not be long at all (e.g., just updating its own score does not cause thousands of entries to be updated)
  • some propagations has an effect on the current user, some does not: the user is more willing to waiting for the update in the first case

=> hopefully these 2 are not independent... typically those affecting other users are more likely to be long, but for those it is not a big deal if they are applied 20sec later

Not all services have the same impact

Some of the services are called a lot (>100x/sec), while some are called <10x per day by a moderator, or even <1x month by a super administrator.

group and item ancestor propagation

Requirements:

  • Internally: Inconsistency in the ancestors may have severe effect on the consistency of the data
  • UX: better if applied immediately as it would probably create inconsistency in what the services return
    Actual speed:
  • They should always be fast (??? correct ? or did we have slow item propagation?)

perm and result propagation

Permissions
UX requirements:

  • changes affecting other users may always (?) be postponed to a second later without problems
  • perm change on oneself on the item we are working on (for instance when creating a new item) should be done immediately

Results
Actual delay:

  • some changes (such as affecting the item hierarchy) may affect a large amount users and a large amount of items... so a huge amount of results
  • updating its own score on a single item should have a limited impact so should not be too long

UX requirements:

  • updating its own score on a single item may unlock content... ideally the UI should know immediately about that
  • update on other users can always (?) wait

Per services

Services with low/medium frequency and limited effect on propagation (so the propagation caused should be very short)

Those affecting the group membership of 1 user. Result propagation is run as as it may enable visibility for the user to a new item subtree... and so requiring to compute the results for this new subtree.

  • groupInvitationAccept
  • groupJoinRequestCreate
  • groupsJoinByCode
  • groupJoinRequestsAccept
  • groupLeaveRequestsAccept
  • groupInvitationsCreate
  • groupLeave
  • userDataRefresh
  • accessTokenCreate
  • itemEnter

Those affecting the result of 1 item and 1 participant, so that may require propagation to "ancestor results" but no unlock:

  • itemGetAnswerToken
  • itemGetHintToken
  • attemptCreate

Those affecting the result of 1 item and 1 participant but that may cause unlock:

  • saveGrade
    (note that we will probably need soon to know immediately in response if it has caused unlock)

Those which remove group for users... which should not change anything for results / perm:

  • groupMembersRemove
  • groupRemoveChild
  • groupDelete
  • groupUpdate

All these should probably do their propagation SYNC (providing they are not applying propagation from other services of course)

Service with (very) high call frequency and limited impact

itemTaskTokenGenerate: affect 1 participant on 1 item and does not trigger unlock

resultStart/resultStartPath: affect 1 participant on 1 item and does not trigger unlock

These should probably do their propagation SYNC (providing they are not applying propagation from other services of course). But in addition they might need an "customized" (lightweight) propagation algorithm knowing their needs (for instance, we know they cannot trigger unlock and do simple changes to result ancestors)

Service with a (possibly) larger propagation radius

Those adding multiple users to a group

  • contestSetAdditionalTime
  • groupAddChild

For these 2 services:

  • The result propagation comes that adding a participant in a group may give him visibility to a new subtree of items, so which may require computing results on them. The permission propagation would come from possible unlock from that previous operation, which is very unlikely. So even if it will probably not, it may be a long propagation causing a service timeout.
  • Typically the current user is giving perm to other users. In such a case, seeing the outcome of the change immediately does not matter.

=> so the result/permission propagation should probably be async here

Those affecting permissions

  • updatePermissions
  • itemDependencyApply

Affecting permission means many permissions may need to be recomputed. Per se, it cannot probably be that long, but that may retrigger a result propagation (for the same reason as for the previous services about groups) which may take time. So probably it requires async propagation.

Those affecting item structure

  • itemCreate
  • itemDelete
  • itemUpdate

These services affect the results of many (possibly all) users on possibly a high item hierarchy depth. This may cause unlocking which may trigger permission propagation as well. These services are the main (only?) source of problems. So they need to run their propagation async.

But but... these are also affecting the current user:

  • When creating an item, the user gets owner permission on the item (which require propagation to be effective) and typically the first thing the user wants to do is to edit the item, but currently, it may not be able to see the title of the element he has just created because of propagation hasn't run yet.. Problem we will need to fix very soon
  • The score of the parent chapter may be immediately impacted, probably it is clearer for the user if that score is updated immediately in sync. (Lower priority)

=> so probably the propagation impacting the current user have to be applied in sync, the other asynchronously.

Conclusions

  • most services could probably go back to full sync propagation
  • a few (3.. 2 mainly actually) services may require a specific optimized sync result propagation as they are called very often and they only need a small subset of the propagation part
  • there is still async result propagation needed for some services
  • the services running their propagation synchronously must have a way to just run their part of the propagation
  • a few services needs a mixed of sync+async result propagation
  • the saveGrade will probably need to return what has been unlocked as an effect of its propagation
@smadbe smadbe added the Definition in Progress Issue definition not completed label Oct 1, 2024
@smadbe smadbe changed the title (re)Organize sync vs async propagation (re)Organize sync vs async propagation: requirement-impact analysis Oct 1, 2024
@smadbe smadbe changed the title (re)Organize sync vs async propagation: requirement-impact analysis (re)Organize sync vs async propagation: requirements-impact analysis Oct 1, 2024
@smadbe smadbe changed the title (re)Organize sync vs async propagation: requirements-impact analysis (re)Organize sync vs async propagation: actual delay vs requirements analysis Oct 1, 2024
@zenovich
Copy link
Collaborator

zenovich commented Oct 14, 2024

They should always be fast (??? correct ? or did we have slow item propagation?)

It's fast. I've tried this test on the anonymized DB:

  1. delete from items_ancestors;
    Query OK, 26898 rows affected (2.38 sec)
  2. insert into items_propagate select id, 'todo' from items on duplicate key update ancestors_computation_state='todo';
    Query OK, 8808 rows affected (0.41 sec)
    Records: 4404 Duplicates: 4404 Warnings: 0
  3. Comment out all the propagations from the db-recompute command except for the one related to items ancestors and build the app.
  4. Run time ./bin/AlgoreaBackend db-recompute <env_name>
    The result is 4.371s.

Note, that the test simulates an unreal situation where we need to insert all the items_ancestors from scratch. It's the absolutely worst case, impossible in the prod. Still, it takes only 4 seconds (including loading the environment and running two transactions, the first one is dummy). So we can be sure it's always possible to run the items ancestors recalculation synchronously and inside the main transaction as we used to do and as we do for the groups ancestors (the same algorithm and even the same method in the code).

After moving it back into a single transaction and adding some optimizations it takes only 2 seconds:

time ./bin/AlgoreaBackend db-recompute full
Loading environment: full
Running ItemItemStore.CreateNewAncestors()
DONE
./bin/AlgoreaBackend db-recompute full  0.01s user 0.01s system 1% cpu 2.132 total

@smadbe
Copy link
Contributor Author

smadbe commented Oct 16, 2024

They should always be fast

It's fast. I've tried this test on the anonymized DB:

From our discussion on slack: This answer only applies to the item case of the question, group ancestor propagation is not necessarily fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Definition in Progress Issue definition not completed
Projects
None yet
Development

No branches or pull requests

2 participants