Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reset] Track child workflows initiated after reset #7210

Merged
merged 13 commits into from
Feb 5, 2025

Conversation

gow
Copy link
Contributor

@gow gow commented Jan 31, 2025

What changed?

In this PR I'm recording a set of children that were initialized after the reset point. The children are identified by "workflow_type:workflow_id" key.

Why?

When the parent starts to make progress after reset, it needs to know all the children that were already started after the reset point. In the current iteration of the feature we will use this information to terminate any running child before starting another instance.
We need this information mainly for 2 reasons.

  1. If we don't attempt to terminate the parent may not make progress due to ID conflict policy
  2. On the other hand, always terminating all children before starting is also wrong since we may,
    • Terminate any children that may be started in the future that were not present at the time of reset.
    • Terminate children belonging to a different parent.
      So, we need to record an exact set of children that were observed (at the time of reset) to have been initialized after the reset point.

How did you test it?

Potential risks

Documentation

Is hotfix candidate?

@gow gow requested a review from a team as a code owner January 31, 2025 22:48
@gow gow marked this pull request as draft January 31, 2025 22:48
@gow gow force-pushed the cg/reset_before_child_init_1 branch from 57f6bb3 to 9d6e3b3 Compare February 4, 2025 01:02
@gow gow force-pushed the cg/reset_before_child_init_2 branch from f8ec2ee to 8a05b96 Compare February 4, 2025 02:00
@gow gow requested a review from yycptt February 4, 2025 02:13
@gow gow force-pushed the cg/reset_before_child_init_1 branch from 9d6e3b3 to 39094ab Compare February 4, 2025 02:18
@gow gow force-pushed the cg/reset_before_child_init_2 branch from 29ba43a to f10c2c4 Compare February 4, 2025 02:18
@gow gow force-pushed the cg/reset_before_child_init_1 branch from 39094ab to 6015fef Compare February 4, 2025 18:55
@gow gow force-pushed the cg/reset_before_child_init_2 branch from f10c2c4 to 5f71593 Compare February 4, 2025 20:11
@gow gow changed the base branch from cg/reset_before_child_init_1 to main February 4, 2025 20:12
@gow gow changed the base branch from main to cg/reset_before_child_init_1 February 4, 2025 20:15
@gow gow force-pushed the cg/reset_before_child_init_2 branch from 5f71593 to 2f2f197 Compare February 4, 2025 20:17
@gow gow marked this pull request as ready for review February 4, 2025 20:20
@gow gow force-pushed the cg/reset_before_child_init_1 branch from 2814a46 to 695190a Compare February 4, 2025 22:39
@gow gow force-pushed the cg/reset_before_child_init_2 branch from 2f2f197 to 9e4907e Compare February 4, 2025 22:43
)

var (
errWorkflowResetterMaxChildren = serviceerror.NewInternal(fmt.Sprintf("WorkflowResetter encountered max allowed children [%d] while resetting.", maxChildrenInResetMutableState))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by invalid arg? I want to include the maxChildrenInResetMutableState value in the error message (in case we change these limits).
Or did you mean something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the error type should be invalid argument, otherwise it counts against SLA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Changed it to InvalidArgument

service/history/ndc/workflow_resetter.go Outdated Show resolved Hide resolved
service/history/ndc/workflow_resetter.go Outdated Show resolved Hide resolved
Base automatically changed from cg/reset_before_child_init_1 to main February 5, 2025 00:07
)

var (
errWorkflowResetterMaxChildren = serviceerror.NewInternal(fmt.Sprintf("WorkflowResetter encountered max allowed children [%d] while resetting.", maxChildrenInResetMutableState))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the error type should be invalid argument, otherwise it counts against SLA.

service/history/ndc/workflow_resetter.go Show resolved Hide resolved
@gow gow enabled auto-merge (squash) February 5, 2025 07:13
@gow gow merged commit 1b11ed2 into main Feb 5, 2025
50 checks passed
@gow gow deleted the cg/reset_before_child_init_2 branch February 5, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants