Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CAPPL-536] Workflow update not picked up by Syncer #1029

Merged
merged 6 commits into from
Feb 19, 2025

Conversation

agparadiso
Copy link
Contributor

@agparadiso agparadiso commented Feb 13, 2025

Description

we were seeing these errors:

{"level":"error","ts":"2025-02-12T18:04:12.571Z","logger":"WorkflowRegistrySyncer","caller":"syncer/workflow_registry.go:335","msg":"failed to handle event","version":"2.20.0@6ca1531","err":"failed to close workflow engine: rpc error: code = Unknown desc = error unregistering trigger: triggerId wf_005bd99697e7dfe1acd89a6a5a9a1907f53654355a671d0bd658514066e4ae54_trigger_0 not found","type":"WorkflowUpdatedV1","stacktrace":"github.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer.(*workflowRegistry).readRegistryEvents\n\tgithub.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer/workflow_registry.go:335\ngithub.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer.(*workflowRegistry).Start.func1.1\n\tgithub.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer/workflow_registry.go:234"}

In order to reproduce the issue:

  • Register a workflow with a cron schedule that executes more often than every 30s (eg: every 15s)
  • Observe that we see errors registering the trigger in the engine like this one:
 [ERROR] trigger event was an error rpc error: code = Unknown desc = error registering trigger: maximum fastest cron schedule is 30s; not executing workflows/engine.go:744          logger=WorkflowRegistrySyncer.WorkflowEngine stacktrace=github.com/smartcontractkit/chainlink/v2/core/services/workflows.(*Engine).worker
  • Update the workflow
  • Observe that we fail to update the workflow

Why is this happening?

communication between client and server is not synchronous, if the trigger failed to be registered (eg: when registering a workflow with a cron schedule lower than 30s) this error was being dragged to the forwardTriggerResponsesToChannel, the trigger was never registered and we were trying to Unregister a non existing trigger .

The fix

In order to avoid this Im sending a first ACK/ERROR message that will block until this is received. if there is an error registering the trigger in the first place this will be handled properly

How was this tested?

Added a unit test reproducing the error + tested on a local DON

CAPPL-536

@agparadiso agparadiso force-pushed the CAPPL-536_update_not_picked_up_by_syncer branch from d80a1aa to b5783e6 Compare February 13, 2025 17:45
@agparadiso agparadiso force-pushed the CAPPL-536_update_not_picked_up_by_syncer branch from b5783e6 to 975bdb1 Compare February 14, 2025 10:45
@agparadiso agparadiso force-pushed the CAPPL-536_update_not_picked_up_by_syncer branch from 975bdb1 to 1e81679 Compare February 14, 2025 10:51
@agparadiso agparadiso force-pushed the CAPPL-536_update_not_picked_up_by_syncer branch from e394a58 to 46388e4 Compare February 18, 2025 15:09
@agparadiso agparadiso marked this pull request as ready for review February 18, 2025 15:52
@agparadiso agparadiso requested a review from a team as a code owner February 18, 2025 15:52
@agparadiso agparadiso requested a review from ilija42 February 18, 2025 15:52

if ackMsg.GetResponse().GetError() != "" {
return nil, errors.New(fmt.Sprintf("failed registering trigger: %s", ackMsg.GetResponse().GetError()))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agparadiso This code is a bit ambiguous I think: what happens if the server sends a response without an ack? We should clearly error in that case as that would mean the protocol has been violated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to send: TriggerResponseMessage_Ack and fail if ackMsg.GetAck() == nil much cleaner, thanks🙇🏼

}
if err = server.Send(msg); err != nil {
return fmt.Errorf("failed sending error response for trigger registration %s: %w", request, err)
}
return fmt.Errorf("error registering trigger: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to return an error in this case, you already returned it back to the caller successfully, just return nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, I think return server.Send(msg) is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, done 👍🏼

@ilija42 ilija42 requested a review from a team February 19, 2025 11:07
@agparadiso agparadiso merged commit 6817814 into main Feb 19, 2025
15 of 16 checks passed
@agparadiso agparadiso deleted the CAPPL-536_update_not_picked_up_by_syncer branch February 19, 2025 17:26
agparadiso added a commit that referenced this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants