Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional ErrorDBTransactionConflict if the agent is left running for a while #353

Open
aryanjassal opened this issue Jan 13, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@aryanjassal
Copy link
Member

Describe the bug

A database transaction conflict means that two transactions are attempting to modify the same value in the database. This is bad, as the two modifications can generate a conflict. That is why this error is raised in the first place.

The log specifies where the warning was raised in - Discovery.checkForRediscoveryHandler. As such, an operation is being performed in the handler in which multiple transactions are modifying the same resource. This is most likely a logic error and must be resolved.

Do note that allowing the handler to continue running doesn't see this error to happen again. It has been over three days since the last message, and nothing has popped up yet. The rediscovery handler should have run after three days. My laptop has been sleeping for a while, so that could be the reason that the rediscovery time was skipped.

To Reproduce

  1. Run Polykey agent
  2. Wait for a bit
  3. Encounter bug

Expected behavior

The DB conflict shouldn't happen

Logs

Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: pid               4598
Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: nodeId            vet1dhoduhkmn4749r8usiopivklr8i4qoh9kjnhtgeg5rie4nvh0
Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: clientHost        ::1
Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: clientPort        34113
Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: agentHost         ::
Jan 06 08:26:28 matrix-dell-3480-007 polykey[4598]: agentPort         48008
Jan 06 09:26:32 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.Discovery:Reverting to legacy identity claim discovery logic for github.com:brynblack
Jan 06 10:26:36 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.Discovery:Reverting to legacy identity claim discovery logic for github.com:brynblack
Jan 06 10:26:38 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.task v0prr2klogdo017asfppfdeamkc:Failed - Reason: AbortError: The user aborted a request., Handler: Discovery.discoverVertexHandler
Jan 06 11:26:40 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.Discovery:Reverting to legacy identity claim discovery logic for github.com:brynblack
Jan 06 11:37:27 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.NodeManager:Duplicate refreshBucket task was found for bucket 255, cancelling
Jan 06 11:41:27 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.NodeManager:Duplicate refreshBucket task was found for bucket 255, cancelling
Jan 06 12:26:44 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.Discovery:Reverting to legacy identity claim discovery logic for github.com:brynblack
Jan 06 13:26:48 matrix-dell-3480-007 polykey[4598]: WARN:polykey.PolykeyAgent.task v0prr65a39to018nm1416gds1d8:Failed - Reason: ErrorDBTransactionConflict, Handler: Discovery.checkForRediscoveryHandler

Additional context

Notify maintainers

@tegefaulkes

@aryanjassal aryanjassal added the bug Something isn't working label Jan 13, 2025
@aryanjassal aryanjassal self-assigned this Jan 13, 2025
@CMCDragonkai
Copy link
Member

The exception is not actually an error. It means the "user" must resubmit the transaction, but "user" in this case is an abstract concept, it's whatever has the scope of authority to manage the transaction concurrency which sometimes means it's the real user. This requires understanding the UI workflow or whether this is an autonomous operation.

@tegefaulkes
Copy link
Contributor

Its happening in a background handler. We shouldn't be seeing conflicts because we should be using transaction locking to prevent it.

Copy link
Member Author

const gestaltIds: Array<[GestaltIdEncoded, number]> = [];
// ...
// We want to lock all the ids at once before moving ahead
const locks = gestaltIds.map((gestaltIdEncoded) => {
  return [
    this.constructor.name,
    this.discoverVertexHandlerId,
    gestaltIdEncoded,
  ].join('');
});

Here, each value inside gestaltIdEncoded in the locks map is the array [GestaltIdEncoded, number] instead of being just the encoded id. I have changed the (gestaltIdEncoded) => {} to ([gestaltIdEncoded]) => {} to select only the required parameter.

I will run some tests locally, and then push up a fix for this if they all pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants