Skip to content

Batch dispatcher stuck in infinite retry loop when encountering conflict error #1594

Closed
@shorsher

Description

@shorsher

I've noticed that if there's an error (timeout/disconnect/connection loss) during a BatchPin submission to the blockchain connector (EVMConnect in my case), it's possible for EVMConnect to successfully process the transaction but the FireFly operation will be Failed. Then, the batch processor resubmits the batch but EVMConnect returns a 409, which causes the processor to indefinitely retry and prevent new batches from occurring.

Example error message:

FF10458: Conflict from blockchain connector: FF21065: ID 'default:f5296ba7-1b23-4c7f-8612-620dafc0e40a' is not unique d=pinned_broadcast ns=default opcache=1UGYM3mn p=did:firefly:org/org_5452a6| pid=61421 role=batchmgr

Currently, the only way to fix this is restarting FireFly.

From my understanding, this should be handled by FireFly idempotent retry logic

// We are now pending - we know the connector has the action we're attempting to submit

I think the bug is that an error is still returned by RunOperation, causing the batch processor to retry forever

return bp.retry.Do(ctx, "batch dispatch", func(attempt int) (retry bool, err error) {

I'll link my naive fix that unblocks the batch processor, but I'm not sure it's fully correct because the operation still gets marked as Failed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions