Description
I've noticed that if there's an error (timeout/disconnect/connection loss) during a BatchPin
submission to the blockchain connector (EVMConnect in my case), it's possible for EVMConnect to successfully process the transaction but the FireFly operation will be Failed
. Then, the batch processor resubmits the batch but EVMConnect returns a 409, which causes the processor to indefinitely retry and prevent new batches from occurring.
Example error message:
FF10458: Conflict from blockchain connector: FF21065: ID 'default:f5296ba7-1b23-4c7f-8612-620dafc0e40a' is not unique d=pinned_broadcast ns=default opcache=1UGYM3mn p=did:firefly:org/org_5452a6| pid=61421 role=batchmgr
Currently, the only way to fix this is restarting FireFly.
From my understanding, this should be handled by FireFly idempotent retry logic
firefly/internal/operations/manager.go
Line 178 in 1939b67
I think the bug is that an error is still returned by RunOperation
, causing the batch processor to retry forever
firefly/internal/batch/batch_processor.go
Line 622 in 1939b67
I'll link my naive fix that unblocks the batch processor, but I'm not sure it's fully correct because the operation still gets marked as Failed
.