You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have had some issues with NotLeaderException and StatusRuntimeException (UNAVAILABLE) resulting in a permanent disconnect that is only solved by a restart of the disconnected client.
This happens consistently whenever a leader election changes which node in our cluster acts as the leader.
From what I can see the issue seems to be isolated to appending to streams, as persistent subscriptions (as far as I am currently aware) seem to reconnect. This might not be the case, but I have not really looked into that part further at this moment.
I have managed to reproduce this with debug logs, as well as locally on my machine, by connecting to a cluster and then killing the leader and attempting to write to a stream.
Whilst debugging the code I think I have identified the issue, which is also apparent if you look at this debug log:
My guess is that the exception produced by com.eventstore.dbclient.ClientTelemetry#traceAppend is not handled in com.eventstore.dbclient.GrpcClient#runWithArgs becuase GrpcClient expects a NotLeaderException or StatusRuntimeException but receives a CompletionException.
So if I am correct, this should be solve:able by either not wrapping the exception in ClientTelemetry or by unwrapping CompletionExceptions in the GrpcClient.
I could provide a PR for this if you agree with me that it should be changed.
If it is of interest I can provide a small (but dirty) unit test that kinda reproduces this, but as I mentioned above - killing the leader node and then attempting a appendToStream should throw this error.
The text was updated successfully, but these errors were encountered:
Hi,
We have had some issues with
NotLeaderException
andStatusRuntimeException
(UNAVAILABLE) resulting in a permanent disconnect that is only solved by a restart of the disconnected client.This happens consistently whenever a leader election changes which node in our cluster acts as the leader.
From what I can see the issue seems to be isolated to appending to streams, as persistent subscriptions (as far as I am currently aware) seem to reconnect. This might not be the case, but I have not really looked into that part further at this moment.
I have managed to reproduce this with debug logs, as well as locally on my machine, by connecting to a cluster and then killing the leader and attempting to write to a stream.
Whilst debugging the code I think I have identified the issue, which is also apparent if you look at this debug log:
My guess is that the exception produced by
com.eventstore.dbclient.ClientTelemetry#traceAppend
is not handled incom.eventstore.dbclient.GrpcClient#runWithArgs
becuase GrpcClient expects aNotLeaderException
orStatusRuntimeException
but receives aCompletionException
.So if I am correct, this should be solve:able by either not wrapping the exception in ClientTelemetry or by unwrapping CompletionExceptions in the GrpcClient.
Something as simple as:
Should theoretically solve this, at least for this scenario.
Another potential solution would be to not wrap the exception at all, but it would require more intrusive changes to the Future chain. For example:
I could provide a PR for this if you agree with me that it should be changed.
If it is of interest I can provide a small (but dirty) unit test that kinda reproduces this, but as I mentioned above - killing the leader node and then attempting a appendToStream should throw this error.
The text was updated successfully, but these errors were encountered: