-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error from long-running event handler #33
Comments
My suspicion is that Event Store is closing the connection due to a timeout. If you need to maintain a long-running connection to Event Store, it's necessary to send data occasionally for the server to consider the connection to be active. The easiest way to do this is to call the ping method on a regular basis. See #24 for another issue describing this scenario. |
Is there an idiomatic way of checking the current state of the connection, or would I need to track that myself in the |
You would need to track that yourself from those events. |
As near as we've been able to determine, this problem stems from the automated heartbeat messages that Event Store sends (adding the
For whatever reason, it seems that sometimes the connection can be closed between receiving a heartbeat request and sending the response. Is this something that we should be accounting for in our application? |
And for any future visitors, this is roughly the ping logic we're using:
|
Ok, if you're suspecting the heartbeat responses are a problem then we'll need some more information to diagnose it. I would recommend looking at the Event Store logs to see what it has to say about the connection (that will tell you whether it has closed the connection, or it thinks everything is okay). The other thing that would help would be a packet capture from Wireshark, so that we can see if it's a particular sequence of events that is causing the problem, or if there's something wrong with the heartbeat response packets themselves. |
Here's the relevant bit from
The exact timestamp of the error from the container was 2018-03-27T15:28:10.225626957Z. 10.10.0.32 is the IP of the consuming service. Still working on getting a packet capture. |
Sorry I lost the thread on this, had to deal with a couple other issues that popped up. What would you want from a packet capture? This is running on Ubuntu server, so it looks I'd be using tshark or tcpdump but if I'm being completely honest I'm in way over my head with this. I'm far from a TCP expert so I'm not even sure what I'm looking for or how to go about getting it. |
So the Event Store logs are suggesting that we didn't reply to the Heartbeat request with a Heartbeat response. A packet capture should be able to tell us if that is the case, and if not, might identify what the confusion is. All we need to capture is the TCP data on port 1113 when this issue is happening. I'm no expert on tcpdump (I usually use Wireshark myself), but it looks like |
After a certain degree of bumbling and false starts, I have successfully obtained a packet capture....which is chock full of a bunch of proprietary data that I don't think I can share. :/ Is there an easy way to extract the heartbeat packets? |
The heartbeat packets by themselves probably aren't that useful, it's more useful to see the full stream of packets, in case there's something wrong with the order in which they're being sent. Three possible approaches from here:
|
Okay, I managed to randomize/obfuscate the sensitive data with relative ease. Let me know if this is what you need. |
So the interesting thing to note there is that it does respond to the heartbeat request in packet 81343, but then fails to respond later to the second request in 81567. That would suggest that the receive logic is not handling incoming messages after the ReadStreamEventsForwardCompleted is received. You could try enabling the debug flag on the connection to get log output from the client, to see if it recognizes the second HeartbeatRequest. |
(Disclaimer I'm not sure if this error is coming from event store client, from event store itself, or something completely different that has not yet been identified.)
We have an event handler that throws the following error when it runs for a sufficiently long time:
If I restart the service running this event handler, it picks up where it died just fine and continues processing events.
Something that might be relevant - the event handler that appears to be the source of this error is subscribed to one stream. When it picks up a certain class of event, it opens up a different stream to read detailed events relating to the subscribed stream.
The text was updated successfully, but these errors were encountered: