-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic :bad_match and :function_clause errors (read_timeout) #176
Comments
Packet header unfortunately doesn't contain PacketSize for long packets > 8k recv() loop is executed until timeout, driver can't endlessly wait last packet's part PacketSize is here
Add erlang:display and you'll see where break occurs
|
Please test stage branch.
|
I don't quite understand; gen_tcp:recv is always called with length=0 as far as I can see. So as long as it is only called when at least one byte is definitely still missing, I don't see the problem. You don't have to "detect" the end of a message via the timeout, do you?
Sure. It should wait for the overall/total timeout chosen by the user minus the already elapsed time. (unless the user gives a timeout of |
0 means that all available bytes are returned. All bytes of one packet. |
Sure. But I still don't understand why it's necessary to try to decode a packet after a timeout occured. (assuming the timeout would correspond to the query timeout specified by the user; instead of being a fixed global value)
I've seen it with a TTI_RXH token, if that helps.
Getting timeouts |
It seems dataflags are not empty only with 23c :( |
That kind of helped. But now it actually waits for the full read_timeout in normal operations! I.e. it gets reaaaally slow with a read_timeout of 500ms; even the connect fails with a read_timeout of 3000. So this made things much worse, I dare to say. |
stage23c is slow with 19c Try add parameter socket_options with recbuf Looks better |
Why? I have removed the Overall I just wanted to make a suggestion for improvement. I don't see why the read_timeout parameter is necessary at all. It doesn't correspond to any external/high level behaviour of running queries (*), nor does it affect the server, as far as I understand it. (* something similar might make sense for |
Hi, even with my changes (not trying to decode after a socket timeout), I get sporadic :bad_match errors in production. {:badmatch, <<0, 0, 1, 10, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 61, 0, 1, 1, 0, 0, 0, 0, 25, 79, 82, 65 ... >>}
{:badmatch, <<1, 10, 1, 10, 0, 6, 34, 1, 7, 0, 1, 100, 0, 0, 0, 7, 2, 193, 50, 3, 194, 17, 33, 3, 194, 9, 17, ... >>} Tried to enforce it by doing many queries, but without luck. Happens every few hours, so it's hard to collect any more information. Thanks! |
I also see this sporadically
|
Hi there,
occasionally I get :function_clause and :bad_match errors (with the latter exposing data from the db!).
It can usually be remedied by increasing the :read_timeout parameter, indicating the Oracle server is just sometimes a little faster, sometimes slower.
But isn't that a bug? It seems to me, that the Erlang code tries to decode a partial message after the socket recv times out, leading to failures in attempting to decode it. The second case here seems suspicious to me: Returning :ok on timeout:
The actual error location is usually somewhere in the 'decoder', but it is hidden by the try-catch in jamdb_oracle.ex/sql_query
Maybe there is a reason for that, but shouldn't it return an error? Just removing the second case maybe?
Why is there a separate read_timeout in addition to the overall timeout anyway? If multiple socket reads are needed (because of the fetch size, I guess), why not just calculate an absolute 'end time' from the overall query timeout at the start, and then give each socket recv a remaining relative time - giving up if that is <= 0. In that way, I can say for each Query how long I want to wait, regardless of the fetch size or the way the communication with the server is split. I might want to use different timeouts for different queries, so setting a global read_timeout that serves all is not very practical.
The text was updated successfully, but these errors were encountered: