Description
Hi there,
occasionally I get :function_clause and :bad_match errors (with the latter exposing data from the db!).
It can usually be remedied by increasing the :read_timeout parameter, indicating the Oracle server is just sometimes a little faster, sometimes slower.
But isn't that a bug? It seems to me, that the Erlang code tries to decode a partial message after the socket recv times out, leading to failures in attempting to decode it. The second case here seems suspicious to me: Returning :ok on timeout:
recv(read_timeout, Socket, Length, {_Tout, ReadTout} = Touts, Acc, Data) ->
case sock_recv(Socket, 0, ReadTout) of
{ok, NetworkData} ->
recv(Socket, Length, Touts, <<Acc/bits, NetworkData/bits>>, Data);
{error, timeout} ->
{ok, ?TNS_DATA, Data};
{error, Reason} ->
{error, socket, Reason}
end.
The actual error location is usually somewhere in the 'decoder', but it is hidden by the try-catch in jamdb_oracle.ex/sql_query
-
Maybe there is a reason for that, but shouldn't it return an error? Just removing the second case maybe?
-
Why is there a separate read_timeout in addition to the overall timeout anyway? If multiple socket reads are needed (because of the fetch size, I guess), why not just calculate an absolute 'end time' from the overall query timeout at the start, and then give each socket recv a remaining relative time - giving up if that is <= 0. In that way, I can say for each Query how long I want to wait, regardless of the fetch size or the way the communication with the server is split. I might want to use different timeouts for different queries, so setting a global read_timeout that serves all is not very practical.