Disconnect instantly while reconnecting #197
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We noticed that trying to shut down clients in reconnecting-state may take several minutes before the
on_disconnect
callback gets called. The reason appears to be that when theconnection
has nochannel
when it has no network, the disconnection request will not be handled until the next scheduled reconnect-task is run. This may take up to 128 seconds, which is a bit long for e.g. demanding people to wait until they can reboot their computer. In addition to this, if the client is running as a Windows service and the computer is in S0 low-power idle state, the service process gets only a fraction of normal CPU time, and it may take over 30 minutes before the reconnect task gets run.It looks like the disconnection task is heavily tied with a
channel
of theconnection
, which is not available when there's no network, so currently we wait until a reconnect-task tries to create a newchannel
. However once the disconnect-request has been made, the newchannel
will be only used for callings_mqtt_client_shutdown
, which is to be called after thechannel
has completed its shutdown, so having achannel
should not be needed?This patch skips the wait for the reconnect-task, and instead lets the disconnect-task to call
s_mqtt_client_shutdown
directly in case there's no channel. This shutdown function was moved from client_channel_handler.c to client.c to get access tostatic void s_mqtt_client_shutdown()
defined within client.c. It also feels like it fits there almost better than the channel-file, as the only link to channels appears to be thats_mqtt_disconnect_task
should ensure that the channel has been shut down befores_mqtt_client_shutdown
gets called.Testing this did not show any obviuous side-effects even for the case where reconnect task gets run after disconnect, as the call
aws_atomic_store_ptr(&connection->reconnect_task->connection_ptr, NULL)
froms_mqtt_disconnect_task
appears to cause the reconnect tasks_attempt_reconnect
to bail out due to connection pointer being NULL.One worry pointed out by @kankri about this patch is the small but possible chance that the
s_mqtt_disconnect_task
ands_attempt_reconnect
tasks would get run simultaneously by two separate aws event loop threads, and thens_attempt_reconnect
acquires a pointer toconnection
whose content might get freed and/or invalidated bys_mqtt_disconnect_task
befores_attempt_reconnect
is done. But I guess this is already a potential problem, because the comment "If there is an outstanding reconnect task, cancel it" insides_mqtt_disconnect_task
already assumes that there might be a reconnect task scheduled for the short term future? We could make a separate issue out of that.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.