Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client stays stuck after IoT Hub failover using private link #1197

Closed
gabrielSoudry opened this issue Oct 14, 2024 · 3 comments
Closed

Client stays stuck after IoT Hub failover using private link #1197

gabrielSoudry opened this issue Oct 14, 2024 · 3 comments
Labels

Comments

@gabrielSoudry
Copy link

Context

  • OS and version used: Debian Buster Armhf
  • Python version: 3.7
  • list of installed packages: 2.13.1

Description of the issue

We are using Azure IoT Hub with a failover setup between two regions, France Central and France South, both configured with Private Links. When we initiate a failover from France Central to France South, the IoT Hub successfully fails over, and DNS resolves to the new IP as expected. However, the Azure SDK client does not reconnect automatically after the failover, even though the DNS resolves correctly.

We would expect the SDK to handle the reconnection automatically when the DNS updates, but this does not happen.

Restart the python service work, but this defeats the purpose of the failover resilience.

Steps to Reproduce:

  • Set up two IoT Hub instances with Private Links in France Central and France South.
  • Use Azure SDK to establish a connection to the IoT Hub in France Central.
  • Failover the IoT Hub from France Central to France South.
  • Ping the IoT Hub address to confirm that DNS resolves to the correct IP.
  • Observe that the SDK client does not reconnect automatically.

Expected Behavior:
The Azure SDK client should automatically reconnect to the IoT Hub after failover, when DNS has updated to the new region's IP address.

Actual Behavior:
The Azure SDK client fails to reconnect to the IoT Hub after the failover, even though DNS resolves to the correct IP address.
The Azure SDK client does not reconnect automatically after the failover, and instead of retrying to reconnect, it remains stuck.

14 11:43:03 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.aio.async_clients: Sending message to Hub...
Oct 14 11:43:03 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: publishing on devices/unipi-m203-741/messages/events/
Oct 14 11:43:04 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: payload published for 6
Oct 14 11:43:04 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.aio.async_clients: Successfully sent message to Hub
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 7

==============FAILOVER==================
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Forcing paho disconnect to prevent it from automatically reconnecting
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called: The connection was lost.
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage(ConnectOperation): Connection watchdog expired.  Cancelling op
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnecting MQTT client
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 0
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.pipeline.pipeline_exceptions.OperationTimeout: Transport timeout on connection operation\n']
Oct 14 1
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:53:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage(ConnectOperation): Connection watchdog expired.  Cancelling op
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnecting MQTT client
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 0
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.pipeline.pipeline_exceptions.OperationTimeout: Transport timeout on connection operation\n']
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: connected with result code: 5
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 5
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_connection_failure called: Connection Refused: not authorised.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Forcing paho disconnect to prevent it from automatically reconnecting
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called: The connection was refused.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.UnauthorizedError: Connection Refused: not authorised.\n']
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']

=> STUCK

@gabrielSoudry gabrielSoudry changed the title Client Stays Stuck After IoT Hub Failover Using Private Link Client stays stuck after IoT Hub failover using private link Oct 14, 2024
@olivakar
Copy link
Collaborator

The creation of the device client is dependent on the connection string which belongs to a specific hub.
Unless the device client is created again with a different connection string reconnecting to a different hub on the fly not possible.

As this would be a significant change/addition to functionality we are not delivering new features at this time and are focusing on security and stability.

Since this is a very specific failover scenario, one of the approaches could be create the device client again using a different connection string in application-level code.

@gabrielSoudry
Copy link
Author

Thanks for your response, while I understand that the creation of the device client is tied to a specific connection string, in the case of a failover IoT Hub in another region, the connection using the X.509 certificate remains valid. Since it is the same IoT Hub (just a failover instance in a different region), the certificate is still applicable.

Image

@gabrielSoudry
Copy link
Author

gabrielSoudry commented Oct 21, 2024

@olivakar any news ? can you reopen the issue please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants