-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Closing transport with multiple bridges/subscribers connected #314
Comments
An update on this issue. I tested previous versions to identify in which release this error appears and it seems to appear in 1.0.0-beta.4. The 1.0.0-beta.3 does not have this specific error I described above. |
As a follow-up comment. The issue seems to be related to the reduced resources Zenoh router had allocated as a container on Kubernetes. What are the minimum memory and cpu resources required? That would be helpful to know. |
Hard to say, as it really depend on the number of routed DDS entities and on the amount of traffic to route. How did you found the resources allocation in Kubernetes was in cause ? |
We also didn't do a detailed analysis, but it was more a logic intuition. We visualize the data in Rviz and/or Foxglove in containers connected to the same Zenoh router part of the same Kubernetes cluster. We then noticed that as soon as multiple Rviz/Foglove instances were used the error occurred. This made us think that the router was not able to handle all the traffic. Therefore I increased the cpu and ram resources for the router (note not for the bridges) and that solved the issue. To make sure it works I use very high value for cpu (i.e., 8) and ram (i.e. 25 Gib) for the router container. On the container with Foxglove and the bridge I use now 5Gib of ram and 2 cpu, but the impression we have is that the key point are rather the resources for the Zenoh router. Obviously reducing the number of topics and reducing the pub frequency would help as well, but I was wondering if there are some values that we can use as reference to be sure the router can handle all the traffic. |
Describe the bug
I have a robot with the latest release of the zenoh bridge connecting as a client to a Zenoh router in a Kubernetes cluster. On the same k8s cluster I have a container with a zenoh-bridge that connects to the same Zenoh router as a client and can see these topics and for instance use Rviz with nav2 to visualize and move the robot.
When I start in another container another zenoh-bridge connecting to the same router and visualizing the topics I get the following error and the zenoh-bridge on the robot will stop working.
ERROR ThreadId(19) zenoh_transport::unicast::universal::tx: Unable to push non droppable network message to acac40b9496508dc4cf792ca876954fc. Closing transport!
OBS: I also observed this behavior without starting a second container, but with a second subscriber to a topic for instance with ros2 topic echo and rviz already running. However, in this case, it occurs inconsistently, so sometimes it works other times not.
Two warning messages I noticed that seemed also to be related are the following ones:
WARN net-0 ThreadId(10) zenoh::net::runtime::orchestrator: Unable to connect to tcp/ip:port! Received a close message (reason MAX_LINKS) in response to an OpenSyn on: TransportLinkUnicast { link: Link { src: tcp/ip:port, dst: tcp/ip:port, mtu: 64995, is_reliable: true, is_streamed: true }, config: TransportLinkUnicastConfig { direction: Outbound, batch: BatchConfig { mtu: 64995, is_streamed: true, is_compression: false }, priorities: None, reliability: None } } at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-1.0.0/src/unicast/establishment/open.rs:472.
WARN net-0 ThreadId(09) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/summit/lifecycle_manager_navigation/is_active <-> Zenoh:bot1/summit/lifecycle_manager_navigation/is_active): received error as reply for (2c2adf3057843613,26): ReplyError { payload: ZBytes(ZBuf { slices: [[54, 69, 6d, 65, 6f, 75, 74]] }), encoding: Encoding(Encoding { id: 0, schema: None }) }
To reproduce
System info
Robot with ROS2 humble container and ros2ddsbridge stable latest version 1.0.0
Container on Kubernetes cluster with ROS2 humble and ros2ddsbridge stable latest version 1.0.0
Zenoh router stable latest version 1.0.0
The text was updated successfully, but these errors were encountered: