-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cyphal support for v6x and v6c #23488
base: main
Are you sure you want to change the base?
Conversation
@@ -174,7 +174,7 @@ int16_t CanardSocketCAN::transmit(const CanardTxQueueItem &txf, int timeout_ms) | |||
_send_tv->tv_usec = deadline_systick % 1000000ULL; | |||
_send_tv->tv_sec = (deadline_systick - _send_tv->tv_usec) / 1000000ULL; | |||
|
|||
return sendmsg(_fd, &_send_msg, 0); | |||
return sendmsg(_fd, &_send_msg, MSG_DONTWAIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you guard this behind ifndef CONFIG_NET_CAN_RAW_TX_DEADLINE.
Only devices that don't implement NET_CAN_RAW_TX_DEADLINE suffer from the issue you're describing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as I understand, stm32h7 is supposed to support CONFIG_NET_CAN_RAW_TX_DEADLINE (why not?) and here I enabled it. I looked deeper into the Nuttx CAN implementation and I can provide a few details of the root of the issue. It's quite interesting.
Each time when we send a CAN frame, we call can_sendmsg()
. This function notifies stm32_fdcan_sock with netdev_txnotify_dev()
that new TX data is available and then goes into net_timedwait(&state.snd_sem, UINT_MAX)
where the thread waits until the callback is called.
The problem is that fdcan_txavail_work()
calls sdevif_poll(&priv->dev, fdcan_txpoll)
only if the TX ring buffer is not full, otherwise it does nothing, and the associated callback is never called. In this case, the thread is blocked forever.
This is what I experience in practice with CONFIG_NET_CAN_RAW_TX_DEADLINE and without MSG_DONTWAIT:
- If I remove all Cyphal publishers except Heartbeat, it is fine with or without a connected CAN device, because the TX timeout is enough to keep the ring buffer always not full.
- However, we have a port.List publisher that periodically publishes a long message. Once it publishes, it overflows the ring buffer and blocks the thread forever.
Anyway:
- I wrapped sendmsg with CONFIG_NET_CAN_RAW_TX_DEADLINE as you suggested.
- I removed this CONFIG_NET_CAN_RAW_TX_DEADLINE config from fmu-v6c and fmu-v6x
- I also wrapped setsockopt(_fd, SOL_CAN_RAW, CAN_RAW_TX_DEADLINE, &on, sizeof(on)) because we don't need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the driver either doesn't abort correctly.
Or it doesn't get an interrupt after an abort that the buffer is ready to transmit another can frame.
Anyhow this works, but ideally you want to implement tx timeout to drop frames at high bus loads or can errors. Bursting tx buffers on CAN bus could lock up the bus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @PetervdPerk-NXP
I've rebased and also slightly modified the return value of CanardSocketCAN::transmit for non-blocking calls. It used to drop frames from canard TX queue when it was busy. For example, Yukon could fail to retrieve the registers and port.List successfully. Now it exits the transmit cycle and tries to send it next time. Hopefully, the canard TX frames themselves have a timeout.
Test coverage:
- I made a flight test in Cyphal HITL simulator on this branch + a few more commits: https://review.px4.io/plot_app?log=6c5936cf-04e6-49f3-bad2-54df97beaf4d
- I also manually tested it with Yukon. I retrieved registers a few times and tried
cyphal status
to check the heap status: https://review.px4.io/plot_app?log=1d01fcd3-7044-4942-a07d-d1c59f295b07. When I disconnected the CAN for a few seconds, the heap was 7936/8000. When I reconnected CAN, it was fine again. I also got wq:uavcan was low on stack (132 bytes left).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find a solution to fix blocking calls and use them with NET_CAN_RAW_TX_DEADLINE as it was designed. There are many fixes related to CAN in the origin Nuttx repository. The description of this fix looks like what I'm experiencing: apache/nuttx@3c54d82. But I can't just cherry-pick and check it because there are a lot of changes.
Do you know, if PX4 has plans to update the Nuttx version?
This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there: https://discuss.px4.io/t/trouble-with-cyphal-of-pixhawk-v6xrt/40560/1 |
…blocking Cyphal when no CAN devices are connected
…d ifndef CONFIG_NET_CAN_RAW_TX_DEADLINE guards in cyphal socketcan driver
…r to retry sending later
2884841
to
6f96987
Compare
Solution
This PR adds Cyphal support for fmu-v6x and fmu-v6c.
I've added cyphal.px4board for these targets, ifup to rc.board_defaults, copied nsh/defconfig to cyphal/defconfig and enabled CAN FD and SocketCAN. For v6c it was also necessary to enable CONFIG_NET_UDP.
I've also added MSG_DONTWAIT to CanardSocketCAN when calling sendmsg. Without this flag, Cyphal blocks forever if no CAN devices are connected.
I would also add support for CubeOrange, but I don't have one right now.
Changelog Entry
For release notes:
Test coverage
I've tested fmu-v5, fmu-v6x and fmu-v6c:
Context
Extension of Cyphal support for new targets were requested here:
These recommendations were really useful: