Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cyphal support for v6x and v6c #23488

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PonomarevDA
Copy link
Contributor

Solution

This PR adds Cyphal support for fmu-v6x and fmu-v6c.

I've added cyphal.px4board for these targets, ifup to rc.board_defaults, copied nsh/defconfig to cyphal/defconfig and enabled CAN FD and SocketCAN. For v6c it was also necessary to enable CONFIG_NET_UDP.

I've also added MSG_DONTWAIT to CanardSocketCAN when calling sendmsg. Without this flag, Cyphal blocks forever if no CAN devices are connected.

I would also add support for CubeOrange, but I don't have one right now.

Changelog Entry

For release notes:

Feature/Bugfix: Add Cyphal support for v6x and v6c

Test coverage

I've tested fmu-v5, fmu-v6x and fmu-v6c:

  • With Cyphal devices: connect and disconnect devices to check that Cyphal remains working
  • In Cyphal HITL on my custom branch: a few flight tests

Context

Extension of Cyphal support for new targets were requested here:

These recommendations were really useful:

@@ -174,7 +174,7 @@ int16_t CanardSocketCAN::transmit(const CanardTxQueueItem &txf, int timeout_ms)
_send_tv->tv_usec = deadline_systick % 1000000ULL;
_send_tv->tv_sec = (deadline_systick - _send_tv->tv_usec) / 1000000ULL;

return sendmsg(_fd, &_send_msg, 0);
return sendmsg(_fd, &_send_msg, MSG_DONTWAIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you guard this behind ifndef CONFIG_NET_CAN_RAW_TX_DEADLINE.
Only devices that don't implement NET_CAN_RAW_TX_DEADLINE suffer from the issue you're describing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as I understand, stm32h7 is supposed to support CONFIG_NET_CAN_RAW_TX_DEADLINE (why not?) and here I enabled it. I looked deeper into the Nuttx CAN implementation and I can provide a few details of the root of the issue. It's quite interesting.

Each time when we send a CAN frame, we call can_sendmsg(). This function notifies stm32_fdcan_sock with netdev_txnotify_dev() that new TX data is available and then goes into net_timedwait(&state.snd_sem, UINT_MAX) where the thread waits until the callback is called.
The problem is that fdcan_txavail_work() calls sdevif_poll(&priv->dev, fdcan_txpoll) only if the TX ring buffer is not full, otherwise it does nothing, and the associated callback is never called. In this case, the thread is blocked forever.

This is what I experience in practice with CONFIG_NET_CAN_RAW_TX_DEADLINE and without MSG_DONTWAIT:

  • If I remove all Cyphal publishers except Heartbeat, it is fine with or without a connected CAN device, because the TX timeout is enough to keep the ring buffer always not full.
  • However, we have a port.List publisher that periodically publishes a long message. Once it publishes, it overflows the ring buffer and blocks the thread forever.

Anyway:

  • I wrapped sendmsg with CONFIG_NET_CAN_RAW_TX_DEADLINE as you suggested.
  • I removed this CONFIG_NET_CAN_RAW_TX_DEADLINE config from fmu-v6c and fmu-v6x
  • I also wrapped setsockopt(_fd, SOL_CAN_RAW, CAN_RAW_TX_DEADLINE, &on, sizeof(on)) because we don't need it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the driver either doesn't abort correctly.
Or it doesn't get an interrupt after an abort that the buffer is ready to transmit another can frame.

Anyhow this works, but ideally you want to implement tx timeout to drop frames at high bus loads or can errors. Bursting tx buffers on CAN bus could lock up the bus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @PetervdPerk-NXP

I've rebased and also slightly modified the return value of CanardSocketCAN::transmit for non-blocking calls. It used to drop frames from canard TX queue when it was busy. For example, Yukon could fail to retrieve the registers and port.List successfully. Now it exits the transmit cycle and tries to send it next time. Hopefully, the canard TX frames themselves have a timeout.

Test coverage:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find a solution to fix blocking calls and use them with NET_CAN_RAW_TX_DEADLINE as it was designed. There are many fixes related to CAN in the origin Nuttx repository. The description of this fix looks like what I'm experiencing: apache/nuttx@3c54d82. But I can't just cherry-pick and check it because there are a lot of changes.

Do you know, if PX4 has plans to update the Nuttx version?

@github-actions github-actions bot added the stale label Sep 5, 2024
@DronecodeBot
Copy link

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/trouble-with-cyphal-of-pixhawk-v6xrt/40560/1

@github-actions github-actions bot removed the stale label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants