Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Issue: MAVLinkX on Tx causes crash #159

Open
jlpoltrack opened this issue Apr 17, 2024 · 28 comments
Open

Potential Issue: MAVLinkX on Tx causes crash #159

jlpoltrack opened this issue Apr 17, 2024 · 28 comments

Comments

@jlpoltrack
Copy link
Collaborator

jlpoltrack commented Apr 17, 2024

MAVLinkX seems to cause Tx to crash when there is a serial stream in one situation:

  1. When using STM32 F1 with GCC 12 (ST made default with latest 1.15.0 CubeIDE)

Potential areas to explore:

https://github.com/olliw42/mLRS/blob/main/mLRS/Common/thirdparty/mavlinkx.h#L19-L21

https://github.com/olliw42/mLRS/blob/main/mLRS/CommonTx/mavlink_interface_tx.h#L288-L293

@olliw42
Copy link
Owner

olliw42 commented May 8, 2024

@jlpoltrack
point 2. has evaporated, right?

@jlpoltrack
Copy link
Collaborator Author

@jlpoltrack point 2. has evaporated, right?

Yes, edited first post accordingly.

@olliw42
Copy link
Owner

olliw42 commented Jul 16, 2024

@jlpoltrack
there is a new version v1.16.0 of STM342CubeIDE
it uses gcc12, version 12.3.rel1.20240612
from the last digits I would infer that it is younger than what you had with v1.15.0

Q: do you remember what exact gcc12 version you had with the above issue? It's not 12.3.re1.20240612, right?

Q: dou you think you still could/would reproduce the issue, and check if it is there for 12.3.rel1.20240612 ?

btw, it seems that one also doesn't have to modify the .ld script, as it was with v1.15.x. You can confirm this?

@jlpoltrack
Copy link
Collaborator Author

jlpoltrack commented Jul 16, 2024

Q: do you remember what exact gcc12 version you had with the above issue? It's not 12.3.re1.20240612, right?

It was the version installed with CubeIDE 1.15.0 which is v12.3.rel1.20240306-1730 based on this source:

image

https://wiki.st.com/stm32mcu/wiki/STM32CubeIDE:STM32CubeIDE_errata_1.15.x

Q: dou you think you still could/would reproduce the issue, and check if it is there for 12.3.rel1.20240612 ?

The same issue appears - the connection looks good between Tx and Rx, however, as soon as the FC has initialized and starts emitting data the Tx crashes (no LEDs) and requires a power cycle. This doesn't seem to affect the Rx side - it just disconnects and shows a flashing red LED. Switching to Mavlink (instead of MavlinkX) shows the usual behavior.

Notes on the latest CubeIDE:

On a fresh 1.16.0 install, I get this previously seen warning which doesn't show up in GCC 11:

image

I would guess that this isn't a Tx specific issue, rather that as there is no data on the uplink there is nothing to potentially trigger it on the Rx side.

@olliw42
Copy link
Owner

olliw42 commented Jul 16, 2024

gcc12: ok, so it seems to be a newer version

sad the issue is still there ... I guess I need to try to reprodcue it ... I was hoping that they would come up relatively soon with a newer version, and the the issue would go away ... seems not to have payed out. THX for testing with 1.16.0.

yes, the warning was there before too I think. It's very dodgy, since it doesn't complain in other location. Easy to get around by initializing the variable. Should not be the issue.

Did you notice that the older .ld scripts do work now too (with v1.15.x one had to do a change which I can't recall)?

@jlpoltrack
Copy link
Collaborator Author

Did you notice that the older .ld scripts do work now too (with v1.15.x one had to do a change which I can't recall)?

I don't have to make any code changes for the project to build, that being said I do see this other warning which I recall is related to the .ld scripts:

image

@olliw42
Copy link
Owner

olliw42 commented Jul 16, 2024

ok, so it's still there. THX.

since you seem to be at it, have you tried with different combination sof outcommenting one of these three defines?
https://github.com/olliw42/mLRS/blob/main/mLRS/Common/thirdparty/mavlinkx.h#L19-L21

it should work with any combination commented out, but maybe there are some which don't do the crash

Switching to Mavlink (instead of MavlinkX) shows the usual behavior.

does the behavior also depend on the mode you are using?

@jlpoltrack
Copy link
Collaborator Author

since you seem to be at it, have you tried with different combination sof outcommenting one of these three defines?

does the behavior also depend on the mode you are using?

My quick test today was with FLRC - so would think the compression define wouldn't matter. I will comment them all out first and see if the behavior changes.

@olliw42
Copy link
Owner

olliw42 commented Jul 16, 2024

maybe first trying with different modes ... I'm starting to suspect that it's not actually the mavlinkX library but some time issue somewhere else ... in which case we would hunt the wrong ghost

@jlpoltrack
Copy link
Collaborator Author

maybe first trying with different modes

Same behavior across all modes, for whatever reason it appeared to take slightly longer on 19 Hz for the Tx to crash.

@jlpoltrack
Copy link
Collaborator Author

All three defines commented out also resulted in a Tx crash - not sure if enabling any of them would make sense to test.

@olliw42
Copy link
Owner

olliw42 commented Jul 16, 2024

many thx. It kind of reinforces my thought that the issue is no so much with that library, but with how it is used in the outside. It would also go along with that it's a Tx issue and not Rx issue. THX a lot.

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

@jlpoltrack
it's as it had to be ... I tried on my "default" 868 MHz system ... and the Tx indeed does crash, but for me it really takes like long ... and really doesn't have such a nice reproducibility as in your case
it seems too to have to do with serial data flowing, but it's really hard to say since it happens so indeterministically

can you tell pl: what hardware are you using?

could also maybe try this: start your stuff up but with the serial on the receiver disconnected ... let it run for a while, when connect the serial ... it's to better control the moment of when the serial data starts flowing

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

note: pull the latest main ... it has the changes which should avoid the errors/warnings with gcc12

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

but I can have it running long also with serial data ... grrr

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

.

@jlpoltrack
Copy link
Collaborator Author

jlpoltrack commented Jul 20, 2024

can you tell pl: what hardware are you using?

I'm using DIY E28 hardware. Recent tests were on FLRC with 230400 baud w/ F4 Flight Controller.

but I can have it running long also with serial data ... grrr

I would say within 30 seconds of FC initialization I get the crash on FLRC. I did notice it was somewhat slower on 19 Hz - maybe easier to reproduce on faster rates / higher baud rates?

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

I'm using DIY E28 hardware. Recent tests were on FLRC with 230400 baud w/ F4 Flight Controller.

what diy e28 hardware? it's a target in main? which stm32?

@jlpoltrack
Copy link
Collaborator Author

what diy e28 hardware? it's a target in main? which stm32?

Both Tx and Rx are the diy-e28dual-board02-f103cb boards.

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

but it doesn't matter what you use as rx, right?
different board, same effect, right?
you also don't need to use a gcc12 compiled firmware on teh receiver for it to happen, right?

@jlpoltrack
Copy link
Collaborator Author

jlpoltrack commented Jul 20, 2024

but it doesn't matter what you use as rx, right?
different board, same effect, right?

I first found this issue on some other DIY G4 / E22 hardware (which I eventually scrapped). I don't think it is MCU specific.

you also don't need to use a gcc12 compiled firmware on teh receiver for it to happen, right?

This is not a combination that I've tried.

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

I first found this issue on some other DIY G4 / E22 hardware (which I eventually scrapped). I don't think it is MCU specific.

just went to tx-diye28dual-module02-g491re as tx, and rx-diy-board01-f103cb ... and it works fine, FLRC, mavlinkX, 115200 baud however

@jlpoltrack
Copy link
Collaborator Author

tx-diye28dual-module02-g491re

Okay interesting, my DIY G4 hardware was G441

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

so, using now tx-diy-e28dual-board02-f103cb ...
... I really can't see an issue
FLRC, mavlinkX, 115200
will try 230400 now ...

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

ok, with 230400 I do get tx crashes ... sometimes after a several dozen seconds ... sometimes quickly then the serial flows ... it comes quick when I try to connect with MP ...
ohhhkay ... at least it seems I do reproduce now ... that's good

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

Risto could tell us no with his super debug tools and powers in which line it crashed ... let's see if I can find the issue the old way, will be hard I guess ...

@jlpoltrack
Copy link
Collaborator Author

Risto could tell us no with his super debug tools and powers in which line it crashed ... let's see if I can find the issue the old way, will be hard I guess ...

Do you think baud rate related at the moment? I recall Risto has FRM303 which doesn't support FLRC...

@olliw42
Copy link
Owner

olliw42 commented Jul 20, 2024

I don't know what I should think
... it's gcc12 related however, something which it does differently, right ... it was working all time long so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants