Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable any background task #10894

Open
1 task done
0wwafa opened this issue Jan 22, 2025 · 18 comments
Open
1 task done

disable any background task #10894

0wwafa opened this issue Jan 22, 2025 · 18 comments
Labels
Status: Awaiting Response awaiting a response from the author

Comments

@0wwafa
Copy link

0wwafa commented Jan 22, 2025

Board

ESP32 dev kit v1

Device Description

Classic esp32 wroom-32 devkit v1 with cp2102

Hardware Configuration

1 gpio connected to an external device

Version

v3.1.0

IDE Name

Arduino IDE

Operating System

Ubuntu 22.04

Flash frequency

40 mHz

PSRAM enabled

yes

Upload speed

921600

Description

Everything works but I need to know how to completely disable any RTOS background task EXCEPT the UART.
I made a sketch that sets an interrupt when a gpio changes then sends the timing as a single character (or 4) to the serial UART0.
Since the pulses on the gpio are coming at a very fast rate (up to 57 microseconds apart), as the pulse arrives and it's measured, I send it to the serial port. Everything works but every now and then (like 2 times in 30 seconds) I miss pulses and I get a delay of around 1.2 milliseconds!
It's not my program and it's not the source.
it's like there is something locking up the device for that time.

Is there a way to disable anything except the "Serial" ?
In the setup of my sketch I only have one single interrupt on the gpio rising edge.

The crazy thing is that I can easily get those pulses using an ft232r device (which unfortunately is more imprecise than the esp32).
I would love to do the same with the esp32 but it seems there is a problem.

Note:
if I use slower timings on the source device this happens less.

Note2:
I tried using a smaller transmit buffer 8,16,32 and 64 bytes .. they all work and they all have the same problem...

Sketch

Unfortunatey I can't provide the sketch. It would be too complex.
But there is a similar project called "Tapuino Next" which uses the same method.
They are unaware of the problem because the use slower speeds though.

Debug Message

There are no relevant debug messages to show.

Other Steps to Reproduce

I am not sure where the problem lies...
perhaps creating an interrupt every 60 microseconds and then send a single byte on the serial prot could trigger the problem.
I can try that if you can't provide a way to disable any background task (which would IMHO solve the problem)

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@0wwafa 0wwafa added the Status: Awaiting triage Issue is waiting for triage label Jan 22, 2025
@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

To be exact: the "event" happens every 17.8 seconds!

I measured by sending constant pulses from the source and timing when I was losing one.
I am losing one every 17.8 seconds and it's not obviously because of my code.

Something happens inside RTOS/ESP32 around that time... and whatever is going on stops for 1-2 milliseconds.

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

if it helps, the event happens after 7140 interrupts (received pulses) but it can be unrelated to interrupts.

I really don't know why this happens but this will make any project streaming data from timed pulses miss out.

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

the only reference to 17 or 18 seconds I found is this and it seems related: esp-rs/esp-hal#922

@me-no-dev
Copy link
Member

You should use hardware peripheral for this. RMT is a good choice in this case. It will give you precise length of each pulse. You can not just kill all background tasks in RTOS

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

UPDATE: I made another test and I set the cpu at 160 mHz instead of the default 240mHz and the problem does not show up.

I think it might be something related to the watchdog or something similar.
With a tight code and very fast interrupts probably something happens and it should not.
Or the cpu overheats or just f*cks up for some reason.

The fact that the problem happens only at 240mhz is proof that itàs not related to my code.

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

You should use hardware peripheral for this. RMT is a good choice in this case. It will give you precise length of each pulse. You can not just kill all background tasks in RTOS

Please, don't change the subject. I want to use a simple IRQ. What's wrong with that?
Also RMT, as far as I know, does not allow me to stream data indefinately.

My program is simple:

RISING EDGE >> IRQ >>> SEND BYTE ON SERIAL PORT AT 921600

the loop() is empty (for now)
in the setup I just setup the irq and in the irq routing there is also the serial send.
There is NO WAY this should not work on any reasonable system.

And it works at 160 mHz. At 240 mHz there is the glitch.

@me-no-dev
Copy link
Member

I want to use a simple IRQ. What's wrong with that?

Nothing wrong if you accept the consequences. It's a simple thing in non-RTOS environment. I suggested a way to get the result you want.

the loop() is empty (for now)

This is not a good approach on RTOS. Better add delay(ms) inside to tell the scheduler that it should not switch to that task.

Also RMT, as far as I know, does not allow me to stream data indefinately.

You can have a look at this example on how to mimic callback with a task, but if the timing of the serial bytes is not critical, you could in the loop wait for some pulses from RMT and send them out to Serial.

Everything thus far is just a theory from my side, because you did not provide any code to see what it's actually doing or be able to replicate the problem.

@0wwafa
Copy link
Author

0wwafa commented Jan 22, 2025

UPDATE: it happens also at 160 mHz.

I wish to use the raw power of esp32, but it seems RTOS is ruining the party (and not only for me).

I can't believe that a simple program like this goes flawlessly for 18 seconds then glitches (not my program fault).

This is really unacceptable.

P.S.
RMT is even more complex and gives people a lot of problems (probably for the same reason since they use it for very fast logging)

Please solve this!

@me-no-dev
Copy link
Member

Without minimal code to replicate the issue, there is nothing we can do. Basic description of the input signal is also needed.

@me-no-dev me-no-dev added Status: Awaiting Response awaiting a response from the author and removed Status: Awaiting triage Issue is waiting for triage labels Jan 23, 2025
@0wwafa
Copy link
Author

0wwafa commented Jan 25, 2025

Just create a simple interrupt on a gpio and measure the distance between two pulses of a clock.

If the clock has a period of less than (let's say) 800 microseconds, you will SKIP pulses.
I don't know how to code that (yet) but it's pretty simple.

My code would be useless to you because it's monitoring an external source.
But the problem is NOT in the code.

You can check this code which does the same: https://github.com/sweetlilmre/TapuinoNext/blob/main/src/ESP32TapRecorder.cpp

The recoder WILL skip pulses for the same reason.

@dashxdr
Copy link

dashxdr commented Jan 26, 2025

Maybe the problem stems from esp_timer_get_time(). The 17.8 second period is the same as the rollover period of a 32 bit counter running at 240 mhz. Maybe when the CCOUNTER overflows there is a lot of housekeeping that takes place.

There is also the possibility that every time that counter overflows the esp OS tries to adjust its real time clock so it doesn't drift too far from real time. The crystals are each unique, they're off by a few parts per million or whatever. I seem to recall messing with the ESP32 trying to measure perfectly spaced pulses and finding there was a jitter every so often, like there was a jump in the clock when it corrected the system time. I was going crazy trying to find the source of the clock adjustment it was syncing to...

As I recall I was running into trouble because the WIFI system can block one of the cpu cores for ungodly amounts of time. Another issue was the CCOUNT values are not the same between the two cores, one is ahead of the other. So if your code uses raw CCOUNT values for timing any you do a read on one core and the next read is on the other core you'll introduce an error. That can be fixed by locking code to a single core... the one WIFI isn't using.

Are you sure your code isn't introducing the trouble itself? Maybe you're not handling the rollover of a 32 bit counter correctly?

@dashxdr
Copy link

dashxdr commented Jan 26, 2025

I just ran some tests. I connected gpio19 to gpio18 and have configured gpio18 as an input and an interrupt is generated on any transition. gpio19 is an output and I used the LEDC example code to produce a 9000 hz square wave on it. In my interrupt code I output a byte for the current gpio18 state (0 or 1) and 3 bytes for the low 24 bits of esp_timer_get_time(). I store these values in a fifo and there is a mainline thread that blasts groups of them out over WIFI using udp. The ESP32 is running at 240 mhz and the gpio18 "scope" task is the only thing running on core 1. The wifi and UDP stuff is handled by core 0. The "scope" task is just dealing with fifos in memory and there is no semaphore protection at all, it has a write pointer into the fifo that it maintains, and the reader task as a read pointer that it maintains so there's no conflict.

There is nothing that occurs every 17.8 seconds or so. There are no lost transitions, the ESP32 is doing a good job keeping up. Ideally an interrupt will occur every 55.55 microseconds. Every so often I get a gap between transitions above 64 microseconds (highest I've seen is 68 microseconds), which means the interrupt latency is on the order of 13 microseconds worst case and it's usually much better than that.

I'm running this on an ESP32 WROVER and ESP-IDF v5.4-beta1 is my esp/idf version string. I'm coding in 'c'.

I tried using xthal_get_ccount() instead of esp_timer_get_time() and everything worked just as well, no issues, except the values returned are much bigger (they're counting 240 mhz ticks vs 1 mhz ticks).

Based on my results my belief is the problem is in your code somewhere.

@dashxdr
Copy link

dashxdr commented Jan 27, 2025

perhaps related to this too: https://www.reddit.com/r/esp32/comments/wfuari/esp32_freezes_at_random_intervals_just_seconds/

I've had consistent reboots when a stray unterminated wire hanging off an unused pin picks up too much RF from the WIFI antenna. If I just tuck it out of the way so it's not as close and partially shielded by components the problem goes away... it would occur right when WIFI is initialized and the module connects to an access point.

Or maybe I just needed more capacitance on the power lines.

@TD-er
Copy link
Contributor

TD-er commented Jan 27, 2025

perhaps related to this too: https://www.reddit.com/r/esp32/comments/wfuari/esp32_freezes_at_random_intervals_just_seconds/

I've had consistent reboots when a stray unterminated wire hanging off an unused pin picks up too much RF from the WIFI antenna. If I just tuck it out of the way so it's not as close and partially shielded by components the problem goes away... it would occur right when WIFI is initialized and the module connects to an access point.

Or maybe I just needed more capacitance on the power lines.

Or you could set those unused pins to output low.
GPIO pins can sink quite a lot to GND.

@0wwafa
You really should show some code to reproduce it.
@dashxdr did apparently do a lot of testing based on the extremely limited amount of info you gave, and already gave quite a lot of good info on what might cause these issues you're seeing. However it remains just a guess if it is even related to the problems you're experiencing if you do not share some minimal code example.

@uzi18
Copy link

uzi18 commented Jan 30, 2025

@TD-er code is linked here: #10894 (comment)
@0wwafa try to switch to xthal_get_ccount() menioned by @dashxdr this should be simple enough.
Maybe just divide diff by current cpu speed, so rest of code will be happy.

@TD-er
Copy link
Contributor

TD-er commented Jan 30, 2025

Ah missed that code, sorry about that.

@TD-er
Copy link
Contributor

TD-er commented Jan 30, 2025

Looked into the code that was apparently linked (again sorry, missed it).

What struck me is that the void TapRecorder::CalcTapData(uint32_t signalTime) function does have an uint32_t while it is being called with an uint64_t.
So this is not a continuously incrementing value as it does loop every 2^32 usec.

Also you're doing quite a lot of double calculations in a callback function.
I don't see IRAM_ATTR or volatile or std::atomic<...> declarations of the members used, but maybe i'm overlooking those.

Another possible issue is that TAP_INFO doesn't seem to be initialized, so no idea what its content is. (or am I missing that too, browsing the code via GitHub, so a bit hard to navigate and memorize it all)
As far as I can see, it only gets reset in TapLoader::SeekToCounter which is conditional based on another value which is set based on uninitialized data.

Maybe you can change most of the calculations done in CYCLES_TO_COUNTER into a constexpr, so you only need to do a single multiplication.

DS_G * (sqrt( (N / D) + B ) - C)

N = cycles
A = (DS_V_PLAY / DS_D / PI)
B = ((DS_R * DS_R) / (DS_D * DS_D))
C = (DS_R / DS_D)
D = 1000000.0 * A

With A ... D being const
Not sure if float sqrt is much slower than integer sqrt, but maybe you can also make some optimizations there. See: https://en.wikipedia.org/wiki/Integer_square_root

One more thing I just realized...
Taking a sqrt of a number may not always take the same amount of time.
So maybe there is some critical number which takes longer to sqrt...
Perhaps there is a way to get rid of this quite expensive operation?

Oh and you're missing a virtual with the destructor of class inheriting from TapBase as that class also gets inherited from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting Response awaiting a response from the author
Projects
None yet
Development

No branches or pull requests

5 participants