Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcos_semaphore_wait_timeout() and clock adjusting #658

Open
mdevaev opened this issue Nov 13, 2020 · 2 comments
Open

vcos_semaphore_wait_timeout() and clock adjusting #658

mdevaev opened this issue Nov 13, 2020 · 2 comments

Comments

@mdevaev
Copy link

mdevaev commented Nov 13, 2020

Describe the bug
vcos_semaphore_wait_timeout() uses CLOCK_REALTIME and sem_timedwait(). If the time was adjusted during vcos_semaphore_wait_timeout() (via NTP for example) then it will either wait longer than the specified timeout (if the clock is moved back), or it will not wait for it (if forward). When using sem_timedwait() this is a known problem (1, 2). For sem_timedwait() on Linux, this is the expected behavior (although in QNX there is a sem_timedwait_monotonic() that uses CLOCK_MONOTONIC) since it accepts an absolute timestamp. For vcos_semaphore_wait_timeout(), I think this behavior is incorrect, because the timeout value must be set to a relative value.

As a fix, I could use a check on my side: compare monotonic timestamps before and after calling vcos_semaphore_wait_timeout() and run this again if the timeout was not reached. But this solution will not help if the clock has been moved to the past and the wait may be increased.

To reproduce
The problem is very rare and I don't have a case for reproducing it. But I think my analysis seems to point to the problem fairly accurately. I encountered this bug when using ustreamer. It uses a semaphore when encoding via OMX.

Expected behaviour
The timeout should not be affected by the clock adjusting.

Actual behaviour
Adjusting the clock affects the timeout.

@mdevaev
Copy link
Author

mdevaev commented Nov 13, 2020

Another possible workaround is busyloop (pseudo-c):

long double deadline_ts = get_now_monotonic() + timeout;
VCOS_STATUS_T sem_status;

while (true) {
	sem_status = vcos_semaphore_trywait(sem);
	if (sem_status != VCOS_EAGAIN || get_now_monotonic() > deadline_ts) {
		return sem_status;
	}
	usleep(1000);
}

The obvious problem is usleep() and a large number of unnecessary calls of vcos_semaphore_trywait() (i.e. sem_trywait()).

@mtlynch
Copy link

mtlynch commented Nov 27, 2020

There is a consistent repro for this bug during boot. The Pi doesn't have a built-in clock, so when you shut it off, its clock will lag behind real-world time until the Pi syncs with NTP during boot.

If you run an application that calls vcos_semaphore_wait_timeout during boot, there's a race condition that makes it easy to trigger this issue. For example, imagine the following sequence:

  1. Raspberry Pi powers on and begins booting
  2. Pi OS launches FooApp as a systemd service during boot
  3. FooApp calls vcos_semaphore_wait_timeout with a 30 second timeout
  4. NTP service syncs time, advancing the system clock forward by 2 minutes
  5. FooApp instantly gets back VCOS_EAGAIN from vcos_semaphore_wait_timeout even though 30 seconds have not yet elapsed

This bug can also trigger at any other time that the Pi adjusts its clock forward or backwards, but it's easiest to trigger during boot, as there's generally a significant time jump forward for the time that the Pi has been powered off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants