vcos_semaphore_wait_timeout() and clock adjusting #658

mdevaev · 2020-11-13T09:12:13Z

Describe the bug
vcos_semaphore_wait_timeout() uses CLOCK_REALTIME and sem_timedwait(). If the time was adjusted during vcos_semaphore_wait_timeout() (via NTP for example) then it will either wait longer than the specified timeout (if the clock is moved back), or it will not wait for it (if forward). When using sem_timedwait() this is a known problem (1, 2). For sem_timedwait() on Linux, this is the expected behavior (although in QNX there is a sem_timedwait_monotonic() that uses CLOCK_MONOTONIC) since it accepts an absolute timestamp. For vcos_semaphore_wait_timeout(), I think this behavior is incorrect, because the timeout value must be set to a relative value.

As a fix, I could use a check on my side: compare monotonic timestamps before and after calling vcos_semaphore_wait_timeout() and run this again if the timeout was not reached. But this solution will not help if the clock has been moved to the past and the wait may be increased.

To reproduce
The problem is very rare and I don't have a case for reproducing it. But I think my analysis seems to point to the problem fairly accurately. I encountered this bug when using ustreamer. It uses a semaphore when encoding via OMX.

Expected behaviour
The timeout should not be affected by the clock adjusting.

Actual behaviour
Adjusting the clock affects the timeout.

The text was updated successfully, but these errors were encountered:

mdevaev · 2020-11-13T21:04:02Z

Another possible workaround is busyloop (pseudo-c):

long double deadline_ts = get_now_monotonic() + timeout;
VCOS_STATUS_T sem_status;

while (true) {
	sem_status = vcos_semaphore_trywait(sem);
	if (sem_status != VCOS_EAGAIN || get_now_monotonic() > deadline_ts) {
		return sem_status;
	}
	usleep(1000);
}

The obvious problem is usleep() and a large number of unnecessary calls of vcos_semaphore_trywait() (i.e. sem_trywait()).

mtlynch · 2020-11-27T19:45:46Z

There is a consistent repro for this bug during boot. The Pi doesn't have a built-in clock, so when you shut it off, its clock will lag behind real-world time until the Pi syncs with NTP during boot.

If you run an application that calls vcos_semaphore_wait_timeout during boot, there's a race condition that makes it easy to trigger this issue. For example, imagine the following sequence:

Raspberry Pi powers on and begins booting
Pi OS launches FooApp as a systemd service during boot
FooApp calls vcos_semaphore_wait_timeout with a 30 second timeout
NTP service syncs time, advancing the system clock forward by 2 minutes
FooApp instantly gets back VCOS_EAGAIN from vcos_semaphore_wait_timeout even though 30 seconds have not yet elapsed

This bug can also trigger at any other time that the Pi adjusts its clock forward or backwards, but it's easiest to trigger during boot, as there's generally a significant time jump forward for the time that the Pi has been powered off.

mdevaev mentioned this issue Nov 13, 2020

vcos_semaphore_wait_timeout() and RTC adjusting pikvm/ustreamer#56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vcos_semaphore_wait_timeout() and clock adjusting #658

vcos_semaphore_wait_timeout() and clock adjusting #658

mdevaev commented Nov 13, 2020 •

edited

Loading

mdevaev commented Nov 13, 2020 •

edited

Loading

mtlynch commented Nov 27, 2020

vcos_semaphore_wait_timeout() and clock adjusting #658

vcos_semaphore_wait_timeout() and clock adjusting #658

Comments

mdevaev commented Nov 13, 2020 • edited Loading

mdevaev commented Nov 13, 2020 • edited Loading

mtlynch commented Nov 27, 2020

mdevaev commented Nov 13, 2020 •

edited

Loading

mdevaev commented Nov 13, 2020 •

edited

Loading