Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "SYSLOG_DEFAULT: wrap up_putc/up_nputs calls with critical section" with a fix #14761

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yamt
Copy link
Contributor

@yamt yamt commented Nov 13, 2024

Summary

Reapply "SYSLOG_DEFAULT: wrap up_putc/up_nputs calls with critical section" with a fix.

original change #14722
a revert #14751

Impact

Testing

So that it can be used in more situations.

The primary motivation here is to avoid crashes introduced by
apache#14722.

Tested:
- esp32-devkitc:wifi_smp (smp)
- esp32s3-devkit:smp (ostest, smp)
  (with apache#14755)
@github-actions github-actions bot added Area: Drivers Drivers issues Area: OS Components OS Components issues Size: S The size of the change in this PR is small labels Nov 13, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 13, 2024

[Experimental Bot, please feedback here]

No. The PR is missing critical information required by the NuttX template.

  • Summary: While it mentions the original PR and revert, it lacks a clear explanation of why the reapplication is necessary and how the fix addresses the issues that led to the revert. What exactly was the fix?
  • Impact: The entire Impact section is missing. All fields should be addressed with either "NO" or a "YES" + explanation. This is crucial for reviewers to understand the potential consequences of the change.
  • Testing: While targets are listed, the "Testing logs before change" and "Testing logs after change" sections are empty. This makes it impossible to verify the effectiveness of the fix. What were the symptoms of the bug before, and how do the logs demonstrate that the bug is now resolved? Furthermore, mentioning "a few other unrelated local patches" raises concerns; testing should ideally be done on a clean branch to isolate the effects of the PR.

Therefore, the PR in its current state does not meet the NuttX requirements. It needs substantial additions to be considered complete.

@@ -275,7 +292,7 @@ irqstate_t enter_critical_section(void)

/* Check if we were called from an interrupt handler */

if (!up_interrupt_context())
if (!csection_degraded() && !up_interrupt_context())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for style consistently you could use the dummy-if in the non-SMP case as well. This is fine by me too.

Copy link
Contributor

@pussuw pussuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This also fixes recursive panic/assertion for me when trying to take the spinlock from assert!

@@ -75,6 +75,18 @@ volatile uint8_t g_cpu_nestcount[CONFIG_SMP_NCPUS];
* Private Functions
****************************************************************************/

static bool csection_degraded(void)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional checks will degrade SMP performance, why not add such checks to the syslog path?

cc @hujun260 ,
os ready checks removed by PR:
#13485

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yamt
The invocation of critical sections is still quite frequent. Previously, we removed those checks in order to enhance performance.
In some specific scenarios, we can use spin_lock_irqsave to replace the critical sections or modify the business logic judgments.
So far, I haven't encountered situations where such modifications are absolutely necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12281 this patch remove the judgement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added the check to enter_critical_section because i felt it logically belonged there, not the caller.

#12281 this patch remove the judgement

i was not aware of the history. if we decided that enter_critical_section should not be used early in the boot, it's better to have a DIAGASSERT there to assert it. how do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a long-term plan to reduce the number of calls and the scope of critical sections, as well as to decrease critical sections complexity.
A lot of work has already been done in this area, so we do not wish to add any logic within the critical sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any logic, including DIAGASSERT, you mean? why?

Copy link
Contributor

@hujun260 hujun260 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any logic, including DIAGASSERT, you mean? why?

This line indicate that enter_critical_section should not be used early in the boot

DEBUGASSERT(rtcb != NULL);

You should add checks at the place of invocation to ensure that the assert will not be triggered, rather than modifying the enter_critical_section

Comment on lines +237 to +239
irqstate_t flags = enter_critical_section();
up_putc(ch);
leave_critical_section(flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't see any benefit to protecting up_putc(), unlike up_nputs() which guarantees single-line integrity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read the description in #14662.
this is NOT about the "messages-intermixed" issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you've noticed that some drivers use spinlock + irqsave + tx fifo polling to implement up_putc():
https://github.com/apache/nuttx/blob/master/drivers/serial/uart_pl011.c#L671-L679
For the current workaround, I think it might be more profitable to make similar changes in up_putc(), since we're not entirely sure if there's other code calling up_putc().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spin_lock_irqsave doesn't fix the issue because serial.c uses critical section, not spinlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But whether it is the serial driver or syslog driver, up_putc() will eventually be called, right ? TX interrupt will wake up serial driver, but the final output is still protected by up_putc(), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should serial.c use a spinlock / spin_lock_irqsave ? I cannot imagine why the big kernel lock (enter/exit_critical) needs to be invoked there (although I have not looked at the driver itself).

if CONFIG_SPINLOCK = 0, then spin_lock_irqsave reverts back to a local interrupt disable. Should work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But whether it is the serial driver or syslog driver, up_putc() will eventually be called, right ? TX interrupt will wake up serial driver, but the final output is still protected by up_putc(), right?

no. drivers/serial uses a separate api. (uart_ops_s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should serial.c use a spinlock / spin_lock_irqsave ? I cannot imagine why the big kernel lock (enter/exit_critical) needs to be invoked there (although I have not looked at the driver itself).

because we don't have a spinlock-interlocked blocking api? (do we?)

if CONFIG_SPINLOCK = 0, then spin_lock_irqsave reverts back to a local interrupt disable. Should work ?

i'm not sure what you mean here. is it about this PR?

Copy link
Contributor

@pussuw pussuw Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we don't have a spinlock-interlocked blocking api? (do we?)

Not one that knows to yield.

In SMP case using enter/exit_critical does exactly the same thing as a spin lock but it locks many unrelated (not serial related) APIs in the kernel (like semaphores etc.) as well, as they use enter/exit_critical.

i'm not sure what you mean here. is it about this PR?

Just that in non-SMP case using spin_lock_irqsave would revert back to a local interrupt disable, like it does when using enter/exit_critical (if someone is worried about non-SMP perf.)

I seem to remember seeing a discussion about serial/syslog locking and where it should be, but IMO the mutual exclusion (whatever the chosen mechanism is..) should be on the device level as the shared resource here is the serial port device? So replacing the critical section from both serial.c and syslog.c by a single spinlock should not be an issue. As it would be the serial port lock.

Also, doesn't syslog support other devices than serial port as well ? Are they protected by enter/exit_critcal (I honestly don't know).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Drivers Drivers issues Area: OS Components OS Components issues Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants