Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smpcall: add nxsched_smp_call_async and nxsched_smp_call_single_async #14656

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

hujun260
Copy link
Contributor

@hujun260 hujun260 commented Nov 5, 2024

Summary

reason:
The old implementation of the SMP call, even when using the "no wait" parameter, could still result in waiting, if invoking it within a critical section may lead to deadlocks. Therefore, in order to implement a truly asynchronous SMP call strategy, we have added nxsched_smp_call_async.

apache/nuttx-apps#2822

#14656

These two pull requests (PRs) must be merged together.

Impact

smp call

Testing

ci ostest
Build Host:

  • OS: Ubuntu 20.04
  • CPU: x86_64
  • Compiler: GCC 9.4.0

Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx

@github-actions github-actions bot added Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Nov 5, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 5, 2024

[Experimental Bot, please feedback here]

The PR appears to partially meet the NuttX requirements. While it provides a reason for the change and identifies the affected functional area, it lacks crucial details. Here's a breakdown:

Missing/Insufficient Information:

  • Summary: While the reason for the change is stated, the "what" and "how" are insufficiently explained. It mentions adding nxsched_smp_call_async, but doesn't describe how this new function differs from the existing implementation and what changes were made to achieve the asynchronous behavior.
  • Impact: Simply stating "smp call" isn't descriptive enough. The PR needs to address all the impact points listed in the requirements:
    • Is new feature added? Is existing feature changed? (Yes, a new function is added)
    • Impact on user: Will users need to modify their code to use the new asynchronous call?
    • Impact on build: Are there any new build dependencies or configuration options?
    • Impact on hardware: Does this change affect specific architectures or boards?
    • Impact on documentation: Is documentation updated to reflect this new function and its usage?
    • Impact on security: Are there any potential security implications of this change?
    • Impact on compatibility: Does this break backward compatibility?
  • Testing: While the build host details are provided, the target information is vague ("arch(sim,RISC-V,ARM,..), board:config"). It specifies qemu-armv8a:nsh_smp, which is better, but still lacks details about the specific configuration used. More importantly, the "Testing logs before change" and "Testing logs after change" sections are empty. These are crucial for demonstrating the change's effectiveness and for verifying that the new function works as intended. Include actual log output showcasing the previous behavior (e.g., demonstrating the unwanted wait) and the improved behavior after the change (showing the asynchronous operation).

Recommendations for Improvement:

  • Summary: Elaborate on how nxsched_smp_call_async works and what specific code modifications were implemented. For example, explain the mechanism used to achieve asynchronicity (e.g., using a separate worker thread, sending a signal, etc.).
  • Impact: Address all the impact points listed in the requirements with specific details. Don't just state "NO" or "YES"; explain why.
  • Testing:
    • Provide complete target details, including the specific qemu-armv8a configuration used.
    • Fill in the testing logs! This is the most important part. Show the output of relevant commands or tests before and after the change to demonstrate the improvement and verify the fix. The logs should clearly illustrate the difference in behavior.

By providing the missing information and following these recommendations, the PR will better meet the NuttX requirements and increase the likelihood of it being accepted.

include/nuttx/sched.h Outdated Show resolved Hide resolved
reason:
The old implementation of the SMP call, even when using the "no wait" parameter,
could still result in waiting, if invoking it within a critical section
may lead to deadlocks. Therefore, in order to implement a truly asynchronous SMP
call strategy, we have added nxsched_smp_call_async.

Signed-off-by: hujun5 <[email protected]>
Copy link
Contributor

@pussuw pussuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaoxiang781216 xiaoxiang781216 merged commit c498991 into apache:master Nov 6, 2024
10 of 27 checks passed
@masayuki2009
Copy link
Contributor

@hujun260

A deadlock happens with esp32-devkitc:smp (QEMU)

/home/ishikawa/opensource/QEMU/qemu-esp-develop-8.2.0-20240122/build/qemu-system-xtensa -nographic -M esp32 -smp 2 -drive file=/home/ishikawa/qemu_efuse.bin,if=none,format=raw,id=efuse -global driver=nvram.esp32.efuse,property=drive,value=efuse -drive file=./nuttx/nuttx.merged.bin,if=mtd,format=raw
==12842==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Adding SPI flash device
ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3ffb40f0,len:1056
load:0x40080000,len:23844
entry 0x40082858
*** Booting NuttX ***
dram: lma 0x00001020 vma 0x3ffb40f0 len 0x420    (1056)
iram: lma 0x00001448 vma 0x40080000 len 0x5d24   (23844)
padd: lma 0x00007178 vma 0x00000000 len 0x8e80   (36480)
imap: lma 0x00010000 vma 0x400e0000 len 0x1e914  (125204)
padd: lma 0x0002e91c vma 0x00000000 len 0x16fc   (5884)
dmap: lma 0x00030020 vma 0x3f400020 len 0x9c84   (40068)
total segments stored 6

NuttShell (NSH) NuttX-12.0.0
nsh>  
nsh> uname -a
NuttX 12.0.0 904b6ff85b Nov  6 2024 13:47:16 xtensa esp32-devkitc
nsh> mount
  /proc type procfs
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 0003056 0000744  24.3%  CPU0 IDLE
    1     0   1   0 FIFO     Kthread   - Running            0000000000000000 0003056 0000432  14.1%  CPU1 IDLE
    2     2   0 100 RR       Task      - Running            0000000000000000 0003016 0002104  69.7%  nsh_main
...
End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       4bc5c    4bc5c
ordblks         7        6
mxordblk    2acd0    2acd0
uordblks     6c64     6c64
fordblks    44ff8    44ff8

user_main: smp call test
smp_call_test: Test start
smp_call_test: Call cpu 0, nowait

@hujun260
Copy link
Contributor Author

hujun260 commented Nov 6, 2024

@hujun260

A deadlock happens with esp32-devkitc:smp (QEMU)

/home/ishikawa/opensource/QEMU/qemu-esp-develop-8.2.0-20240122/build/qemu-system-xtensa -nographic -M esp32 -smp 2 -drive file=/home/ishikawa/qemu_efuse.bin,if=none,format=raw,id=efuse -global driver=nvram.esp32.efuse,property=drive,value=efuse -drive file=./nuttx/nuttx.merged.bin,if=mtd,format=raw
==12842==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Adding SPI flash device
ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3ffb40f0,len:1056
load:0x40080000,len:23844
entry 0x40082858
*** Booting NuttX ***
dram: lma 0x00001020 vma 0x3ffb40f0 len 0x420    (1056)
iram: lma 0x00001448 vma 0x40080000 len 0x5d24   (23844)
padd: lma 0x00007178 vma 0x00000000 len 0x8e80   (36480)
imap: lma 0x00010000 vma 0x400e0000 len 0x1e914  (125204)
padd: lma 0x0002e91c vma 0x00000000 len 0x16fc   (5884)
dmap: lma 0x00030020 vma 0x3f400020 len 0x9c84   (40068)
total segments stored 6

NuttShell (NSH) NuttX-12.0.0
nsh>  
nsh> uname -a
NuttX 12.0.0 904b6ff85b Nov  6 2024 13:47:16 xtensa esp32-devkitc
nsh> mount
  /proc type procfs
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 0003056 0000744  24.3%  CPU0 IDLE
    1     0   1   0 FIFO     Kthread   - Running            0000000000000000 0003056 0000432  14.1%  CPU1 IDLE
    2     2   0 100 RR       Task      - Running            0000000000000000 0003016 0002104  69.7%  nsh_main
...
End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena       4bc5c    4bc5c
ordblks         7        6
mxordblk    2acd0    2acd0
uordblks     6c64     6c64
fordblks    44ff8    44ff8

user_main: smp call test
smp_call_test: Test start
smp_call_test: Call cpu 0, nowait

Ok, let me check this issue

@masayuki2009
Copy link
Contributor

@hujun260
raspberrypi-pico:smp has the same issue.

@lupyuen
Copy link
Member

lupyuen commented Nov 6, 2024

Update: Sorry my repo may be outdated, lemme recheck...

Sorry @hujun260: maix-bit/knsh_smp build is failing due to g_call_data:

Configuration/Tool: maix-bit/knsh_smp
Error: smp_call.c:36:31: error: 'g_call_data' defined but not used [-Werror=unused-variable]
   36 | static struct smp_call_data_s g_call_data;
      |                               ^~~~~~~~~~~

https://github.com/nuttxpr2/nuttx/actions/runs/11697271041/job/32575765356#step:7:181

@hujun260
Copy link
Contributor Author

hujun260 commented Nov 6, 2024

@masayuki2009
please help me verify this patch #14663

@lupyuen
Copy link
Member

lupyuen commented Nov 6, 2024

@hujun260 I just confirmed that maix-bit/knsh_smp is failing in our NuttX Mirror due to g_call_data. Does the new patch fix the build? Thanks!

Configuration/Tool: maix-bit/knsh_smp
Error: smp_call.c:36:31: error: 'g_call_data' defined but not used [-Werror=unused-variable]
   36 | static struct smp_call_data_s g_call_data;
      |                               ^~~~~~~~~~~

https://github.com/NuttX/nuttx/actions/runs/11698181980/job/32578126735#step:7:181

@W-M-R W-M-R mentioned this pull request Nov 6, 2024
@hujun260
Copy link
Contributor Author

hujun260 commented Nov 6, 2024

@hujun260 I just confirmed that maix-bit/knsh_smp is failing in our NuttX Mirror due to g_call_data. Does the new patch fix the build? Thanks!

Configuration/Tool: maix-bit/knsh_smp
Error: smp_call.c:36:31: error: 'g_call_data' defined but not used [-Werror=unused-variable]
   36 | static struct smp_call_data_s g_call_data;
      |                               ^~~~~~~~~~~

https://github.com/NuttX/nuttx/actions/runs/11698181980/job/32578126735#step:7:181

apache/nuttx-apps#2825

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: OS Components OS Components issues Size: M The size of the change in this PR is medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants