Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-Wframe-larger-than= in net/core/ethtool.c #206

Closed
nickdesaulniers opened this issue Oct 10, 2018 · 2 comments
Closed

-Wframe-larger-than= in net/core/ethtool.c #206

nickdesaulniers opened this issue Oct 10, 2018 · 2 comments
Labels
-Wframe-larger-than= [ARCH] arm32 This bug impacts ARCH=arm [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 5.1 This bug was fixed in Linux 5.1

Comments

@nickdesaulniers
Copy link
Member

net/core/ethtool.c:2596:5: warning: stack frame size of 1256 bytes in function 'dev_ethtool' [-Wframe-larger-than=]
int dev_ethtool(struct net *net, struct ifreq *ifr)
    ^

not a dupe of #39 , no KASAN

@nickdesaulniers nickdesaulniers added [BUG] linux A bug that should be fixed in the mainline kernel. [ARCH] x86_64 This bug impacts ARCH=x86_64 low priority This bug is not critical and not a priority -Wframe-larger-than= labels Oct 10, 2018
@nathanchance
Copy link
Member

Also applies to arm32 (multi_v7_defconfig):

net/core/ethtool.c:2526:5: warning: stack frame size of 1224 bytes in function 'dev_ethtool' [-Wframe-larger-than=]
int dev_ethtool(struct net *net, struct ifreq *ifr)
    ^
1 warning generated.

@nathanchance nathanchance added the [ARCH] arm32 This bug impacts ARCH=arm label Oct 17, 2018
@nickdesaulniers
Copy link
Member Author

I no longer observe this for x86. (haven't tested arm32 yet) on 5.2-rc1.

@nickdesaulniers nickdesaulniers removed the [ARCH] x86_64 This bug impacts ARCH=x86_64 label May 20, 2019
nathanchance pushed a commit that referenced this issue Nov 7, 2019
I got some khugepaged spew on a 32bit x86:

  BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged
  INFO: lockdep is turned off.
  CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206
  Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203    07/08/2009
  Call Trace:
   dump_stack+0x66/0x8e
   ___might_sleep.cold.96+0x95/0xa6
   __might_sleep+0x2e/0x80
   collapse_huge_page.isra.51+0x5ac/0x1360
   khugepaged+0x9a9/0x20f0
   kthread+0xf5/0x110
   ret_from_fork+0x2e/0x38

Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic()
vs.  mmu_notifier_invalidate_range_start().  Let's do the naive approach
and just reorder the two operations.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 810e24e ("mm/mmu_notifiers: annotate with might_sleep()")
Signed-off-by: Ville Syrjl <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
@nickdesaulniers nickdesaulniers added the unreproducible Not or no longer reproducible label Feb 8, 2020
@agners agners added [FIXED][LINUX] 5.1 This bug was fixed in Linux 5.1 and removed unreproducible Not or no longer reproducible labels Feb 8, 2020
@tpimh tpimh removed the low priority This bug is not critical and not a priority label Feb 15, 2020
nathanchance pushed a commit that referenced this issue Oct 29, 2020
When unbinding the hns3 driver with the HNS3 VF, I got the following
kernel panic:

[  265.709989] Unable to handle kernel paging request at virtual address ffff800054627000
[  265.717928] Mem abort info:
[  265.720740]   ESR = 0x96000047
[  265.723810]   EC = 0x25: DABT (current EL), IL = 32 bits
[  265.729126]   SET = 0, FnV = 0
[  265.732195]   EA = 0, S1PTW = 0
[  265.735351] Data abort info:
[  265.738227]   ISV = 0, ISS = 0x00000047
[  265.742071]   CM = 0, WnR = 1
[  265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000
[  265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000
[  265.764314] Internal error: Oops: 96000047 [#1] SMP
[  265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206
[  265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[  265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[  265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300
[  265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300
[  265.861992] sp : ffff80004c983b50
[  265.881411] pmr_save: 000000e0
[  265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500
[  265.889744] x27: 0000000000000000 x26: 0000000000000000
[  265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90
[  265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150
[  265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90
[  265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004
[  265.916198] x17: 0000000000000000 x16: ffff800011ae32e8
[  265.916201] x15: 0000000000000028 x14: 0000000000000002
[  265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8
[  265.946619] x11: ffff80004c983b50 x10: 0000000000000000
[  265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000
[  265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff
[  265.951917] x5 : ffff80004c983930 x4 : 0000000000000001
[  265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100
[  265.964487] x1 : 0000000000000000 x0 : ffff800054627000
[  265.964491] Call trace:
[  265.964494]  hclgevf_cmd_uninit+0xbc/0x300
[  265.964496]  hclgevf_uninit_ae_dev+0x9c/0xe8
[  265.964501]  hnae3_unregister_ae_dev+0xb0/0x130
[  265.964516]  hns3_remove+0x34/0x88 [hns3]
[  266.009683]  pci_device_remove+0x48/0xf0
[  266.009692]  device_release_driver_internal+0x114/0x1e8
[  266.030058]  device_driver_detach+0x28/0x38
[  266.034224]  unbind_store+0xd4/0x108
[  266.037784]  drv_attr_store+0x40/0x58
[  266.041435]  sysfs_kf_write+0x54/0x80
[  266.045081]  kernfs_fop_write+0x12c/0x250
[  266.049076]  vfs_write+0xc4/0x248
[  266.052378]  ksys_write+0x74/0xf8
[  266.055677]  __arm64_sys_write+0x24/0x30
[  266.059584]  el0_svc_common.constprop.3+0x84/0x270
[  266.064354]  do_el0_svc+0x34/0xa0
[  266.067658]  el0_svc+0x38/0x40
[  266.070700]  el0_sync_handler+0x8c/0xb0
[  266.074519]  el0_sync+0x140/0x180

It looks like the BAR memory region had already been unmapped before we
start clearing CMDQ registers in it, which is pretty bad and the kernel
happily kills itself because of a Current EL Data Abort (on arm64).

Moving the CMDQ uninitialization a bit early fixes the issue for me.

Fixes: 862d969 ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Zenghui Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-Wframe-larger-than= [ARCH] arm32 This bug impacts ARCH=arm [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 5.1 This bug was fixed in Linux 5.1
Projects
None yet
Development

No branches or pull requests

4 participants