Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected IRQ trap at vector 00 #196

Open
shenki opened this issue Aug 12, 2020 · 0 comments
Open

unexpected IRQ trap at vector 00 #196

shenki opened this issue Aug 12, 2020 · 0 comments

Comments

@shenki
Copy link
Member

shenki commented Aug 12, 2020

happens whenever the host reboots?

5.8.0-00136-g25e684fb22df-dirty
witherspoon

Kernel tree is dev-5.8 plus the fsi compatability symlink patch, and -dirty is disabling the xdma and kcs4 devices.

[  292.255123] irq 0, desc: 9da69a1e, depth: 1, count: 0, unhandled: 0
[  292.261461] ->handle_irq():  4b7ada57, handle_bad_irq+0x0/0x244
[  292.267398] ->irq_data.chip(): db180b77, 0x80c139ac
[  292.272275] ->action(): 00000000
[  292.275509]    IRQ_NOPROBE set
[  292.278566]  IRQ_NOREQUEST set
[  292.281627] unexpected IRQ trap at vector 00
[  496.150264] irq 0, desc: 9da69a1e, depth: 1, count: 0, unhandled: 0
[  496.156603] ->handle_irq():  4b7ada57, handle_bad_irq+0x0/0x244
[  496.162544] ->irq_data.chip(): db180b77, 0x80c139ac
[  496.167429] ->action(): 00000000
[  496.170664]    IRQ_NOPROBE set
[  496.173722]  IRQ_NOREQUEST set
[  496.176782] unexpected IRQ trap at vector 00
# cat /proc/interrupts 
           CPU0       
  0:          2      none     Edge    
 18:      70081      AVIC  16 Edge      FTTMR010-TIMER1
 20:       2786      AVIC   2 Edge      eth0
 21:        286      AVIC   5 Edge      aspeed_vhub
 22:        354      AVIC  25 Edge      aspeed gfx
 23:          0      AVIC   7 Edge      aspeed-video
 32:       7870      AVIC   9 Edge      ttyS0
 33:       3145      AVIC  10 Edge      ttyS4
 34:      19326      AVIC   8 Edge      ipmi-bt-host, ttyS5
 36:          0     dummy   2 Edge      1e78a0c0.i2c-bus
 37:     362091     dummy   3 Edge      1e78a100.i2c-bus
 38:     200010     dummy   4 Edge      1e78a140.i2c-bus
 39:     193864     dummy   5 Edge      1e78a180.i2c-bus
 40:       2154     dummy   9 Edge      1e78a380.i2c-bus
 41:          0     dummy  10 Edge      1e78a3c0.i2c-bus
 42:      22006     dummy  11 Edge      1e78a400.i2c-bus
 43:          0     dummy  12 Edge      1e78a440.i2c-bus
 44:          0     dummy  13 Edge      1e78a480.i2c-bus
 45:          0  1e780000.gpio  13 Edge      air-water
 46:          0  1e780000.gpio  74 Edge      checkstop
 47:          0  1e780000.gpio 127 Edge      ps0-presence
 48:          0  1e780000.gpio 104 Edge      ps1-presence
 49:          0  1e780000.gpio  67 Edge      gpiolib
IPI0:          0  CPU wakeup interrupts
IPI1:          0  Timer broadcast interrupts
IPI2:          0  Rescheduling interrupts
IPI3:          0  Function call interrupts
IPI4:          0  CPU stop interrupts
IPI5:          0  IRQ work interrupts
IPI6:          0  completion interrupts
Err:          2
amboar pushed a commit to amboar/linux that referenced this issue Nov 26, 2020
…u_desc

[ Upstream commit 6a8be1b ]

With SLUB DEBUG CONFIG below crash is seen as kmem_cache_alloc
is being called in non-atomic context.

To fix this issue, use GFP_ATOMIC instead of GFP_KERNEL kzalloc.

[  357.217088] BUG: sleeping function called from invalid context at mm/slab.h:498
[  357.217091] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/0
[  357.217092] INFO: lockdep is turned off.
[  357.217095] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.9.0-rc5-wt-ath+ openbmc#196
[  357.217096] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[  357.217097] Call Trace:
[  357.217098]  <IRQ>
[  357.217107]  ? ath11k_dp_htt_get_ppdu_desc+0xa9/0x170 [ath11k]
[  357.217110]  dump_stack+0x77/0xa0
[  357.217113]  ___might_sleep.cold+0xa6/0xb6
[  357.217116]  kmem_cache_alloc_trace+0x1f2/0x270
[  357.217122]  ath11k_dp_htt_get_ppdu_desc+0xa9/0x170 [ath11k]
[  357.217129]  ath11k_htt_pull_ppdu_stats.isra.0+0x96/0x270 [ath11k]
[  357.217135]  ath11k_dp_htt_htc_t2h_msg_handler+0xe7/0x1d0 [ath11k]
[  357.217137]  ? trace_hardirqs_on+0x1c/0x100
[  357.217143]  ath11k_htc_rx_completion_handler+0x207/0x370 [ath11k]
[  357.217149]  ath11k_ce_recv_process_cb+0x15e/0x1e0 [ath11k]
[  357.217151]  ? handle_irq_event+0x70/0xa8
[  357.217154]  ath11k_pci_ce_tasklet+0x10/0x30 [ath11k_pci]
[  357.217157]  tasklet_action_common.constprop.0+0xd4/0xf0
[  357.217160]  __do_softirq+0xc9/0x482
[  357.217162]  asm_call_on_stack+0x12/0x20
[  357.217163]  </IRQ>
[  357.217166]  do_softirq_own_stack+0x49/0x60
[  357.217167]  irq_exit_rcu+0x9a/0xd0
[  357.217169]  common_interrupt+0xa1/0x190
[  357.217171]  asm_common_interrupt+0x1e/0x40
[  357.217173] RIP: 0010:cpu_idle_poll.isra.0+0x2e/0x60
[  357.217175] Code: 8b 35 26 27 74 69 e8 11 c8 3d ff e8 bc fa 42 ff e8 e7 9f 4a ff fb 65 48 8b 1c 25 80 90 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 0
[  357.217177] RSP: 0018:ffffffff97403ee0 EFLAGS: 00000202
[  357.217178] RAX: 0000000000000001 RBX: ffffffff9742b8c0 RCX: 0000000000b890ca
[  357.217180] RDX: 0000000000b890ca RSI: 0000000000000001 RDI: ffffffff968d0c49
[  357.217181] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[  357.217182] R10: ffffffff9742b8c0 R11: 0000000000000046 R12: 0000000000000000
[  357.217183] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000066fdf520
[  357.217186]  ? cpu_idle_poll.isra.0+0x19/0x60
[  357.217189]  do_idle+0x5f/0xe0
[  357.217191]  cpu_startup_entry+0x14/0x20
[  357.217193]  start_kernel+0x443/0x464
[  357.217196]  secondary_startup_64+0xa4/0xb0

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Wen Gong <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit to amboar/linux that referenced this issue Nov 26, 2020
…adlock

[ Upstream commit df64880 ]

After base_lock which occupy by ath11k_regd_update, the softirq run for
WMI_REG_CHAN_LIST_CC_EVENTID maybe arrived and it also need to accuire
the spin lock, then deadlock happend, change to disable softirqis to solve it.

[  235.576990] ================================
[  235.576991] WARNING: inconsistent lock state
[  235.576993] 5.9.0-rc5-wt-ath+ openbmc#196 Not tainted
[  235.576994] --------------------------------
[  235.576995] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  235.576997] kworker/u16:1/98 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  235.576998] ffff9655f75cad98 (&ab->base_lock){+.?.}-{2:2}, at: ath11k_regd_update+0x28/0x1d0 [ath11k]
[  235.577009] {IN-SOFTIRQ-W} state was registered at:
[  235.577013]   __lock_acquire+0x219/0x6e0
[  235.577015]   lock_acquire+0xb6/0x270
[  235.577018]   _raw_spin_lock+0x2c/0x70
[  235.577023]   ath11k_reg_chan_list_event.isra.0+0x10d/0x1e0 [ath11k]
[  235.577028]   ath11k_wmi_tlv_op_rx+0x3c3/0x560 [ath11k]
[  235.577033]   ath11k_htc_rx_completion_handler+0x207/0x370 [ath11k]
[  235.577039]   ath11k_ce_recv_process_cb+0x15e/0x1e0 [ath11k]
[  235.577041]   ath11k_pci_ce_tasklet+0x10/0x30 [ath11k_pci]
[  235.577043]   tasklet_action_common.constprop.0+0xd4/0xf0
[  235.577045]   __do_softirq+0xc9/0x482
[  235.577046]   asm_call_on_stack+0x12/0x20
[  235.577048]   do_softirq_own_stack+0x49/0x60
[  235.577049]   irq_exit_rcu+0x9a/0xd0
[  235.577050]   common_interrupt+0xa1/0x190
[  235.577052]   asm_common_interrupt+0x1e/0x40
[  235.577053]   cpu_idle_poll.isra.0+0x2e/0x60
[  235.577055]   do_idle+0x5f/0xe0
[  235.577056]   cpu_startup_entry+0x14/0x20
[  235.577058]   start_kernel+0x443/0x464
[  235.577060]   secondary_startup_64+0xa4/0xb0
[  235.577061] irq event stamp: 432035
[  235.577063] hardirqs last  enabled at (432035): [<ffffffff968d12b4>] _raw_spin_unlock_irqrestore+0x34/0x40
[  235.577064] hardirqs last disabled at (432034): [<ffffffff968d10d3>] _raw_spin_lock_irqsave+0x63/0x80
[  235.577066] softirqs last  enabled at (431998): [<ffffffff967115c1>] inet6_fill_ifla6_attrs+0x3f1/0x430
[  235.577067] softirqs last disabled at (431996): [<ffffffff9671159f>] inet6_fill_ifla6_attrs+0x3cf/0x430
[  235.577068]
[  235.577068] other info that might help us debug this:
[  235.577069]  Possible unsafe locking scenario:
[  235.577069]
[  235.577070]        CPU0
[  235.577070]        ----
[  235.577071]   lock(&ab->base_lock);
[  235.577072]   <Interrupt>
[  235.577073]     lock(&ab->base_lock);
[  235.577074]
[  235.577074]  *** DEADLOCK ***
[  235.577074]
[  235.577075] 3 locks held by kworker/u16:1/98:
[  235.577076]  #0: ffff9655f75b1d48 ((wq_completion)ath11k_qmi_driver_event){+.+.}-{0:0}, at: process_one_work+0x1d3/0x5d0
[  235.577079]  openbmc#1: ffffa33cc02f3e70 ((work_completion)(&ab->qmi.event_work)){+.+.}-{0:0}, at: process_one_work+0x1d3/0x5d0
[  235.577081]  openbmc#2: ffff9655f75cad50 (&ab->core_lock){+.+.}-{3:3}, at: ath11k_core_qmi_firmware_ready.part.0+0x4e/0x160 [ath11k]
[  235.577087]
[  235.577087] stack backtrace:
[  235.577088] CPU: 3 PID: 98 Comm: kworker/u16:1 Not tainted 5.9.0-rc5-wt-ath+ openbmc#196
[  235.577089] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[  235.577095] Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work [ath11k]
[  235.577096] Call Trace:
[  235.577100]  dump_stack+0x77/0xa0
[  235.577102]  mark_lock_irq.cold+0x15/0x3c
[  235.577104]  mark_lock+0x1d7/0x540
[  235.577105]  mark_usage+0xc7/0x140
[  235.577107]  __lock_acquire+0x219/0x6e0
[  235.577108]  ? sched_clock_cpu+0xc/0xb0
[  235.577110]  lock_acquire+0xb6/0x270
[  235.577116]  ? ath11k_regd_update+0x28/0x1d0 [ath11k]
[  235.577118]  ? atomic_notifier_chain_register+0x2d/0x40
[  235.577120]  _raw_spin_lock+0x2c/0x70
[  235.577125]  ? ath11k_regd_update+0x28/0x1d0 [ath11k]
[  235.577130]  ath11k_regd_update+0x28/0x1d0 [ath11k]
[  235.577136]  __ath11k_mac_register+0x3fb/0x480 [ath11k]
[  235.577141]  ath11k_mac_register+0x119/0x180 [ath11k]
[  235.577146]  ath11k_core_pdev_create+0x17/0xe0 [ath11k]
[  235.577150]  ath11k_core_qmi_firmware_ready.part.0+0x65/0x160 [ath11k]
[  235.577155]  ath11k_qmi_driver_event_work+0x1c5/0x230 [ath11k]
[  235.577158]  process_one_work+0x265/0x5d0
[  235.577160]  worker_thread+0x49/0x300
[  235.577161]  ? process_one_work+0x5d0/0x5d0
[  235.577163]  kthread+0x135/0x150
[  235.577164]  ? kthread_create_worker_on_cpu+0x60/0x60
[  235.577166]  ret_from_fork+0x22/0x30

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Wen Gong <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant