gpu related crashes with kernel >= 6.9.7 #309

oliverbestmann · 2024-07-17T06:13:57Z

Since updating from 6.9.5 to to 6.9.6 (and 6.9.9) i get random gpu/drm related crashes after a few minutes of usage.

Jul 15 10:20:18 m1pro kernel: ------------[ cut here ]------------
Jul 15 10:20:18 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 10:20:18 m1pro kernel: WARNING: CPU: 0 PID: 15794 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: sp : ffff800090397440
Jul 15 10:20:18 m1pro kernel: x29: ffff800090397440 x28: 0000000000000030 x27: ffff000014ad5000
Jul 15 10:20:18 m1pro kernel: x26: ffff80007a55d948 x25: 0000000000000000 x24: ffff000139b5dc00
Jul 15 10:20:18 m1pro kernel: x23: ffff800090397888 x22: ffff000139b5cb38 x21: ffff0005be57f5d8
Jul 15 10:20:18 m1pro kernel: x20: ffff00013bfb1c08 x19: ffff00013bfb1c08 x18: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 10:20:18 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 10:20:18 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 21401009 (nzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 10:20:18 m1pro kernel: sp : ffff800090395d40
Jul 15 10:20:18 m1pro kernel: x29: ffff800090395d50 x28: 00000000ffffffa0 x27: ffff000639ee3280
Jul 15 10:20:18 m1pro kernel: x26: ffffffa00000c984 x25: 0000000000212a9c x24: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x23: 736120492079616d x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 10:20:18 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 10:20:18 m1pro kernel: x8 : c98580007a45d9c4 x7 : 0000000000000cc0 x6 : 0000000000000318
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000064ce340
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000200 x1 : 736120492079616d x0 : ffff000001f2cb00
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 10:20:18 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 10:20:18 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw6vertex17RunVertexG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1u_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtN>
Jul 15 10:20:18 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x1ba8/0x1dd0 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP

Going back to 6.9.5 brings back a stable system.

The text was updated successfully, but these errors were encountered:

jannau · 2024-07-17T07:27:03Z

There isn't much of change between asahi-6.9.5-1 and asahi-6.9.6-1 and I don't see relevant changes.

It looks like there is an issue with handling failing drm_sched_can_queue() calls. Is the (GPU) workload at the time of the error in any way remarkable?

mkurz · 2024-07-17T07:38:26Z

It looks like there is an issue with handling failing drm_sched_can_queue() calls. Is the (GPU) workload at the time of the error in any way remarkable?

I was running in the same (or similar) drm_sched_can_queue problem last week when I upgraded to 6.9.7-1. I downgraded to 6.9.6-1 and had no issues since then anymore. I didn't report because I thought all this is WIP, but maybe this is a bug? (Or is this fixed with a newer release?)

Jul 08 22:08:45 mkurz-macbook-pro kernel: ------------[ cut here ]------------
Jul 08 22:08:45 mkurz-macbook-pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 08 22:08:45 mkurz-macbook-pro kernel: WARNING: CPU: 1 PID: 4579 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: Modules linked in: tls ppp_async l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_gene>
Jul 08 22:08:45 mkurz-macbook-pro kernel: CPU: 1 PID: 4579 Comm: Renderer Tainted: G S                 6.9.7-asahi-1-1-ARCH #1
Jul 08 22:08:45 mkurz-macbook-pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pc : drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: lr : drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: sp : ffff80009ba07440
Jul 08 22:08:45 mkurz-macbook-pro kernel: x29: ffff80009ba07440 x28: ffff00049660c000 x27: 000000000000000d
Jul 08 22:08:45 mkurz-macbook-pro kernel: x26: ffff80009ba07608 x25: ffff00001579df80 x24: ffff800081739000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x23: ffff00000dce3000 x22: ffff000011ed6938 x21: ffff00049660c1d8
Jul 08 22:08:45 mkurz-macbook-pro kernel: x20: ffff00002e3a1c08 x19: ffff00002e3a1c08 x18: 0000000000000050
Jul 08 22:08:45 mkurz-macbook-pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 08 22:08:45 mkurz-macbook-pro kernel: x14: ffff80008153d288 x13: 2e657461636e7572 x12: 74202c74696d696c
Jul 08 22:08:45 mkurz-macbook-pro kernel: x11: ffff80008153d288 x10: 0000000000000316 x9 : ffff8000815ed288
Jul 08 22:08:45 mkurz-macbook-pro kernel: x8 : 000000000002ffe8 x7 : 00000000ffffe000 x6 : ffff8000815ed288
Jul 08 22:08:45 mkurz-macbook-pro kernel: x5 : 80000000ffffe000 x4 : 0000000000000002 x3 : ffff800081318008
Jul 08 22:08:45 mkurz-macbook-pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001188f6d00
Jul 08 22:08:45 mkurz-macbook-pro kernel: Call trace:
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_wakeup+0x18/0x5c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_entity_push_job+0x168/0x1c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvXsI_NtCsfLhZwm4SDSu_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0x131c/0x1604
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvNvXs_NtCsfLhZwm4SDSu_5asahi6driverNtB6_11AsahiDriverNtNtNtCs48FVigIbjZk_6kernel3drm3drv6Driver6IOCTL>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl_kernel+0xbc/0x130
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl+0x20c/0x4c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  __arm64_sys_ioctl+0x2cc/0xc9c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  invoke_syscall.constprop.0+0x50/0xe4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  do_el0_svc+0x40/0xdc
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0_svc+0x38/0x160
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 22:08:45 mkurz-macbook-pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 22:08:45 mkurz-macbook-pro kernel: ------------[ cut here ]------------
Jul 08 22:08:45 mkurz-macbook-pro kernel: WARNING: CPU: 1 PID: 4579 at mm/slub.c:4358 free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel: Modules linked in: tls ppp_async l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_gene>
Jul 08 22:08:45 mkurz-macbook-pro kernel: CPU: 1 PID: 4579 Comm: Renderer Tainted: G S      W          6.9.7-asahi-1-1-ARCH #1
Jul 08 22:08:45 mkurz-macbook-pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pstate: 41400009 (nZcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pc : free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel: lr : kfree+0x160/0x1b4
Jul 08 22:08:45 mkurz-macbook-pro kernel: sp : ffff80009ba05dc0
Jul 08 22:08:45 mkurz-macbook-pro kernel: x29: ffff80009ba05dc0 x28: ffff000136110800 x27: ffff00009207e1c0
Jul 08 22:08:45 mkurz-macbook-pro kernel: x26: ffff000027a57008 x25: 0000000002806900 x24: ffffffa000000074
Jul 08 22:08:45 mkurz-macbook-pro kernel: x23: ffffffa6001ab980 x22: ffff80009ba06290 x21: 0000000000000001
Jul 08 22:08:45 mkurz-macbook-pro kernel: x20: ffff000400000500 x19: ffffff7fc4000000 x18: 000000000007815a
Jul 08 22:08:45 mkurz-macbook-pro kernel: x17: 0000000000000000 x16: 00000000ffff0000 x15: 00000000ffffffa6
Jul 08 22:08:45 mkurz-macbook-pro kernel: x14: 001ac0d800000000 x13: 9393939300000000 x12: 0000000000000000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x11: 0000000000000000 x10: 00000000000002a0 x9 : 0000000000000000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x8 : 0000000000000000 x7 : 00000000000002a0 x6 : ffff00039be81900
Jul 08 22:08:45 mkurz-macbook-pro kernel: x5 : ffff80009ba06348 x4 : ffff0001188f6d00 x3 : ffff8000a7be6140
Jul 08 22:08:45 mkurz-macbook-pro kernel: x2 : 0000000000000001 x1 : ffff000400000500 x0 : 0000000000000000
ul 08 22:08:45 mkurz-macbook-pro kernel: Call trace:
Jul 08 22:08:45 mkurz-macbook-pro kernel:  free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  kfree+0x160/0x1b4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RINvMs8_NtCsfLhZwm4SDSu_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V12_3INtNtB8_>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvMs_NtNtCsfLhZwm4SDSu_5asahi5queue6renderNtB6_13QueueG13V12_313submit_render+0x169c/0x1da4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvXsI_NtCsfLhZwm4SDSu_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0xfcc/0x1604
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvNvXs_NtCsfLhZwm4SDSu_5asahi6driverNtB6_11AsahiDriverNtNtNtCs48FVigIbjZk_6kernel3drm3drv6Driver6IOCTL>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl_kernel+0xbc/0x130
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl+0x20c/0x4c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  __arm64_sys_ioctl+0x2cc/0xc9c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  invoke_syscall.constprop.0+0x50/0xe4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  do_el0_svc+0x40/0xdc
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0_svc+0x38/0x160
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 22:08:45 mkurz-macbook-pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 22:08:45 mkurz-macbook-pro kernel: object pointer: 0x00000000c6ae86e4

jannau · 2024-07-17T07:46:49Z

asahi-6.9.7-1 contains @asahilina's GPUVM changes so a regression caused by that is at least possible

cyrinux · 2024-07-17T08:27:47Z

Hi, as I see that @mkurz run a macbook pro, for information, I got this issue on a m2 air. This is totally random but happen several times per day.

asahilina · 2024-07-17T08:36:57Z

This is in drm/sched so it's less likely to be GPUVM related...

Jobs may not exceed the credit limit, truncate.

This is an impossible condition, since the job credit count is always 1 and the credit limit is 1280 or something like that. So I think there is some kind of memory corruption...

asahilina · 2024-07-17T08:48:51Z

The realloc crash has some interesting strings...

>>> bytes.fromhex("736120492079616d")[::-1]
b'may I as'

This string is not from the kernel... @oliverbestmann, do you have any idea where this came from?

asahilina · 2024-07-17T08:53:31Z

Also are we sure this is reproducible with v6.9.6 in at least some cases? Because then it can't be the GPUVM stuff...

jannau · 2024-07-17T09:14:41Z

If it's reproducible with asahi-6.9.6-1 there's no obvious change which would explain why it's not in asahi-6.9.5-1 as well. Nothing in git range-diff asahi-6.9.5-1...asahi-6.9.6-1 looks related.

asahilina · 2024-07-17T09:25:48Z

Are these kernels built with clang/llvm by any chance? So far everyone reporting this is on something other than Fedora, and Ella specifically pointed this out on Discord:

i have a small hunch its a compiler bug in clang or ub in drm sched causing freezing when built with clang

jannau · 2024-07-17T09:26:17Z

@cyrinux please describe which systems you use. Do you use Fedora-Asahi-Remix?

@mkurz / @oliverbestmann do you use LLVM or gcc to build the kernel?

cyrinux · 2024-07-17T09:29:35Z

@cyrinux please describe which systems you use. Do you use Fedora-Asahi-Remix?

I use nixos unstable with https://github.com/tpwrules/nixos-apple-silicon/ overlay. 😸

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x611f0320]
[    0.000000] Linux version 6.9.9-asahi (nixbld@localhost) (gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42) #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan  1 00:00:00 UTC 1980
[    0.000000] random: crng init done
[    0.000000] Machine model: Apple MacBook Air (13-inch, M2, 2022)
[    0.000000] efi: EFI v2.10 by Das U-Boot

Ella-0 · 2024-07-17T09:29:38Z

Are these kernels built with clang/llvm by any chance? So far everyone reporting this is on something other than Fedora, and Ella specifically pointed this out on Discord:

i have a small hunch its a compiler bug in clang or ub in drm sched causing freezing when built with clang

My kernel is built with GCC.

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x612f0240]
[    0.000000] Linux version 6.9.7-asahi (ella@natsu) (gcc (GCC) 14.1.1 20240507, GNU ld (GNU Binutils) 2.42.0) #2 SMP PREEMPT Fri Jul  5 23:30:34 GMT 2024
[    0.000000] KASLR enabled
[    0.000000] random: crng init done
[    0.000000] Machine model: Apple MacBook Pro (14-inch, M1 Pro, 2021)

asahilina · 2024-07-17T10:36:56Z

Please also report your Mesa versions, and the Rust version used for the kernel compile too.

At this point I'm pretty sure this is random memory corruption, but none of us on Fedora can reproduce it so far...

cyrinux · 2024-07-17T11:53:28Z

Please also report your Mesa versions, and the Rust version used for the kernel compile too.

At this point I'm pretty sure this is random memory corruption, but none of us on Fedora can reproduce it so far...

$ nix-store --query --requisites /run/current-system | cut -d- -f2- | sort -u | grep -E "(rustc|mesa)"
mesa-24.2.0
mesa-24.2.0-drivers
rustc-1.78.0
rustc-wrapper-1.78.0

mkurz · 2024-07-17T14:07:56Z

I am running Arch Linux ARM with all packages up to date, thanks to @joske's pull requests: https://github.com/AsahiLinux/PKGBUILDs/pulls/joske

You find the PKGBUILD I am using here: https://github.com/joske/PKGBUILDs/tree/kernel/linux-asahi
However, latest one in that branch is using 6.9.7-1 which did crash for me, so I downgraded to 6.9.6-1 one commit before: https://github.com/joske/PKGBUILDs/tree/42afae8c0c27efad565957f5213e096ef971c7bf/linux-asahi - no issues since 2 weeks.
I just run makepkg -sicAL to build and install.

Jul 17 15:00:19 mkurz-macbook-pro kernel: Linux version 6.9.6-asahi-1-3-ARCH (linux-asahi@archlinux) (gcc (GCC) 14.1.1 20240507, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Mon, 08 Jul 2024 21:53:07 +0000

$ yay -Q | grep -E 'llvm|clang|mesa|rust|gcc|glibc'
clang 18.1.8-1
gcc 14.1.1+r1+g43b730b9134-1
gcc-libs 14.1.1+r1+g43b730b9134-1
glibc 2.39+r52+gf8e4623421-1
llvm 18.1.8-3
llvm-libs 18.1.8-3
mesa-asahi-edge 24.2.0_pre20240527-3
mesa-asahi-edge-debug 24.2.0_pre20240527-3
mesa-utils 9.0.0-4
rust-bindgen 0.69.4-1
rustup 1.27.1-1
spirv-llvm-translator 18.1.2-1

$ cat rust-toolchain.toml 
[toolchain]
channel = "1.76.0"
components = ["rustc", "cargo", "rust-src"]
targets = ["aarch64-unknown-linux-gnu"]

So for me this happend when going from 6.9.6-1 to 6.9.7-1

mkurz · 2024-07-17T14:09:55Z

btw. after upgrading llvm/clang I had to re-compile mesa.

asahilina · 2024-07-17T16:07:58Z

I'm bisecting configs and running into some scary mm-related crashes that have nothing to do with the GPU. I think there is some horrible regression here that affects some kernel configs...

Everyone, please post the value of these kernel configs:

CONFIG_ARM64_PA_BITS CONFIG_ARM64_VA_BITS CONFIG_PGTABLE_LEVELS

For reference, on Fedora we have:

CONFIG_ARM64_PA_BITS=48
CONFIG_ARM64_VA_BITS=48
CONFIG_PGTABLE_LEVELS=4

maximbaz · 2024-07-17T16:15:07Z

Answering for NixOS (same setup as @cyrinux above), the values seem to be the same as on Fedora.

mkurz · 2024-07-17T19:38:54Z

From https://github.com/joske/PKGBUILDs/blob/kernel/linux-asahi/config:

$ grep -E 'CONFIG_ARM64_PA_BITS|CONFIG_ARM64_VA_BITS|CONFIG_PGTABLE_LEVELS' config 
CONFIG_PGTABLE_LEVELS=4
# CONFIG_ARM64_VA_BITS_36 is not set
# CONFIG_ARM64_VA_BITS_47 is not set
CONFIG_ARM64_VA_BITS_48=y
# CONFIG_ARM64_VA_BITS_52 is not set
CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PA_BITS_48=y
CONFIG_ARM64_PA_BITS=48

Both the same when building 6.9.6-1 or 6.9.7-1.

The only difference between in config between the two kernels is: joske/PKGBUILDs@14913f3#diff-3a3fd6cbc5653e937609572c62143e181842a4a1ebdc1b55e9e2e34e6aa6c5fc

montchr · 2024-07-18T01:10:39Z

I just ran into this, also using https://github.com/tpwrules/nixos-apple-silicon/tree/6015c1e2f91896e0b7a983c2824c665af32f568a

Jul 17 20:30:16 tuvok kernel: ------------[ cut here ]------------
Jul 17 20:30:16 tuvok kernel: asahi 206400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 17 20:30:16 tuvok kernel: WARNING: CPU: 3 PID: 19136 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: Modules linked in: usbhid xhci_plat_hcd xhci_hcd xt_mark snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype overlay bnep brcmfmac_wcc joydev hid_magicmouse appledrm macsmc_hwmon macsmc_reboot macsmc_power macsmc_hid ofpart tps6598x snd_soc_cs42l84 spi_nor apple_isp videobuf2_dma_sg snd_soc_tas2764 hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_admac clk_apple_nco apple_dcp videobuf2_common asahi pwm_apple mux_core mc drm_dma_helper apple_soc_cpufreq snd_soc_apple_mca hci_bcm4377 brcmfmac bluetooth brcmutil snd_soc_macaudio leds_pwm cfg80211 ecdh_generic ecc rfkill xt_conntrack ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables uinput evdi(O) loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap macvlan bridge stp llc fuse nfnetlink ip_tables nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi dockchannel_hid regmap_spmi phy_apple_atc pcie_apple pci_host_common
Jul 17 20:30:16 tuvok kernel:  typec macsmc_rtkit dwc3 nvme_apple macsmc mfd_core apple_rtkit_helper nvmem_apple_efuses spmi_apple_controller udc_core apple_dockchannel apple_sart pinctrl_apple_gpio i2c_pasemi_platform spi_apple i2c_pasemi_core apple_dart
Jul 17 20:30:16 tuvok kernel: CPU: 3 PID: 19136 Comm: Renderer Tainted: G S         O       6.9.9-asahi #1-NixOS
Jul 17 20:30:16 tuvok kernel: Hardware name: Apple MacBook Air (13-inch, M2, 2022) (DT)
Jul 17 20:30:16 tuvok kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 17 20:30:16 tuvok kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: sp : ffff800098ea7440
Jul 17 20:30:16 tuvok kernel: x29: ffff800098ea7440 x28: 0000000000000030 x27: ffff00001262a000
Jul 17 20:30:16 tuvok kernel: x26: ffff80007a421910 x25: 0000000000000000 x24: ffff0000262c7300
Jul 17 20:30:16 tuvok kernel: x23: ffff800098ea7888 x22: ffff0001db450338 x21: ffff0000a78b99d8
Jul 17 20:30:16 tuvok kernel: x20: ffff00022c869208 x19: ffff00022c869208 x18: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 17 20:30:16 tuvok kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: Call trace:
Jul 17 20:30:16 tuvok kernel:  drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel:  drm_sched_wakeup+0x18/0x7c
Jul 17 20:30:16 tuvok kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 17 20:30:16 tuvok kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 17 20:30:16 tuvok kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 17 20:30:16 tuvok kernel:  drm_ioctl+0x23c/0x4e4
Jul 17 20:30:16 tuvok kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 17 20:30:16 tuvok kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 17 20:30:16 tuvok kernel:  do_el0_svc+0x40/0xf0
Jul 17 20:30:16 tuvok kernel:  el0_svc+0x34/0x11c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync+0x190/0x194
Jul 17 20:30:16 tuvok kernel: ---[ end trace 0000000000000000 ]---
Jul 17 20:30:16 tuvok kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 17 20:30:16 tuvok kernel: Mem abort info:
Jul 17 20:30:16 tuvok kernel:   ESR = 0x0000000096000007
Jul 17 20:30:16 tuvok kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 17 20:30:16 tuvok kernel:   SET = 0, FnV = 0
Jul 17 20:30:16 tuvok kernel:   EA = 0, S1PTW = 0
Jul 17 20:30:16 tuvok kernel:   FSC = 0x07: level 3 translation fault
Jul 17 20:30:16 tuvok kernel: Data abort info:
Jul 17 20:30:16 tuvok kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 17 20:30:16 tuvok kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 17 20:30:16 tuvok kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 17 20:30:16 tuvok kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000000bc4b30000
Jul 17 20:30:16 tuvok kernel: [ffff000000000700] pgd=1800000bce3fc003, p4d=1800000bce3fc003, pud=1800000bce3f8003, pmd=1800000bce3f4003, pte=0000000000000000
Jul 17 20:30:16 tuvok kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 17 20:30:16 tuvok kernel: Modules linked in: usbhid xhci_plat_hcd xhci_hcd xt_mark snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype overlay bnep brcmfmac_wcc joydev hid_magicmouse appledrm macsmc_hwmon macsmc_reboot macsmc_power macsmc_hid ofpart tps6598x snd_soc_cs42l84 spi_nor apple_isp videobuf2_dma_sg snd_soc_tas2764 hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_admac clk_apple_nco apple_dcp videobuf2_common asahi pwm_apple mux_core mc drm_dma_helper apple_soc_cpufreq snd_soc_apple_mca hci_bcm4377 brcmfmac bluetooth brcmutil snd_soc_macaudio leds_pwm cfg80211 ecdh_generic ecc rfkill xt_conntrack ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables uinput evdi(O) loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap macvlan bridge stp llc fuse nfnetlink ip_tables nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi dockchannel_hid regmap_spmi phy_apple_atc pcie_apple pci_host_common
Jul 17 20:30:16 tuvok kernel:  typec macsmc_rtkit dwc3 nvme_apple macsmc mfd_core apple_rtkit_helper nvmem_apple_efuses spmi_apple_controller udc_core apple_dockchannel apple_sart pinctrl_apple_gpio i2c_pasemi_platform spi_apple i2c_pasemi_core apple_dart
Jul 17 20:30:16 tuvok kernel: CPU: 3 PID: 19136 Comm: Renderer Tainted: G S      W  O       6.9.9-asahi #1-NixOS
Jul 17 20:30:16 tuvok kernel: Hardware name: Apple MacBook Air (13-inch, M2, 2022) (DT)
Jul 17 20:30:16 tuvok kernel: pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 17 20:30:16 tuvok kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 17 20:30:16 tuvok kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 17 20:30:16 tuvok kernel: sp : ffff800098ea5cf0
Jul 17 20:30:16 tuvok kernel: x29: ffff800098ea5d00 x28: 00000000005a0112 x27: ffff000081b13f00
Jul 17 20:30:16 tuvok kernel: x26: 00000000faa60000 x25: 00000000ffffffa0 x24: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 17 20:30:16 tuvok kernel: x20: ffff000001f48b00 x19: 0000000000000328 x18: 00000000000000ff
Jul 17 20:30:16 tuvok kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x14: 0000000000000000 x13: 0000000100000000 x12: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x11: 0000000000000001 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 17 20:30:16 tuvok kernel: x8 : d0b580007a3219c4 x7 : 0000000000000cc0 x6 : 0000000000000328
Jul 17 20:30:16 tuvok kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000002ce914c3
Jul 17 20:30:16 tuvok kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f48b00
Jul 17 20:30:16 tuvok kernel: Call trace:
Jul 17 20:30:16 tuvok kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 17 20:30:16 tuvok kernel:  krealloc+0x9c/0x144
Jul 17 20:30:16 tuvok kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 17 20:30:16 tuvok kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG14V12_4INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs0_NtNtB8_5queue6renderNtB3Q_18QueueInnerG14V12_413submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG14V12_4B4X_EB4X_B4X_NCB3I_s1_0NCB3I_s2_0EB8_+0x800/0x1ea8 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvMs0_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG14V12_413submit_render+0x162c/0x1cd0 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 17 20:30:16 tuvok kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 17 20:30:16 tuvok kernel:  drm_ioctl+0x23c/0x4e4
Jul 17 20:30:16 tuvok kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 17 20:30:16 tuvok kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 17 20:30:16 tuvok kernel:  do_el0_svc+0x40/0xf0
Jul 17 20:30:16 tuvok kernel:  el0_svc+0x34/0x11c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync+0x190/0x194
Jul 17 20:30:16 tuvok kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9)
Jul 17 20:30:16 tuvok kernel: ---[ end trace 0000000000000000 ]---
Jul 17 20:30:27 tuvok kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 17 20:30:27 tuvok kernel: Mem abort info:
Jul 17 20:30:27 tuvok kernel:   ESR = 0x0000000096000007
Jul 17 20:30:27 tuvok kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 17 20:30:27 tuvok kernel:   SET = 0, FnV = 0
Jul 17 20:30:27 tuvok kernel:   EA = 0, S1PTW = 0
Jul 17 20:30:27 tuvok kernel:   FSC = 0x07: level 3 translation fault
Jul 17 20:30:27 tuvok kernel: Data abort info:
Jul 17 20:30:27 tuvok kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 17 20:30:27 tuvok kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 17 20:30:27 tuvok kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 17 20:30:27 tuvok kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000000bc4b30000
Jul 17 20:30:27 tuvok kernel: [ffff000000000700] pgd=1800000bce3fc003, p4d=1800000bce3fc003, pud=1800000bce3f8003, pmd=1800000bce3f4003, pte=0000000000000000
Jul 17 20:30:27 tuvok kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP

asahilina · 2024-07-18T02:39:35Z

Sorry, I really need a consistent way to reproduce this to track it down. So far I've been unable to repro the drm_sched_can_queue crash myself. The only thing I got are some unrelated crashes in futex code when trying @Ella-0's kernel config (that she sent me via Discord), that I bisected to that VA bits thing, ~~but that may be unrelated or maybe there is a deeper memory management issue~~ this turned out to be completely unrelated.

The crash itself makes no sense. It's memory corruption, where the drm_sched job gets clobbered with something else, and then somehow consistently after that the changes made by drm_sched directly cause a crash in the allocator, in what has to be a subsequent ioctl call because the drm_sched stuff is the last thing the ioctl does. That it's somehow this consistent is very, very strange. I would have expected heap corruption to manifest in more varied ways after the fact. The actual lifetimes of the allocations involved are extremely simple, so I'm 99% sure this isn't a silly lifetime problem in my code (at least not as it relates to the specific structures referenced in the crashes). The code in both the drm_sched_can_queue codepath and in at least one of the subsequent crash codepaths just allocates an object, uses it, and frees it. This is the kind of thing Rust makes almost impossible to get wrong. Unless there's a compiler bug somewhere, I don't see how it's possible for the root cause to be a simple lifetime issue, so I think this has to be a much deeper problem with memory management going wrong elsewhere, and we're just seeing the consequences somehow fairly consistently affect these structures in the GPU driver.

I tried running the same kernel under kASAN and came up with nothing. I also tried Ella's config with kASAN, still nothing, ~~and doing that avoided the crashes that correlated with 52-bit VA support being enabled too~~.

Best guess is there is a spurious page being freed or something like that, so memory is reused while it is still in use. I actually already ran into one of these before (fixed in 2bb1499) which would perfectly explain this kind of behavior, except for the fact that that particular one only happened on DART pagetable freeing which only really happens when unbinding drivers (which is why we didn't notice for so long). If there is a similar bug lurking somewhere else, but it only happens sometimes, then that might explain this and the other badness.

I'm 90% sure that there is an upstream regression in memory management somewhere here, but the only lead I have is that 52-bit VA thing, and I don't know if that is the same issue behind the drm_sched_can_queue crashes at this point or something else...

Edit: The 52-bit VA thing is unrelated unfortunately.

oliverbestmann · 2024-07-18T06:24:13Z

Sorry, seems like I am a bit late now, probably nothing new, but still:
I am also using NixOS on a mac book m1 pro, same kernel config applies as @maximbaz..
I checked my journalctl logs and it looks like i actually did not run 6.9.6 but 6.9.7, so @mkurz seems to be correct. All kernels are compiled with GCC 13.3.0, no clang. I do not know where may I as might came from. Chromium might have been open, so it could come from anywhere.
I've checked the logs of multiple crashes and it looks like it is always the same stacktrace, the register values do differ though.
For me the crashes seems to happen very quickly (~3min after boot) when using the zoom webapp.

asahilina · 2024-07-18T09:42:01Z

Unfortunately, I just confirmed that the 52-bit problem is completely unrelated. Upstream Linux is just broken with the combination of LPA2 (52-bit support), 16K pages, and non-LPA2 hardware. Please don't build with 52-bit support.

So now we're back to square one... I have no idea how to repro the GPU issue ;;

oliverbestmann · 2024-07-18T14:26:28Z

This implies that it is working fine for you on a macbook pro m1 with wayland and gnome? Running chromium also works? What information would be helpful to you?

asahilina · 2024-07-18T15:09:11Z

That's the first time I hear gnome is involved, and also nobody mentioned chromium until your previous post ^^;; (the OP does in fact mention the process name is chromium in the oops log, but I missed that bit...)

The more info about the setup I get the better, and if you can try more workloads (for example, webgl tests and other browsery things) and see if you can find something that reproduces it fast that would be very useful...

Right now I'm testing chromium on an M2 Pro Mac Mini and a bunch of maps and webGL stuff doesn't seem to cause any issues, but this is on Fedora. If there's something about the userspace build that matters here, maybe I need to install another distro...

oliverbestmann · 2024-07-18T16:47:20Z

You are right, I only mentioned wayland and gnome in the issue tpwrules/nixos-apple-silicon#218 here, I am sorry for that.

I just checked my previous boot logs to find everything i can. Here is a different stack trace. This ne does not contain the Warning about a kernel paging request:

Jul 15 09:32:20 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:32:20 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 09:32:20 m1pro kernel: WARNING: CPU: 1 PID: 3268 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4>
Jul 15 09:32:20 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_app>
Jul 15 09:32:20 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S                 6.9.9-asahi #1-NixOS
Jul 15 09:32:20 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:32:20 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:32:20 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: sp : ffff800095b37440
Jul 15 09:32:20 m1pro kernel: x29: ffff800095b37440 x28: 0000000000000030 x27: ffff00001332a000
Jul 15 09:32:20 m1pro kernel: x26: ffff80007a849948 x25: 0000000000000000 x24: ffff0000b5124b00
Jul 15 09:32:20 m1pro kernel: x23: ffff800095b37888 x22: ffff0000647efe38 x21: ffff0001259a9dd8
Jul 15 09:32:20 m1pro kernel: x20: ffff00005b10e808 x19: ffff00005b10e808 x18: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 09:32:20 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: Call trace:
Jul 15 09:32:20 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 09:32:20 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 09:32:20 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:32:20 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:32:20 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:32:20 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:32:20 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:32:20 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:32:20 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:32:20 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:32:20 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:32:20 m1pro kernel: WARNING: CPU: 1 PID: 3268 at mm/slub.c:4358 free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4>
Jul 15 09:32:20 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_app>
Jul 15 09:32:20 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:32:20 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:32:20 m1pro kernel: pstate: 41401009 (nZcv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:32:20 m1pro kernel: pc : free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel: lr : kfree+0x180/0x1d0
Jul 15 09:32:20 m1pro kernel: sp : ffff800095b35c00
Jul 15 09:32:20 m1pro kernel: x29: ffff800095b35c00 x28: ffff00005a93e280 x27: ffff000120c746c0
Jul 15 09:32:20 m1pro kernel: x26: ffffffa0002aca00 x25: 0000000002995000 x24: ffffffa600910000
Jul 15 09:32:20 m1pro kernel: x23: ffff00005b10ea08 x22: ffffffa00000002c x21: 0000000000000001
Jul 15 09:32:20 m1pro kernel: x20: ffff000100000500 x19: ffffff7fc1000000 x18: 000000000000002b
Jul 15 09:32:20 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x14: 0000000000000000 x13: 9393939300000000 x12: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x11: 0000000000000000 x10: 0000000000000268 x9 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000268 x6 : ffff00006533b200
Jul 15 09:32:20 m1pro kernel: x5 : ffff800095b361e0 x4 : ffff00005add2400 x3 : ffff8000992e6700
Jul 15 09:32:20 m1pro kernel: x2 : 0000000000000001 x1 : ffff000100000500 x0 : 0000000000180028
Jul 15 09:32:20 m1pro kernel: Call trace:
Jul 15 09:32:20 m1pro kernel:  free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel:  kfree+0x180/0x1d0
Jul 15 09:32:20 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10__>
Jul 15 09:32:20 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:32:20 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:32:20 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:32:20 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:32:20 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:32:20 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:32:20 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:32:20 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:32:20 m1pro kernel: object pointer: 0x00000000837d9730

but then a few minutes later:

Jul 15 09:38:48 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:38:48 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 09:38:48 m1pro kernel: WARNING: CPU: 1 PID: 3268 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:48 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:48 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:38:48 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:48 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:48 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: sp : ffff800095b37440
Jul 15 09:38:48 m1pro kernel: x29: ffff800095b37440 x28: 0000000000000030 x27: ffff00001332a000
Jul 15 09:38:48 m1pro kernel: x26: ffff80007a849948 x25: 0000000000000000 x24: ffff0000b5124b00
Jul 15 09:38:48 m1pro kernel: x23: ffff800095b37888 x22: ffff0000647efe38 x21: ffff00005bf9add8
Jul 15 09:38:48 m1pro kernel: x20: ffff00005b10e808 x19: ffff00005b10e808 x18: 0000000000000000
Jul 15 09:38:48 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 09:38:48 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:38:48 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: Call trace:
Jul 15 09:38:48 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 09:38:48 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 09:38:48 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 09:38:48 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:48 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:38:48 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:38:48 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:38:48 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:38:48 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:38:48 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:38:48 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:38:48 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:38:48 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:38:48 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 15 09:38:48 m1pro kernel: Mem abort info:
Jul 15 09:38:48 m1pro kernel:   ESR = 0x0000000096000007
Jul 15 09:38:48 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 09:38:48 m1pro kernel:   SET = 0, FnV = 0
Jul 15 09:38:48 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 09:38:48 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 15 09:38:48 m1pro kernel: Data abort info:
Jul 15 09:38:48 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 15 09:38:48 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 09:38:48 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 09:38:48 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5b30000
Jul 15 09:38:48 m1pro kernel: [ffff000000000700] pgd=18000107cf028003, p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 15 09:38:48 m1pro kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 15 09:38:48 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:49 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:49 m1pro gmrun[6128]: curl: (7) Failed to connect to 192.168.86.21 port 80 after 1 ms: Couldn't connect to server
Jul 15 09:38:49 m1pro kernel: CPU: 5 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:38:49 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:49 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:49 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 09:38:49 m1pro kernel: sp : ffff800095b35b30
Jul 15 09:38:49 m1pro kernel: x29: ffff800095b35b40 x28: ffff000053d9cc80 x27: ffff000120519980
Jul 15 09:38:49 m1pro kernel: x26: ffffffa0002b2500 x25: 00000000ffffffa6 x24: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 09:38:49 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000358 x18: 000000000000002b
Jul 15 09:38:49 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x11: ffffffa0002ac9a8 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 09:38:49 m1pro kernel: x8 : 00ce80007a7499c4 x7 : 0000000000000cc0 x6 : 0000000000000358
Jul 15 09:38:49 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000002d68881
Jul 15 09:38:49 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 15 09:38:49 m1pro kernel: Call trace:
Jul 15 09:38:49 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 09:38:49 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 09:38:49 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_18QueueInnerG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4X_EB4X_B4X_NCB3I_s1_0NCB3I_s2_0EB8_+0x79c/0x1f80 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:49 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:38:49 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:38:49 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:38:49 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:38:49 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:38:49 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:38:49 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:38:49 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:38:49 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 09:38:49 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:38:49 m1pro gmrun[3788]: conky: reading exec value failed (perhaps it's not the correct format?)
Jul 15 09:38:49 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 15 09:38:49 m1pro kernel: Mem abort info:
Jul 15 09:38:49 m1pro kernel:   ESR = 0x0000000096000007
Jul 15 09:38:49 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 09:38:49 m1pro kernel:   SET = 0, FnV = 0
Jul 15 09:38:49 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 09:38:49 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 15 09:38:49 m1pro kernel: Data abort info:
Jul 15 09:38:49 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 15 09:38:49 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 09:38:49 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 09:38:49 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5b30000
Jul 15 09:38:49 m1pro kernel: [ffff000000000700] pgd=18000107cf028003, p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 15 09:38:49 m1pro kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
Jul 15 09:38:49 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:49 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:49 m1pro kernel: CPU: 1 PID: 2880 Comm: Xwayland Tainted: G S    D W          6.9.9-asahi #1-NixOS
Jul 15 09:38:49 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:49 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:49 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 09:38:49 m1pro kernel: sp : ffff800093a87400
Jul 15 09:38:49 m1pro kernel: x29: ffff800093a87410 x28: ffff000081e06800 x27: ffff00001332a000
Jul 15 09:38:49 m1pro kernel: x26: 0000000000000001 x25: 0000000000048bc6 x24: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000dc0
Jul 15 09:38:49 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000278 x18: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe8c8d008
Jul 15 09:38:49 m1pro kernel: x14: 0000000000000000 x13: 0000000400000004 x12: ffff000081e06e00
Jul 15 09:38:49 m1pro kernel: x11: 0000000000000008 x10: fffffffffffffff8 x9 : 0000000000000000
Jul 15 09:38:49 m1pro kernel: x8 : 468980007a74fc60 x7 : 0000000000000dc0 x6 : 0000000000000278
Jul 15 09:38:49 m1pro kernel: x5 : ffff800093a877c8 x4 : 0000000000000000 x3 : 0000000002d68881
Jul 15 09:38:49 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 15 09:38:49 m1pro kernel: Call trace:
Jul 15 09:38:49 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 09:38:49 m1pro kernel:  _RNvMsb_NtNtCsc1LFWrxnNA7_6kernel3drm5schedINtB5_6EntityNtNtCsirMamryJlsQ_5asahi5queue16QueueJobG13V13_5E7new_jobBV_+0x2c/0xc0 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x8c8/0x1578 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:49 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c

Then i have one from 6.9.7:

Jul 08 09:38:09 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 08 09:38:09 m1pro kernel: WARNING: CPU: 1 PID: 3046 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: Modules linked in: vhost_net vhost vhost_iotlb uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm bnep uvcvideo videobuf2_vmalloc uvc brcmfmac_wcc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device usbhid joydev hci_bcm4377 hid_magicmouse bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc appledrm snd_soc_macaudio ofpart snd_soc_cs42l84 spi_>
Jul 08 09:38:09 m1pro kernel:  spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc mfd_core pinctrl_apple_gpio spmi_apple_controller phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 08 09:38:09 m1pro kernel: CPU: 1 PID: 3046 Comm: chromium Tainted: G S                 6.9.7-asahi #1-NixOS
Jul 08 09:38:09 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 09:38:09 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 08 09:38:09 m1pro kernel: pc : drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: lr : drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: sp : ffff800089e17440
Jul 08 09:38:09 m1pro kernel: x29: ffff800089e17440 x28: ffff800089e17888 x27: ffff000016c10000
Jul 08 09:38:09 m1pro kernel: x26: ffff80007a46d948 x25: 0000000000000000 x24: ffff000073c1a280
Jul 08 09:38:09 m1pro kernel: x23: ffff00007f00fc00 x22: ffff000073c3f638 x21: ffff00007f00fdd8
Jul 08 09:38:09 m1pro kernel: x20: ffff000072486e08 x19: ffff000072486e08 x18: fffffffffffd8e28
Jul 08 09:38:09 m1pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 08 09:38:09 m1pro kernel: x14: 6465656378652074 x13: ffff8000814cd310 x12: 0000000000000cff
Jul 08 09:38:09 m1pro kernel: x11: 0000000000000455 x10: ffff80008157d310 x9 : ffff8000814cd310
Jul 08 09:38:09 m1pro kernel: x8 : 00000000ffffdfff x7 : ffff80008157d310 x6 : 80000000ffffe000
Jul 08 09:38:09 m1pro kernel: x5 : 0000000000000456 x4 : 0000000000000002 x3 : ffff8000812b0008
Jul 08 09:38:09 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00001641b600
Jul 08 09:38:09 m1pro kernel: Call trace:
Jul 08 09:38:09 m1pro kernel:  drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel:  drm_sched_wakeup+0x18/0x54
Jul 08 09:38:09 m1pro kernel:  drm_sched_entity_push_job+0x15c/0x1a8
Jul 08 09:38:09 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12c4/0x157c [asahi]
Jul 08 09:38:09 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 08 09:38:09 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 08 09:38:09 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 08 09:38:09 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 08 09:38:09 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 08 09:38:09 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 08 09:38:09 m1pro kernel:  el0_svc+0x34/0xfc
Jul 08 09:38:09 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 09:38:09 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 09:38:09 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 09:38:09 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.912244:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:09 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 08 09:38:09 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 08 09:38:09 m1pro kernel: Mem abort info:
Jul 08 09:38:09 m1pro kernel:   ESR = 0x0000000096000007
Jul 08 09:38:09 m1pro kernel: Mem abort info:
Jul 08 09:38:09 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 08 09:38:09 m1pro kernel:   ESR = 0x0000000096000007
Jul 08 09:38:09 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 08 09:38:10 m1pro kernel:   SET = 0, FnV = 0
Jul 08 09:38:10 m1pro kernel:   SET = 0, FnV = 0
Jul 08 09:38:10 m1pro kernel:   EA = 0, S1PTW = 0
Jul 08 09:38:10 m1pro kernel:   EA = 0, S1PTW = 0
Jul 08 09:38:10 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 08 09:38:10 m1pro kernel: Data abort info:
Jul 08 09:38:10 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 08 09:38:10 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 08 09:38:10 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 08 09:38:10 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 08 09:38:10 m1pro kernel: Data abort info:
Jul 08 09:38:10 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5fa0000
Jul 08 09:38:10 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 08 09:38:10 m1pro kernel: [ffff000000000700] pgd=18000107cf028003
Jul 08 09:38:10 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 08 09:38:10 m1pro kernel: , p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 08 09:38:10 m1pro kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 08 09:38:10 m1pro kernel: Modules linked in: vhost_net vhost vhost_iotlb uinput xt_conntrack nft_chain_nat xt_MASQUERADE
Jul 08 09:38:10 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 08 09:38:10 m1pro kernel:  nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm bnep uvcvideo videobuf2_vmalloc uvc brcmfmac_wcc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device usbhid joydev
Jul 08 09:38:10 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5fa0000
Jul 08 09:38:10 m1pro kernel:  hci_bcm4377 hid_magicmouse bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc appledrm snd_soc_macaudio ofpart snd_soc_cs42l84 spi_nor rfkill snd_soc_tas2764 apple_sio asahi apple_admac snd_soc_apple_mca pwm_apple virt_dma
Jul 08 09:38:10 m1pro kernel: [ffff000000000700] pgd=18000107cf028003
Jul 08 09:38:10 m1pro kernel:  macsmc_reboot macsmc_hid macsmc_power apple_isp videobuf2_dma_sg videobuf2_memops hid_apple videobuf2_v4l2 videodev videobuf2_common mc clk_apple_nco apple_dcp apple_soc_cpufreq drm_dma_helper mux_apple_display_crossbar leds_pwm mux_core loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap
Jul 08 09:38:10 m1pro kernel: , p4d=18000107cf028003
Jul 08 09:38:10 m1pro kernel:  macvlan bridge stp llc fuse nfnetlink ip_tables xhci_plat_hcd xhci_hcd sdhci_pci cqhci sdhci mmc_core nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc mfd_core
Jul 08 09:38:10 m1pro kernel: , pud=18000107cf024003
Jul 08 09:38:10 m1pro kernel:  pinctrl_apple_gpio spmi_apple_controller phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 08 09:38:10 m1pro kernel: CPU: 2 PID: 3046 Comm: chromium Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 08 09:38:10 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 09:38:10 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 08 09:38:10 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2a4
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.964931:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.977288:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.009285:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel: , pmd=18000107cf020003
Jul 08 09:38:10 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2a4
Jul 08 09:38:10 m1pro kernel: sp : ffff800089e15b30
Jul 08 09:38:10 m1pro kernel: x29: ffff800089e15b40 x28: ffff000229055a80 x27: ffff00011ea08480
Jul 08 09:38:10 m1pro kernel: x26: ffffffa000084000 x25: 00000000ffffffa6 x24: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 08 09:38:10 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000358 x18: 0000000000000008
Jul 08 09:38:10 m1pro kernel: x17: 0000000000000000
Jul 08 09:38:10 m1pro kernel: , pte=0000000000000000
Jul 08 09:38:10 m1pro kernel:  x16: 0000000000000000 x15: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x11: ffffffa00007e4a8 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 08 09:38:10 m1pro kernel: x8 : 648d80007a36d99c x7 : 0000000000000cc0 x6 : 0000000000000358
Jul 08 09:38:10 m1pro kernel: x5 : ffff0000b3f4b1ac x4 : 0000000000000000 x3 : 0000000000ce7781
Jul 08 09:38:10 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 08 09:38:10 m1pro kernel: Call trace:
Jul 08 09:38:10 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2a4
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.041288:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel: 
Jul 08 09:38:10 m1pro kernel:  krealloc+0x7c/0xe4
Jul 08 09:38:10 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.077402:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.109345:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.141292:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.177507:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.209386:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.241462:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.277387:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.309381:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.341297:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.377374:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.409408:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.441404:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 08 09:38:10 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_13QueueG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4S_EB4S_B4S_NCB3I_s1_0NCB3I_s2_0EB8_+0x79c/0x1f80 [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_13QueueG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf9c/0x157c [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 08 09:38:10 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 08 09:38:10 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 08 09:38:10 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 08 09:38:10 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 08 09:38:10 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 08 09:38:10 m1pro kernel:  el0_svc+0x34/0xfc
Jul 08 09:38:10 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 09:38:10 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 09:38:10 m1pro kernel: Code: 54000b60 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 08 09:38:10 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 09:38:10 m1pro kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
~

I got this warning from chromium in the log 3260 times: Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE, but it does not appear to be related. I get it mulitple times per second while using the zoom webapp, even on 6.9.5. I also get the freeze without it appearing once.

It looks like it is not only chromium, here I have one crash in Xwayland on 6.9.7:

Jul 06 17:57:04 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 06 17:57:04 m1pro kernel: WARNING: CPU: 1 PID: 2821 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 1 PID: 2821 Comm: Xwayland Tainted: G S                 6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: lr : drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f17440
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f17440 x28: ffff800093f17888 x27: ffff00001495d000
Jul 06 17:57:04 m1pro kernel: x26: 0000000000000000 x25: 0000000000000000 x24: ffff00005fd539c0
Jul 06 17:57:04 m1pro kernel: x23: ffff00059c506400 x22: ffff00004c279238 x21: ffff00059c5065d8
Jul 06 17:57:04 m1pro kernel: x20: ffff00007f658008 x19: ffff00007f658008 x18: fffffffffffd50e0
Jul 06 17:57:04 m1pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 06 17:57:04 m1pro kernel: x14: 6465656378652074 x13: ffff8000814cd310 x12: 0000000000000b8e
Jul 06 17:57:04 m1pro kernel: x11: 00000000000003da x10: ffff80008157d310 x9 : ffff8000814cd310
Jul 06 17:57:04 m1pro kernel: x8 : 00000000ffffdfff x7 : ffff80008157d310 x6 : 80000000ffffe000
Jul 06 17:57:04 m1pro kernel: x5 : 00000000000003db x4 : 0000000000000002 x3 : ffff8000812b0008
Jul 06 17:57:04 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066486d80
Jul 06 17:57:04 m1pro kernel: Call trace:
Jul 06 17:57:04 m1pro kernel:  drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel:  drm_sched_wakeup+0x18/0x54
Jul 06 17:57:04 m1pro kernel:  drm_sched_entity_push_job+0x15c/0x1a8
Jul 06 17:57:04 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12c4/0x157c [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 06 17:57:04 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 06 17:57:04 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 06 17:57:04 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 06 17:57:04 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 06 17:57:04 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 06 17:57:04 m1pro kernel:  el0_svc+0x34/0xfc
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 06 17:57:04 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 06 17:57:04 m1pro kernel: ------------[ cut here ]------------
Jul 06 17:57:04 m1pro kernel: WARNING: CPU: 1 PID: 2821 at mm/slub.c:4358 free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 1 PID: 2821 Comm: Xwayland Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: 41401009 (nZcv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel: lr : kfree+0x160/0x1b0
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f15c00
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f15c00 x28: ffff0005fc360e00 x27: ffff00025a28fc00
Jul 06 17:57:04 m1pro kernel: x26: ffffffa0000c4380 x25: 00000000007c5300 x24: ffffffa600ae7000
Jul 06 17:57:04 m1pro kernel: x23: ffff00001672aa08 x22: ffffffa00000001c x21: 0000000000000001
Jul 06 17:57:04 m1pro kernel: x20: ffff000500000500 x19: ffffff7fc5000000 x18: 000000000000000c
Jul 06 17:57:04 m1pro kernel: x17: 0000000000000000 x16: 0000000000000080 x15: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x14: 0000000000000000 x13: 9393939300000000 x12: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x11: 0000000000000000 x10: 0000000000000268 x9 : 0000000000000000
Jul 06 17:57:04 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000268 x6 : ffff0005a6d3db00
Jul 06 17:57:04 m1pro kernel: x5 : ffff800093f161e0 x4 : ffff000066486d80 x3 : ffff8000994ac080
Jul 06 17:57:04 m1pro kernel: x2 : 0000000000000001 x1 : ffff000500000500 x0 : 0000000000008128
Jul 06 17:57:04 m1pro kernel: Call trace:
Jul 06 17:57:04 m1pro kernel:  free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel:  kfree+0x160/0x1b0
Jul 06 17:57:04 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_13QueueG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4S_EB4S_B4S_NCB3I_s1_0>
Jul 06 17:57:04 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_13QueueG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf9c/0x157c [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 06 17:57:04 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 06 17:57:04 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 06 17:57:04 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 06 17:57:04 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 06 17:57:04 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 06 17:57:04 m1pro kernel:  el0_svc+0x34/0xfc
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 06 17:57:04 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 06 17:57:04 m1pro kernel: object pointer: 0x00000000f1f7ed20
Jul 06 17:57:04 m1pro kernel: Unable to handle kernel paging request at virtual address 000109050208110a
Jul 06 17:57:04 m1pro kernel: Mem abort info:
Jul 06 17:57:04 m1pro kernel:   ESR = 0x0000000096000004
Jul 06 17:57:04 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 06 17:57:04 m1pro kernel:   SET = 0, FnV = 0
Jul 06 17:57:04 m1pro kernel:   EA = 0, S1PTW = 0
Jul 06 17:57:04 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 06 17:57:04 m1pro kernel: Data abort info:
Jul 06 17:57:04 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 06 17:57:04 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 06 17:57:04 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 06 17:57:04 m1pro kernel: [000109050208110a] address between user and kernel address ranges
Jul 06 17:57:04 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 2 PID: 2821 Comm: Xwayland Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2a4
Jul 06 17:57:04 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2a4
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f15d40
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f15d50 x28: 00000000ffffffa0 x27: ffff0005fc360480
Jul 06 17:57:04 m1pro kernel: x26: ffffffa00000c984 x25: 00000000000174f6 x24: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x23: f801090502080f0a x22: 00000000ffffffff x21: 0000000000000cc0
Jul 06 17:57:04 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 06 17:57:04 m1pro kernel: x17: 0000000000000000 x16: 00000000000c0000 x15: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 06 17:57:04 m1pro kernel: x8 : 20fa80007a43199c x7 : 0000000000000cc0 x6 : 0000000000000318

Running a video conference on zoom triggered the freeze for me the fast - it takes only a few minutes for the system to freeze.

--

Regarding the build: Jul 15 09:01:14 m1pro kernel: Linux version 6.9.9-asahi (nixbld@localhost) (gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42) #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980
My kernel command line is: initrd=\EFI\nixos\x0f7ip2w2nzvaz8ywlshqalzbr7ys0ww-initrd-linux-6.9.9-asahi-initrd.efi init=/nix/store/p10vrn8z7q2vsssff5ysg00za6wl3vaf-nixos-system-m1pro-24.11.20240709.feb2849/init earlycon console=ttySAC0,115200n8 console=tty0 boot.shell_on_fail nvme_apple.flush_interval=0 mitigations=off loglevel=4.

You could probably just follow the installation instructions here to get the exact same kernel build, chromium, wayland + gnome (well, at least thats what nix promises you): https://github.com/tpwrules/nixos-apple-silicon/blob/main/docs/uefi-standalone.md

cjdell · 2024-07-21T23:13:03Z

Can also confirm stability with 50+ hours uptime. Love your hard work on this project. No plans on going back to macOS. 🙂

mkurz · 2024-07-22T14:21:21Z

Also testing asahi-6.9.9-7 on ALARM and so far looks good. Thanks!

mkurz · 2024-07-23T07:36:42Z

Actually title should be changed from gpu related crashes with kernel >= 6.9.6 to gpu related crashes with kernel >= 6.9.7 IMHO

asahilina · 2024-07-23T08:27:06Z

The bug actually affects all of 6.9.x and probably a few earlier versions too, it's just a coincidence that it apparently only manifested starting with 6.9.7.

…

On July 23, 2024 9:37:05 AM GMT+02:00, Matthias Kurz ***@***.***> wrote: Actually title should be changed from `gpu related crashes with kernel >= 6.9.6` to `gpu related crashes with kernel >= 6.9.7` IMHO -- Reply to this email directly or view it on GitHub: #309 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

oliverbestmann · 2024-07-23T12:11:25Z

I've renamed it anyways, as it was a pretty consistent coincidence.

robclark · 2024-09-09T17:45:25Z

I suspect this might be a regression introduced when the drm_scheduler was converted to workqueues recently (instead of kthreads).

looks like the regression was introduced in:

commit a78422e9dff366b3a46ae44caf6ec8ded9c9fc2f
Author:     Danilo Krummrich <[email protected]>
AuthorDate: Fri Nov 10 01:16:33 2023 +0100
Commit:     Danilo Krummrich <[email protected]>
CommitDate: Fri Nov 10 02:54:29 2023 +0100

    drm/sched: implement dynamic job-flow control
    
    Currently, job flow control is implemented simply by limiting the number
    of jobs in flight. Therefore, a scheduler is initialized with a credit
    limit that corresponds to the number of jobs which can be sent to the
    hardware.
    
    This implies that for each job, drivers need to account for the maximum
    job size possible in order to not overflow the ring buffer.
    
    However, there are drivers, such as Nouveau, where the job size has a
    rather large range. For such drivers it can easily happen that job
    submissions not even filling the ring by 1% can block subsequent
    submissions, which, in the worst case, can lead to the ring run dry.
    
    In order to overcome this issue, allow for tracking the actual job size
    instead of the number of jobs. Therefore, add a field to track a job's
    credit count, which represents the number of credits a job contributes
    to the scheduler's credit limit.
    
    Signed-off-by: Danilo Krummrich <[email protected]>
    Reviewed-by: Luben Tuikov <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

I don't see any upstream users of ops->update_job_credits(), so someone should probably just send a revert of that patch

asahilina · 2024-09-09T18:42:23Z

@robclark nouveau is using variable credits, just not update_job_credits(). The credits logic still accesses the job in the race path, so removing that callback is not enough.

robclark · 2024-09-09T20:11:20Z

@robclark nouveau is using variable credits, just not update_job_credits(). The credits logic still accesses the job in the race path, so removing that callback is not enough.

hmm, if nouveau is using that, it makes it more complicated to revert. But that patch is fatally flawed, the whole point of a single-producer-single-consumer queue is that you have just a single producer and single consumer. That patch violates this rule.

asahilina · 2024-09-09T20:31:07Z

I suspect the correct fix is to remove the drm_sched_can_queue() condition entirely from drm_sched_wakeup() (so the work function is always woken up/queued and then simply no-ops if there is nothing to do, doing the check in the right context only), but I've already run into enough sharp edges in this code that I'm not going to be proposing that myself.

Edit: In fact this was already proposed here but for some reason Luben never implemented the proposed simplified drm_sched_wakeup() and only did a partial revert.

robclark · 2024-09-09T20:48:16Z

I suspect the correct fix is to remove the drm_sched_can_queue() condition entirely from drm_sched_wakeup() (so the work function is always woken up/queued and then simply no-ops if there is nothing to do, doing the check in the right context only), but I've already run into enough sharp edges in this code that I'm not going to be proposing that myself.

Edit: In fact this was already proposed here but for some reason Luben never implemented the proposed simplified drm_sched_wakeup() and only did a partial revert.

I've only looked briefly at the credit patches, but the call in drm_sched_wakeup() looks like it is only to try and avoid a wakeup. So yeah, removing that would be the thing to do.

Fixes a race condition reported here: AsahiLinux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Signed-off-by: Rob Clark <[email protected]>

Fixes a race condition reported here: AsahiLinux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>

Fixes a race condition reported here: #309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: #309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>

Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>

Fixes a race condition reported here: AsahiLinux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>

Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>

commit 440d52b370b03b366fd26ace36bab20552116145 upstream. Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 440d52b upstream. Fixes a race condition reported here: AsahiLinux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 440d52b370b03b366fd26ace36bab20552116145 upstream. Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

oliverbestmann changed the title ~~gpu related crashes with kernel > 6.9.6~~ gpu related crashes with kernel >= 6.9.6 Jul 17, 2024

oliverbestmann mentioned this issue Jul 17, 2024

crashes with kernel >= 6.9.7 tpwrules/nixos-apple-silicon#218

Closed

This comment was marked as outdated.

Sign in to view

mkurz mentioned this issue Jul 23, 2024

bump kernel and m1n1 AsahiLinux/PKGBUILDs#41

Open

oliverbestmann changed the title ~~gpu related crashes with kernel >= 6.9.6~~ gpu related crashes with kernel >= 6.9.7 Jul 23, 2024

gpu related crashes with kernel >= 6.9.7 #309

gpu related crashes with kernel >= 6.9.7 #309

Comments

oliverbestmann commented Jul 17, 2024

jannau commented Jul 17, 2024

mkurz commented Jul 17, 2024

jannau commented Jul 17, 2024

cyrinux commented Jul 17, 2024 • edited Loading

asahilina commented Jul 17, 2024

asahilina commented Jul 17, 2024

asahilina commented Jul 17, 2024

jannau commented Jul 17, 2024

asahilina commented Jul 17, 2024

jannau commented Jul 17, 2024

cyrinux commented Jul 17, 2024 • edited Loading

Ella-0 commented Jul 17, 2024

asahilina commented Jul 17, 2024 • edited Loading

cyrinux commented Jul 17, 2024

mkurz commented Jul 17, 2024

mkurz commented Jul 17, 2024

asahilina commented Jul 17, 2024 • edited Loading

maximbaz commented Jul 17, 2024

mkurz commented Jul 17, 2024

montchr commented Jul 18, 2024

asahilina commented Jul 18, 2024 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

oliverbestmann commented Jul 18, 2024

asahilina commented Jul 18, 2024

oliverbestmann commented Jul 18, 2024

asahilina commented Jul 18, 2024

oliverbestmann commented Jul 18, 2024

cjdell commented Jul 21, 2024

mkurz commented Jul 22, 2024

mkurz commented Jul 23, 2024 • edited Loading

asahilina commented Jul 23, 2024 via email

oliverbestmann commented Jul 23, 2024

robclark commented Sep 9, 2024

asahilina commented Sep 9, 2024

robclark commented Sep 9, 2024

asahilina commented Sep 9, 2024 • edited Loading

robclark commented Sep 9, 2024

cyrinux commented Jul 17, 2024 •

edited

Loading

cyrinux commented Jul 17, 2024 •

edited

Loading

asahilina commented Jul 17, 2024 •

edited

Loading

asahilina commented Jul 17, 2024 •

edited

Loading

asahilina commented Jul 18, 2024 •

edited

Loading

mkurz commented Jul 23, 2024 •

edited

Loading

asahilina commented Sep 9, 2024 •

edited

Loading