Skip to content

GPF for non-canonical address in dmu_zfetch_fini #16895

Closed
@cfallin

Description

@cfallin

System information

Type Version/Name
Distribution Name Fedora
Distribution Version 40
Kernel Version 6.12.4 (6.12.4-100.fc40.x86_64)
Architecture x86-64
OpenZFS Version 2.2.7

Describe the problem you're observing

After a few hours to a few days of light operation (file server in home network), I see a kernel oops as shown in the logs. Subsequently, the following symptoms persist until reboot:

  • Load average is pinned at 48 (on a 24-core system);
  • sync hangs forever;
  • some memory seems to be leaked or lost permanently and stats get weird: system gets swappy, htop reports memory usage of "-4219161K/62.7G" (!);
  • the system sometimes becomes completely unresponsive and I need to power-cycle.

Describe how to reproduce the problem

I can't seem to find a reliable reproducer, but this crash does happen consistently (I'm power-cycling every few days). The machine's workload is a combination of some development work over ssh and Samba serving as a network Time Machine volume for a macOS machine to continuously back up to; some zvols for a few VMs; and very occasional accesses to ~5TiB of data that is mostly at rest. ZFS pool on a mirror of two large spinning disks, and another pool on NVMe for home directory and zvols.

The system has generally been very stable for the 4.5 years I've had it. I migrated its volumes to ZFS 6 months ago and all was well until recently -- I suspect either a Fedora kernel upgrade or ZFS upgrade, but I can't correlate exactly. I'm running latest or close-to-latest versions of both (6.12.4 and 2.2.7 respectively) now.

Sorry I don't have more to go on here -- happy to try settings or collect other info as needed. Thanks!

Include any warning/errors/backtraces from the system logs

The ultimate "Oops" is:

kernel oops log
[50806.427187] Oops: general protection fault, probably for non-canonical address 0xbfff93fcd9822a28: 0000 [#1] PREEMPT SMP NOPTI
[50806.427202] CPU: 6 UID: 0 PID: 687 Comm: dbu_evict Tainted: P S      W  OE      6.12.4-100.fc40.x86_64 #1
[50806.427213] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[50806.427218] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.40 11/06/2019
[50806.427223] RIP: 0010:__list_del_entry_valid_or_report+0x43/0x80
[50806.427233] Code: ce 15 8b 00 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 aa 15 8b 00 48 b8 22 01 00 00 00 00 ad de 48 39 c1 0f 84 83 15 8b 00 <48> 8b 31 48 39 fe 0f 85 63 15 8b 00 48 8b 42 08 48 39 c6 0f 85 42
[50806.427240] RSP: 0018:ffffa06241c4fd30 EFLAGS: 00010287
[50806.427248] RAX: dead000000000122 RBX: ffff93fcd98229e8 RCX: bfff93fcd9822a28
[50806.427254] RDX: ffff93fcd9822a28 RSI: fffffffffffffe88 RDI: ffff93fcd9822a28
[50806.427259] RBP: ffff93fcd9822a28 R08: 000000003fffffff R09: ffffffffe39024a0
[50806.427264] R10: 00000000002b001e R11: ffff93ff9e9217c0 R12: ffff93fcd9822a08
[50806.427269] R13: ffff93f0d11ea9c0 R14: ffff93f0cc084448 R15: ffff93f0cc084428
[50806.427274] FS:  0000000000000000(0000) GS:ffff93ff9e900000(0000) knlGS:0000000000000000
[50806.427280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50806.427286] CR2: 00007fa072037698 CR3: 000000013e822000 CR4: 0000000000350ef0
[50806.427291] Call Trace:
[50806.427297]  
[50806.427302]  ? __die_body.cold+0x19/0x27
[50806.427312]  ? die_addr+0x3c/0x60
[50806.427321]  ? exc_general_protection+0x17d/0x400
[50806.427336]  ? asm_exc_general_protection+0x26/0x30
[50806.427351]  ? __list_del_entry_valid_or_report+0x43/0x80
[50806.427360]  dmu_zfetch_fini+0x75/0xf0 [zfs]
[50806.427654]  dnode_destroy+0x183/0x250 [zfs]
[50806.427910]  dnode_buf_evict_async+0x7d/0xf0 [zfs]
[50806.428159]  taskq_thread+0x2c7/0x500 [spl]
[50806.428182]  ? __pfx_default_wake_function+0x10/0x10
[50806.428194]  ? __pfx_dnode_buf_evict_async+0x10/0x10 [zfs]
[50806.428446]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[50806.428462]  kthread+0xd2/0x100
[50806.428469]  ? __pfx_kthread+0x10/0x10
[50806.428475]  ret_from_fork+0x34/0x50
[50806.428482]  ? __pfx_kthread+0x10/0x10
[50806.428487]  ret_from_fork_asm+0x1a/0x30
[50806.428501]  
[50806.428504] Modules linked in: vhost_net vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE xt_mark snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core tun nf_tables ip6table_nat ip6table_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter rfkill vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr nct6775 nct6775_core hwmon_vid binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amd_atl intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi edac_mce_amd ses snd_hda_codec enclosure scsi_transport_sas joydev snd_hda_core kvm_amd snd_hwdep ppdev ee1004 snd_seq kvm snd_seq_device snd_pcm snd_timer r8169 wmi_bmof rapl snd pcspkr acpi_cpufreq i2c_piix4 soundcore i2c_smbus realtek zenpower(OE) parport_pc parport gpio_amdpt gpio_generic nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sunrpc loop dm_multipath nfnetlink zram nouveau drm_ttm_helper ttm video gpu_sched crct10dif_pclmul i2c_algo_bit crc32_pclmul
[50806.428679]  crc32c_intel drm_gpuvm polyval_clmulni drm_exec polyval_generic mxm_wmi ghash_clmulni_intel nvme uas drm_display_helper sha512_ssse3 nvme_core usb_storage sha256_ssse3 sha1_ssse3 cec sp5100_tco zfs(POE) nvme_auth wmi spl(OE) scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables br_netfilter bridge stp llc fuse
[50806.428764] ---[ end trace 0000000000000000 ]---
[50806.428769] RIP: 0010:__list_del_entry_valid_or_report+0x43/0x80
[50806.428775] Code: ce 15 8b 00 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 aa 15 8b 00 48 b8 22 01 00 00 00 00 ad de 48 39 c1 0f 84 83 15 8b 00 <48> 8b 31 48 39 fe 0f 85 63 15 8b 00 48 8b 42 08 48 39 c6 0f 85 42
[50806.428780] RSP: 0018:ffffa06241c4fd30 EFLAGS: 00010287
[50806.428785] RAX: dead000000000122 RBX: ffff93fcd98229e8 RCX: bfff93fcd9822a28
[50806.428790] RDX: ffff93fcd9822a28 RSI: fffffffffffffe88 RDI: ffff93fcd9822a28
[50806.428794] RBP: ffff93fcd9822a28 R08: 000000003fffffff R09: ffffffffe39024a0
[50806.428797] R10: 00000000002b001e R11: ffff93ff9e9217c0 R12: ffff93fcd9822a08
[50806.428801] R13: ffff93f0d11ea9c0 R14: ffff93f0cc084448 R15: ffff93f0cc084428
[50806.428806] FS:  0000000000000000(0000) GS:ffff93ff9e900000(0000) knlGS:0000000000000000
[50806.428811] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50806.428815] CR2: 00007fa072037698 CR3: 000000013e822000 CR4: 0000000000350ef0
[53791.822323] BUG: unable to handle page fault for address: 000000d8203e2062
[53791.822332] #PF: supervisor read access in kernel mode
[53791.822334] #PF: error_code(0x0000) - not-present page
[53791.822337] PGD 129025067 P4D 129025067 PUD 0 
[53791.822342] Oops: Oops: 0000 [#2] PREEMPT SMP NOPTI
[53791.822347] CPU: 16 UID: 0 PID: 684 Comm: arc_prune Tainted: P S    D W  OE      6.12.4-100.fc40.x86_64 #1
[53791.822352] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [D]=DIE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[53791.822354] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.40 11/06/2019
[53791.822356] RIP: 0010:arc_released+0x15/0x30 [zfs]
[53791.822505] Code: 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 31 c0 48 83 7f 10 00 74 11 48 8b 07 <48> 81 78 60 c0 ab 6e c0 0f 94 c0 0f b6 c0 e9 78 90 da cd 0f 1f 84
[53791.822507] RSP: 0018:ffffa06241b7bb10 EFLAGS: 00010206
[53791.822511] RAX: 000000d8203e2002 RBX: ffff93f39f9d0000 RCX: 0000000000000001
[53791.822513] RDX: 0000000000000000 RSI: ffff93f67a7dad60 RDI: ffff93fef8801c40
[53791.822515] RBP: 0000000000000000 R08: 0000000000000030 R09: ffff93f0fee17700
[53791.822518] R10: ffff93f0fc025108 R11: 0000000000000002 R12: 0000000000000000
[53791.822520] R13: ffff93fb64f9a000 R14: ffff93f1b939e618 R15: 00000000001c6d1a
[53791.822522] FS:  0000000000000000(0000) GS:ffff93ff9ee00000(0000) knlGS:0000000000000000
[53791.822524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53791.822527] CR2: 000000d8203e2062 CR3: 0000000157e5a000 CR4: 0000000000350ef0
[53791.822529] Call Trace:
[53791.822533]  
[53791.822537]  ? __die_body.cold+0x19/0x27
[53791.822543]  ? page_fault_oops+0x15a/0x2f0
[53791.822550]  ? exc_page_fault+0x7e/0x180
[53791.822554]  ? asm_exc_page_fault+0x26/0x30
[53791.822561]  ? arc_released+0x15/0x30 [zfs]
[53791.822678]  dbuf_rele_and_unlock+0x79/0x5d0 [zfs]
[53791.822799]  ? srso_return_thunk+0x5/0x5f
[53791.822804]  sa_handle_destroy+0x7e/0xd0 [zfs]
[53791.822940]  zfs_zinactive+0x92/0xf0 [zfs]
[53791.823049]  zfs_inactive+0x93/0x210 [zfs]
[53791.823153]  ? unmap_mapping_range+0x85/0x140
[53791.823159]  zpl_evict_inode+0x45/0x60 [zfs]
[53791.823259]  evict+0x118/0x2a0
[53791.823266]  prune_icache_sb+0x92/0xd0
[53791.823271]  super_cache_scan+0x152/0x1e0
[53791.823276]  zfs_prune+0x177/0x220 [zfs]
[53791.823378]  zpl_prune_sb+0x4e/0x80 [zfs]
[53791.823475]  arc_prune_task+0x22/0x40 [zfs]
[53791.823595]  taskq_thread+0x2c7/0x500 [spl]
[53791.823607]  ? __pfx_default_wake_function+0x10/0x10
[53791.823615]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[53791.823623]  kthread+0xd2/0x100
[53791.823627]  ? __pfx_kthread+0x10/0x10
[53791.823630]  ret_from_fork+0x34/0x50
[53791.823634]  ? __pfx_kthread+0x10/0x10
[53791.823637]  ret_from_fork_asm+0x1a/0x30
[53791.823644]  
[53791.823646] Modules linked in: vhost_net vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE xt_mark snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core tun nf_tables ip6table_nat ip6table_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter rfkill vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr nct6775 nct6775_core hwmon_vid binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amd_atl intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi edac_mce_amd ses snd_hda_codec enclosure scsi_transport_sas joydev snd_hda_core kvm_amd snd_hwdep ppdev ee1004 snd_seq kvm snd_seq_device snd_pcm snd_timer r8169 wmi_bmof rapl snd pcspkr acpi_cpufreq i2c_piix4 soundcore i2c_smbus realtek zenpower(OE) parport_pc parport gpio_amdpt gpio_generic nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sunrpc loop dm_multipath nfnetlink zram nouveau drm_ttm_helper ttm video gpu_sched crct10dif_pclmul i2c_algo_bit crc32_pclmul
[53791.823744]  crc32c_intel drm_gpuvm polyval_clmulni drm_exec polyval_generic mxm_wmi ghash_clmulni_intel nvme uas drm_display_helper sha512_ssse3 nvme_core usb_storage sha256_ssse3 sha1_ssse3 cec sp5100_tco zfs(POE) nvme_auth wmi spl(OE) scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables br_netfilter bridge stp llc fuse
[53791.823779] CR2: 000000d8203e2062
[53791.823783] ---[ end trace 0000000000000000 ]---
[53791.823785] RIP: 0010:__list_del_entry_valid_or_report+0x43/0x80
[53791.823789] Code: ce 15 8b 00 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 aa 15 8b 00 48 b8 22 01 00 00 00 00 ad de 48 39 c1 0f 84 83 15 8b 00 <48> 8b 31 48 39 fe 0f 85 63 15 8b 00 48 8b 42 08 48 39 c6 0f 85 42
[53791.823791] RSP: 0018:ffffa06241c4fd30 EFLAGS: 00010287
[53791.823794] RAX: dead000000000122 RBX: ffff93fcd98229e8 RCX: bfff93fcd9822a28
[53791.823796] RDX: ffff93fcd9822a28 RSI: fffffffffffffe88 RDI: ffff93fcd9822a28
[53791.823798] RBP: ffff93fcd9822a28 R08: 000000003fffffff R09: ffffffffe39024a0
[53791.823800] R10: 00000000002b001e R11: ffff93ff9e9217c0 R12: ffff93fcd9822a08
[53791.823802] R13: ffff93f0d11ea9c0 R14: ffff93f0cc084448 R15: ffff93f0cc084428
[53791.823804] FS:  0000000000000000(0000) GS:ffff93ff9ee00000(0000) knlGS:0000000000000000
[53791.823807] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53791.823809] CR2: 000000d8203e2062 CR3: 0000000157e5a000 CR4: 0000000000350ef0
[53791.823812] note: arc_prune[684] exited with irqs disabled

The full dmesg since boot is in this gist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions