Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC: Move header reallocation under db_mtx lock #15330

Closed
wants to merge 1 commit into from

Conversation

amotin
Copy link
Member

@amotin amotin commented Sep 29, 2023

As part of #14340 I've removed b_evict_lock, that played no role in ARC eviction process. But appears it played a secondary role of protecting b_hdr pointers in arc_buf_t during header reallocation by arc_hdr_realloc_crypt(), that, as found in #15293, may cause use after free races if some encrypted block is read while being synced after been writen.

After closer look on b_evict_lock I still do not believe it covered all the possible races, so I am not eager to resurrect it. Instead this refactors arc_hdr_realloc_crypt() into arc_realloc_crypt() and moves its calls from arc_write_ready() into upper levels ready callbacks, like dbuf_write_ready(), where it is protected by existing db_mtx, protecting also all the arc buffer accesses.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

As part of openzfs#14340 I've removed b_evict_lock, that played no role in
ARC eviction process.  But appears it played a secondary role of
protecting b_hdr pointers in arc_buf_t during header reallocation
by arc_hdr_realloc_crypt(), that, as found in openzfs#15293, may cause use
after free races if some encrypted block is read while being synced
after been writen.

After closer look on b_evict_lock I still do not believe it covered
all the possible races, so I am not eager to resurrect it.  Instead
this refactors arc_hdr_realloc_crypt() into arc_realloc_crypt()
and moves its calls from arc_write_ready() into upper levels ready
callbacks, like dbuf_write_ready(), where it is protected by
existing db_mtx, protecting also all the arc buffer accesses.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
@rincebrain
Copy link
Contributor

rincebrain commented Sep 29, 2023

Assuming you're expecting this to help with #11679, I got this after a few runs of zfs_receive_raw on my horrible testbed with this branch:

[ 6120.704164] VERIFY3(hdr->b_l1hdr.b_pabd != NULL) failed (0000000000000000 != 0000000000000000)
[ 6120.817594] PANIC at arc.c:1992:arc_buf_untransform_in_place()
[ 6120.894362] Showing stack for process 14637
[ 6120.894386] CPU: 0 PID: 14637 Comm: zfs Tainted: P           OE     5.10.0-8-sparc64 #1 Debian 5.10.46-4
[ 6120.894396] Call Trace:
[ 6120.894492] [<000000001033141c>] spl_dumpstack+0x1c/0x2c [spl]
[ 6120.894525] [<00000000103314b0>] spl_panic+0x84/0xa4 [spl]
[ 6120.896016] [<00000000108aa8cc>] arc_buf_fill+0x168c/0x1a20 [zfs]
[ 6120.896659] [<00000000108aac84>] arc_untransform+0x24/0xa0 [zfs]
[ 6120.897298] [<00000000108c90b8>] dbuf_read_verify_dnode_crypt+0xf8/0x1a0 [zfs]
[ 6120.897938] [<00000000108d0bc0>] dbuf_read_impl.constprop.0+0x2a0/0xcc0 [zfs]
[ 6120.898630] [<00000000108d1854>] dbuf_read+0x274/0x640 [zfs]
[ 6120.899267] [<00000000108e1f30>] dmu_buf_hold+0x50/0x80 [zfs]
[ 6120.899894] [<00000000109f3d60>] zap_lockdir+0x20/0x100 [zfs]
[ 6120.900520] [<00000000109f5470>] zap_lookup+0x30/0x80 [zfs]
[ 6120.901153] [<0000000010973d3c>] sa_setup+0x17c/0x660 [zfs]
[ 6120.901774] [<0000000010a53ef4>] zfsvfs_init.part.0+0x374/0x560 [zfs]
[ 6120.902437] [<0000000010a56bac>] zfsvfs_create_impl+0x1ac/0x360 [zfs]
[ 6120.903059] [<0000000010a56dd8>] zfsvfs_create+0x78/0xa0 [zfs]
[ 6120.903681] [<0000000010a56e88>] zfs_domount+0x88/0x700 [zfs]
[ 6120.904299] [<0000000010a725dc>] zpl_mount+0x1fc/0x320 [zfs]

If I misunderstood your expectations for this patch, my apologies for the noise.

e: And on my x86_64 system, it dies with:

[628184.620592] BUG: kernel NULL pointer dereference, address: 0000000000000000
[628184.622154] #PF: supervisor read access in kernel mode
[628184.623682] #PF: error_code(0x0000) - not-present page
[628184.625181] PGD 0 P4D 0
[628184.626643] Oops: 0000 [#1] SMP NOPTI
[628184.628076] CPU: 14 PID: 1400289 Comm: dp_sync_taskq Kdump: loaded Tainted: P           OE     5.10.0-25-amd64 #1 Debian 5.10.191-1
[628184.630940] Hardware name: Micro-Star International Co., Ltd. MS-7D50/MEG X570S ACE MAX (MS-7D50), BIOS 1.40 05/24/2022
[628184.632443] RIP: 0010:arc_realloc_crypt+0x1c/0x670 [zfs]
[628184.633865] Code: ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 89 f6 41 55 49 89 fd 41 54 53 48 83 ec 08 <4c> 8b 27 41 8b 44 24 28 a9 00 00 04 00 0f 84 93 03 00 00 a9 00 00
[628184.636710] RSP: 0018:ffffa7d260087b50 EFLAGS: 00010286
[628184.638098] RAX: 802d008f820001ff RBX: 0000000000000000 RCX: 000000000000002d
[628184.639474] RDX: ffff95d180fc9840 RSI: 0000000000000000 RDI: 0000000000000000
[628184.640817] RBP: ffffa7d260087b80 R08: 0000000000000000 R09: ffff95d180fc9840
[628184.642150] R10: ffff95d326368ba0 R11: 0000000000000000 R12: ffff95d2cafc4200
[628184.643486] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95d2def7ac50
[628184.644825] FS:  0000000000000000(0000) GS:ffff95e05eb80000(0000) knlGS:0000000000000000
[628184.646175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[628184.647517] CR2: 0000000000000000 CR3: 00000007db00a000 CR4: 0000000000750ee0
[628184.648851] PKRU: 55555554
[628184.650149] Call Trace:
[628184.651416]  ? __die_body.cold+0x1a/0x1f
[628184.652656]  ? no_context+0x1a6/0x3c0
[628184.653868]  ? exc_page_fault+0x78/0x160
[628184.655072]  ? asm_exc_page_fault+0x1e/0x30
[628184.656297]  ? arc_realloc_crypt+0x1c/0x670 [zfs]
[628184.657508]  dbuf_write_ready+0x15b/0x870 [zfs]
[628184.658732]  zio_ready+0x93/0x7a0 [zfs]
[628184.659930]  zio_nowait+0x112/0x340 [zfs]
[628184.661092]  dbuf_sync_list+0xc5/0x160 [zfs]
[628184.662253]  dnode_sync+0x741/0x1c80 [zfs]
[628184.663353]  ? __switch_to+0x114/0x440
[628184.664470]  ? multilist_sublist_insert_head+0x3d/0x90 [zfs]
[628184.665593]  sync_dnodes_task+0x99/0x180 [zfs]
[628184.666661]  taskq_thread+0x2ed/0x600 [spl]
[628184.667699]  ? wake_up_q+0xa0/0xa0
[628184.668730]  ? taskq_thread_spawn+0x60/0x60 [spl]
[628184.669743]  kthread+0x118/0x140
[628184.670731]  ? __kthread_bind_mask+0x60/0x60
[628184.671724]  ret_from_fork+0x1f/0x30
[628184.672696] Modules linked in: ext4 crc16 mbcache jbd2 dm_mod loop zfs(POE) spl(OE) zenpower(OE) cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative nls_ascii nls_cp437 vfat fat edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel kvm_amd soundwire_generic_allocation kvm snd_soc_core snd_usb_audio snd_compress soundwire_cadence irqbypass snd_hda_codec snd_usbmidi_lib snd_rawmidi snd_seq_device pl2303 rapl snd_hda_core mc snd_hwdep joydev evdev ftdi_sio soundwire_bus snd_pcm usbserial wmi_bmof snd_timer pcspkr efi_pstore sp5100_tco mxm_wmi watchdog snd ccp soundcore tpm_crb tpm_tis tpm_tis_core tpm rng_core button acpi_cpufreq binfmt_misc msr nfsd drm fuse auth_rpcgss nfs_acl lockd configfs grace sunrpc efivarfs ip_tables x_tables autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_logitech_hidpp hid_logitech_dj hid_generic usbhid
[628184.672744]  hid crc32_pclmul ahci crc32c_intel libahci xhci_pci ghash_clmulni_intel aesni_intel xhci_hcd libaes ixgbe crypto_simd libata nvme r8169 cryptd xfrm_algo usbcore glue_helper nvme_core dca realtek scsi_mod mdio_devres i2c_piix4 ptp libphy t10_pi pps_core crc_t10dif usb_common mdio crct10dif_generic crct10dif_pclmul crct10dif_common wmi [last unloaded: spl]
[628184.684821] CR2: 0000000000000000

@amotin amotin marked this pull request as draft September 30, 2023 01:09
@amotin amotin closed this Oct 3, 2023
@amotin amotin deleted the realloc_crypt branch October 3, 2023 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants