-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-2.3.0-rc3 patchset #16675
Open
behlendorf
wants to merge
34
commits into
openzfs:zfs-2.3-release
Choose a base branch
from
behlendorf:zfs-2.3.0-rc3-staging
base: zfs-2.3-release
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
zfs-2.3.0-rc3 patchset #16675
behlendorf
wants to merge
34
commits into
openzfs:zfs-2.3-release
from
behlendorf:zfs-2.3.0-rc3-staging
+997
−637
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for this duplication. Use FreeBSD's linuxkpi version instead, and provide zfs_fallthrough to augment it (it's all that's needed). Use #pragma once to avoid naming issues for guard variables. Since this is a complete rewrite, use my copyright here (the original code in FreeBSD still credits everybody). This works back at least to FreeBSD 12.4, which is not out of support, and all newer releases. Remove extra copies of macros that were defined elsewhere, but are now properly defined in LinuxKPI so are redundant. Sponsored-by: Netflix Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Warner Losh <[email protected]> Closes openzfs#16650
While mounting ZFS root during boot on Linux distributions from initrd, mount from busybox is effectively used which executes mount system call directly. This skips the ZFS helper mount.zfs, which checks and enables the mount options as specified in dataset properties. As a result, datasets mounted during boot from initrd do not have correct mount options as specified in ZFS dataset properties. There has been an attempt to use mount.zfs in zfs initrd script, responsible for mounting the ZFS root filesystem (PR#13305). This was later reverted (PR#14908) after discovering that using mount.zfs breaks mounting of snapshots on root (/) and other child datasets of root have the same issue (Issue#9461). This happens because switching from busybox mount to mount.zfs correctly parses the mount options but also adds 'mntpoint=/root' to the mount options, which is then prepended to the snapshot mountpoint in '.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools where root filesystem is mounted before pivot_root. When Linux runtime is reached, trying to access the snapshots on root results in automounting the snapshot on '/root/.zfs/*', which fails. This commit attempts to fix the automounting of snapshots on root, while using mount.zfs in initrd script. Since the mountpoint of dataset is stored in vfs_mntpoint field, we can check if current mountpoint of dataset and vfs_mntpoint are same or not. If they are not same, reset the vfs_mntpoint field with current mountpoint. This fixes the mountpoints of root dataset and children in respective vfs_mntpoint fields when we try to access the snapshots of root dataset or its children. With correct mountpoint for root dataset and children stored in vfs_mntpoint, all snapshots of root dataset are mounted correctly and become accessible. This fix will come into play only if current process, that is trying to access the snapshots is not in chroot context. The Linux kernel API that is used to convert struct path into char format (d_path), returns the complete path for given struct path. It works in chroot environment as well and returns the correct path from original filesystem root. However d_path fails to return the complete path if any directory from original root filesystem is mounted using --bind flag or --rbind flag in chroot environment. In this case, if we try to access the snapshot from outside the chroot environment, d_path returns the path correctly, i.e. it returns the correct path to the directory that is mounted with --bind flag. However inside the chroot environment, it only returns the path inside chroot. For now, there is not a better way in my understanding that gives the complete path in char format and handles the case where directories from root filesystem are mounted with --bind or --rbind on another path which user will later chroot into. So this fix gets enabled if current process trying to access the snapshot is not in chroot context. With the snapshots issue fixed for root filesystem, using mount.zfs in ZFS initrd script, mounts the datasets with correct mount options. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16646
More useful stuff, especially when trying to follow a disassembly. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16653
My eyes are going blurry looking at all those write calls. This is much nicer. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Close openzfs#16653
Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16653
This is the sort of code that we get right once and never look at again. Anyone reading this code is already likely in the middle of a debugging nightmare, and then they have a wall of manual string construction and an unfamiliar and idiosyncratic library to deal with. So, comment the whole thing to try to make it clear what's going on. In pursuit of the above, I've added return checks to some of the libunwind calls, fixed the frame loop to not skip the "top" frame (however unseful it may be), and fix a couple of calls to spl_bt_u64_to_hex_str() which requested 18 digits instead of 16. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16653
For some reason it was dropped when split from kernel, that makes raidz_test to accumulate in RAM up to 100GB of logs we don't need. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16492 Closes openzfs#16566 Closes openzfs#16664
Some compiler/versions warn these typedefs according to openzfs#16660. The platform specific header sys/abd_os.h shouldn't define or use abd_t, as it's defined in its non-platform specific consumer sys/abd.h. Do the same as what FreeBSD header does. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes openzfs#16660 Closes openzfs#16665
Just another useful nugget of info in times of strife. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16667
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was causing the kthread_dequeue_signal_3arg_task check, which uses kernel_siginfo_t, to fail on older kernels. In d6b8c17, we started checking for the "new" three-arg dequeue_signal() by testing for the "old" version. Because that test is explicitly using kernel_siginfo_t, it would fail, leading to the build trying to use the new three-arg version, which would then not compile. This commit fixes that by avoiding checking for the old 3-arg dequeue_signal entirely. Instead, we check for the new one, as well as the 4-arg form, and we use the old form as a fallback. This way, we never have to test for it explicitly, and once we're building HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so everything works out nice. Original-patch-by: Finix <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16666
All of our thread entry functions have this signature: void (*)(void*) __attribute__((noreturn)) The low-level `__thread_create()` function accepts a `thread_func_t` as the entry point, which is defined more simply as: void (*)(void *) And then the `thread_create()` and `thread_create_named()` macros cast the passed-in function point down to `thread_func_t`, that is, casting away the `noreturn` attribute. Clang considers casting between these two types to be invalid because both the caller and the callee may have elided parts of the stack frame save and restore, knowing that they won't be needed. Recent Linux appears to be setting `-Wcast-function-type-strict`, which causes this invalid cast to emit a warning, which with `-Werror` is converted to an error, breaking the build. This commit fixes this in the simplest possible way: adding `noreturn` to the `thread_func_t` attribute. Since all our thread entry functions already have this attribute, it's arguably a just a consistency fix anyway. I considered removing the casts in the macros, which silences the warnings, but it turns out that Clang has a bug that won't emit this error for implicit conversions, only explicit casts. So leaving them there seems like a reasonable belt-and-suspenders approach. Also, frankly, this whole mechanism seems a little undercooked inside LLVM, so I'm content go with my intuition about the smallest, least invaisve change. **NOTE**: `__thread_create` is exported by `spl.ko` and has a `thread_func_t` arg, so this is an ABI break. Whether that matters in practice, I have no idea. Further reading: - llvm/llvm-project@1aad641 - llvm/llvm-project#7325 - llvm/llvm-project#41465 Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16672 Closes openzfs#16673
The following tests have been observed to occasionally fail when running under the CI. Updated our exceptions list to track them. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#16670
behlendorf
force-pushed
the
zfs-2.3.0-rc3-staging
branch
from
October 21, 2024 20:02
62a2009
to
d8a8c2f
Compare
Adding cryptsetup breaks some dialog things on Debian 11. Apply some workaround for it. Signed-off-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
On 64bit FreeBSD this reduces one from 296 to 280 bytes. On small block workloads dbufs may consume gigabytes of ARC, and this saves 5% of it. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16684
Add a LUKS sanity test to trigger: openzfs#16631 Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16681
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's `simd.h` are defined as zero, but they are actually only used as statements. Replace the definitions with `do {} while (0)` instead, to avoid gcc `-Wunused-value` warnings. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Toomas Soome <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes openzfs#16693
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio` variables are cleared before copying data out of them. This avoids accessing garbage data, and fixes gcc `-Wuninitialized` warnings. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Toomas Soome <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Dimitry Andric <[email protected]> Closes openzfs#16688
If on the first open device's logical ashift is bigger than set by pool's ashift property, ignore the last as unusable instead of creating vdev that will fail most of I/Os due to misalignment. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16690
While reading some code @grwilson came across the above function that seemingly had no consumers besides a ztest callback that ensures that the tx_callback infrastructure works correctly. It turns out that Lustre is the main (and potentially the only) consumer of this. Refer to `osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre repo for more info. Let's add a comment highlighting this before someone removes it by mistake. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Serapheim Dimitropoulos <[email protected]> Closes openzfs#16698
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533
Freeing an ABD can take sleeping locks to update various stats. We aren't allowed to sleep on an interrupt handler. So, move the free off to the io_done callback. We should never have been freeing things in the interrupt handler, but we got away with it because we were usually freeing a linear ABD, which at most is returning two objects to a cache and never sleeping. Scatter ABDs can be used now, and those have more complex locking. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16687
I was surprised to discover today that `zpool online` and `zpool offline` don't print any information about why they failed in many cases, they just return 1 with no information about why. Let's improve that where we can without changing the library function. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#16244
Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16692
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24. Update Fedora runners in the test suite. Some minor tweaks also needed to support ksh 1.0.10. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16700
behlendorf
force-pushed
the
zfs-2.3.0-rc3-staging
branch
from
November 1, 2024 17:03
d8a8c2f
to
634593c
Compare
I think we've done enough experiments. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16189 Closes openzfs#16712
Small block workloads may use a very large number of dirty records. During simple block cloning test due to BRT still using 4KB blocks I can easily see up to 2.5M of those used. Before this change dbuf_dirty_record_t structures representing them were allocated via kmem_zalloc(), that rounded their size up to 512 bytes. Introduction of specialized kmem cache allows to reduce the size from 512 to 408 bytes. Additionally, since override and raw params in dirty records are mutually exclusive, puting them into a union allows to reduce structure size down to 368 bytes, increasing the saving to 28%, that can be a 0.5GB or more of RAM. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16694
Not all udev devices have parent devices. Calling udev_device_get_ functions yield an assertion error if called with a NULL pointer. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Sietse <[email protected]> Co-authored-by: Sietse <[email protected]> Closes openzfs#16705 Closes openzfs#16717
behlendorf
force-pushed
the
zfs-2.3.0-rc3-staging
branch
from
November 5, 2024 00:48
634593c
to
dc1e3cf
Compare
This reverts commit b052035. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723
Avoids using fallback_migrate_folio, which starts unnecessary writeback (leading to BUG in migrate_folio_extra). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: tstabrawa <[email protected]> Closes openzfs#16568 Closes openzfs#16723
Currently, even though send_reader_thread prefetches spill block, do_dump() will not use it and issues its own blocking arc_read. This causes significant performance degradation when sending datasets with lots of spill blocks. For unmodified spill blocks, we also create send_range struct for them in send_reader_thread and issue prefetches for them. We piggyback them on the dnode send_range instead of enqueueing them so we don't break send_range_after check. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Co-authored-by: david.chen <[email protected]> Closes openzfs#16701
behlendorf
force-pushed
the
zfs-2.3.0-rc3-staging
branch
from
November 6, 2024 19:54
dc1e3cf
to
490bda0
Compare
a10e552 updated abd_free_linear_page() to no longer call abd_update_scatter_stat(). This meant that linear pages that were not attached to Direct I/O requests were not doing waste accounting for the ARC. This led to performance issues due to incorrect ARC accounting that resulted in 100% of CPU time being spent in arc_evict() during prolonged I/O workloads with the ARC. The call to abd_update_scatter_stats() is now conditionally called in abd_free_linear_page() when the ABD is not from a Direct I/O request. Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes openzfs#16729
When building on musl, we get: ``` In file included from tests/zfs-tests/cmd/getversion.c:22: /usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> In file included from module/os/linux/zfs/vdev_file.c:36: /usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> ``` Bug: https://bugs.gentoo.org/925235 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Sam James <[email protected]> Closes openzfs#15925
This commit fixes JSON output for zfs list when user properties are requested with -o flag. This case needed to be handled specifically since zfs_prop_to_name does not return property name for user properties, instead it is stored in pl->pl_user_prop. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16732
Signed-off-by: Brian Behlendorf <[email protected]>
behlendorf
force-pushed
the
zfs-2.3.0-rc3-staging
branch
from
November 7, 2024 19:34
490bda0
to
1a54b13
Compare
behlendorf
added
Status: Code Review Needed
Ready for review and testing
and removed
Status: Work in Progress
Not yet ready for general review
labels
Nov 7, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Initial proposed patchset for zfs-2.3.0-rc3.
Description
Bug fixes, build fixes, ZTS updates.
How Has This Been Tested?
Clean backports from master. Will be retested by the CI.
Types of changes