Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stable/cobia] Sync with upstream zfs-2.2-release branch #159

Merged
merged 28 commits into from
Aug 25, 2023

Conversation

ixhamza
Copy link

@ixhamza ixhamza commented Aug 25, 2023

Sync with upstream zfs-2.2-release branch

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

oromenahar and others added 28 commits July 27, 2023 16:11
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15097
Signed-off-by: Brian Behlendorf <[email protected]>
Merge after upstream zfs-2.2-release rc3 tag
This gives `zdb -b` support for clone blocks.

Previously, it didn't know what clones were, so would count their space
allocation multiple times and then report leaked space (or, in debug,
would assert trying to claim blocks a second time).

This commit fixes those bugs, and reports the number of clones and the
space "used" (saved) by them.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes openzfs#15123
Before Linux 5.3, the filesystem's copy_file_range handler had to signal
back to the kernel that we can't fulfill the request and it should
fallback to a content copy. This is done by returning -EOPNOTSUPP.

This commit converts the EXDEV return from zfs_clone_range to
EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it
might be unable to clone. Without it the copy_file_range() syscall will
return EXDEV to userspace, breaking its semantics.

Add test for copy_file_range fallbacks.  copy_file_range should always
fallback to a content copy whenever ZFS can't service the request with
cloning.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15131
Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15128
glibc includes sys/types.h from stdlib.h. This is not the case for MUSL,
so explicitly include it. Fixes usage of uint_t.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Zach Dykstra <[email protected]>
Closes openzfs#15130
If looking up a snapdir inode failed, hold pool config – hold the 
snapshot – get its creation property – release it – release it, 
then use that as the [amc]time in the allocated inode. If that 
fails then fall back to current time. No performance impact since 
this is only done when allocating a new snapdir inode.
                                                       
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#15110
Closes openzfs#15117
When the vdev properties features was merged an extra check
was added in `spa_vdev_remove_top_check()` which checked
whether the vdev that we want to remove is already being
removed and if so return an EALREADY error.

```
static int
spa_vdev_remove_top_check(vdev_t *vd)
{
	... <snip> ...
	/*
	 * This device is already being removed
	 */
	if (vd->vdev_removing)
		return (SET_ERROR(EALREADY));
```

Before that change we'd still fail with an error but it
was a more generic one - here is the check that failed
later in the same function:
```
	/*
	 * There can not be a removal in progress.
	 */
	if (spa->spa_removing_phys.sr_state == DSS_SCANNING)
		return (SET_ERROR(EBUSY));
```

Changing the error code returned from that function changed
the behavior of the removal's library interface exposed to
the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN`
instead of `EZFS_EBUSY` that was returning before.

This patch adds logic to make `spa_vdev_remove()` mindful
of the new EALREADY code and propagating `EZFS_EBUSY`
reverting to the previously established semantics of that
function.

Reviewed-by: Mark Maybee <[email protected]>
Reviewed-by: Matthew Ahrens <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes openzfs#15013
Closes openzfs#15129
For Native Debian packaging, zinject binary and man page is
packaged in ZFS test package. zinject is not not directly related
to ZTS and should be packaged with other utilities, like it is
present in zfs_<ver>.rpm/deb packages.

This commit moves zinject binary and man page from openzfs-zfs-test
to openzfs-zfsutils package.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#15160
NAS-123369 / 24.04 / Packaging: Move zinject from openzfs-zfs-test to openzfs-zfsutils package
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
Closes openzfs#15141
When compiling a kernel with bcachefs and zfs,
the two macros will collide, making it impossible
to have both filesystems.

It is sufficient to just undefine the macro before calling it.

On why this should be in ZFS rather than bcachefs, currently,
bcachefs is not a in-tree filesystem, but,
it has a reasonably high chance of getting included soon.

This avoids the breakage in ZFS early,
this patch may be distributed downstream in NixOS
and is already used there.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Lahfa <[email protected]>
Closes openzfs#15144
POSIX timers target the process, not the thread (as does SIGINFO),
so we need to block it in the main thread which will die if interrupted.

Ref: https://101010.pl/@[email protected]/110731819189629373
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#15113
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15148
Support mountpoint=legacy for the root dataset in the dracut zfs support
scripts.

mountpoint=/ or mountpoint=/sysroot also works.

Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for
root datasets with mountpoint != legacy.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rafael Kitover <[email protected]>
Closes openzfs#15149
For Native Debian packaging, zinject binary and man page is
packaged in ZFS test package. zinject is not not directly related
to ZTS and should be packaged with other utilities, like it is
present in zfs_<ver>.rpm/deb packages.

This commit moves zinject binary and man page from openzfs-zfs-test
to openzfs-zfsutils package.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#15160
In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15170
Closes openzfs#15172
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15200
The transaction there does not produce any dirty data or log blocks,
so it should not be throttled. All other cases wait for TXG sync, by
which time the log block we are writing will be obsolete, so we can
skip waiting and just return error here instead.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15096
Fastwrite was introduced many years ago to improve ZIL writes spread
between multiple top-level vdevs by tracking number of allocated but
not written blocks and choosing vdev with smaller count.  It suposed
to reduce ZIL knowledge about allocation, but actually made ZIL to
even more actively report allocation code about the allocations,
complicating both ZIL and metaslabs code.

On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync()
was lost many years ago, that was one of the declared benefits. Plus
introduction of embedded log metaslab class solved another problem
with allocation rotor accounting both normal and log allocations,
since in most cases those are now in different metaslab classes.

After all that, I'd prefer to simplify already too complicated ZIL,
ZIO and metaslab code if the benefit of complexity is not obvious.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15107
In most cases dmu_sync() works with dirty records directly and does
not need actual data. The only exception is dmu_sync_late_arrival().
To save some CPU time use dmu_buf_hold_noread*() in z*_get_data()
and explicitly call dbuf_read() in dmu_sync_late_arrival(). There
is also a chance that by that time TXG will already be synced and
we won't have to do it at all.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15153
If we get next block allocation error during log write, we trigger
transaction commit.  But the block we have just completed is still
written and transactions it covers will be acknowledged normally.
If after that we ignore the block during replay just because it is
the last in the chain, we may not replay some transactions that we
have acknowledged as synced, that is not right.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15132
The previous patch openzfs#14841 appeared to have significant flaw, causing
deadlocks if zl_get_data callback got blocked waiting for TXG sync.  I
already handled some of such cases in the original patch, but issue
 openzfs#14982 shown cases that were impossible to solve in that design.

This patch fixes the problem by postponing log blocks allocation till
the very end, just before the zios issue, leaving nothing blocking after
that point to cause deadlocks.  Before that point though any sleeps are
now allowed, not causing sync thread blockage.  This require slightly
more complicated lwb state machine to allocate blocks and issue zios
in proper order.  But with removal of special early issue workarounds
the new code is much cleaner now, and should even be more efficient.

Since this patch uses null zios between write, I've found that null
zios do not wait for logical children ready status in zio_ready(),
that makes parent write to proceed prematurely, producing incorrect
log blocks.  Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children()
fixes it.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Mark Maybee <[email protected]>
Reviewed-by: George Wilson <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15122
Sync with upstream zfs-2.2-release branch
@ixhamza ixhamza requested a review from amotin August 25, 2023 20:30
@amotin amotin merged commit b8f9529 into stable/cobia Aug 25, 2023
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.