Raidz expand #14840

chrisjsimpson · 2023-05-08T19:54:55Z

Motivation and Context

Bring https://github.com/ahrens/zfs/tree/raidz-expand up to date with upstream openzfs/zfs master branch.

My goal here was to get this branch up to date with openzfs/zfs master in the hope it saves someone time from the more complex work of testing/reviewing.

Ref #12225

Description

Performed a manual rebase and manual resolve of conflicts of ahrens raidz-expand branch or openzfs/zfs master branch.

How Has This Been Tested?

I have not tested this yet, and I have no immediate plans to sorry.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document. *
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

I appreciate this is probably not following Contributing/Pull Requests since it's a rebase of a PR
Nitpick: I messed up the git commit author doing this on a test system

This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Contributions-by: Fedor Uporov <[email protected]> Contributions-by: Stuart Maybee <[email protected]> Contributions-by: Thorsten Behrens <[email protected]> Contributions-by: Fmstrat <[email protected]>

Some blocks, which were synced in the same txg as raidz_reflow_complete_sync(), can have incorrect logical width. The increasing of txg value, which was added to expand txgs array, can help in this case.

…test_vdev_attach_detach()

…e of reflow The "shadow block" repair write was not acutally being executed due to bypassing in lower layers.

MMP uberblock could be owerwritten by scratch object if raidz expansion is in progress.

This reverts commit d99d9a3.

Improve comment.

Add mmp uberblock actualization from scratch object side

chrisjsimpson · 2023-05-08T19:57:17Z

Sorry, my intention is to raise this against https://github.com/ahrens/zfs 's fork, not here. I've missed up with the PR ui here.

ahrens and others added 25 commits May 8, 2023 16:55

disable zstd mempool

f4caf7b

Increase the number of txgs added to last reflow complete sync txg

2e850ec

Some blocks, which were synced in the same txg as raidz_reflow_complete_sync(), can have incorrect logical width. The increasing of txg value, which was added to expand txgs array, can help in this case.

panic in zthr_iscancelled

80cc84d

ztest: Skip ztest_vdev_LUN_growth() if raidz expansion is in-progress

fad7722

ztest: Make ztest_vdev_raidz_attach() error checking more closer to z…

ac4637e

…test_vdev_attach_detach()

ztest: Restore scratch object testing

935af3e

ztest: Add raidz expansion testing as CLI option

0e4f962

ztest: Fix integer printing

5e3c16e

fix a bug where we can fail to repair a few blocks while in the middl…

1ac2855

…e of reflow The "shadow block" repair write was not acutally being executed due to bypassing in lower layers.

manpage comments

ec9cf17

fix assertion failure in raidz_reflow_sync()

40750f0

one more manpage tweak

8a8423c

Do not work with shadow location in case if scratch space requested

dccddfe

Do not switch to scratch offset in the middle of the row

afc1123

Fix raidz asize computation in case of expansion is in progress

33e2141

Make vdev_rz_expanding config variable syncing more earlier

2fb3d7e

Skip mmp ub writing if scratch object is active.

84b0814

MMP uberblock could be owerwritten by scratch object if raidz expansion is in progress.

Revert "Make vdev_rz_expanding config variable syncing more earlier"

95c65b6

This reverts commit d99d9a3.

Skip mmp ub writing if scratch object is active.

c64d563

Improve comment.

Handle scratch in case of shadow writes

91d1813

Scratch logic refactoring

8cd6c85

Remove skip mmp ub writing if scratch object is active.

6a6be8e

Add mmp uberblock actualization from scratch object side

Fix 'shadow write' to scratch region

914f429

Remove unneeded argument from mmp.c

b1c0c3d

chrisjsimpson closed this May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raidz expand #14840

Raidz expand #14840

chrisjsimpson commented May 8, 2023 •

edited

Loading

chrisjsimpson commented May 8, 2023

Raidz expand #14840

Raidz expand #14840

Conversation

chrisjsimpson commented May 8, 2023 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

chrisjsimpson commented May 8, 2023

chrisjsimpson commented May 8, 2023 •

edited

Loading