Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raidz expand #14840

Closed
wants to merge 25 commits into from
Closed

Raidz expand #14840

wants to merge 25 commits into from

Conversation

chrisjsimpson
Copy link

@chrisjsimpson chrisjsimpson commented May 8, 2023

Motivation and Context

Bring https://github.com/ahrens/zfs/tree/raidz-expand up to date with upstream openzfs/zfs master branch.

My goal here was to get this branch up to date with openzfs/zfs master in the hope it saves someone time from the more complex work of testing/reviewing.

Ref #12225

Description

Performed a manual rebase and manual resolve of conflicts of ahrens raidz-expand branch or openzfs/zfs master branch.

How Has This Been Tested?

I have not tested this yet, and I have no immediate plans to sorry.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • I appreciate this is probably not following Contributing/Pull Requests since it's a rebase of a PR
  • Nitpick: I messed up the git commit author doing this on a test system

ahrens and others added 25 commits May 8, 2023 16:55
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally.  This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).

== Initiating expansion ==

A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`.  The new device will become part of the RAIDZ group.  A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.

The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.

== During expansion ==

The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).

The expansion progress can be monitored with `zpool status`.

Data redundancy is maintained during (and after) the expansion.  If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).

The pool remains accessible during expansion.  Following a reboot or
export/import, the expansion resumes where it left off.

== After expansion ==

When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).

Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).

A RAIDZ vdev can be expanded multiple times.

After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks.  New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity).  However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.

Sponsored-by: The FreeBSD Foundation
Contributions-by: Fedor Uporov <[email protected]>
Contributions-by: Stuart Maybee <[email protected]>
Contributions-by: Thorsten Behrens <[email protected]>
Contributions-by: Fmstrat <[email protected]>
Some blocks, which were synced in the same txg as
raidz_reflow_complete_sync(), can have incorrect logical width.
The increasing of txg value, which was added to expand txgs array,
can help in this case.
…e of reflow

The "shadow block" repair write was not acutally being executed due to
bypassing in lower layers.
MMP uberblock could be owerwritten by scratch object if raidz expansion is in progress.
Add mmp uberblock actualization from scratch object side
@chrisjsimpson
Copy link
Author

Sorry, my intention is to raise this against https://github.com/ahrens/zfs 's fork, not here. I've missed up with the PR ui here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants