Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize recv_fix_encryption_hierarchy() #16929

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gamanakis
Copy link
Contributor

@gamanakis gamanakis commented Jan 5, 2025

Motivation and Context

Motivation: #16317
Closes: #16317

Description

recv_fix_encryption_hierarchy() in its present state goes through all
stream filesystems, and for each one traverses the snapshots in order to
find one that exists locally. This happens by calling guid_to_name() for
each snapshot, which iterates through all children of the filesystem.
This results in CPU utilization of 100% for several minutes (for ~1000
filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive
(-w, regardless whether encrypted or not, dryrun or not).

Fix this by following a different logic: using the top_fs name, call
gather_nvlist() to gather the nvlists for all local filesystems. For
each one filesystem, go through the snapshots to find the corresponding
stream's filesystem (since we know the snapshots guid and can search
with it in stream_avl for the stream's fs). Then go on to fix the
encryption roots and locations as in its present state.

Avoiding guid_to_name() iteratively makes
recv_fix_encryption_hierarchy() significantly faster (from several
minutes to seconds for ~1000 filesystems on a Ryzen 4350G).

Another problem is the following: in case we have promoted a clone of
the filesystem outside the top filesystem specified in zfs send, zfs
receive does not fail but returns an error:
recv_incremental_replication() fails to find its origin and errors out
with needagain=1. This results in recv_fix_hierarchy() not being called
which may render some children of the top fs not mountable since their
encryption root was not updated. To circumvent this make
recv_incremental_replication() silently ignore this error.

How Has This Been Tested?

An existing test was expanded.
Perfomance testing was done locally using the script from #16317

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Jan 5, 2025
@gamanakis gamanakis force-pushed the enc_hierarchy branch 7 times, most recently from 0a8bf07 to 13ee306 Compare January 6, 2025 17:15
@gamanakis gamanakis changed the title [DRAFT] speed up recv_fix_encryption_hierarchy() [DRAFT] optimize recv_fix_encryption_hierarchy() Jan 6, 2025
@gamanakis gamanakis force-pushed the enc_hierarchy branch 4 times, most recently from c217393 to 3aa17e1 Compare January 7, 2025 08:57
@gamanakis gamanakis changed the title [DRAFT] optimize recv_fix_encryption_hierarchy() optimize recv_fix_encryption_hierarchy() Jan 7, 2025
@gamanakis gamanakis marked this pull request as ready for review January 7, 2025 09:00
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Jan 7, 2025
@gamanakis gamanakis force-pushed the enc_hierarchy branch 2 times, most recently from 85ecc90 to 5660bcc Compare January 7, 2025 13:39
@gamanakis
Copy link
Contributor Author

gamanakis commented Jan 11, 2025

c63079c: rebased to master
a55562a: rebased to master

@gamanakis gamanakis marked this pull request as draft January 21, 2025 10:40
@github-actions github-actions bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Jan 21, 2025
@gamanakis
Copy link
Contributor Author

gamanakis commented Jan 21, 2025

Just converted this to a draft again. The original test was broken, in the sense that it is not checking the properties on the receiving pool, but rather in the source.

The problem is that even in its current form (ie before this PR) the corrected test fails: recv_fix_encryption() fails to change the encryption root of test/fs/child after the second recv, resulting in an unmountable fs as we have already changed its key.

EDIT:
This happens because recv_incremental_replication() detects we have the wrong origin (in case we had promoted earlier a clone) and terminates with softerr=1, which leads to recv_fix_encryption_hierarchy() being skipped.

Now all this happens only when we send a fs but have previously promoted a top clone outside of the top fs used in zfs send:

zpool export -a
zpool create test /root/A.vdev -f
zpool create testB /root/B.vdev -f

echo asdfgasdfg | zfs create -o encryption=on -o keyformat=passphrase test/fs
zfs snap -r test/fs@snap
zfs clone test/fs@snap test/clone
zfs create test/fs/child

zfs snap -r test/fs@before
zfs send -Rw test/fs@before | zfs recv -dF testB

echo "****** 1st SEND Completed ******"
echo qwertqwert | zfs change-key -o keyformat=passphrase test/fs/child
echo "****** Changed Key on Child ******"
zfs promote test/clone ########### Here we are promoting outside of the top fs ##########
zfs snap -r test/fs@after
zfs send -Rwi test/fs@before test/fs@after | zfs recv -vdF testB

zfs get encryptionroot test/fs/child
zfs get encryptionroot testB/fs/child

Address this by ignoring the error, allowing recv_fix_encryption_hierarchy() to run.

recv_fix_encryption_hierarchy() in its present state goes through all
stream filesystems, and for each one traverses the snapshots in order to
find one that exists locally. This happens by calling guid_to_name() for
each snapshot, which iterates through all children of the filesystem.
This results in CPU utilization of 100% for several minutes (for ~1000
filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive
(-w, regardless whether encrypted or not, dryrun or not).

Fix this by following a different logic: using the top_fs name, call
gather_nvlist() to gather the nvlists for all local filesystems. For
each one filesystem, go through the snapshots to find the corresponding
stream's filesystem (since we know the snapshots guid and can search
with it in stream_avl for the stream's fs). Then go on to fix the
encryption roots and locations as in its present state.

Avoiding guid_to_name() iteratively makes
recv_fix_encryption_hierarchy() significantly faster (from several
minutes to seconds for ~1000 filesystems on a Ryzen 4350G).

Another problem is the following: in case we have promoted a clone of
the filesystem outside the top filesystem specified in zfs send, zfs
receive does not fail but returns an error:
recv_incremental_replication() fails to find its origin and errors out
with needagain=1. This results in recv_fix_hierarchy() not being called
which may render some children of the top fs not mountable since their
encryption root was not updated. To circumvent this make
recv_incremental_replication() silently ignore this error.

Signed-off-by: George Amanakis <[email protected]>
@gamanakis gamanakis marked this pull request as ready for review January 21, 2025 13:44
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Jan 21, 2025
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right to me. Thanks for running this down and implementing some more efficient logic for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
2 participants