Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-progs: add option for recursive subvol snapshots #886

Open
wants to merge 18 commits into
base: devel
Choose a base branch
from

Conversation

maharmstone
Copy link
Contributor

Adds an option -R to btrfs subvolume snapshot, corresponding to the flag BTRFS_UTIL_CREATE_SNAPSHOT_RECURSIVE.

This is another resubmission of a missed patch of Omar's from 2018: https://lore.kernel.org/all/e42cdc5d5287269faf4d09e8c9786d0b3adeb658.1516991902.git.osandov@fb.com/

osandov and others added 13 commits August 15, 2024 14:58
Add new option --recursive 'btrfs subvol delete', causing it to pass the
BTRFS_UTIL_DELETE_SUBVOLUME_RECURSIVE flag through to libbtrfsutil.

This can work in two modes, depending on the user:

- regular user - this will skip subvolumes that are not accessible
- root (CAP_SYS_ADMIN) - no limitations

Pull-request: kdave#861
Signed-off-by: Mark Harmstone <[email protected]>
Co-authored-by: Omar Sandoval <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
[ Add details to man page, fix indent in the doc. ]
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Add a new option --subvol, which tells mkfs.btrfs to create the
specified directories as subvolumes when used with --rootdir.

Given a populated directory dir, the command

  $ mkfs.btrfs --rootdir dir --subvol usr --subvol home --subvol home/username img

will create subvolumes 'usr' and 'home' within the toplevel subvolume,
and subvolume 'username' within the 'home' subvolume. It will fail if
any of the directories do not yet exist.

Pull-request: kdave#868
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Add limit parameter so workflows are not skipped if they don't fit the
default limit 10. Add more workflows to clean up after recent updates.

Signed-off-by: David Sterba <[email protected]>
Remove last newline in the output of 'btrfs filesystem show', keep the
line between two filesystems so the devices are visually grouped
togehter.

Pull-request: kdave#866
Author: Matt Langford <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Added 0x prefix to HEX numbers and transform some tables to new format.

Pull-request: kdave#881
Signed-off-by: Yuwei Han <[email protected]>
[ Fix RST grammar errors ]
Signed-off-by: Qu Wenruo <[email protected]>
Change --subvol that it can accept flags, and add a "default" flag that
allows you to mark a subvolume as the default.

Signed-off-by: Mark Harmstone <[email protected]>
Adds a flag to mkfs.btrfs --subvol to allow subvolumes to be created
readonly.

Signed-off-by: Mark Harmstone <[email protected]>
Call btrfs_util_subvolume_create in create_one_subvolume rather than
calling the ioctl directly.

Signed-off-by: Mark Harmstone <[email protected]>
Co-authored-by: Omar Sandoval <[email protected]>
Call btrfs_util_subvolume_snapshot in cmd_subvolume_snapshot rather than
calling the ioctl directly.

Signed-off-by: Mark Harmstone <[email protected]>
Co-authored-by: Omar Sandoval <[email protected]>
Remove functions that after the previous two patches are no longer
referenced.

Signed-off-by: Mark Harmstone <[email protected]>
Co-authored-by: Omar Sandoval <[email protected]>
Currently the transaction log is more or less ignored by btrfs check,
meaning that it's possible for a FS with a corrupt log to pass btrfs
check, but be immediately corrupted by the kernel when it's mounted.

This patch adds a check that if there's an inode in the log, any pending
non-inlined csumed writes also have corresponding csum entries.

Signed-off-by: Mark Harmstone <[email protected]>
[ Small commit message update. ]
Signed-off-by: Qu Wenruo <[email protected]>
The new hard link detection and creation support is done by maintaining
an rb tree with the following members:

- st_ino, st_dev
  This is to record the stat() report from the host fs.
  With this two, we can detect if it's really a hard link (st_dev
  determines one filesystem/subvolume, and st_ino determines the inode
  number inside the fs).

- root
  This is btrfs root pointer. This a special requirement for the recent
  introduced "--subvol" option.

  As we can have the following corner case:

  rootdir/
  |- foobar_hardlink1
  |- foobar_hardlink2
  |- subv/		<- To be a subvolume inside btrfs
     |- foobar_hardlink3

  In above case, on the host fs, `subv/` directory is just a regular
  directory, but in the new btrfs it will be a subvolume.

  In that case, `foobar_hardlink3` cannot be created as a hard link,
  but a new inode.

- st_nlink and found_nlink
  Records the original reported number of links, and the nlinks we
  created inside btrfs.
  This is recorded in case we created all hard links and can remove
  the entry early.

- btrfs_ino
  This is the inode number inside btrfs.

And since we can handle hard links safely, remove all the related
warnings, and add a new note for `--subvol` option, warning about the
case where we need to split hard links due to subvolume boundary.

Signed-off-by: Qu Wenruo <[email protected]>
This introduces two new cases:

- 3 hardlinks without any subvolume
  This should results 3 hard links inside the btrfs.

- 3 hardlinks, but a subvolume will split 2 of them
  Then the 2 inside the same subvolume should still report 2 nlinks,
  but the lone one inside the new subvolume can only report 1 nlink.

Signed-off-by: Qu Wenruo <[email protected]>
Documentation/btrfs-subvolume.rst Outdated Show resolved Hide resolved
@@ -263,6 +263,9 @@ snapshot [-r] [-i <qgroupid>] <source> <dest>|[<dest>/]<name>

-r
Make the new snapshot read only.
-R
Recursively snapshot subvolumes beneath the source. This option cannot be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like limitation but we can't do better in user space. This is possible in kernel but is quite complicated so we must exclude -r and -R for now. As a suggestion to documenation, the subvolume can be mad read-only afterwards by changing the property to ro.

And as this is using libbtrfsutil the same limitations apply (and need to be documented): it's not atomic and depends on the capabilities to enumerate the subvolumes. Similar to what the recursive deletion has, we might need a special section in documentation for that so we can refer to it instead of repeating the whole text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added a bit to the man page.

I might also look into fixing the kernel so that this can be done properly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not mention the limitations, nor referes to a section where it would be explained.

@adam900710 adam900710 added this to the v6.11 milestone Sep 5, 2024
@adam900710 adam900710 added enhancement UI/UX User interface change labels Sep 5, 2024
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
softlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the soft link inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a soft link with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for soft links we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Signed-off-by: Qu Wenruo <[email protected]>
symbolic links

[BUG]
There is a recent bug that btrfs/012 fails and kernel rejects to read a
symbolic link which is backed by a regular extent.

Furthremore in that case, "btrfs check" doesn't detect such problem at
all.

[CAUSE]
For symbolic links, we only allow inline file extents, and this means we
should only have a symbolic link target which is smaller than 4K.

But btrfs check doesn't handle symbolic link inodes any differently, thus
it doesn't check if the file extents are inlined or not, nor reporting
this problem as an error.

[FIX]
When processing data extents, if we find the owning inode is a symbolic
link, and the file extent is regular/preallocated, mark the inode with
I_ERR_FILE_EXTENT_TOO_LARGE error.

Signed-off-by: Qu Wenruo <[email protected]>
…inks

[BUG]
There is a recent bug that btrfs/012 fails and kernel rejects to read a
symbolic link which is backed by a regular extent.

Furthremore in that case, "btrfs check --mode=lowmem" doesn't detect such
problem at all.

[CAUSE]
For symbolic links, we only allow inline extents, and this means we should
only have a symbolic link target which is smaller than 4K.

But lowmem mode btrfs check doesn't handle symbolic link inodes any
differently, thus it doesn't check if the file extents are inlined or not,
nor reporting this problem as an error.

[FIX]
When processing data extents, if we find the owning inode is a symbolic
link, and the file extent is regular/preallocated, report an error for
the bad file extent item.

Signed-off-by: Qu Wenruo <[email protected]>
…link handling

The new test case will:

- Create a symbolic which contains a 4095 bytes sized target on ext4

- Convert the ext4 to btrfs

- Make sure we can still read the symbolic link
  For unpatched btrfs-convert, the resulted symbolic link will be rejected
  by kernel and fail.

Signed-off-by: Qu Wenruo <[email protected]>
@maharmstone
Copy link
Contributor Author

Pushed new version

Adds an option -R to btrfs subvolume snapshot, corresponding to the flag
BTRFS_UTIL_CREATE_SNAPSHOT_RECURSIVE.

This is another resubmission of a missed patch of Omar's from 2018:
https://lore.kernel.org/all/e42cdc5d5287269faf4d09e8c9786d0b3adeb658.1516991902.git.osandov@fb.com/

Signed-off-by: Mark Harmstone <[email protected]>
Co-authored-by: Omar Sandoval <[email protected]>
Create a snapshot of the subvolume *source* with the
name *name* in the *dest* directory.

If only *dest* is given, the subvolume will be named the basename of *source*.
If *source* is not a subvolume, btrfs returns an error.

If you wish to recursively create a readonly snapshot, you can run
:command:`btrfs property set <path> ro true` on each subvolume after this command completes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity I'd suggest to mention that it has to be done from the leaves up in the tree consisting of subvolumes. Which may not be that easy to find out manually, so this could be a separate command for subvolume (in a separate patch). For parity an option to the snapshot command can be added that should read like it's doing the read-only change after the snapshots. This can be also added separatelly, we need to define the use cases first.

@@ -263,6 +263,9 @@ snapshot [-r] [-i <qgroupid>] <source> <dest>|[<dest>/]<name>

-r
Make the new snapshot read only.
-R
Recursively snapshot subvolumes beneath the source. This option cannot be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not mention the limitations, nor referes to a section where it would be explained.

@kdave kdave force-pushed the devel branch 2 times, most recently from f6d1943 to 5ff9f62 Compare September 17, 2024 15:00
@kdave kdave removed this from the v6.11 milestone Sep 17, 2024
@kdave kdave added this to the v6.12 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement UI/UX User interface change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants