Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Initial ZFS clones support #82

Merged
merged 1 commit into from
Jan 4, 2022
Merged

Conversation

tuffnatty
Copy link
Contributor

This adds initial support for ZFS clones (#36). The datasets are now synced in creation order, thus any clone comes after its origin. If a clone dataset is selected for sync, its origin snapshot must already exist on the target node. If it's not the case, an error occurs, and user is given a suggestion to retransfer the origin dataset with --other-snapshots.

@coveralls
Copy link

coveralls commented Jun 22, 2021

Coverage Status

Coverage decreased (-0.5%) to 91.189% when pulling e11c332 on tuffnatty:clones_support into 07cb7cf on psy0rz:master.

@psy0rz
Copy link
Owner

psy0rz commented Jun 22, 2021

Awesome!

I have been thinking about clone support for a long time, but its pretty hard to do in a clean and correct way. That's why i have some issues with this implementation:

  • There are a few ways this will fail, without workaround:
    • If the origin isn't in the selected datasets, the --other-snapshots option wont help and the backup will fail.
    • If the dataset that has the origin was already replicated, but an other snapshot was cloned later, there is no way to fix it. (except destroy, and start over with --other-snapshots)
    • If a clone was promoted, so the clone/origin roles have reversed, sorting on creation time will give the wrong order.
  • Using --other-snapshots shouldn't be needed to have clone support. zfs-autobackup should figure out what to do, and have perhaps some extra options to support clones.
  • As a general rule: I try to prevent using magic and auto-detection as much as possible. It should only fail if it tries to do something you tell it to do. (e.g. use clones with --clone or something)

So to support cloning in every way imaginable, it gets VERY complicated. We assume cloning support will have to be enabled with --clone so we can do some extra checking and raise extra errors.

When a new dataset will be created on the target, there are three possibilities:

  • source dataset has no origin, continue as we normally do. done.
  • source dataset is has an origin (e.g. is a clone):
    • figure out if the origin is on the target already:
      • The obvious way is easy: Check the if the snapshot is in the determined target_path. (like you do with make_target_path())
      • But what if an admin has multiple backup sets in multiple target paths? Should we check all the GUID's of ALL the snapshots on the target node. This is resource intensive, so maybe we need an option for it? (--clone-check-all ?)
    • if we found it on the target, use it. done.
    • if we dont have it on the target, figure out how to get it:
      • is it a snapshot of a selected dataset?
        • can we still send it? then send that dataset first, and make sure the snapshot is selected. (even if its an other snapshot)
        • cant send it because later snapshots already have been send? issue a warning and continue without cloning.
      • is it a snapshot of an other dataset, what do we want to do?
        • Abort with an error, and suggest --clone-ignore-other to ignore it?
        • Issue a warning, and continue without cloning?
        • Have an option like --clone-other, that will replicate that all origin-snapshots of that dataset?

Now keep in mind that this needs to be recursive as well: e.g. a clone of a snapshot that is part of a dataset that itself is a clone should also work. So that will need some rewriting of code.

Perhaps an option like --origin-snapshots that also sends over all snapshots that are origins helps. (like a filtered version of --other-snapshots)

Now my head is spinning, and i'm too tired to think about promoting/demoting clones. :)

Let me know what you think about all this.

@tuffnatty
Copy link
Contributor Author

  • If the origin isn't in the selected datasets, the --other-snapshots option wont help and the backup will fail.

Well, it's really an extreme situation so why not make users plan their backups thoroughly? ;-)
In this case we can either 1) fail (as the patch does now) or 2) emit a warning and skip the clone (or flatten/unCOW it which is the current zfs_autobackup behaviour).

  • If the dataset that has the origin was already replicated, but an other snapshot was cloned later, there is no way to fix it. (except destroy, and start over with --other-snapshots)

Yes, sure I had to provide a more detailed error message, with the destroy part.

  • If a clone was promoted, so the clone/origin roles have reversed, sorting on creation time will give the wrong order.

We don't try to support promotion here, as well as dataset renaming and recreation is not handled correctly at the moment. Basically, this patch only allows for my specific use case, but I thought it could be useful to someone.

  • Using --other-snapshots shouldn't be needed to have clone support. zfs-autobackup should figure out what to do, and have perhaps some extra options to support clones.

I thought on --flatten-clones or --uncow-clones to keep the old behaviour.
Well, we could detect snapshots with the clones property and force-sync those with clones in the selected dataset list.

  • As a general rule: I try to prevent using magic and auto-detection as much as possible. It should only fail if it tries to do something you tell it to do. (e.g. use clones with --clone or something)
    So to support cloning in every way imaginable, it gets VERY complicated. We assume cloning support will have to be enabled with --clone so we can do some extra checking and raise extra errors.

Another option then is to keep the old (flatten/unCOW) behaviour by default, emitting a warning ("Your clone {} is going to be unCOWed! Prepare to lose {} TB space in 30 seconds!"), and add a --clones-magic option.

  • source dataset is has an origin (e.g. is a clone):

    • figure out if the origin is on the target already:

      • The obvious way is easy: Check the if the snapshot is in the determined target_path. (like you do with make_target_path())
      • But what if an admin has multiple backup sets in multiple target paths? Should we check all the GUID's of ALL the snapshots on the target node. This is resource intensive, so maybe we need an option for it? (--clone-check-all ?)

This could be documented as a configuration unsupported with --clones-magic for now.

A side thought: If zfs_autobackup had access to its full configuration, that is, a persistent storage (config file/zfs properties/...) as opposed to command line of the current single run, it could peek there for the other backup sets' info.

  • if we found it on the target, use it. done.

  • if we dont have it on the target, figure out how to get it:

    • is it a snapshot of a selected dataset? then send that dataset first, and make sure the snapshot is selected. (even if its an other snapshot)

If it's selected, then it's already sent, as datasets are sorted by creation.

* is it a snapshot of an other dataset, what do we want to do?
  
  * Abort with an error, and suggest --clone-ignore-other to ignore it?
  * Issue a warning, and continue without cloning?
  * Have an option like --clone-other, that will replicate that all origin-snapshots of that dataset?

How about: --clone-other-origin=abort|skip|uncow|force? But really, the user should just select the origin dataset by setting the autobackup: property, and in a significant number of cases it's already set on a parent.

Now keep in mind that this needs to be recursive as well: e.g. a clone of a snapshot that is part of a dataset that itself is a clone should also work. So that will need some rewriting of code.

I think this specific configuration is supported by the current patch.

Perhaps an option like --origin-snapshots that also sends over all snapshots that are origins helps. (like a filtered version of --other-snapshots)

It's a good name.

@tuffnatty tuffnatty changed the title Initial ZFS clones support WIP: Initial ZFS clones support Jun 22, 2021
@psy0rz
Copy link
Owner

psy0rz commented Jul 5, 2021

Im busy with some other projects, but i will come back on this at some later time, after i have given it some more thought.

I agree we can atleast try to support bookmarks in most cases to start with, and later enhance it. And perhaps emit warnings or errors to alert the user on "unreachable" origins. Its in my nature to immediately start thinking about all possible edge cases that can cause failures. :)

@psy0rz
Copy link
Owner

psy0rz commented Oct 3, 2021

i'll get started on this after 3.1.1 is released.

@tuffnatty
Copy link
Contributor Author

Rebased against current master.

@psy0rz psy0rz merged commit e11c332 into psy0rz:master Jan 4, 2022
@psy0rz
Copy link
Owner

psy0rz commented Jan 4, 2022

i accidently already merged it. i was reviewing it but decided i still need to fix some 3.1.x stuff first.

will remerge/refactor it after those things sorry

@knuuuut
Copy link

knuuuut commented Aug 30, 2023

It looks like this isn't implemented in 3.2.2. Without support of clones I can't use it in my setup. Any news on when we can expect it?

@tuffnatty
Copy link
Contributor Author

@knuuuut The clones are supported, with the caveats outlined in the discussion above.

@knuuuut
Copy link

knuuuut commented Aug 31, 2023

Hmm, but I don't get them as expected.
Command:
zfs-autobackup -v --no-snapshot --no-holds --zfs-compressed --strip-path 2 test external04/test
Source:

zfs list -r -o origin,name backup01/hu/files/S101-IT01
ORIGIN                                      NAME
-                                           backup01/hu/files/S101-IT01
-                                           backup01/hu/files/S101-IT01/230114-0059
-                                           backup01/hu/files/S101-IT01/230427-0914
backup01/hu/files/S101-IT01/230427-0914@ro  backup01/hu/files/S101-IT01/230601-0116
backup01/hu/files/S101-IT01/230601-0116@ro  backup01/hu/files/S101-IT01/230801-0457
backup01/hu/files/S101-IT01/230801-0457@ro  backup01/hu/files/S101-IT01/230817-0817
backup01/hu/files/S101-IT01/230817-0817@ro  backup01/hu/files/S101-IT01/230817-2034
backup01/hu/files/S101-IT01/230817-2034@ro  backup01/hu/files/S101-IT01/230818-0838
backup01/hu/files/S101-IT01/230818-0838@ro  backup01/hu/files/S101-IT01/230818-2041
backup01/hu/files/S101-IT01/230818-2041@ro  backup01/hu/files/S101-IT01/230819-0845
backup01/hu/files/S101-IT01/230819-0845@ro  backup01/hu/files/S101-IT01/230819-2048
backup01/hu/files/S101-IT01/230819-2048@ro  backup01/hu/files/S101-IT01/230820-0851
backup01/hu/files/S101-IT01/230820-0851@ro  backup01/hu/files/S101-IT01/230820-2054
backup01/hu/files/S101-IT01/230820-2054@ro  backup01/hu/files/S101-IT01/230821-0857
backup01/hu/files/S101-IT01/230821-0857@ro  backup01/hu/files/S101-IT01/230821-2100
backup01/hu/files/S101-IT01/230821-2100@ro  backup01/hu/files/S101-IT01/230822-0903
backup01/hu/files/S101-IT01/230822-0903@ro  backup01/hu/files/S101-IT01/230822-2107
backup01/hu/files/S101-IT01/230822-2107@ro  backup01/hu/files/S101-IT01/230823-0911
backup01/hu/files/S101-IT01/230823-0911@ro  backup01/hu/files/S101-IT01/230823-2114
backup01/hu/files/S101-IT01/230823-2114@ro  backup01/hu/files/S101-IT01/230824-0917
backup01/hu/files/S101-IT01/230824-0917@ro  backup01/hu/files/S101-IT01/230824-2130
backup01/hu/files/S101-IT01/230824-2130@ro  backup01/hu/files/S101-IT01/230825-0933
backup01/hu/files/S101-IT01/230825-0933@ro  backup01/hu/files/S101-IT01/230825-2136
backup01/hu/files/S101-IT01/230825-2136@ro  backup01/hu/files/S101-IT01/230826-0939
backup01/hu/files/S101-IT01/230826-0939@ro  backup01/hu/files/S101-IT01/230826-2143
backup01/hu/files/S101-IT01/230826-2143@ro  backup01/hu/files/S101-IT01/230827-0947
backup01/hu/files/S101-IT01/230827-0947@ro  backup01/hu/files/S101-IT01/230827-2153
backup01/hu/files/S101-IT01/230827-2153@ro  backup01/hu/files/S101-IT01/230828-0956
backup01/hu/files/S101-IT01/230828-0956@ro  backup01/hu/files/S101-IT01/230828-2159
backup01/hu/files/S101-IT01/230828-2159@ro  backup01/hu/files/S101-IT01/230829-1011
backup01/hu/files/S101-IT01/230829-1011@ro  backup01/hu/files/S101-IT01/230829-2217
backup01/hu/files/S101-IT01/230829-2217@ro  backup01/hu/files/S101-IT01/230830-1020
backup01/hu/files/S101-IT01/230830-1020@ro  backup01/hu/files/S101-IT01/230830-2224
backup01/hu/files/S101-IT01/230830-2224@ro  backup01/hu/files/S101-IT01/230831-1026

Target after command:

zfs list -r -o origin,name external04/test
ORIGIN  NAME
-       external04/test
-       external04/test/files
-       external04/test/files/S101-IT01
-       external04/test/files/S101-IT01/230114-0059
-       external04/test/files/S101-IT01/230427-0914
-       external04/test/files/S101-IT01/230601-0116
-       external04/test/files/S101-IT01/230801-0457
-       external04/test/files/S101-IT01/230817-0817
-       external04/test/files/S101-IT01/230817-2034
-       external04/test/files/S101-IT01/230818-0838
-       external04/test/files/S101-IT01/230818-2041
-       external04/test/files/S101-IT01/230819-0845
-       external04/test/files/S101-IT01/230819-2048
-       external04/test/files/S101-IT01/230820-0851
-       external04/test/files/S101-IT01/230820-2054
-       external04/test/files/S101-IT01/230821-0857
-       external04/test/files/S101-IT01/230821-2100
-       external04/test/files/S101-IT01/230822-0903
-       external04/test/files/S101-IT01/230822-2107
-       external04/test/files/S101-IT01/230823-0911
-       external04/test/files/S101-IT01/230823-2114
-       external04/test/files/S101-IT01/230824-0917
-       external04/test/files/S101-IT01/230824-2130
-       external04/test/files/S101-IT01/230825-0933
-       external04/test/files/S101-IT01/230825-2136
-       external04/test/files/S101-IT01/230826-0939
-       external04/test/files/S101-IT01/230826-2143
-       external04/test/files/S101-IT01/230827-0947
-       external04/test/files/S101-IT01/230827-2153
-       external04/test/files/S101-IT01/230828-0956
-       external04/test/files/S101-IT01/230828-2159
-       external04/test/files/S101-IT01/230829-1011
-       external04/test/files/S101-IT01/230829-2217
-       external04/test/files/S101-IT01/230830-1020
-       external04/test/files/S101-IT01/230830-2224

What's wrong here?

@tuffnatty
Copy link
Contributor Author

@knuuuut You're right. This PR got reverted in 883984f and never merged again. I'm going to rebase it against HEAD and submit again, it's been working great for me.

@knuuuut
Copy link

knuuuut commented Sep 1, 2023

@tuffnatty Thank you!
I'm looking forward when it's available via pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants