Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specify source dataset(s) instead of property name #113

Open
digitalsignalperson opened this issue Jan 28, 2022 · 8 comments
Open

specify source dataset(s) instead of property name #113

digitalsignalperson opened this issue Jan 28, 2022 · 8 comments
Milestone

Comments

@digitalsignalperson
Copy link
Contributor

digitalsignalperson commented Jan 28, 2022

I'm currently wondering about the design requiring setting of a autobackup:$name property to select source_dataset

What are the advantages compared to just providing e.g.

  • the name of the source pool/dataset (or list of many, or path to file containing the lists)
  • optional recursive option
  • optional exclude list

or any insight in to the design choice would be curious to hear.

Cons of using property to manage the config:

  • you need to modify the filesystem before you can use the tool, can't just quickly sync from point A to point B
  • more complicated configuration management: Half the configuration lives in arguments, half lives in zfs properties
  • need to filter (remove) property on destination? (e.g. replicate to another pool on same host; causes confusion two pools/dataset copies with same autobackup:$name property); possible relation to exclude_received?

The code seems like it would be clean to change without any issues (don't see other use of 'property_name'), changing
source_datasets = source_node.selected_datasets(property_name=property_name, ...
to source_datasets = a list as parsed from commandline argument

possibly related to #41 (rsync for zfs?? zfsync src_pool/data dst_pool/data)

Curious to hear your thoughts, cheers!

Edit: wasn't thinking about snapshots, holds which use self.args.backup_name; that could still be an argument for those naming purposes. Or in my case I'd use --no-snapshot --no-holds

@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

I agree, i'm creating a seperate zfs-rsync issue for this.

On what snapshots should it operate? Just the latest common? And if you run it again and there are newer snapshots, should it send increments to the other side as well?

@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

please have a look at #114 and comment overthere

@Scrin
Copy link

Scrin commented Jan 28, 2022

There are definitely pros and cons in both approaches and it highly depends on the context which is better. As for insight on the original design choice, I'm not sure on that one, but for me the ability to define what to backup on the "source system" rather than the "backupper" was the primary reason why I switched my primary backup solution to zfs_autobackup.

In my primary infrastructure design I have a bunch of servers which all contain both "critical" and "non-critical" data (critical being things like databases, non-critical being things like configurations, or data that can be trivially recreated on demand), and these all depend on the services running on each server.

What the zfs_autobackup design allows me to do is to simplify my infrastructure setup and configuration regarding the backups; the setup scripts (ansible mainly) for the services set up the necessary zfs datasets needed by the service, sets their properties (such as tagging the critical datasets for backup) and obviously sets up the services themselves.

This way when a new service is created in my infrastructure setup or an existing one added to a new server, everything that needs to be done can be done only on that server, the backuppers that backs up all the servers don't need to have knowledge "what datasets are important to backup". The only "knowledge" my backuppers need is "which servers and where to backup, and what are the zpool names", thus "config" changes need to be done only to the server that has the data related to the change, be it adding a completely new service or setting up a new instance of a service.

@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

@Scrin good point, i should reiterate that more clearly in the documentation. zfs-autobackup makes it so that other tools/admins can select datasets on the sourcesystem, without needed to access the backup server ad all.

@digitalsignalperson
Copy link
Contributor Author

Thanks for sharing, I can see how the property is useful depending on the scenario. To not make a breaking change, that could stay the default behavior, but have a new optional argument to instead supply a list of sources (or txt or yaml with list of sources)

@psy0rz
Copy link
Owner

psy0rz commented Jan 28, 2022

thats true, i could add a --select=... --select-child=... and --select-single=... (non recursive) perhaps

rest of the syntax stays the same and you wont need to set properties. (but still can, and you could use both if you want)

@digitalsignalperson
Copy link
Contributor Author

digitalsignalperson commented Nov 6, 2023

Any thoughts on this for a PR?
digitalsignalperson/zfs_autobackup@d0b58b9...digitalsignalperson:zfs_autobackup:v3.1.2-hacks

example usage:

zfs-autobackup -v \
    --no-holds \
    --no-thinning \
    --no-snapshot \
    --other-snapshots \
    --min-change 1 \
    --strip-path=1 \
    --clear-mountpoint \
    backupname-does-nothing-here \
    rpool/test-destination \
    rpool/recursive-source-dataset/\* \
    rpool/some-source-dataset \
    rpool/some-other-source-dataset

I went with ignoring trying to select datasets with the BACKUP-NAME property if source paths are specified, but that could still be an option. The BACKUP-NAME param is still used for snapshots and thinning in general, except in this example with --no-snapshot and --no-thinning.

To use as a snapshot tool without specifying a TARGET-PATH, it's a little weird with the order of args. I allowed for "/None" to be used as a target path to solve this, but maybe there's a more sensible way to order the args or add other options.

@psy0rz
Copy link
Owner

psy0rz commented Nov 16, 2023

Hmm i'm not sure if i already responded to this somewhere?

I think this solution is too hackish, i would rather see --select-... options for this.

zfs-autobackup -v \
    --no-holds \
    --no-thinning \
    --no-snapshot \
    --other-snapshots \
    --min-change 1 \
    --strip-path=1 \
    --clear-mountpoint \
    --select-recursive=rpool/recursive-source-dataset \
    --select=rpool/some-source-dataset \
    --select=rpool/some-other-source-dataset \
    backupname-does-nothing-here \
    rpool/test-destination 

Have select behave consistent with https://github.com/psy0rz/zfs_autobackup/wiki/Manual#dataset-property

e.g. something like --select, --select-recursive, --select-exclude, --select-child

And perhaps ignore the autobackup property when --select is used or something.

Edwin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants