Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document support for different backup / sync scenarios #189

Open
geek-merlin opened this issue Jan 12, 2021 · 0 comments
Open

Document support for different backup / sync scenarios #189

geek-merlin opened this issue Jan 12, 2021 · 0 comments

Comments

@geek-merlin
Copy link

To keep this meta-ticket focused, please discuss specifics in their respective child-issues. Thx!

To clarify: Probably most of what is needed here will not be implemented into this package, but hopefully added to docs, maybe as pointers to other packages. Nevertheless we should use this as a central place to coordinate. I'll update provided info / new subtickets here in the IS.

Scenario backup-to-local

I want to backup my local files to a local disk.

An simple example is packaged with rededup.

Next steps (see #181):

  • Make backup a self-contained command that uses env vars from a config file
  • Include auto-prune with configurable patterns (see prune command #176)
  • Add commands to list files per backup, and restore backups partially

Scenario backup-to-cloud

I want to backup my local files directly to cloud storage.

Next steps:

Variant (backup-to-local)-and-sync-to-cloud

I want my local backup mirrored on cloud storage.

For a single backup, this should work well via rclone (but rsync, syncthing, nextcloud-client shoudl do well too).
Garbage collection happens on the remote machine, and deleted chunks are to be deleted in the cloud, too.

Variant (backup-to-cloud|backup-to-local-and-sync-to-cloud)-with-shared-cloud

I want to backup several machines (with probably some identical files) to cloud.

This is safe, as chunks are content-addressed, and as long as backups are namespaced by machine.

GC is a challenge though: The garbage collection process(es) need to do locking in the shared cloud storage, so we must ensure that either there is only one machine that does GC, or ensure that locking is done in cloud storage (which is nontrivial if backups are copied to cloud!)

Todo:

  • Figure out and document how to handle (wrt safe locking) the backup-to-local-and-sync-to-cloud-with-shared-cloud case, or if this is reasonably possible at all.

Feature: -with-shared-cloud-and-syncing

Once we have multiple machines backed up to shared cloud, we can leverage this to sync configurable parts of the machines' file system.

Todo:

  • Ensure that backups store extended attributes (e.g. by leveraging suitable tar option)
  • Implement distributed versioning via vector clock xattributes for all files. (This effectively adds a file version number per machine that is incremented if a changed file is backupped. We can then check, on which machines a file was changed, especially if there are edit conflicts, and act accordingly.)
  • As a first step, add that vector clock logic and a commant "check remote copies" that reports local and remote change status of a file.

Variant (backup|sync)-with-no-delete-permissions

For security, we grant the backup (or sync) process only read-write, but no delete permissions for the cloud storage. (So, even if the backing up machine is compromised, the backups are safe.)
A separate, dedicated machine has credentials with delete permissions, and does pruning and garbage collection.

Todo:

  • Figure out if the backup process can do without delete permissions (what about locking?). => How to do backup to add-only cloud storage #188
  • Figure out and document how separate processes for backup, sync, and GC can work together in a safe way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant