-
Notifications
You must be signed in to change notification settings - Fork 0
Syncers
Currently supported syncers:
- RSync
- Amazon S3
- Rackspace Cloud Files
Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:
- Creating archives (optionally compressed).
- Creating database backups (optionally compressed).
- The packaging (tar'ing) of these archives/databases.
- The optional encrypting of this final backup package.
- The storing of this backup package
The last step is what Storages do. Syncers are not part of this procedure, and are run after the above procedure has completed.
A Syncer is used to keep source directories and their contents synchronized with a destination directory. Only the contents of the source directories that have changed are transferred to the destination. The source and destination locations may be on the same physical machine or remote, depending on the Syncer and it's configuration.
When a Syncer is added to a backup model, it does become a part of the entire backup process for the model. Therefore, if the backup procedure above completes, but a Syncer should fail, then that backup model will be considered as having failed and you will receive an appropriate Notification.
If you wish to more fully separate this backup procedure from the processing of your Syncer(s), you can simply setup
additional models that only perform your Syncer(s). These can still be run after your backup model has completed
by simply performing multiple triggers.
e.g. backup perform --trigger my_backup,my_syncer
Note that in doing so, you will now receive notifications from each - but notifications for each may also be configured
differently. Also, should the first trigger/model fail, the second trigger/model will still be performed - as long as
the failure isn't due to a fatal error that causes Backup to exit.
See Performing Backups and Technical Overview for more info.
The RSync Syncer supports 3 types of operations:
-
RSync::Push -- Used to sync folders on the local system to a folder on a remote host.
-
RSync::Pull -- Used to sync folders from a remote host to a folder on the local system.
-
RSync::Local -- Used to sync folders on the local system to another local folder.
Additionally, RSync::Push
and RSync::Pull
support 3 different modes of operation:
-
:ssh (default) -- Connects to the remote host via SSH and does not require the use of an rsync daemon.
-
:ssh_daemon -- Connects via SSH, then spawns a single-use rsync daemon to allow certain daemon features to be used.
-
:rsync_daemon -- Connects directly to an rsync daemon on the remote host via TCP.
Note that :ssh
and :ssh_daemon
modes transfer data over an encrypted connection. :rsync_daemon
does not.
The configuration of RSync::Push
and RSync::Pull
are identical. Only the direction of the transfer differs. The
following shows all the configuration options, along with an explanation of use based on the mode
of operation.
Backup::Model.new(:my_backup, 'My Backup') do
sync_with RSync::Push do |rsync| # or: sync_with RSync::Pull do |rsync|
##
# :ssh is the default mode if not specified.
rsync.mode = :ssh # or :ssh_daemon or :rsync_daemon
##
# May be a hostname or IP address
rsync.host = "123.45.678.90"
##
# When using :ssh or :ssh_daemon mode, this will be the SSH port (default: 22).
# When using :rsync_daemon mode, this is the rsync:// port (default: 873).
rsync.port = 22
##
# When using :ssh or :ssh_daemon mode, this is the remote user name used to connect via SSH.
# This only needs to be specified if different than the user running Backup.
#
# The SSH user must have a passphrase-less SSH key setup to authenticate to the remote host.
# If this is not desirable, you can provide the path to a specific SSH key for this purpose
# using SSH's -i option in #additional_ssh_options
rsync.ssh_user = "ssh_username"
##
# If you need to pass additional options to the SSH command, specify them here.
# These will be added to the rsync command like so:
# rsync -a -e "ssh -p 22 <additional_ssh_options>" ...
rsync.additional_ssh_options = "-i '/path/to/id_rsa'"
##
# When using :ssh_daemon or :rsync_daemon mode, this is the user used to authenticate to the rsync daemon.
# This only needs to be specified if different than the user running Backup.
rsync.rsync_user = "rsync_username"
##
# When using :ssh_daemon or :rsync_daemon mode, if a password is needed to authenticate to the rsync daemon,
# it may be supplied here. Backup will write this password to a temporary file, then use it with rsync's
# --password-file option.
rsync.rsync_password = "my_password"
# If you prefer to supply the path to your own password file for this option, use:
rsync.rsync_password_file = "/path/to/password_file"
##
# If you need to pass additional options to the rsync command, specify them here.
rsync.additional_rsync_options = "--sparse --exclude='some_pattern'"
##
# When set to `true` this adds rsync's --delete option, which causes rsync to remove paths
# from the destination (rsync.path) that no longer exist in the sources (rsync.directories).
rsync.mirror = true
##
# When set to `true`, rsync will compress the data being transerred.
# Note that this only reduces the amount of data sent.
# It does not result in compressed files on the destination.
rsync.compress = true
##
# Configures the directories to be sync'd to the rsync.path.
#
# For RSync::Push, these are local paths.
# Relative paths will be relative to the directory where Backup is being run.
# These paths are expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
#
# For RSync::Pull, these are paths on the remote.
# Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
# in upon logging in via SSH.
#
# Note that while rsync supports the use of trailing `/` on source directories to transfer a directory's
# "contents" and not create the directory itself at the destination, Backup does not.
# Trailing `/` will be ignored, and any directory added here will be created at the rsync.path destination.
rsync.directories do |directory|
directory.add "/var/apps/my_app/public/uploads"
directory.add "/var/apps/my_app/logs"
end
##
# The "destination" path to sync the directories to.
#
# For RSync::Push, this will be a path on the remote.
# Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
# in upon logging in via SSH.
#
# For RSync::Pull, this will be a local path.
# Relative paths will be relative to the directory where Backup is being run.
# This path is expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
rsync.path = "backups"
end
end
sync_with RSync::Local do |rsync|
rsync.path = "~/backups/"
rsync.mirror = true
rsync.directories do |directory|
directory.add "/var/apps/my_app/public/uploads"
directory.add "/var/apps/my_app/logs"
end
end
With RSync::Local
, all operations are local to the machine, where rsync
acts as a smart file copy mechanism.
Both path
and all paths added to directories
will be expanded locally. Relative paths will be relative to the
working directory where backup is running. Paths beginning with ~/
will be expanded to the $HOME
directory of the
user running Backup.
Note that while rsync
supports the use of trailing /
on source directories to transfer a directory's
"contents" and not create the directory itself at the destination, Backup does not.
Trailing /
will be ignored, and any directory added to rsync.directories
will be created at the rsync.path
destination.
sync_with Cloud::S3 do |s3|
s3.access_key_id = "my_access_key_id"
s3.secret_access_key = "my_secret_access_key"
s3.bucket = "my-bucket"
s3.region = "us-east-1"
s3.path = "/backups"
s3.mirror = true
s3.concurrency_type = :threads
s3.concurrency_level = 50
s3.directories do |directory|
directory.add "/path/to/directory/to/sync"
directory.add "/path/to/other/directory/to/sync"
end
end
Additional Notes
AS OF BACKUP 3.0.22 WE NO LONGER USE s3sync
NOR aproxacs-s3sync
CLI UTILITY GEMS.
BACKUP NOW HAS IT'S OWN HAND-ROLLED, HIGH PERFORMANCE AND MORE STABLE SYNCING SOLUTION WHICH USES THE fog
AND parallel
GEMS.
THESE GEMS ARE BUNDLED IN BACKUP'S DEPENDENCY MANAGER SO IT "JUST WORKS".
Available regions:
ap-northeast-1
ap-southeast-1
eu-west-1
us-east-1
us-west-1
NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Amazon S3 does not support this and will have to transfer a whole file in order to update it. However, Amazon S3 does provide "checksums". This allows Backup to check if a file you synced up to Amazon S3 has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).
When enabling mirroring (s3.mirror = true
) Backup will keep an exact one-on-one mirror from your filesystem on Amazon S3. This means
that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Amazon S3. If you remove a file from
one of these directories, it will also remove it from Amazon S3.
The directory.add
method allows you to add any directories you want to sync from your filesystem to your Amazon S3 account.
You can use the concurrency_type
and concurrency_level
to increase concurrency on file transfers. The recommended concurrency_type
is :threads
.
Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to
use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory
and you spin up a bunch of processes to handle it. You may want to change the concurrency_level
(higher or lower, feel free to experiment),
but we recommend using :threads
and not :processes
when doing so.
For each concurrency_level
you can handle one additional file concurrently. This means that if you set concurrency_level = 50
, it means Backup will
perform 50 operations at a time. Operations include:
- Transfering a single file
- Removing a single file
So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!
sync_with Cloud::CloudFiles do |cf|
cf.username = "my_username"
cf.api_key = "my_api_key"
cf.container = "my_container"
cf.auth_url = "https://auth.api.rackspacecloud.com"
cf.servicenet = false
cf.path = "/backups"
cf.mirror = true
cf.concurrency_type = :threads
cf.concurrency_level = 50
cf.directories do |directory|
directory.add "/path/to/directory/to/sync"
directory.add "/path/to/other/directory/to/sync"
end
end
Additional Notes
Available Auth URLs
- https://auth.api.rackspacecloud.com (US - Default)
- https://lon.auth.api.rackspacecloud.com (UK)
NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Rackspace Cloud Files does not support this and will have to transfer a whole file in order to update it. However, Rackspace Cloud Files does provide "checksums". This allows Backup to check if a file you synced up to Rackspace Cloud Files has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).
Set servicenet = true
if Backup runs on a Rackspace server. It will avoid transfer charges and it's more performant.
When enabling mirroring (cf.mirror = true
) Backup will keep an exact one-on-one mirror from your filesystem on Rackspace Cloud Files. This means
that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Rackspace Cloud Files. If you remove a file from
one of these directories, it will also remove it from Rackspace Cloud Files.
The directory.add
method allows you to add any directories you want to sync from your filesystem to your Rackspace Cloud Files account.
You can use the concurrency_type
and concurrency_level
to increase concurrency on file transfers. The recommended concurrency_type
is :threads
.
Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to
use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory
and you spin up a bunch of processes to handle it. You may want to change the concurrency_level
(higher or lower, feel free to experiment),
but we recommend using :threads
and not :processes
when doing so.
For each concurrency_level
you can handle one additional file concurrently. This means that if you set concurrency_level = 50
, it means Backup will
perform 50 operations at a time. Operations include:
- Transferring a single file
- Removing a single file
So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!
If you want to create multiple RSync syncer objects then you might want to configure some default settings to reduce redundancy. For example, if you always want to enable mirroring and compression for RSync syncers, and you always want to use the same server host, port and ssh_user, then you can do the following:
Backup::Syncer::RSync::Push.defaults do |rsync|
rsync.host = "123.45.678.90"
rsync.port = 22
rsync.ssh_user = "my_username"
rsync.mirror = true
rsync.compress = true
end
With this in place, whenever you want to use the above default configuration you can just omit it from your backup models, like so:
sync_with RSync::Push do |rsync|
rsync.path = "~/backups/for_my_other_app"
rsync.directories do |directory|
directory.add "/var/apps/my_other_app/public/uploads"
directory.add "/var/apps/my_other_app/logs"
end
end
Since we didn't specify the host
, port
, ssh_user
, mirror
and compress
options,
it'll default to the values we specified in the configuration block.
To set default configuration for S3 or CloudFiles, use Backup::Syncer::Cloud::S3.defaults
and
Backup::Syncer::Cloud::CloudFiles.defaults
.