Skip to content
Brian D. Burns edited this page Mar 27, 2013 · 22 revisions

Syncers

Currently supported syncers:

  • RSync
  • Amazon S3
  • Rackspace Cloud Files

Storages vs Syncers

Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:

  • Creating archives (optionally compressed).
  • Creating database backups (optionally compressed).
  • The packaging (tar'ing) of these archives/databases.
  • The optional encrypting of this final backup package.
  • The storing of this backup package

The last step is what Storages do. Syncers are not part of this procedure, and are run after the above procedure has completed.

A Syncer is used to keep source directories and their contents synchronized with a destination directory. Only the contents of the source directories that have changed are transferred to the destination. The source and destination locations may be on the same physical machine or remote, depending on the Syncer and it's configuration.

Note

When a Syncer is added to a backup model, it does become a part of the entire backup process for the model. Therefore, if the backup procedure above completes, but a Syncer should fail, then that backup model will be considered as having failed and you will receive an appropriate Notification.

If you wish to more fully separate this backup procedure from the processing of your Syncer(s), you can simply setup additional models that only perform your Syncer(s). These can still be run after your backup model has completed by simply performing multiple triggers.
e.g. backup perform --trigger my_backup,my_syncer
Note that in doing so, you will now receive notifications from each - but notifications for each may also be configured differently. Also, should the first trigger/model fail, the second trigger/model will still be performed - as long as the failure isn't due to a fatal error that causes Backup to exit.

See Performing Backups and Technical Overview for more info.

RSync

The RSync Syncer supports 3 types of operations:

  • RSync::Push -- Used to sync folders on the local system to a folder on a remote host.

  • RSync::Pull -- Used to sync folders from a remote host to a folder on the local system.

  • RSync::Local -- Used to sync folders on the local system to another local folder.

Additionally, RSync::Push and RSync::Pull support 3 different modes of operation:

  • :ssh (default) -- Connects to the remote host via SSH and does not require the use of an rsync daemon.

  • :ssh_daemon -- Connects via SSH, then spawns a single-use rsync daemon to allow certain daemon features to be used.

  • :rsync_daemon -- Connects directly to an rsync daemon on the remote host via TCP.

Note that :ssh and :ssh_daemon modes transfer data over an encrypted connection. :rsync_daemon does not.

RSync::Push / RSync::Pull Configuration


The configuration of RSync::Push and RSync::Pull are identical. Only the direction of the transfer differs. The following shows all the configuration options, along with an explanation of use based on the mode of operation.

Backup::Model.new(:my_backup, 'My Backup') do
  sync_with RSync::Push do |rsync| # or: sync_with RSync::Pull do |rsync|
    ##
    # :ssh is the default mode if not specified.
    rsync.mode = :ssh # or :ssh_daemon or :rsync_daemon
    ##
    # May be a hostname or IP address
    rsync.host = "123.45.678.90"
    ##
    # When using :ssh or :ssh_daemon mode, this will be the SSH port (default: 22).
    # When using :rsync_daemon mode, this is the rsync:// port (default: 873).
    rsync.port = 22
    ##
    # When using :ssh or :ssh_daemon mode, this is the remote user name used to connect via SSH.
    # This only needs to be specified if different than the user running Backup.
    #
    # The SSH user must have a passphrase-less SSH key setup to authenticate to the remote host.
    # If this is not desirable, you can provide the path to a specific SSH key for this purpose
    # using SSH's -i option in #additional_ssh_options
    rsync.ssh_user = "ssh_username"
    ##
    # If you need to pass additional options to the SSH command, specify them here.
    # These will be added to the rsync command like so:
    #   rsync -a -e "ssh -p 22 <additional_ssh_options>" ...
    rsync.additional_ssh_options = "-i '/path/to/id_rsa'"
    ##
    # When using :ssh_daemon or :rsync_daemon mode, this is the user used to authenticate to the rsync daemon.
    # This only needs to be specified if different than the user running Backup.
    rsync.rsync_user = "rsync_username"
    ##
    # When using :ssh_daemon or :rsync_daemon mode, if a password is needed to authenticate to the rsync daemon,
    # it may be supplied here. Backup will write this password to a temporary file, then use it with rsync's
    # --password-file option.
    rsync.rsync_password = "my_password"
    # If you prefer to supply the path to your own password file for this option, use:
    rsync.rsync_password_file = "/path/to/password_file"
    ##
    # If you need to pass additional options to the rsync command, specify them here.
    rsync.additional_rsync_options = "--sparse --exclude='some_pattern'"
    ##
    # When set to `true` this adds rsync's --delete option, which causes rsync to remove paths
    # from the destination (rsync.path) that no longer exist in the sources (rsync.directories).
    rsync.mirror   = true
    ##
    # When set to `true`, rsync will compress the data being transerred.
    # Note that this only reduces the amount of data sent.
    # It does not result in compressed files on the destination.
    rsync.compress = true

    ##
    # Configures the directories to be sync'd to the rsync.path.
    #
    # For RSync::Push, these are local paths.
    # Relative paths will be relative to the directory where Backup is being run.
    # These paths are expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
    #
    # For RSync::Pull, these are paths on the remote.
    # Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
    # in upon logging in via SSH.
    #
    # Note that while rsync supports the use of trailing `/` on source directories to transfer a directory's
    # "contents" and not create the directory itself at the destination, Backup does not.
    # Trailing `/` will be ignored, and any directory added here will be created at the rsync.path destination.
    rsync.directories do |directory|
      directory.add "/var/apps/my_app/public/uploads"
      directory.add "/var/apps/my_app/logs"
    end

    ##
    # The "destination" path to sync the directories to.
    #
    # For RSync::Push, this will be a path on the remote.
    # Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
    # in upon logging in via SSH.
    #
    # For RSync::Pull, this will be a local path.
    # Relative paths will be relative to the directory where Backup is being run.
    # This path is expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
    rsync.path = "backups"
  end
end

RSync::Local Configuration


sync_with RSync::Local do |rsync|
  rsync.path     = "~/backups/"
  rsync.mirror   = true

  rsync.directories do |directory|
    directory.add "/var/apps/my_app/public/uploads"
    directory.add "/var/apps/my_app/logs"
  end
end

With RSync::Local, all operations are local to the machine, where rsync acts as a smart file copy mechanism.

Both path and all paths added to directories will be expanded locally. Relative paths will be relative to the working directory where backup is running. Paths beginning with ~/ will be expanded to the $HOME directory of the user running Backup.

Note that while rsync supports the use of trailing / on source directories to transfer a directory's "contents" and not create the directory itself at the destination, Backup does not. Trailing / will be ignored, and any directory added to rsync.directories will be created at the rsync.path destination.

Amazon S3

sync_with Cloud::S3 do |s3|
  s3.access_key_id     = "my_access_key_id"
  s3.secret_access_key = "my_secret_access_key"
  s3.bucket            = "my-bucket"
  s3.region            = "us-east-1"
  s3.path              = "/backups"
  s3.mirror            = true
  s3.concurrency_type  = :threads
  s3.concurrency_level = 50

  s3.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"
  end
end

Additional Notes

AS OF BACKUP 3.0.22 WE NO LONGER USE s3sync NOR aproxacs-s3sync CLI UTILITY GEMS. BACKUP NOW HAS IT'S OWN HAND-ROLLED, HIGH PERFORMANCE AND MORE STABLE SYNCING SOLUTION WHICH USES THE fog AND parallel GEMS. THESE GEMS ARE BUNDLED IN BACKUP'S DEPENDENCY MANAGER SO IT "JUST WORKS".

Available regions:

  • ap-northeast-1
  • ap-southeast-1
  • eu-west-1
  • us-east-1
  • us-west-1

NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Amazon S3 does not support this and will have to transfer a whole file in order to update it. However, Amazon S3 does provide "checksums". This allows Backup to check if a file you synced up to Amazon S3 has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).

When enabling mirroring (s3.mirror = true) Backup will keep an exact one-on-one mirror from your filesystem on Amazon S3. This means that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Amazon S3. If you remove a file from one of these directories, it will also remove it from Amazon S3.

The directory.add method allows you to add any directories you want to sync from your filesystem to your Amazon S3 account.

You can use the concurrency_type and concurrency_level to increase concurrency on file transfers. The recommended concurrency_type is :threads. Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory and you spin up a bunch of processes to handle it. You may want to change the concurrency_level (higher or lower, feel free to experiment), but we recommend using :threads and not :processes when doing so.

For each concurrency_level you can handle one additional file concurrently. This means that if you set concurrency_level = 50, it means Backup will perform 50 operations at a time. Operations include:

  • Transfering a single file
  • Removing a single file

So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!

Rackspace Cloud Files

sync_with Cloud::CloudFiles do |cf|
  cf.username          = "my_username"
  cf.api_key           = "my_api_key"
  cf.container         = "my_container"
  cf.auth_url          = "https://auth.api.rackspacecloud.com"
  cf.servicenet        = false
  cf.path              = "/backups"
  cf.mirror            = true
  cf.concurrency_type  = :threads
  cf.concurrency_level = 50

  cf.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"
  end
end

Additional Notes

Available Auth URLs

NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Rackspace Cloud Files does not support this and will have to transfer a whole file in order to update it. However, Rackspace Cloud Files does provide "checksums". This allows Backup to check if a file you synced up to Rackspace Cloud Files has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).

Set servicenet = true if Backup runs on a Rackspace server. It will avoid transfer charges and it's more performant.

When enabling mirroring (cf.mirror = true) Backup will keep an exact one-on-one mirror from your filesystem on Rackspace Cloud Files. This means that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Rackspace Cloud Files. If you remove a file from one of these directories, it will also remove it from Rackspace Cloud Files.

The directory.add method allows you to add any directories you want to sync from your filesystem to your Rackspace Cloud Files account.

You can use the concurrency_type and concurrency_level to increase concurrency on file transfers. The recommended concurrency_type is :threads. Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory and you spin up a bunch of processes to handle it. You may want to change the concurrency_level (higher or lower, feel free to experiment), but we recommend using :threads and not :processes when doing so.

For each concurrency_level you can handle one additional file concurrently. This means that if you set concurrency_level = 50, it means Backup will perform 50 operations at a time. Operations include:

  • Transferring a single file
  • Removing a single file

So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!

Default Configuration for the RSync Syncer

If you want to create multiple RSync syncer objects then you might want to configure some default settings to reduce redundancy. For example, if you always want to enable mirroring and compression for RSync syncers, and you always want to use the same server host, port and ssh_user, then you can do the following:

Backup::Syncer::RSync::Push.defaults do |rsync|
  rsync.host     = "123.45.678.90"
  rsync.port     = 22
  rsync.ssh_user = "my_username"
  rsync.mirror   = true
  rsync.compress = true
end

With this in place, whenever you want to use the above default configuration you can just omit it from your backup models, like so:

sync_with RSync::Push do |rsync|
  rsync.path = "~/backups/for_my_other_app"

  rsync.directories do |directory|
    directory.add "/var/apps/my_other_app/public/uploads"
    directory.add "/var/apps/my_other_app/logs"
  end
end

Since we didn't specify the host, port, ssh_user, mirror and compress options, it'll default to the values we specified in the configuration block.

To set default configuration for S3 or CloudFiles, use Backup::Syncer::Cloud::S3.defaults and Backup::Syncer::Cloud::CloudFiles.defaults.

Clone this wiki locally