Skip to content
Brian D. Burns edited this page Nov 2, 2013 · 22 revisions

Syncers

Currently supported syncers:

  • RSync
  • Amazon S3
  • Rackspace Cloud Files

Storages vs Syncers

Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:

  • Creating archives (optionally compressed).
  • Creating database backups (optionally compressed).
  • The packaging (tar'ing) of these archives/databases.
  • The optional encrypting of this final backup package.
  • The storing of this backup package

The last step is what Storages do. Syncers are not part of this procedure, and are run after the above procedure has completed.

A Syncer is used to keep source directories and their contents synchronized with a destination directory. Only the contents of the source directories that have changed are transferred to the destination. The source and destination locations may be on the same physical machine or remote, depending on the Syncer and it's configuration.

Note

When a Syncer is added to a backup model, it does become a part of the entire backup process for the model. Therefore, if the backup procedure above completes, but a Syncer should fail, then that backup model will be considered as having failed and you will receive an appropriate Notification.

If you wish to more fully separate this backup procedure from the processing of your Syncer(s), you can simply setup additional models that only perform your Syncer(s). These can still be run after your backup model has completed by simply performing multiple triggers.
e.g. backup perform --trigger my_backup,my_syncer
Note that in doing so, you will now receive notifications from each - but notifications for each may also be configured differently. Also, should the first trigger/model fail, the second trigger/model will still be performed - as long as the failure isn't due to a fatal error that causes Backup to exit.

See Performing Backups and Technical Overview for more info.

RSync

The RSync Syncer supports 3 types of operations:

  • RSync::Push -- Used to sync folders on the local system to a folder on a remote host.

  • RSync::Pull -- Used to sync folders from a remote host to a folder on the local system.

  • RSync::Local -- Used to sync folders on the local system to another local folder.

Additionally, RSync::Push and RSync::Pull support 3 different modes of operation:

  • :ssh (default) -- Connects to the remote host via SSH and does not require the use of an rsync daemon.

  • :ssh_daemon -- Connects via SSH, then spawns a single-use rsync daemon to allow certain daemon features to be used.

  • :rsync_daemon -- Connects directly to an rsync daemon on the remote host via TCP.

Note that :ssh and :ssh_daemon modes transfer data over an encrypted connection. :rsync_daemon does not.

RSync::Push / RSync::Pull Configuration

The configuration of RSync::Push and RSync::Pull are identical. Only the direction of the transfer differs. The following shows all the configuration options, along with an explanation of use based on the mode of operation.

Model.new(:my_backup, 'My Backup') do
  sync_with RSync::Push do |rsync| # or: sync_with RSync::Pull do |rsync|
    ##
    # :ssh is the default mode if not specified.
    rsync.mode = :ssh # or :ssh_daemon or :rsync_daemon
    ##
    # May be a hostname or IP address
    rsync.host = "123.45.678.90"
    ##
    # When using :ssh or :ssh_daemon mode, this will be the SSH port (default: 22).
    # When using :rsync_daemon mode, this is the rsync:// port (default: 873).
    rsync.port = 22
    ##
    # When using :ssh or :ssh_daemon mode, this is the remote user name used to connect via SSH.
    # This only needs to be specified if different than the user running Backup.
    #
    # The SSH user must have a passphrase-less SSH key setup to authenticate to the remote host.
    # If this is not desirable, you can provide the path to a specific SSH key for this purpose
    # using SSH's -i option in #additional_ssh_options
    rsync.ssh_user = "ssh_username"
    ##
    # If you need to pass additional options to the SSH command, specify them here.
    # Options may be given as a String (as shown) or an Array (see additional_rsync_options).
    # These will be added to the rsync command like so:
    #   rsync -a -e "ssh -p 22 <additional_ssh_options>" ...
    rsync.additional_ssh_options = "-i '/path/to/id_rsa'"
    ##
    # When using :ssh_daemon or :rsync_daemon mode, this is the user used to authenticate to the rsync daemon.
    # This only needs to be specified if different than the user running Backup.
    rsync.rsync_user = "rsync_username"
    ##
    # When using :ssh_daemon or :rsync_daemon mode, if a password is needed to authenticate to the rsync daemon,
    # it may be supplied here. Backup will write this password to a temporary file, then use it with rsync's
    # --password-file option.
    rsync.rsync_password = "my_password"
    # If you prefer to supply the path to your own password file for this option, use:
    rsync.rsync_password_file = "/path/to/password_file"
    ##
    # If you need to pass additional options to the rsync command, specify them here.
    # Options may be given as an Array (as shown) or as a String (see additional_ssh_options).
    rsync.additional_rsync_options = ['--sparse', "--exclude='some_pattern'"]
    ##
    # When set to `true` this adds rsync's --delete option, which causes rsync to remove paths
    # from the destination (rsync.path) that no longer exist in the sources (rsync.directories).
    rsync.mirror   = true
    ##
    # When set to `true`, rsync will compress the data being transerred.
    # Note that this only reduces the amount of data sent.
    # It does not result in compressed files on the destination.
    rsync.compress = true

    ##
    # Configures the directories to be sync'd to the rsync.path.
    #
    # For RSync::Push, these are local paths.
    # Relative paths will be relative to the directory where Backup is being run.
    # These paths are expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
    #
    # For RSync::Pull, these are paths on the remote.
    # Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
    # in upon logging in via SSH.
    #
    # Note that while rsync supports the use of trailing `/` on source directories to transfer a directory's
    # "contents" and not create the directory itself at the destination, Backup does not.
    # Trailing `/` will be ignored, and any directory added here will be created at the rsync.path destination.
    rsync.directories do |directory|
      directory.add "/var/apps/my_app/public/uploads"
      directory.add "/var/apps/my_app/logs"

      # Exclude files/folders.
      # Each pattern will be passed to rsync's `--exclude` option.
      #
      # Note: rsync is run using the `--archive` option,
      #       so be sure to read the `FILTER RULES` in `man rsync`.
      directory.exclude '*~'
      directory.exclude 'tmp/'
    end

    ##
    # The "destination" path to sync the directories to.
    #
    # For RSync::Push, this will be a path on the remote.
    # Relative paths (or paths that start with '~/') will be relative to the directory the `ssh_user` is placed
    # in upon logging in via SSH.
    #
    # For RSync::Pull, this will be a local path.
    # Relative paths will be relative to the directory where Backup is being run.
    # This path is expanded, so '~/this/path' will expand to the $HOME directory of the user running Backup.
    rsync.path = "backups"
  end
end

RSync::Local Configuration

sync_with RSync::Local do |rsync|
  rsync.path     = "~/backups/"
  rsync.mirror   = true

  rsync.directories do |directory|
    directory.add "/var/apps/my_app/public/uploads"
    directory.add "/var/apps/my_app/logs"

    # Exclude files/folders.
    # Each pattern will be passed to rsync's `--exclude` option.
    # rsync is run using the `--archive` option, so be sure to read the `FILTER RULES` in `man rsync`.
    directory.exclude '*~'
    directory.exclude 'tmp/'
  end
end

With RSync::Local, all operations are local to the machine, where rsync acts as a smart file copy mechanism.

Both path and all paths added to directories will be expanded locally. Relative paths will be relative to the working directory where backup is running. Paths beginning with ~/ will be expanded to the $HOME directory of the user running Backup.

Note that while rsync supports the use of trailing / on source directories to transfer a directory's "contents" and not create the directory itself at the destination, Backup does not. Trailing / will be ignored, and any directory added to rsync.directories will be created at the rsync.path destination.

Cloud Syncers

Supported Cloud Services

  • Amazon S3
  • Rackspace Cloud Files

Unlike the RSync Syncer, which has the ability to transfer only parts of individual files, Cloud Syncers check the MD5 checksum of the local file, then transfers the entire file if the checksum on the remote does not match.

Mirroring

When a Cloud Syncer's mirror option is set to true, Backup will remove all files from the remote that do not exist locally. File removal is performed after all updated files have been transferred, and performed using bulk delete requests to minimize the number of requests made to the remote.

Concurrency

Cloud Syncers may perform several concurrent file transfers by setting the Syncer's thread_count. This allows for greater performance, especially when transferring many small files where more time is spent negotiating with the server than actually transferring data.

Error Handling

Each file transfer will be retried if an error occurs. By default, each failed transfer will be retried 10 times, pausing 30 seconds before each retry. These defaults may be changed using:

syncer.max_retries = 10
syncer.retry_waitsec = 30

When an error occurs that causes Backup to retry the request, the error will be logged. Note that these messages will be logged as informational messages, so they will not generate warnings. If max_retries is exceeded, then an error will be raised and the Syncer will fail.

If mirror is enabled, the file deletion requests will be retried as well. However, if max_retries is exceeded for this operation, it will be logged as a warning.

Data Integrity

All files are uploaded along with a MD5 checksum the server uses to verify the data received. If the integrity check fails, the error will be handled as stated above and the file will be retransmitted.

Amazon S3

sync_with Cloud::S3 do |s3|
  # AWS Credentials
  s3.access_key_id     = "my_access_key_id"
  s3.secret_access_key = "my_secret_access_key"
  # Or, to use a IAM Profile:
  # s3.use_iam_profile = true

  s3.bucket            = "my-bucket"
  s3.region            = "us-east-1"
  s3.path              = "/backups"
  s3.mirror            = true
  s3.thread_count      = 10

  s3.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"

    # Exclude files/folders.
    # The pattern may be a shell glob pattern (see `File.fnmatch`) or a Regexp.
    # All patterns will be applied when traversing each added directory.
    directory.exclude '**/*~'
    directory.exclude /\/tmp$/
  end
end

AWS Regions

  • us-east-1 - US Standard (Default)
  • us-west-2 - US West (Oregon)
  • us-west-1 - US West (Northern California)
  • eu-west-1 - EU (Ireland)
  • ap-southeast-1 - Asia Pacific (Singapore)
  • ap-southeast-2 - Asia Pacific (Sydney)
  • ap-northeast-1 - Asia Pacific (Tokyo)
  • sa-east-1 - South America (Sao Paulo)

File Size Limit

The maximum file size that can be transferred is 5 GiB. If a file is encountered that exceeds this limit, it will be skipped and a warning will be logged. Unlike the S3 Storage, Multipart Uploading is not used. Keep in mind that if a failure occurs and a retry is attempted, the entire file is re-transmitted.

Server-Side Encryption

You may configure your AWS S3 stored files to use Server-Side Encryption by adding the following:

sync_with Cloud::S3 do |s3|
  s3.encryption = :aes256
end

Reduced Redundancy Storage

You may configure your AWS S3 stored files to use [Reduced Redundancy Storage][] by adding the following:

sync_with Cloud::S3 do |s3|
  s3.storage_class = :reduced_redundancy
end

Fog Options

If you need to pass additional options for fog, you can specify those using fog_options.

sync_with Cloud::S3 do |s3|
  s3.fog_options = {
    :path_style => true,
    :persistent => true
    :connection_options => { :tcp_nodelay => true } # these are Excon options
  }
end

These options will be merged into those used to establish the connection via fog.
e.g. Fog::Storage.new({ :provider => 'AWS'}.merge(fog_options))

Rackspace Cloud Files

sync_with Cloud::CloudFiles do |cf|
  cf.username          = "my_username"
  cf.api_key           = "my_api_key"
  cf.container         = "my_container"
  cf.path              = "/backups"
  cf.mirror            = true
  cf.thread_count      = 10

  cf.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"

    # Exclude files/folders from the sync.
    # The pattern may be a shell glob pattern (see `File.fnmatch`) or a Regexp.
    # All patterns will be applied when traversing each added directory.
    directory.exclude '**/*~'
    directory.exclude /\/tmp$/
  end
end

File Size Limit

The maximum file size that can be transferred is 5 GiB. If a file is encountered that exceeds this limit, it will be skipped and a warning will be logged. Unlike the CloudFiles Storage, SLO support is not available. Keep in mind that if a failure occurs and a retry is attempted, the entire file is re-transmitted.

Endpoints and Regions

By default, the US endpoint identity.api.rackspacecloud.com/v2.0 will be used. If you need to use another endpoint, specify the auth_url:

sync_with Cloud::CloudFiles do |cf|
  cf.auth_url = 'lon.identity.api.rackspacecloud.com/v2.0'
end

The default region is :dfw (Dallas). You may specify another region using:

sync_with Cloud::CloudFiles do |cf|
  cf.region = :ord # Chicago
end

If Backup is running on a Rackspace Cloud Server in the same data center as your Cloud Files server, you can enable the use of Rackspace's ServiceNet to avoid bandwidth charges by setting:

sync_with Cloud::CloudFiles do |cf|
  cf.servicenet = true
end

Fog Options

If you need to pass additional options for fog, you can specify those using fog_options.

sync_with Cloud::CloudFiles do |cf|
  cf.fog_options = {
    :persistent => true
    :connection_options => { :tcp_nodelay => true } # these are Excon options
  }
end

These options will be merged into those used to establish the connection via fog.
e.g. Fog::Storage.new({ :provider => 'Rackspace'}.merge(fog_options))

Syncer ID

When you add a Syncer to your Backup Model, you may optionally add a unique identifier.

sync_with RSync::Push, 'Syncer #1' do |rsync|
  # etc...
end

This syncer_id will appear in the log messages when the Syncer starts and finishes:

Syncer::RSync::Push (Syncer #1) Started...
...etc...
Syncer::RSync::Push (Syncer #1) Finished!

This is not particularly important for Syncers, and is currently only used for the log messages. It's more of an effort to maintain consistency, where all components that may be added multiple times to a single Backup Model can use this to uniquely identify themselves. For instance, with Storages this is required to keep Cycling data separate, and Databases use this to keep their backup dumps separate.

Clone this wiki locally