Syncers

Currently supported syncers:

RSync
Amazon S3
Rackspace Cloud Files

Storages vs Syncers

Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:

The copying files/dumping databases to the ~/Backup/.tmp directory
The packaging (tar'ing) of the copied/organized files
The (optionally) compressing of the packaged file
The (optionally) encrypting of the packaged file
The storing of the packaged file

The last step is what storages do, store the final result of a backup file to the specified destination.

Syncers completely bypass this whole procedure. They are meant to instantly transfer directories of data from the production server to the backup server. This is extremely useful if you have lots of gigabytes of data you need to transfer, for example "user-uploaded-content" that over time built up to 50GB worth of images, music, videos or other heavy file formats. With a Syncer you would basically just say: "Keep a mirror of this directory (/var/apps/my_app/public/music/) on my backup server in (/var/backups/my_app/music)". Then every time you run Backup on this Syncer, it won't copy over 50GB of data, then tar it, and then transfer it. This'll transfer the actual user@production:/var/apps/my_app/public/music/ directory to user@backup:/var/backups/my_app/music. This way no additional disk storage on your production box will be used to store the temporary files (copy, tar, compress, encrypt) before the transfer, which can be very CPU intensive, slow, expensive and also cause your application(s) to become slow during this time.

Examples

The following examples should be placed in your Backup configuration file.

Backup::Model.new(:my_backup, 'My Backup') do
  # examples go here...
end

RSync

sync_with RSync do |rsync|
  rsync.ip                 = "123.45.678.90"
  rsync.port               = 22
  rsync.username           = "my_username"
  rsync.password           = "my_password"
  rsync.path               = "~/backups/"
  rsync.mirror             = true
  rsync.compress           = true
  rsync.additional_options = ['--some-option']

  rsync.directories do |directory|
    directory.add "/var/apps/my_app/public/uploads"
    directory.add "/var/apps/my_app/logs"
  end
end

Additional Notes

RSync has the ability to transfer parts of files, rather than the full files when a file updates. For example, say you have a text file of 100KB in size. Now you add another 50 lines of text, increasing the size by 5KB (So now the total is 105KB). Now, the next time Backup gets invoked, it'll see that the file changed, and will only transfer the additional 5KB that got added to the text file, rather than transferring the whole 105KB over again.

The rsync.mirror option, when set to true will tell RSync to keep an exact mirror of the files, of your production box, on the backup server. This means that when files get removed from the /var/apps/my_app/public/uploads directory, it'll also remove these files from the backup server during the next sync. When set to false, it'll ignore removed files and just keep them on the backup server.

The rsync.compress option, when set to true will tell RSync to compress the data that'll be transferred. The compression is only meant for transferring data, this improves transfer speed and lowers bandwidth usage. Turning this option on will resort in more CPU usage for the compression. Once the changes have been transferred, it'll automatically uncompress back to it's original state.

The directory.add method allows you to add the directories you want to sync from the production server to your backup server. When a path ends with a '/' (forward slash) it'll only sync the contents (and sub-directories) of that directory. If the provided path does not end with a '/', it'll create that directory on the backup server (thus, syncing the whole directory, including it's contents and all sub-directories).

Amazon S3

sync_with S3 do |s3|
  s3.access_key_id     = "my_access_key_id"
  s3.secret_access_key = "my_secret_access_key"
  s3.bucket            = "my-bucket"
  s3.region            = "us-east-1"
  s3.path              = "/backups"
  s3.mirror            = true
  s3.concurrency_type  = :threads
  s3.concurrency_level = 50

  s3.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"
  end
end

Additional Notes

AS OF BACKUP 3.0.22 WE NO LONGER USE s3sync NOR aproxacs-s3sync CLI UTILITY GEMS. BACKUP NOW HAS IT'S OWN HAND-ROLLED, HIGH PERFORMANCE AND MORE STABLE SYNCING SOLUTION WHICH USES THE fog AND parallel GEMS. THESE GEMS ARE BUNDLED IN BACKUP'S DEPENDENCY MANAGER SO IT "JUST WORKS".

Available regions:

ap-northeast-1
ap-southeast-1
eu-west-1
us-east-1
us-west-1

NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Amazon S3 does not support this and will have to transfer a whole file in order to update it. However, Amazon S3 does provide "checksums". This allows Backup to check if a file you synced up to Amazon S3 has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).

When enabling mirroring (s3.mirror = true) Backup will keep an exact one-on-one mirror from your filesystem on Amazon S3. This means that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Amazon S3. If you remove a file from one of these directories, it will also remove it from Amazon S3.

The directory.add method allows you to add any directories you want to sync from your filesystem to your Amazon S3 account.

You can use the concurrency_type and concurrency_level to increase concurrency on file transfers. The recommended concurrency_type is :threads. Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory and you spin up a bunch of processes to handle it. You may want to change the concurrency_level (higher or lower, feel free to experiment), but we recommend using :threads and not :processes when doing so.

For each concurrency_level you can handle one additional file concurrently. This means that if you set concurrency_level = 50, it means Backup will perform 50 operations at a time. Operations include:

Transfering a single file
Removing a single file

So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!

Rackspace Cloud Files

sync_with CloudFiles do |cf|
  cf.username          = "my_username"
  cf.api_key           = "my_api_key"
  cf.container         = "my_container"
  cf.auth_url          = "https://auth.api.rackspacecloud.com"
  cf.servicenet        = false
  cf.path              = "/backups"
  cf.mirror            = true
  cf.concurrency_type  = :threads
  cf.concurrency_level = 50

  cf.directories do |directory|
    directory.add "/path/to/directory/to/sync"
    directory.add "/path/to/other/directory/to/sync"
  end
end

Additional Notes

Available Auth URLs

https://auth.api.rackspacecloud.com (US - Default)
https://lon.auth.api.rackspacecloud.com (UK)

NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Rackspace Cloud Files does not support this and will have to transfer a whole file in order to update it. However, Rackspace Cloud Files does provide "checksums". This allows Backup to check if a file you synced up to Rackspace Cloud Files has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).

Set servicenet = true if Backup runs on a Rackspace server. It will avoid transfer charges and it's more performant.

When enabling mirroring (cf.mirror = true) Backup will keep an exact one-on-one mirror from your filesystem on Rackspace Cloud Files. This means that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Rackspace Cloud Files. If you remove a file from one of these directories, it will also remove it from Rackspace Cloud Files.

The directory.add method allows you to add any directories you want to sync from your filesystem to your Rackspace Cloud Files account.

You can use the concurrency_type and concurrency_level to increase concurrency on file transfers. The recommended concurrency_type is :threads. Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory and you spin up a bunch of processes to handle it. You may want to change the concurrency_level (higher or lower, feel free to experiment), but we recommend using :threads and not :processes when doing so.

For each concurrency_level you can handle one additional file concurrently. This means that if you set concurrency_level = 50, it means Backup will perform 50 operations at a time. Operations include:

Transfering a single file
Removing a single file

So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!

Default Configuration for the RSync Syncer

If you want to create multiple RSync syncer objects then you might want to configure some default settings to reduce redundancy. For example, if you always want to enable mirroring and compression for RSync syncers, and you always want to use the same server ip, port, username and password, then you can do the following:

Backup::Configuration::Syncer::RSync.defaults do |rsync|
  rsync.ip       = "123.45.678.90"
  rsync.port     = 22
  rsync.username = "my_username"
  rsync.password = "my_password"
  rsync.mirror   = true
  rsync.compress = true
end

With this in place, whenever you want to use the above default configuration you can just omit it from your backup models, like so:

sync_with RSync do |rsync|
  rsync.path     = "~/backups/for_my_other_app"

  rsync.directories do |directory|
    directory.add "/var/apps/my_other_app/public/uploads"
    directory.add "/var/apps/my_other_app/logs"
  end
end

Since we didn't specify the ip, port, username, password, mirror and compress options, it'll default to the values we specified in the configuration block.

To set default configuration for S3, use Backup::Configuration::Syncer::S3.defaults.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncers

Syncers

Storages vs Syncers

Examples

RSync

Amazon S3

Rackspace Cloud Files

Default Configuration for the RSync Syncer

Clone this wiki locally