-
Notifications
You must be signed in to change notification settings - Fork 0
Syncers
Currently supported syncers:
- RSync
- Amazon S3
- Rackspace Cloud Files
Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:
- The copying files/dumping databases to the
~/Backup/.tmp
directory - The packaging (tar'ing) of the copied/organized files
- The (optionally) compressing of the packaged file
- The (optionally) encrypting of the packaged file
- The storing of the packaged file
The last step is what storages do, store the final result of a backup file to the specified destination.
Syncers completely bypass this whole procedure. They are meant to instantly transfer directories of data from the production server to the backup server. This is extremely useful if you have lots of gigabytes of data you need to transfer, for example "user-uploaded-content" that over time built up to 50GB worth of images, music, videos or other heavy file formats. With a Syncer you would basically just say: "Keep a mirror of this directory (/var/apps/my_app/public/music/) on my backup server in (/var/backups/my_app/music)". Then every time you run Backup on this Syncer, it won't copy over 50GB of data, then tar it, and then transfer it. This'll transfer the actual user@production:/var/apps/my_app/public/music/ directory to user@backup:/var/backups/my_app/music. This way no additional disk storage on your production box will be used to store the temporary files (copy, tar, compress, encrypt) before the transfer, which can be very CPU intensive, slow, expensive and also cause your application(s) to become slow during this time.
The following examples should be placed in your Backup configuration file.
Backup::Model.new(:my_backup, 'My Backup') do
# examples go here...
end
sync_with RSync do |rsync|
rsync.ip = "123.45.678.90"
rsync.port = 22
rsync.username = "my_username"
rsync.password = "my_password"
rsync.path = "~/backups/"
rsync.mirror = true
rsync.compress = true
rsync.additional_options = ['--some-option']
rsync.directories do |directory|
directory.add "/var/apps/my_app/public/uploads"
directory.add "/var/apps/my_app/logs"
end
end
Additional Notes
RSync has the ability to transfer parts of files, rather than the full files when a file updates. For example, say you have a text file of 100KB in size. Now you add another 50 lines of text, increasing the size by 5KB (So now the total is 105KB). Now, the next time Backup gets invoked, it'll see that the file changed, and will only transfer the additional 5KB that got added to the text file, rather than transferring the whole 105KB over again.
The rsync.mirror
option, when set to true
will tell RSync to keep an exact mirror of the files, of your production
box, on the backup server. This means that when files get removed from the /var/apps/my_app/public/uploads
directory,
it'll also remove these files from the backup server during the next sync. When set to false
, it'll ignore removed
files and just keep them on the backup server.
The rsync.compress
option, when set to true
will tell RSync to compress the data that'll be transferred.
The compression is only meant for transferring data, this improves transfer speed and lowers bandwidth usage.
Turning this option on will resort in more CPU usage for the compression.
Once the changes have been transferred, it'll automatically uncompress back to it's original state.
The directory.add
method allows you to add the directories you want to sync from the production server to your backup server.
When a path ends with a '/' (forward slash) it'll only sync the contents (and sub-directories) of that directory.
If the provided path does not end with a '/', it'll create that directory on the backup server
(thus, syncing the whole directory, including it's contents and all sub-directories).
sync_with S3 do |s3|
s3.access_key_id = "my_access_key_id"
s3.secret_access_key = "my_secret_access_key"
s3.bucket = "my-bucket"
s3.region = "us-east-1"
s3.path = "/backups"
s3.mirror = true
s3.concurrency_type = :threads
s3.concurrency_level = 50
s3.directories do |directory|
directory.add "/path/to/directory/to/sync"
directory.add "/path/to/other/directory/to/sync"
end
end
Additional Notes
AS OF BACKUP 3.0.22 WE NO LONGER USE s3sync
NOR aproxacs-s3sync
CLI UTILITY GEMS.
BACKUP NOW HAS IT'S OWN HAND-ROLLED, HIGH PERFORMANCE AND MORE STABLE SYNCING SOLUTION WHICH USES THE fog
AND parallel
GEMS.
THESE GEMS ARE BUNDLED IN BACKUP'S DEPENDENCY MANAGER SO IT "JUST WORKS".
Available regions:
ap-northeast-1
ap-southeast-1
eu-west-1
us-east-1
us-west-1
NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Amazon S3 does not support this and will have to transfer a whole file in order to update it. However, Amazon S3 does provide "checksums". This allows Backup to check if a file you synced up to Amazon S3 has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).
When enabling mirroring (s3.mirror = true
) Backup will keep an exact one-on-one mirror from your filesystem on Amazon S3. This means
that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Amazon S3. If you remove a file from
one of these directories, it will also remove it from Amazon S3.
The directory.add
method allows you to add any directories you want to sync from your filesystem to your Amazon S3 account.
You can use the concurrency_type
and concurrency_level
to increase concurrency on file transfers. The recommended concurrency_type
is :threads
.
Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to
use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory
and you spin up a bunch of processes to handle it. You may want to change the concurrency_level
(higher or lower, feel free to experiment),
but we recommend using :threads
and not :processes
when doing so.
For each concurrency_level
you can handle one additional file concurrently. This means that if you set concurrency_level = 50
, it means Backup will
perform 50 operations at a time. Operations include:
- Transfering a single file
- Removing a single file
So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!
sync_with CloudFiles do |cf|
cf.username = "my_username"
cf.api_key = "my_api_key"
cf.container = "my_container"
cf.auth_url = "https://auth.api.rackspacecloud.com"
cf.servicenet = false
cf.path = "/backups"
cf.mirror = true
cf.concurrency_type = :threads
cf.concurrency_level = 50
cf.directories do |directory|
directory.add "/path/to/directory/to/sync"
directory.add "/path/to/other/directory/to/sync"
end
end
Additional Notes
Available Auth URLs
- https://auth.api.rackspacecloud.com (US - Default)
- https://lon.auth.api.rackspacecloud.com (UK)
NOTE Unlike RSync, which has the ability to transfer only parts of individual files, Rackspace Cloud Files does not support this and will have to transfer a whole file in order to update it. However, Rackspace Cloud Files does provide "checksums". This allows Backup to check if a file you synced up to Rackspace Cloud Files has or has not been updated locally since then. In the case that it has not changed, Backup will not re-upload it (and thus, save you bandwidth/system resources).
Set servicenet = true
if Backup runs on a Rackspace server. It will avoid transfer charges and it's more performant.
When enabling mirroring (cf.mirror = true
) Backup will keep an exact one-on-one mirror from your filesystem on Rackspace Cloud Files. This means
that if you add a file to one of the mirrored directories on your filesystem, it will sync it to Rackspace Cloud Files. If you remove a file from
one of these directories, it will also remove it from Rackspace Cloud Files.
The directory.add
method allows you to add any directories you want to sync from your filesystem to your Rackspace Cloud Files account.
You can use the concurrency_type
and concurrency_level
to increase concurrency on file transfers. The recommended concurrency_type
is :threads
.
Threads do not require additional memory to spin up, and for operations such as file transfers, they are absolutely excellent in parallel performance, so it is highly encouraged to
use them. Processes on the other hand are discouraged, they consume a lot of ram and are likely to cause problems if your machine doesn't have a lot of memory
and you spin up a bunch of processes to handle it. You may want to change the concurrency_level
(higher or lower, feel free to experiment),
but we recommend using :threads
and not :processes
when doing so.
For each concurrency_level
you can handle one additional file concurrently. This means that if you set concurrency_level = 50
, it means Backup will
perform 50 operations at a time. Operations include:
- Transfering a single file
- Removing a single file
So when Backup is busy mirroring your specified directories, it will perform 50 transfer/remove operations at the same time. It's very fast!
If you want to create multiple RSync syncer objects then you might want to configure some default settings to reduce redundancy. For example, if you always want to enable mirroring and compression for RSync syncers, and you always want to use the same server ip, port, username and password, then you can do the following:
Backup::Configuration::Syncer::RSync.defaults do |rsync|
rsync.ip = "123.45.678.90"
rsync.port = 22
rsync.username = "my_username"
rsync.password = "my_password"
rsync.mirror = true
rsync.compress = true
end
With this in place, whenever you want to use the above default configuration you can just omit it from your backup models, like so:
sync_with RSync do |rsync|
rsync.path = "~/backups/for_my_other_app"
rsync.directories do |directory|
directory.add "/var/apps/my_other_app/public/uploads"
directory.add "/var/apps/my_other_app/logs"
end
end
Since we didn't specify the ip
, port
, username
, password
, mirror
and compress
options,
it'll default to the values we specified in the configuration block.
To set default configuration for S3, use Backup::Configuration::Syncer::S3.defaults
.