Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create: add rsync protocol support for pulling files from other host #1811

Closed
ghost opened this issue Nov 6, 2016 · 24 comments
Closed

create: add rsync protocol support for pulling files from other host #1811

ghost opened this issue Nov 6, 2016 · 24 comments

Comments

@ghost
Copy link

ghost commented Nov 6, 2016

Short description: adapt rsync algorithm to service us.

Problem: if we can not install borg client on target server, we can use sshfs to connect to this server but it take extremely long time to backup something from this server, it's impossible to wait some days for completing backup operation, another solution is split backup task to two subtasks: rsync to local folder, backup to borg repository from this local folder, in this case we need double disk space (we need space for local folder and for borg repository), so it's not good solution too.

Suggestion: we can adapt librsync (http://librsync.sourcefrog.net/ , https://github.com/librsync/librsync ) or tools, which use librsync ( https://github.com/pgodel/rdiff-backup ) as suggested in this blog article https://translate.google.com/translate?sl=ru&tl=en&js=y&prev=_t&hl=ru&ie=UTF-8&u=http%3A%2F%2Fwebenterprise.ru%2Frsync-with-python%2F&edit-text= (it may be faster to implement and later replace rdiff-backup dependency to librsync dependency).

We can use Python Wheels if we would like to save end user from any installation steps of librsync...


💰 there is a bounty for this

@ghost ghost changed the title Feature request Feature request [add rsync algorithm support] Nov 6, 2016
@enkore enkore changed the title Feature request [add rsync algorithm support] Add rsync protocol support Nov 6, 2016
@enkore
Copy link
Contributor

enkore commented Nov 6, 2016

Using the rsync algorithm (this is what librsync implements) wouldn't make any difference if we still can't install Borg on the other end (because we would still need Borg on the other end, or some other software with a protocol Borg understands, that calculates the delta diffs).

Using the rsync protocol on the other hand would allow to pull backups from any host where rsync is installed and usable over SSH (or where the rsync daemon is running). This would be quite a useful feature indeed.

However, this protocol is not implemented by librsync. I'm not sure if there actually is any implementation other than rsync itself. Edit: I found one implementation of the protocol: https://github.com/gilbertchen/acrosync-library (licensed under RPL, not sure if it would be possible to use it)

librsync does not implement the rsync wire protocol. If you want to talk to an rsync server to transfer files you'll need to shell out to rsync. You cannot make use of librsync to talk to an rsync server.

If you really meant the algorithm, not the protocol, feel free to change the title back.

@anarcat
Copy link
Contributor

anarcat commented Nov 6, 2016

seems like this is yet another case of #1070 - i can't believe i remember that issue number by heart now. :/

@ghost
Copy link
Author

ghost commented Nov 6, 2016

Sorry, english is not my native language.

I mean here, if we have two alternatives:

a. ) run both commands:

"rsync --progress \
    -aicvzh \
    --partial \
    --append \
    --rsh=ssh -e 'ssh -i ~/.ssh/id_rsa' {user}@{host}:{directory} {temp_directory_local}".format(...)

plus

"""docker run -ti --rm \
    -v {repository_root_local}:/repository \
    -v {temp_directory_local}:/incoming \
    -v {borg_cache_directory}:/cache \
    -e BORG_PASSPHRASE='{borg_passphrase}' \
    -e BORG_CACHE_DIR='/cache' {IMAGE_NAME} \
    /bin/bash -c " borg create --verbose --stats --progress --list --compression lzma,9 /repository::{backup_name}  /incoming/* " """
                 .format(...)

or

b.) run only second command (but use sshfs directory instead of {temp_directory_local})

then a.) looks much better despite the fact, it require more disk space because of b.) is really slow.

And will be good to somehow find solution, which combines good disk space requirements ( like in b.) ) and good performance ( like in a.) case ).

Yes, I mean rsync protocol over SSH ..

@ThomasWaldmann
Copy link
Member

lib-acrosync looks like the right library for such a task. But (besides the unusual license that is rather restrictive for a library), it looks like there was no commit after 2015 and also it doesn't seem too popular.

also, if we invested a lot of effort to support fetching files via rsync, we would just somehow "solve" the "you need borg on the remote side problem" via "you need rsync on the remote side".

@ghost
Copy link
Author

ghost commented Nov 28, 2016

rsync is usually installed by default by many shared hosting providers (and it's possible to ask to install it, it's popular, known by many sysadmins and trusted tool), but , yes, I agree, better to have not only rsync (it's faster and preferred way if it's available rather than scp), but also scp support to use with hosts without rsync support... in some more rare cases we have only ftp(s) (but rsync and scp should be more preferred IMHO)

So really better to have 3 ways to work with remote servers: rsync, ssh(scp/sftp), ftp(s)

Not sure about best implement order (I prefer start from rsync here - it's well known and popular tool and system admins usually install it by request if users need it, they trust rsync)

But anyway I hope someday borg will allow to use all 3 methods to access remote server...

@dave-fl
Copy link

dave-fl commented Jan 17, 2017

Since considering rsync. Would you consider backblaze b2. Their command line API can be used. It offers a sync option. There is also rclone depending on how you plan to integrate.

@enkore
Copy link
Contributor

enkore commented Jan 17, 2017

Wrong end ;)

This is about Borg being able to read files to be backed up from a server with rsync installed. You probably want to have a repository on B2, see #102 #1070

@DonRichie
Copy link

DonRichie commented May 28, 2017

I have multiple issues due to the lack of a pull functionality:

  • I don't want to expose my LAN devices to the internet. But that is the only way how I am able to back up a remote server without losing deduplication. Some people maybe also can't expose their LAN because they have DS-Lite etc.. SSH-tunneling hacks are also no awesome way to solve this problem.
  • The second problem is that I can't easily pull from my openwrt router with an attached 1TiB hard drive. There is no binary provided.

After reading around the net, the lack of a nice pull functionality is the major problem why people don't use borgbackup.
Isn't there any smart way to use the rsync protocol for deduplicated pulling? It is the one program available nearly everywhere.

My idea would be:
Can't you emulate the chunking process out of the information rsync gives you?
If rsync says a file exists based on the file in the last backup:
-> Don't transfer it and let the chunking algorithm use the local file of the last backup.
If rsync says a file does not exist:
-> Download it and let the chunking algorithm use the new downloaded file.

Another idea:
Provide a FUSE mount of the last backup (like you already can). Then allow writing on this mountpoint. This would make it possible for rsync to work on this mountpoint for syncing.
And after rsync finished create a new borgbackup of the changed mountpoint. This would also make it possible to attach any other remote file copying solution to borgbackup.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 28, 2017

Guess your "write onto FUSE-mounted borg archive" could already work, you just need to mount a writable overlay fs onto the borg mount. Linux Live CDs use such overlay filesystems for the same purpose.
But be aware FUSE does not support ACLs yet.

As this has nothing to do with "borg using rsync protocol" -> move this to different ticket.

@textshell
Copy link
Member

@ThomasWaldmann Doing "write onto FUSE-mounted borg archive" with a borg mount and an overlayfs and then using borg create to backup that overlayfs sounds like a very fragil solution. Also It will deadlock because running a backup to the repo that is mounted as backup source will not work.

@ALL
In principle looking at https://rsync.samba.org/how-rsync-works.html it seems that basically rsync exchanges file lists and then the receiving side can get the whole file contents (assuming --whole-file mode). So with some amount of caching this would allow just transfering files that rsync would consider to be changed. (i.e. have rsync do change detection). But this would need to hook somewhere in the middle of the Archive class. And it looks like a lot of work.
So i think doing this without layers upon layers of duct tape would be possible. But it would need to hook into borg in ways that would need some refactoring. Also a good remote file system should have enough information for borg to also detect unchanged files at least as good as rsync without --checksum. So the --whole-file solution seems to be only helping when --checksum like change detection is desired.
I'm not sure how much more work delta-diff support would add, it seems most of that is done on the other side with the receiver only reassembling from a list of directions, but this would add extracting from the repository while running a backup and more rsync related complexity.

Another random idea: Detect changed files somehow. Extract those files from a borg archive. rsync these files over the extracted ones. Make a delta archive of just those changes. (this should all be scriptable outside of borg using at most slightly extended borg commands (i.e. via normally running the borg executable)) and have a final step with a special borg command to combine two archives (and maybe a list of removed files?) into a new archive. The last step would be a metadata only operation. On the other hand, this sounds similar to the idea with fuse mount and then using the upper layer to somehow merge it back into the repository but without using the fuse mount in the last step (creating the new archive)

@enkore
Copy link
Contributor

enkore commented May 29, 2017

I don't want to expose my LAN devices to the internet. But that is the only way how I am able to back up a remote server without losing deduplication. Some people maybe also can't expose their LAN because they have DS-Lite etc.. SSH-tunneling hacks are also no awesome way to solve this problem.

Your setup would be local target storage (in your LAN), and backing up from hosts in The Internet to that LAN storage, correct?

@DonRichie
Copy link

DonRichie commented May 29, 2017

Your setup would be local target storage (in your LAN), and backing up from hosts in The Internet to that LAN storage, correct?

Yes exactly.
I have a DS-Lite internet connection, a self administered linux server in the internet and a NAS in my LAN where the backups should go.

@level323
Copy link

@DonRichie I'm not familiar with what a "DS-Lite" internet connection is, but I assume it is probably a dynamic-IP connection or something similar which makes port-forwarding difficult.

Given that your description as I read it seems to include full control (root access) to the internet machine, I suggest (IMO) a very workable solution is to use tinc VPN link between your internet machine and the local machine hosting the borg backups. I've been using tinc for years and it's extremely reliable and (IMO) setup is straightforward (esp. compared to OpenVPN). Once setup, it "just works" and keeps on working. Firewalling the both endpoints to establish security to a level you're happy with would be required also, of course. Once you've got the tinc link up, you can run borg between the two machines as though they were on the same LAN.

@ThomasWaldmann
Copy link
Member

@textshell oops, right, I didn't consider locking / that it must be same src / target repo.

@enkore
Copy link
Contributor

enkore commented May 29, 2017

@DonRichie I'm not familiar with what a "DS-Lite" internet connection is, but I assume it is probably a dynamic-IP connection or something similar which makes port-forwarding difficult.

DS-Lite is short for "Dual Stack Lite". It means that the internet connection only has IPv6 connectivity while IPv4 connectivity is established through a NAT at the carrier shared with many customers. Thus, hosting IPv4 services is impossible on such an "Internet" "connection".

@enkore
Copy link
Contributor

enkore commented May 29, 2017

Note:
If we get around to merge the import-tar (create-from-tar) code (suggested for 1.2), then a simple tar pipe can be used (ssh foo@remote tar cf- /some/files | borg import-tar repo::archive -).
rsync protocol support seems like a rather marginal improvement over that with much more complexity on our part (rsync protocol is not documented [1], no independent implementations except the strangely licensed one I brought up above, have to do what @textshell suggested, reading from / comparing with existing archive, and how do you even test that thing? etc.).

[1] Beyond a document giving a basic overview, aptly summarized with the one sentence »A well-designed communications protocol has a number of characteristics. [...] Rsync's protocol has none of these good characteristics.«

@level323
Copy link

@enkore Thanks for the info on DS-Lite

@DonRichie I should have mentioned that tinc has native IPv6 support.

@milkey-mouse
Copy link
Contributor

I found some documentation on the rsync network protocol

@RonnyPfannschmidt
Copy link
Contributor

@enkore wouldnt it make sense to close this issue as wontfix then?

@enkore enkore changed the title Add rsync protocol support create: add rsync protocol support for pulling files from other host Jul 23, 2017
@ThomasWaldmann
Copy link
Member

There seems to be no progress here and it somehow looks like there is no good / easy way to "add rsync * for pulling files", so I am suggesting to close this.

@mariohock you put a bounty on this, would you agree with the money going into general borg organisation funds (usually used by me to put bounties on random tickets)?

@devZer0
Copy link

devZer0 commented Mar 6, 2019

if this will be closed, nobody will pick this up to work on. i would find this damn useful, as we actually have a lot of extra space ((terabytes) in use for backup because of this missing feature.

@ThomasWaldmann
Copy link
Member

There are some issues with this (like adding additional dependencies and complexity, like that it can't work as good as with borg as the remote agent) and it looks like nobody wants to work on this, thus I am closing this.

Looks like there is a USD 10 bounty from one supporter ( @mariohock ) on this.

@mariohock would you agree that this is used otherwise for borg or would you like to get the funds back (you could just submit a "fake" solution to this closed ticket).

@mariohock
Copy link

Too bad that that the feature doesn't come. But I see your reasons. Still I want to thank you for your good work.

I think I marked it as fixed in bountysource, but the interface wasn't really clear to me. I hope it worked.

@ThomasWaldmann
Copy link
Member

I'll claim the bounty and then use the USD 10 for some other (generic) bounty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests