Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for protocols other than SSH #1070

Open
Compizfox opened this issue May 21, 2016 · 32 comments
Open

Support for protocols other than SSH #1070

Compizfox opened this issue May 21, 2016 · 32 comments

Comments

@Compizfox
Copy link

Compizfox commented May 21, 2016

I'm currently using Duplicity for my off-site backups which works quite well apart from one thing: it lacks incremental forever backups.

Borg does have this feature.

However, my backup backend only supports WebDAV. In Duplicity I can directly use a WebDAV backend but Borg only seems to support SSH. I could mount the WebDAV server locally, but davfs2 isn't available on FreeBSD (the platform of my fileserver). There is an analog called wdfs, but it hasn't been updated since 2007. Besides, I've always been told that mounting (off-site) backups locally isn't a good idea because that makes it more likely that your backups can be overwritten in some way.

So, if I want to use Borg instead of duplicity, I'd have two options, none of which are ideal:

  • Mounting the WebDAV server locally using wdfs
  • Running Borg on a separate VM running Linux, mounting the WebDAV server locally using davfs2, using NFS to access the FreeBSD fileserver

So I was curious whether support for other protocols, like WebDAV, is a planned feature.

@ThomasWaldmann
Copy link
Member

Well, we have quite some plans with borg, but a own webdav backend is not a priority for the core devs for sure (I heard that webdav is a rather shitty protocol, btw.).

That said, you could use any kind of tools that either mounts webdav as a FS (as you tried - assuming that the fs emulation is good enough) or first do a "local" backup and then additionally push that to your webdav server using separate tools.

There might be some other backends some day, but likely that will first require bigger internal changes (see also the already existing ticket about S3 support).

@verygreen
Copy link
Contributor

The thing with webdav vs ssh is that with SSH you can run borg copy on the server, and then the two talk to each other using ssh as the data pipe. This is not possible with webdav, no matter how you cut it webdav = full file access without involving borg at all which is universally bad for backups.

I imagine whoever holds your backups is not super keen on letting you run random commands on their hosts and that's why you get webdav (or others might get ftp and the like) - i.e. full filesystem access protocol.

@RonnyPfannschmidt
Copy link
Contributor

webdav is not something thats implementable with a reasonable cost due to sloppy spec and many many sloppy implementations

@Compizfox
Copy link
Author

Compizfox commented May 25, 2016

Alright, thanks all for your answers.

I have Borg running in combination with WDFS (an ages old FUSE FS for WebDAV) now which seems to work fine. Only the performance is much lower than what I had with Duplicity; I now get 20 Mb/s max on a 100 Mb/s connection, Duplicity used to max out that speed. I don't know if WDFS or Borg is the bottleneck of course, although I suspect the former.

On another machine (Debian) I'm trying Borg with davfs2 now, but here I'm running into some problems. When I try to initialise a Borg repository on a davfs2 mount, Borg crashes (see attachment for log).

borglog.txt

The thing with webdav vs ssh is that with SSH you can run borg copy on the server, and then the two talk to each other using ssh as the data pipe. This is not possible with webdav, no matter how you cut it webdav = full file access without involving borg at all which is universally bad for backups.

I don't see that as "universally bad" at all. Backup software like Borg and Duplicity don't require any software on the backend which is actually a big pro, because that means that I can use it with any backend I want (theoretically). Cloud providers don't let you run code on their backend.

So IMO it is actually a very good thing that Borg doesn't require software on the backend but still can do incremental forever backups. rdiff-backup for example can also do incremental forever but it requires software on the backend (and can't do encryption, but that's an unrelated issue) so I simpy can't use it.

webdav is not something thats implementable with a reasonable cost due to sloppy spec and many many sloppy implementations

So how does Duplicity do this? Duplicity supports a whole shitload of storage backends (which is great), including WebDAV. This means you don't have to use some FUSE filesystem, which is a good thing.

@verygreen
Copy link
Contributor

I don't see that as "universally bad" at all. Backup software like Borg and Duplicity don't require any software on the backend which is actually a big pro, because that means that I can use it with any backend I want (theoretically). Cloud providers don't let you run code on their backend.

Well, I think there's a misunderstanding here on your part. It's like saying that "borg can write directly to a local fs, so it does not require any software on the backend to be effective (And to support ext4/zfs/whatever).

So IMO it is actually a very good thing that Borg doesn't require software on the backend but still can do incremental forever backups. rdiff-backup for example can also do incremental forever but it requires software on the backend (and can't do encryption, but that's an unrelated issue) so I simpy can't use it.

Borg absolutely does require software on the backend (the other borg instance), otherwise it's just reduced to a local filesystem access be it provided by a local kernel driver (native fs, fuse) or a layer in the app itself (ld_preloaded library, all accesses going through a translation layer like what I suspect duplicity does).
The difference here is: When you have borg (or rsync-backup or bacula or some such) backend running on the node, you can talk only via the backup protocol to it. It does not allow you to touch random files and perform random actions, you are confined to the protocol supported by this particular backup software and that's it. No file deletion or modifying random backup data. borg in append-only mode does not allow you to remove anything in the remote repo like that even if you ask to prune stuff.

When it is said that borg supports ssh out of the box, what is meant is "via ssh we can start another borg instance on the other end and then communicate with it via ssh pipe, in fact we recommend to use ssh force command to ensure only borg could be started for a particular key so nobody has unrestricted access to the repo, only via borg protocol".

@RonnyPfannschmidt
Copy link
Contributor

a first step towards dumb storage might be a storage format that requires less logic
i think its order of magnitude easier to just do atomic uploads of richer segments, than it is to run the current locked segments in place

@enkore
Copy link
Contributor

enkore commented May 25, 2016

Proper cloud storage support will require a completely different storage design, since the environment is completely different. Repository+LoggedIO produce strong guarantees on consistency and integrity (somewhat dependent on file system and hardware), which isn't really possible with anything that doesn't give similar guarantees to a POSIX-y file system. Which is why there have been various failures observed with things like DAV or FTP FUSE FSes.

I think this has been discussed in the S3 thread already, but I recap anyway. A possible design could revolve around hash-packs (i.e. blobs of chunks identified by a hash-over-ordered-IDs) to avoid inevitable failures when trying to do segment-counting in the cloud. Those need indexing and versioned objects (=the manifest) need another storage strategy. Compaction needs different strategies. Concurrent accesses needs different strategies. Locking is per se not possible in the storage. And things like append-only are not possible etc.

It would just be an entirely different Repository implementation that will be much more complex. There is really nothing stopping that from happening (just saying).

@RonnyPfannschmidt
Copy link
Contributor

there is 2 ways to go about that

a) direct chunk upload - grat for s3, moot for dav/sshfs
b) rich segments that dont need locking in the store (tricky to get right, needs exclusive create and uploads and a segment dependency graph and graph merges)

@enkore
Copy link
Contributor

enkore commented May 25, 2016

a) direct chunk upload - grat for s3, moot for dav/sshfs

S3 PUTs cost money as well. Iirc about 0.5 cents per 1000. Storing just a couple million [small] chunks will cost much more in PUTs than actual storage costs.

E: 100 GB, 10 million chunks = 3 USD/mo for storage, but the upload costed 50 USD. A hard drive is [much] cheaper in that scenario.

@Compizfox
Copy link
Author

Compizfox commented May 25, 2016

Borg absolutely does require software on the backend (the other borg instance), otherwise it's just reduced to a local filesystem access be it provided by a local kernel driver (native fs, fuse) or a layer in the app itself (ld_preloaded library, all accesses going through a translation layer like what I suspect duplicity does).

Maybe I'm misunderstanding you but Borg doesn't require Borg to be installed on the backend, right? Hence why I can use FUSE for WebDAV support. It does support it though. I quote from the README:

If Borg is installed on the remote host, big performance gains can be achieved compared to using a network filesystem (sshfs, nfs, ...).

 

The difference here is: When you have borg (or rsync-backup or bacula or some such) backend running on the node, you can talk only via the backup protocol to it. It does not allow you to touch random files and perform random actions, you are confined to the protocol supported by this particular backup software and that's it. No file deletion or modifying random backup data. borg in append-only mode does not allow you to remove anything in the remote repo like that even if you ask to prune stuff.

I understand why this is better from a security point of view but for most off-site backups this is simply not possible.

@RonnyPfannschmidt
Copy link
Contributor

borg needs to be on the server in case of ssh, since a rpc protocol is used, not a file locking protocol

@verygreen
Copy link
Contributor

Maybe I'm misunderstanding you but Borg doesn't require Borg to be installed on the backend, right?

It's required in case you are using ssh://.... repos. It's not required if you use sshfs through fuse.

@Compizfox
Copy link
Author

Thanks, I understand now. I missed the part about sshfs. I thought it was also possible to use ssh:// repos without having Borg installed on the backend, but that is not the case.

@DavidCWGA
Copy link

rclone supports most of the traditional cloud services including Amazon Cloud Drive. They are also open source. Perhaps some "borrowing" of code could work here?

https://github.com/ncw/rclone

@RonnyPfannschmidt
Copy link
Contributor

Iits in go, i dont See any sane way to do that

@xeor
Copy link

xeor commented Dec 4, 2016

I'm looking for other protocols as well, specially plain old http or https would be nice.
I see the problem supporting everything that the borg/borg-server combo can do using other protocols. For my case, it's strictly one-way. No need for restore, stats, mounts or anything else..
I imagine many others things the same, but also, many peoples probably want to backup to S3 and similar providers as well..

@guillaume-uH57J9
Copy link

guillaume-uH57J9 commented Sep 1, 2017

Support for more widespread protocols, such as sftp or webdav, would be helpful.

My use case would be a remote repository accessible only through sftp.
As a security measure, there is no remote shell access on that remote server. I have restricted remote ssh access using ChrootDirectory + ForceCommand.

Update:

I tried rclone which I discovered thanks to @DavidCWGA, and found this alternative solution:

  • Backup to a local borg repository
  • Use rclone to sync the local borg repository with a remote sftp server

Seeing comments from @enkore I suspect this solution has less guarantees w.r.t. consistency and integrity for the remote backup, although I don't know enough about borg to know what I'm loosing really.

@ThomasWaldmann
Copy link
Member

sshfs uses sftp, so: use sshfs? It will be slower than borg-ssh-borg.

@dimejo
Copy link

dimejo commented Sep 1, 2017

My use case would be a remote repository accessible only through sftp.
As a security measure, there is no remote shell access on that remote server. I have restricted remote ssh access using ChrootDirectory + ForceCommand.

TBH, I would love to see more backends supported by borg, but for situations where you are able to provide SSH access I would always prefer that. Using SSH with command and borg serve allows you explicitly set a destination and prevent the client from deleting data. With pure SFTP there is no way to prevent a (hacked) client from deleting all backups.

@guillaume-uH57J9
Copy link

Thanks @dimejo this satisfy my use case, since in this situation I have complete control of the remote server so I can install borg, and edit ssh configuration.

FYI Here's what I'm doing in my sshd_config:

# borg only group
Match Group borgonly
 ForceCommand borg serve --restrict-to-path $HOME --append-only
 X11Forwarding no
 AllowTcpForwarding no
 PermitTunnel no

@ttr
Copy link

ttr commented Jun 12, 2019

Hello.
Apologies to ping 2y old topic, but I'm also interested with non local/non-ssh backed options. Mentioned rclone is one mitigation (to mount external system over fuse) but as already mentioned here and on rclone docs, this have some limitations and potential issues. However, I've found that rclone have 'serve' subcommand and one of it is for restic (different backup tool). By looks it's used from restic and not standalone - restic utilizes configured backed storages in rclone, and then starts rclone as communication layer.
This still not same solution as using borg on remote host, but might be good compromise - instead to rely on fuse and external implementations, borg will depends on FS implementation of rclone but API between rclone and borg could be locked down to needs. Obviously this will require work on both projects but seems it would be much less work then implement all cloud backed individually.

@ThomasWaldmann
Copy link
Member

I didn't use rclone yet, but isn't the default usage pattern to rclone a local directory to a remote storage (not using FUSE)? And if that local directory is your borg repo, you will have a remote clone of that afterwards (see FAQ about copying/cloning borg repos).

Considering rclone is in Go, I imagine it might be easier to integrate into other software written in Go, like restic, than into Python SW.

@ttr
Copy link

ttr commented Jun 13, 2019

It is, and it's advised to not use fuse due to issues with fs cache (in fuse/kernel). This is why if borg talked directly to rclone (bypassing fuse, fs layer) it should be more stable.
Having local copy of borg synced to remote is

  1. not advised by borg docs (better to have separate repositories)
  2. need of local storage (in situation if someone want to have only remote backups - not myself).
    For my ideal approach would be to have borg over ssh to my NAS and as 2nd repo to cloud provider - now I could use fuse mount but it require certain tunning (mostly to disable local fs cache).

re go/python - yes and no.
Not talking to integrate into but with - there would need to be API specification that both software will need to talk to each other, and Yes, one implementation will be in go, other in python.
Still might be quicker than implementation of all cloud-providers inside borg (and honestly would prefer borg team to keep focus on making sure core of borg is close to flawless).

@ncw
Copy link

ncw commented Nov 13, 2019

Hi! Rclone author here. I had a request for better borg integration with rclone... rclone mount mostly works but fails sometimes probably to do with those consistency issues noted above.

I implemented rclone serve restic to act as a special purpose gateway for restic. restic then uses an HTTP protocol to speak to rclone over a pipe to access all the backends rclone supports.

I was wondering how difficult something like that would be for borg? I see borg has borg serve which serves a similar purpose when run over an ssh connection. How difficult would it be to re-implement borg serve in Go and add it to rclone? Is the protocol stable? I couldn't find any docs for it but I haven't looked very hard ;-)

@DurvalMenezes
Copy link

How difficult would it be to re-implement borg serve in Go and add it to rclone?

That would be a killer feature to have, instantly opening all clouds supported by rclone for direct access by borg-backup users.

Borg developers, it would be very appreciated (by myself and tons of others) if you could help @ncw with this.

Thanks in Advance,
-- Durval.

@ThomasWaldmann
Copy link
Member

@ncw have a look into remote.py, this is where the client/server code is located.

It is basically doing remote procedure calls over stdin/stdout/stderr channels of ssh.

The code there is more or less stable, but the RPC interface is made in a way so it is extensible.

@ppenguin
Copy link

ppenguin commented Oct 27, 2020

Sorry to (again) revive this thread, I guess some never stop trying ;)

I'm tempted like others before me to consider plugging in a "bucket-based" cloud backend (like S3 or tardigrade) into borg.
I'm aware of and understand (part of) the concerns in this thread (although being a noob in borg internals).

Based on this comment by enkore in this thread, I was wondering if it is not possible to mitigate the consequences/requirements by focusing the use-case, and I was hoping to get some advice/pointers whether is could be feasible.

As I understand from repository and LoggedIO, in there the repository chunks and indexes (all files) are stored in the repository. If the repository is local, it is basically just writing files, but apparently it is doing some magic to maintain consistency to handle possible interruptions etc.?

But what if one would be more blunt and store these files (chunks and index) by commands like s3 put s3://myrepobucket/123 (pseudo code), while e.g. requiring that:

  • one repo client only (locking)
  • to redo a complete chunk if failed is acceptable (as I see in my repo these are up to 500 MB)
  • The index (probably?) is cached locally during backup and copied to the repo at the end of a successful backup session
  • The index is downloaded locally before doing a further (incremental) backup session

Then the granularity of consistency would be the size of each uploaded chunk? I.e.: if my backup is interrupted I would need to do two things to get the remote repo in a consistent state:

  • remove the last uploaded file if the interruption has possibly corrupted it (to be confirmed: how does S3 handle an interrupted upload?)
  • upload the local index (representing the remote repo state) to the remote

One could reduce the risk by making uploads "more atomic" by uploading temp and renaming. (This mechanism also appears in LoggedIO I believe).

At the risk of appearing too naive with this strategy, I'd love to hear some opinions on the rough feasibility of it.

(As a side note: I'm currently successfully running daily backups to a remote NAS over a slow link using an sshfs mount, and it works well for me (full backup size single TB range). The reason for pursuing the above is to make the borg client leaner/self-contained by removing the sshfs-fuse necessity, so it's easier to run on resource constrained devices. I have successfully run tests with borg on low-spec NAS with local or mounted repos, so part of the prerequisites seem given.)

@ThomasWaldmann
Copy link
Member

borg2: other backends (using other protocols) can now get implemented much easier than before, since #8332 was merged.

@ncw
Copy link

ncw commented Sep 19, 2024

I had a look at an example borgstore backend the posixfs backend.

That interface looks relatively easy to implement.

It would be straightforward to make a backend for rclone. This could either use the binary directly, or probably preferred start an API server on a unix socket or stdin/stdout.

Is that of interest?

I note that there is an sftp backend now which would work directly with rclone serve sftp - this could be used straight away.

I note also @ThomasWaldmann pull request to use the restic server which would also work with rclone serve restic and be more efficient that the sftp server.

So maybe a dedicated rclone backend isn't needed? Though I think it could be more seamless if it starts and stops any rclone components itself rather than the user having to start it.

Thoughts?

@ThomasWaldmann
Copy link
Member

@ncw Yes, rclone as a borgstore backend is very interesting - if that would work good, I guess we would not need to reinvent the wheel (wheel = talking to misc cloud storage providers).

sftp: that already works, but is slow, see borgbackup/borgstore#44 - could maybe be less slow for low-latency connection on localhost though.

borgstore REST client backend PR: there are some differences, so the restic server or the rclone restic-like server don't work "as is", but would need some (easy?) adaptions.

See also #5324, @buengese seems also interested.

@ppenguin
Copy link

So maybe a dedicated rclone backend isn't needed? Though I think it could be more seamless if it starts and stops any rclone components itself rather than the user having to start it.

Thoughts?

In general this would be great I think since adds multiple backends at once. One caveat: I tried rclone on constrained systems a few years ago (NAS in my case) and I found the performance to be prohibitively slow and (if I remember correctly?) memory consumption/requirements prohibitively high. My focus backend was storj/tardigrade at the time, I forgot what others I tried with it. (I'm still focused on storj though, and in the meantime implemented a backend for kopia for it, because I didn't feel up to doing this in borg. It's not yet submitted to upstream, still testing...)

In other words, rclone would certainly be great, but it maybe shouldn't be a reason to not explore/build "native" protocols for backends it supports directly in borg.

@ncw
Copy link

ncw commented Sep 20, 2024

@ThomasWaldmann - I agree 100% with your comments on the sftp protocol and paramiko - have had lots of experience with both!

I have made a first attempt at an rclone backend here: borgbackup/borgstore#46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Remote repositories and protocols
Development

No branches or pull requests