Support for protocols other than SSH #1070

Compizfox · 2016-05-21T00:12:05Z

I'm currently using Duplicity for my off-site backups which works quite well apart from one thing: it lacks incremental forever backups.

Borg does have this feature.

However, my backup backend only supports WebDAV. In Duplicity I can directly use a WebDAV backend but Borg only seems to support SSH. I could mount the WebDAV server locally, but davfs2 isn't available on FreeBSD (the platform of my fileserver). There is an analog called wdfs, but it hasn't been updated since 2007. Besides, I've always been told that mounting (off-site) backups locally isn't a good idea because that makes it more likely that your backups can be overwritten in some way.

So, if I want to use Borg instead of duplicity, I'd have two options, none of which are ideal:

Mounting the WebDAV server locally using wdfs
Running Borg on a separate VM running Linux, mounting the WebDAV server locally using davfs2, using NFS to access the FreeBSD fileserver

So I was curious whether support for other protocols, like WebDAV, is a planned feature.

ThomasWaldmann · 2016-05-21T00:57:26Z

Well, we have quite some plans with borg, but a own webdav backend is not a priority for the core devs for sure (I heard that webdav is a rather shitty protocol, btw.).

That said, you could use any kind of tools that either mounts webdav as a FS (as you tried - assuming that the fs emulation is good enough) or first do a "local" backup and then additionally push that to your webdav server using separate tools.

There might be some other backends some day, but likely that will first require bigger internal changes (see also the already existing ticket about S3 support).

verygreen · 2016-05-25T02:49:38Z

The thing with webdav vs ssh is that with SSH you can run borg copy on the server, and then the two talk to each other using ssh as the data pipe. This is not possible with webdav, no matter how you cut it webdav = full file access without involving borg at all which is universally bad for backups.

I imagine whoever holds your backups is not super keen on letting you run random commands on their hosts and that's why you get webdav (or others might get ftp and the like) - i.e. full filesystem access protocol.

RonnyPfannschmidt · 2016-05-25T07:45:33Z

webdav is not something thats implementable with a reasonable cost due to sloppy spec and many many sloppy implementations

Compizfox · 2016-05-25T11:10:09Z

Alright, thanks all for your answers.

I have Borg running in combination with WDFS (an ages old FUSE FS for WebDAV) now which seems to work fine. Only the performance is much lower than what I had with Duplicity; I now get 20 Mb/s max on a 100 Mb/s connection, Duplicity used to max out that speed. I don't know if WDFS or Borg is the bottleneck of course, although I suspect the former.

On another machine (Debian) I'm trying Borg with davfs2 now, but here I'm running into some problems. When I try to initialise a Borg repository on a davfs2 mount, Borg crashes (see attachment for log).

borglog.txt

The thing with webdav vs ssh is that with SSH you can run borg copy on the server, and then the two talk to each other using ssh as the data pipe. This is not possible with webdav, no matter how you cut it webdav = full file access without involving borg at all which is universally bad for backups.

I don't see that as "universally bad" at all. Backup software like Borg and Duplicity don't require any software on the backend which is actually a big pro, because that means that I can use it with any backend I want (theoretically). Cloud providers don't let you run code on their backend.

So IMO it is actually a very good thing that Borg doesn't require software on the backend but still can do incremental forever backups. rdiff-backup for example can also do incremental forever but it requires software on the backend (and can't do encryption, but that's an unrelated issue) so I simpy can't use it.

webdav is not something thats implementable with a reasonable cost due to sloppy spec and many many sloppy implementations

So how does Duplicity do this? Duplicity supports a whole shitload of storage backends (which is great), including WebDAV. This means you don't have to use some FUSE filesystem, which is a good thing.

verygreen · 2016-05-25T14:29:03Z

I don't see that as "universally bad" at all. Backup software like Borg and Duplicity don't require any software on the backend which is actually a big pro, because that means that I can use it with any backend I want (theoretically). Cloud providers don't let you run code on their backend.

Well, I think there's a misunderstanding here on your part. It's like saying that "borg can write directly to a local fs, so it does not require any software on the backend to be effective (And to support ext4/zfs/whatever).

So IMO it is actually a very good thing that Borg doesn't require software on the backend but still can do incremental forever backups. rdiff-backup for example can also do incremental forever but it requires software on the backend (and can't do encryption, but that's an unrelated issue) so I simpy can't use it.

Borg absolutely does require software on the backend (the other borg instance), otherwise it's just reduced to a local filesystem access be it provided by a local kernel driver (native fs, fuse) or a layer in the app itself (ld_preloaded library, all accesses going through a translation layer like what I suspect duplicity does).
The difference here is: When you have borg (or rsync-backup or bacula or some such) backend running on the node, you can talk only via the backup protocol to it. It does not allow you to touch random files and perform random actions, you are confined to the protocol supported by this particular backup software and that's it. No file deletion or modifying random backup data. borg in append-only mode does not allow you to remove anything in the remote repo like that even if you ask to prune stuff.

When it is said that borg supports ssh out of the box, what is meant is "via ssh we can start another borg instance on the other end and then communicate with it via ssh pipe, in fact we recommend to use ssh force command to ensure only borg could be started for a particular key so nobody has unrestricted access to the repo, only via borg protocol".

RonnyPfannschmidt · 2016-05-25T14:44:07Z

a first step towards dumb storage might be a storage format that requires less logic
i think its order of magnitude easier to just do atomic uploads of richer segments, than it is to run the current locked segments in place

enkore · 2016-05-25T14:56:07Z

Proper cloud storage support will require a completely different storage design, since the environment is completely different. Repository+LoggedIO produce strong guarantees on consistency and integrity (somewhat dependent on file system and hardware), which isn't really possible with anything that doesn't give similar guarantees to a POSIX-y file system. Which is why there have been various failures observed with things like DAV or FTP FUSE FSes.

I think this has been discussed in the S3 thread already, but I recap anyway. A possible design could revolve around hash-packs (i.e. blobs of chunks identified by a hash-over-ordered-IDs) to avoid inevitable failures when trying to do segment-counting in the cloud. Those need indexing and versioned objects (=the manifest) need another storage strategy. Compaction needs different strategies. Concurrent accesses needs different strategies. Locking is per se not possible in the storage. And things like append-only are not possible etc.

It would just be an entirely different Repository implementation that will be much more complex. There is really nothing stopping that from happening (just saying).

RonnyPfannschmidt · 2016-05-25T15:13:14Z

there is 2 ways to go about that

a) direct chunk upload - grat for s3, moot for dav/sshfs
b) rich segments that dont need locking in the store (tricky to get right, needs exclusive create and uploads and a segment dependency graph and graph merges)

enkore · 2016-05-25T15:25:32Z

a) direct chunk upload - grat for s3, moot for dav/sshfs

S3 PUTs cost money as well. Iirc about 0.5 cents per 1000. Storing just a couple million [small] chunks will cost much more in PUTs than actual storage costs.

E: 100 GB, 10 million chunks = 3 USD/mo for storage, but the upload costed 50 USD. A hard drive is [much] cheaper in that scenario.

Compizfox · 2016-05-25T16:29:49Z

Borg absolutely does require software on the backend (the other borg instance), otherwise it's just reduced to a local filesystem access be it provided by a local kernel driver (native fs, fuse) or a layer in the app itself (ld_preloaded library, all accesses going through a translation layer like what I suspect duplicity does).

Maybe I'm misunderstanding you but Borg doesn't require Borg to be installed on the backend, right? Hence why I can use FUSE for WebDAV support. It does support it though. I quote from the README:

If Borg is installed on the remote host, big performance gains can be achieved compared to using a network filesystem (sshfs, nfs, ...).

The difference here is: When you have borg (or rsync-backup or bacula or some such) backend running on the node, you can talk only via the backup protocol to it. It does not allow you to touch random files and perform random actions, you are confined to the protocol supported by this particular backup software and that's it. No file deletion or modifying random backup data. borg in append-only mode does not allow you to remove anything in the remote repo like that even if you ask to prune stuff.

I understand why this is better from a security point of view but for most off-site backups this is simply not possible.

RonnyPfannschmidt · 2016-05-25T16:33:32Z

borg needs to be on the server in case of ssh, since a rpc protocol is used, not a file locking protocol

verygreen · 2016-05-25T16:48:17Z

Maybe I'm misunderstanding you but Borg doesn't require Borg to be installed on the backend, right?

It's required in case you are using ssh://.... repos. It's not required if you use sshfs through fuse.

Compizfox · 2016-05-25T16:52:35Z

Thanks, I understand now. I missed the part about sshfs. I thought it was also possible to use ssh:// repos without having Borg installed on the backend, but that is not the case.

DavidCWGA · 2016-10-27T20:05:18Z

rclone supports most of the traditional cloud services including Amazon Cloud Drive. They are also open source. Perhaps some "borrowing" of code could work here?

https://github.com/ncw/rclone

RonnyPfannschmidt · 2016-10-28T06:02:54Z

Iits in go, i dont See any sane way to do that

xeor · 2016-12-04T22:32:17Z

I'm looking for other protocols as well, specially plain old http or https would be nice.
I see the problem supporting everything that the borg/borg-server combo can do using other protocols. For my case, it's strictly one-way. No need for restore, stats, mounts or anything else..
I imagine many others things the same, but also, many peoples probably want to backup to S3 and similar providers as well..

guillaume-uH57J9 · 2017-09-01T16:31:53Z

Support for more widespread protocols, such as sftp or webdav, would be helpful.

My use case would be a remote repository accessible only through sftp.
As a security measure, there is no remote shell access on that remote server. I have restricted remote ssh access using ChrootDirectory + ForceCommand.

Update:

I tried rclone which I discovered thanks to @DavidCWGA, and found this alternative solution:

Backup to a local borg repository
Use rclone to sync the local borg repository with a remote sftp server

Seeing comments from @enkore I suspect this solution has less guarantees w.r.t. consistency and integrity for the remote backup, although I don't know enough about borg to know what I'm loosing really.

ThomasWaldmann · 2017-09-01T17:25:56Z

sshfs uses sftp, so: use sshfs? It will be slower than borg-ssh-borg.

dimejo · 2017-09-01T17:45:43Z

My use case would be a remote repository accessible only through sftp.
As a security measure, there is no remote shell access on that remote server. I have restricted remote ssh access using ChrootDirectory + ForceCommand.

TBH, I would love to see more backends supported by borg, but for situations where you are able to provide SSH access I would always prefer that. Using SSH with command and borg serve allows you explicitly set a destination and prevent the client from deleting data. With pure SFTP there is no way to prevent a (hacked) client from deleting all backups.

guillaume-uH57J9 · 2017-09-01T18:25:50Z

Thanks @dimejo this satisfy my use case, since in this situation I have complete control of the remote server so I can install borg, and edit ssh configuration.

FYI Here's what I'm doing in my sshd_config:

# borg only group
Match Group borgonly
 ForceCommand borg serve --restrict-to-path $HOME --append-only
 X11Forwarding no
 AllowTcpForwarding no
 PermitTunnel no

ttr · 2019-06-12T08:58:54Z

Hello.
Apologies to ping 2y old topic, but I'm also interested with non local/non-ssh backed options. Mentioned rclone is one mitigation (to mount external system over fuse) but as already mentioned here and on rclone docs, this have some limitations and potential issues. However, I've found that rclone have 'serve' subcommand and one of it is for restic (different backup tool). By looks it's used from restic and not standalone - restic utilizes configured backed storages in rclone, and then starts rclone as communication layer.
This still not same solution as using borg on remote host, but might be good compromise - instead to rely on fuse and external implementations, borg will depends on FS implementation of rclone but API between rclone and borg could be locked down to needs. Obviously this will require work on both projects but seems it would be much less work then implement all cloud backed individually.

ThomasWaldmann · 2019-06-12T12:31:08Z

I didn't use rclone yet, but isn't the default usage pattern to rclone a local directory to a remote storage (not using FUSE)? And if that local directory is your borg repo, you will have a remote clone of that afterwards (see FAQ about copying/cloning borg repos).

Considering rclone is in Go, I imagine it might be easier to integrate into other software written in Go, like restic, than into Python SW.

ttr · 2019-06-13T11:10:40Z

It is, and it's advised to not use fuse due to issues with fs cache (in fuse/kernel). This is why if borg talked directly to rclone (bypassing fuse, fs layer) it should be more stable.
Having local copy of borg synced to remote is

not advised by borg docs (better to have separate repositories)
need of local storage (in situation if someone want to have only remote backups - not myself).
For my ideal approach would be to have borg over ssh to my NAS and as 2nd repo to cloud provider - now I could use fuse mount but it require certain tunning (mostly to disable local fs cache).

re go/python - yes and no.
Not talking to integrate into but with - there would need to be API specification that both software will need to talk to each other, and Yes, one implementation will be in go, other in python.
Still might be quicker than implementation of all cloud-providers inside borg (and honestly would prefer borg team to keep focus on making sure core of borg is close to flawless).

ncw · 2019-11-13T22:10:21Z

Hi! Rclone author here. I had a request for better borg integration with rclone... rclone mount mostly works but fails sometimes probably to do with those consistency issues noted above.

I implemented rclone serve restic to act as a special purpose gateway for restic. restic then uses an HTTP protocol to speak to rclone over a pipe to access all the backends rclone supports.

I was wondering how difficult something like that would be for borg? I see borg has borg serve which serves a similar purpose when run over an ssh connection. How difficult would it be to re-implement borg serve in Go and add it to rclone? Is the protocol stable? I couldn't find any docs for it but I haven't looked very hard ;-)

DurvalMenezes · 2019-11-14T15:43:25Z

How difficult would it be to re-implement borg serve in Go and add it to rclone?

That would be a killer feature to have, instantly opening all clouds supported by rclone for direct access by borg-backup users.

Borg developers, it would be very appreciated (by myself and tons of others) if you could help @ncw with this.

Thanks in Advance,
-- Durval.

ThomasWaldmann · 2019-11-14T21:52:29Z

@ncw have a look into remote.py, this is where the client/server code is located.

It is basically doing remote procedure calls over stdin/stdout/stderr channels of ssh.

The code there is more or less stable, but the RPC interface is made in a way so it is extensible.

ppenguin · 2020-10-27T16:14:09Z

Sorry to (again) revive this thread, I guess some never stop trying ;)

I'm tempted like others before me to consider plugging in a "bucket-based" cloud backend (like S3 or tardigrade) into borg.
I'm aware of and understand (part of) the concerns in this thread (although being a noob in borg internals).

Based on this comment by enkore in this thread, I was wondering if it is not possible to mitigate the consequences/requirements by focusing the use-case, and I was hoping to get some advice/pointers whether is could be feasible.

As I understand from repository and LoggedIO, in there the repository chunks and indexes (all files) are stored in the repository. If the repository is local, it is basically just writing files, but apparently it is doing some magic to maintain consistency to handle possible interruptions etc.?

But what if one would be more blunt and store these files (chunks and index) by commands like s3 put s3://myrepobucket/123 (pseudo code), while e.g. requiring that:

one repo client only (locking)
to redo a complete chunk if failed is acceptable (as I see in my repo these are up to 500 MB)
The index (probably?) is cached locally during backup and copied to the repo at the end of a successful backup session
The index is downloaded locally before doing a further (incremental) backup session

Then the granularity of consistency would be the size of each uploaded chunk? I.e.: if my backup is interrupted I would need to do two things to get the remote repo in a consistent state:

remove the last uploaded file if the interruption has possibly corrupted it (to be confirmed: how does S3 handle an interrupted upload?)
upload the local index (representing the remote repo state) to the remote

One could reduce the risk by making uploads "more atomic" by uploading temp and renaming. (This mechanism also appears in LoggedIO I believe).

At the risk of appearing too naive with this strategy, I'd love to hear some opinions on the rough feasibility of it.

(As a side note: I'm currently successfully running daily backups to a remote NAS over a slow link using an sshfs mount, and it works well for me (full backup size single TB range). The reason for pursuing the above is to make the borg client leaner/self-contained by removing the sshfs-fuse necessity, so it's easier to run on resource constrained devices. I have successfully run tests with borg on low-spec NAS with local or mounted repos, so part of the prerequisites seem given.)

ThomasWaldmann · 2024-09-17T16:57:08Z

borg2: other backends (using other protocols) can now get implemented much easier than before, since #8332 was merged.

ncw · 2024-09-19T10:35:28Z

I had a look at an example borgstore backend the posixfs backend.

That interface looks relatively easy to implement.

It would be straightforward to make a backend for rclone. This could either use the binary directly, or probably preferred start an API server on a unix socket or stdin/stdout.

Is that of interest?

I note that there is an sftp backend now which would work directly with rclone serve sftp - this could be used straight away.

I note also @ThomasWaldmann pull request to use the restic server which would also work with rclone serve restic and be more efficient that the sftp server.

So maybe a dedicated rclone backend isn't needed? Though I think it could be more seamless if it starts and stops any rclone components itself rather than the user having to start it.

Thoughts?

ThomasWaldmann · 2024-09-19T10:51:53Z

@ncw Yes, rclone as a borgstore backend is very interesting - if that would work good, I guess we would not need to reinvent the wheel (wheel = talking to misc cloud storage providers).

sftp: that already works, but is slow, see borgbackup/borgstore#44 - could maybe be less slow for low-latency connection on localhost though.

borgstore REST client backend PR: there are some differences, so the restic server or the rclone restic-like server don't work "as is", but would need some (easy?) adaptions.

See also #5324, @buengese seems also interested.

ppenguin · 2024-09-19T13:00:17Z

So maybe a dedicated rclone backend isn't needed? Though I think it could be more seamless if it starts and stops any rclone components itself rather than the user having to start it.

Thoughts?

In general this would be great I think since adds multiple backends at once. One caveat: I tried rclone on constrained systems a few years ago (NAS in my case) and I found the performance to be prohibitively slow and (if I remember correctly?) memory consumption/requirements prohibitively high. My focus backend was storj/tardigrade at the time, I forgot what others I tried with it. (I'm still focused on storj though, and in the meantime implemented a backend for kopia for it, because I didn't feel up to doing this in borg. It's not yet submitted to upstream, still testing...)

In other words, rclone would certainly be great, but it maybe shouldn't be a reason to not explore/build "native" protocols for backends it supports directly in borg.

ncw · 2024-09-20T14:51:37Z

@ThomasWaldmann - I agree 100% with your comments on the sftp protocol and paramiko - have had lots of experience with both!

I have made a first attempt at an rclone backend here: borgbackup/borgstore#46

JensRantil mentioned this issue Aug 1, 2016

Borg backup to Amazon S3 on FUSE? #102

Open

anarcat mentioned this issue Nov 6, 2016

create: add rsync protocol support for pulling files from other host #1811

Closed

enkore mentioned this issue Dec 22, 2016

sFTP support like obnam ? #1976

Closed

mikix mentioned this issue Feb 3, 2019

Allow more repository backends #4312

Closed

ncw mentioned this issue Feb 3, 2022

SFTP and new provider borgbase.com rclone/rclone#5974

Closed

ThomasWaldmann mentioned this issue Aug 15, 2024

use borgstore and other big changes #8332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for protocols other than SSH #1070

Support for protocols other than SSH #1070

Compizfox commented May 21, 2016 •

edited

Loading

ThomasWaldmann commented May 21, 2016

verygreen commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

Compizfox commented May 25, 2016 •

edited

Loading

verygreen commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

enkore commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

enkore commented May 25, 2016 •

edited

Loading

Compizfox commented May 25, 2016 •

edited

Loading

RonnyPfannschmidt commented May 25, 2016

verygreen commented May 25, 2016

Compizfox commented May 25, 2016

DavidCWGA commented Oct 27, 2016

RonnyPfannschmidt commented Oct 28, 2016

xeor commented Dec 4, 2016

guillaume-uH57J9 commented Sep 1, 2017 •

edited

Loading

ThomasWaldmann commented Sep 1, 2017

dimejo commented Sep 1, 2017

guillaume-uH57J9 commented Sep 1, 2017

ttr commented Jun 12, 2019

ThomasWaldmann commented Jun 12, 2019

ttr commented Jun 13, 2019

ncw commented Nov 13, 2019

DurvalMenezes commented Nov 14, 2019

ThomasWaldmann commented Nov 14, 2019

ppenguin commented Oct 27, 2020 •

edited

Loading

ThomasWaldmann commented Sep 17, 2024

ncw commented Sep 19, 2024

ThomasWaldmann commented Sep 19, 2024

ppenguin commented Sep 19, 2024

ncw commented Sep 20, 2024

Support for protocols other than SSH #1070

Support for protocols other than SSH #1070

Comments

Compizfox commented May 21, 2016 • edited Loading

ThomasWaldmann commented May 21, 2016

verygreen commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

Compizfox commented May 25, 2016 • edited Loading

verygreen commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

enkore commented May 25, 2016

RonnyPfannschmidt commented May 25, 2016

enkore commented May 25, 2016 • edited Loading

Compizfox commented May 25, 2016 • edited Loading

RonnyPfannschmidt commented May 25, 2016

verygreen commented May 25, 2016

Compizfox commented May 25, 2016

DavidCWGA commented Oct 27, 2016

RonnyPfannschmidt commented Oct 28, 2016

xeor commented Dec 4, 2016

guillaume-uH57J9 commented Sep 1, 2017 • edited Loading

ThomasWaldmann commented Sep 1, 2017

dimejo commented Sep 1, 2017

guillaume-uH57J9 commented Sep 1, 2017

ttr commented Jun 12, 2019

ThomasWaldmann commented Jun 12, 2019

ttr commented Jun 13, 2019

ncw commented Nov 13, 2019

DurvalMenezes commented Nov 14, 2019

ThomasWaldmann commented Nov 14, 2019

ppenguin commented Oct 27, 2020 • edited Loading

ThomasWaldmann commented Sep 17, 2024

ncw commented Sep 19, 2024

ThomasWaldmann commented Sep 19, 2024

ppenguin commented Sep 19, 2024

ncw commented Sep 20, 2024

Compizfox commented May 21, 2016 •

edited

Loading

Compizfox commented May 25, 2016 •

edited

Loading

enkore commented May 25, 2016 •

edited

Loading

Compizfox commented May 25, 2016 •

edited

Loading

guillaume-uH57J9 commented Sep 1, 2017 •

edited

Loading

ppenguin commented Oct 27, 2020 •

edited

Loading