-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ZREP_RESUME=yes when using ZREP_R=-R #141
Comments
this is a very long write up. |
I'll go check out the diffs and get it deployed to our test servers this evening. |
I tested by having a Comcast 'glitch' in the middle of sending When I connected back in and ran
|
hmm.
well, excellent. you have it in a "must resume to continue" state.
So you can then easily tell me why my code isnt working to detect
that, right? :)
…On Mon, Dec 9, 2019 at 8:30 PM Aaron C. de Bruyn ***@***.***> wrote:
I tested by having a Comcast 'glitch' in the middle of sending tank/virt
When I connected back in and ran sync all it failed because tank/virt/vm-101-disk-1 partially sent. There is a resume token for it sitting on the remote box.
***@***.***:~# ZREP_SEND_FLAGS="--raw" ZREP_OUTFILTER="pv -eIrab" ZREP_RESUME=yes ZREP_R=-R ZREP_INC_FLAG=-i /usr/local/bin/zrep -t zrep-remote sync all
sending ***@***.***_000004 to uswuxsdrtr01.--redacted--:tank/backups/usrbgof/officeshare
106KiB [48.8KiB/s] [48.8KiB/s]
Expiring zrep snaps on tank/officeshare
Also running expire on uswuxsdrtr01.--redacted--:tank/backups/usrbgof/officeshare now...
Expiring zrep snaps on tank/backups/usrbgof/officeshare
sending ***@***.***_000003 to uswuxsdrtr01.--redacted--:tank/backups/usrbgof/users
100KiB [52.4KiB/s] [52.4KiB/s]
Expiring zrep snaps on tank/users
Also running expire on uswuxsdrtr01.--redacted--:tank/backups/usrbgof/users now...
Expiring zrep snaps on tank/backups/usrbgof/users
sending ***@***.***_000003 to uswuxsdrtr01.--redacted--:tank/backups/usrbgof/virt
cannot receive incremental stream: destination tank/backups/usrbgof/virt/vm-101-disk-1 contains partially-complete state from "zfs receive -s".
3.40MiB [0.00 B/s] [43.5KiB/s]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Unfortnately I think the only way to resolve this is to walk through the child datasets and check if each one has been partially sent (maybe check the destination for a resume token?) and finish up the send. Say the destination has Maybe a |
oh yeah...
i vague recall contemplating this type of issue, and wondering if ZFS
would handle it properly.
now I think we can say, "it doesnt handle it properly".
remember that zrep doesnt implement -R itself. It relies ZFS handling
of such things.
so... I'm reluctant to have zrep attempt to automatically recover this
type of situation.. because it always AVOIDS looking at filesystems in
a -R bundle individually.
I have no idea what would happen, or what SHOULD be done, in this kind of case.
I'd want to see some kind of officially answer from the ZFS developers
on this one.
…-R is icky :-/
On Sun, Jan 12, 2020 at 8:56 PM Aaron C. de Bruyn ***@***.***> wrote:
Unfortnately I think the only way to resolve this is to walk through the child datasets and check if each one has been partially sent (maybe check the destination for a resume token?) and finish up the send.
Say the destination has
***@***.***_000002
***@***.***_000002
***@***.***_000001 (but a partial send of zrep_000002)
***@***.***_000001 (zrep_000002 never got sent because of an earlier problem during the send of disk2)
Maybe a sync all creates a new snapshot like zrep_000003, and then figures out what each child needs
tank/virt needs zrep_000003 from zrep_000002
tank/virt/disk1 needs zrep_000003 from zrep_000002
tank/virt/disk2 needs to get the resume token, finish the send of zrep_000002, then send zrep_000003 from zrep_000002
tank/virt/disk3 needs zrep_000003 from zrep_000001 (or maybe zrep_000002 needs to be sent first?)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
The more I think about this, the more annoying it is....
Last time I checked, I wonder if the ZFS devs are working on a solution to this, or if they feel it's up to individual admins / software devs to work around it... |
depends how a sub file system is implemented, i would think.
but given all the possibilities, there may not be a conceptually good
solution.
It may simply come down to “don’t use init and -R and resume support all
together”.
Do you think i should make zrep bail out if all 3 are attempted?
…On Mon, Jan 13, 2020 at 12:48 PM Aaron C. de Bruyn ***@***.***> wrote:
The more I think about this, the more annoying it is....
zfs snapshot -r ***@***.*** is great for making sure all disks
attached to a VM are snapshotted at the exact same moment....but zfs send
-Ri @some-other-snap ***@***.*** is a terrible way to ensure all the
child datasets and volumes get transferred.
Last time I checked, zfs send -R will barf if you create a new zvol or
dataset underneath...because the newly created dataset has no
@some-other-snap and it has no existing data on the destination
host...meaning it needs a zfs send -R ***@***.*** or a zfs
send ***@***.*** just to make it consistent with the rest
of the filesystem.
I wonder if the ZFS devs are working on a solution to this, or if they
feel it's up to individual admins / software devs to work around it...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#141?email_source=notifications&email_token=AANEV6P5KGRB2TG6RMBRZUDQ5THSVA5CNFSM4JTBZQA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2HLTI#issuecomment-573863373>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANEV6L3SV3ZNPNHW6OUG33Q5THSVANCNFSM4JTBZQAQ>
.
|
If it bails out, it would make it impossible to simultaneously snapshot the underlying filesystems for backup unless an external tool was used. I think the current behavior is acceptable--you'll get an error from ZFS and it will fail to sync. I'm debating writing a quick script to compare both the source and destination filesystems to figure out where things left off and manually bring them back into sync....sort of like a 'zrep fixup' for when an outage causes this... As for I do that because when you run a The current workflow when there are no children or you aren't using
Maybe adjust it to:
If no resume tokens were found, just
That should bring everything into sync with one minor exception that I think is probably fine. If you delete a child dataset on the source server, it will exist forever on the destination server since it's not involved in syncing. I think it's probably a good idea to not have zrep delete datasets and just stick to snapshots. |
kinda sad/non optimal though.
if a user is going for an init of very large file systems, and there is a
non zero chance of interruption...
seems like they would be better off doing an individual sync of each file
system, with resume support.
Then once initial sync is done, somehow do recursive incremental sync.
but... how to get to the point where you can do a recursive incremental
sync?
what happens if , after individual syncs, you:
1. make a top level recursive snapshot
2. make a normal top level zrep incremental sync with the “copy
intermediate snapshots” flag present?
does the step 1 snapshot get transferred over?
and if so, does it then become a valid checkpoint for future incremental
recursive syncs?
|
i guess i’m missing a key bit of information, which is;
does
zfs snap -R top@snapname
do something “magical” with the snapshot(s) that somehow associate them
together?
or is it merely a convenience function, that creates snapshots with exactly
the same name, and exactly the same TIME... but after that, there is no
association between the snapshots?
you seem to be assuming there is no association.
i was kinda presuming there was. would be nice to know for sure.
|
Related to #55 .
The resume feature works beautifully for me in all but one case.
Imagine you have a filesystem:
I want to ensure anything under
tank/virt
gets a simultaneous snapshot (so all the disks are consistent) and gets backed up.To accomplish that I run:
ZREP_SEND_FLAGS="--raw -v" ZREP_RESUME=yes ZREP_R=-R ZREP_INC_FLAG=-i /usr/local/bin/zrep -t zrep-remote init tank/virt --redacted-- tank/backups/uslaysd/virt
This snapshots everything under
tank/virt
and starts sending it off-site.If the transfer gets interrupted and I attempt to restart it:
If I had to guess, this occurs because zrep is looking at the remote box for
tank/virt
and it has already been transferred.The transfer left off at disk-2:
tank/backups/uslaysd/virt/disk-2 1-176c55bc6e-148-78<snip remainder of resume token>
I think this presents a challenge for zrep as it would have to walk through the descendants on the remote system and figure out:
I currently have to manually resume from this and it's a complex process.
The first step is to find the name and resume token and then
zfs send -vt <token> | ssh root@--redacted-- zfs receive -sv tank/backups/uslaysd/virt/disk-2
.Once that particular disk is done, I have to look at what the remote has:
zfs list -r tank/backups/uslaysd/virt
Let's say is has
disk-1
anddisk-2
, but it doesn't havedisk-3
.Now I have to 'normal' transfer of disk-3:
zfs send -vR tank/virt/disk-3@zrep-remote_000000 | ssh root@--redacted-- zfs receive -sv tank/backups/uslaysd/virt/disk-3
Once that completes, I should have a complete copy of
tank/virt@zrep-remote_000000
on the backup box.This last part I'm a bit fuzzy on--I think if I run a
sync
, it'll error out. I think I have to run azrep sentsync tank/virt@zrep-remote_000000
first to make sure both sides agree on what they have and then a sync will work. I need to do more testing to have a definitive answer on that one.It basically kinda sucks that ZFS treats a
zfs send -R tank/virt@zrep-remote_000000 | zfs receive -s tank/backups/uslaysd/virt
as a bunch of individual units when handling a resume instead of having maybe a-R
flag that can be passed withzfs send -vt
to include missing child datasets. It makes things more complex for zrep. :)The text was updated successfully, but these errors were encountered: