-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 148 Snapper: VM snapshots #109
Comments
@twhiteman asked via chat and I answered
TBD.
There is a concern about receiving snapshots from untrusted sources. Unless we have RFD 14 implemented, we can't put the image somewhere that the customer could tamper with it.
IMGAPI images (I think) only cover the boot disk and require that the guest be rebooted to run a prepare-image script. That being said, it may make a lot of sense to extend IMGAPI to cover this use case. |
@mgerdts The subject has wrong RFD number, Snapper: VM snapshots is RFD 148. |
What will we do about snapshots that are extremely large? Say a customer has a 1TB or higher instance. Will we be able to reliably send such large snapshots to manta? Will the customer also be charged for data usage in manta? |
On Wed, Jul 18, 2018 at 12:50 PM, Michael Zeller ***@***.***> wrote:
What will we do about snapshots that are extremely large? Say a customer
has a 1TB or higher instance. Will we be able to reliably send such large
snapshots to manta?
We will have to experiment with how reliable that is. The stories I've
heard in the past of problems with replication failures of large ZFS
streams have generally been around sending data over long distances with
questionable networks in the middle. When transmissions involve terabytes
of data, with significant bit flips in the network, the errors undetectable
by TCP's checksum became a common-enough-to-worry issue.
Customers that want quick snapshots (plus upload) of such large streams
should probably have a practice of frequently sending snapshots, so that
the delta is never that high. We may even want to (in a future project)
offer a replication strategy that makes it possible to set such a schedule
automatically. This scheme would favor replication to a live pool rather
than to manta.
Receiving into a live pool also has an advantage for the "failure with huge streams" problem. From zfs(1M):
```
-s If the receive is interrupted, save the partially received state,
rather than deleting it. Interruption may be due to premature
termination of the stream (e.g. due to network failure or failure
of the remote system if the stream is being read over a network
connection), a checksum error in the stream, termination of the zfs
receive process, or unclean shutdown of the system.
The receive can be resumed with a stream generated by zfs send -t
token, where the token is the value of the receive_resume_token
property of the filesystem or volume which is received into.
To use this flag, the storage pool must have the extensible_dataset
feature enabled. See zpool-features(5) for details on ZFS feature
flags.
```
Will the customer also be charged for data usage in manta?
I would expect that any "per GiB per month" for snapshots would essentially
be there to cover the costs of snapper's manta. Since this would likely
not be in a customer's account, normal manta billing would not do the trick.
|
One minor concern I have is regarding the Manta paths for snapshots. Overall I like the scheme, but there's one (potentially) common use-case it would flake out on: regular database snapshots. Regularly snapshotting a database is a good idea, and since those snapshots will hopefully never be used, there's a monetary incentive to stick to incremental. This will result in a very deep directory structure. I don't know what Manta's directory path limit in chars is, but in practice HTTP headers over 8K are asking for trouble. |
@marsell said:
I'm not so sure the monetary incentive is to always use incremental from the latest, as that means you can never remove any snapshot except the latest. If the source of an incremental is able to be chosen, it would allow for a scheme like a monthly full, daily incremental from the monthly full, hourly incremental from the previous daily or hourly.
Quite a valid point here. Presuming we use IMGAPI, we can probably leverage whatever support it already has for not deleting images that have children. The proposed hierarchy is clearly not the only way to accomplish this. |
Thanks for starting this. I realize that this is an early draft; I think there are a couple of different classes of issues that are worth User Visible AspectsFirst, while I understand the differentiation and practicality of a full Next, I have a bunch of questions about when can snapshots be taken. One of the main points of the introduction is that this is supposed to Storage of SnapshotsIn most cases writing to Manta will be done over the WAN. I think the Conversely, the local storage discussion isn't as straightforward. A It's not clear in the section that's talking about ZFS reservations as Intersection with Image CreationFolks are also going to probably want to say something like can I take |
I'll add specifics as to how I think https://apidocs.joyent.com/cloudapi/#CreateMachineSnapshot and related calls will be used.
I think the introduction and Anatomy of a VM snapshot made that clear. In particular:
There are limitations. In particular, the following are not part of the snapshot.
These limitations match the limitations of snapshots currently supported with Until such a time as we are able to roll back all configuration, do we need to block configuration changes while snapshots exist?
It is a crash-consistent image. Use snapshots if and only if your file system and consumers of raw disk can withstand an unexpected power outage.
I think I already covered this above. These snapshots will have many of the same issues as the snapshots that we already support.
I had initially proposed having some infrastructure zones with delegated datasets (snapper zones). There would be a set (minimum two, more over time) per data center. We would leverage the migration code (RFD 34) to send the VM's dataset hierarchy to the the delegated datasets of two snapper zones. The stream would be received into the snapper zone's delegate dataset. @twhiteman suggested that things would be much simpler if we relied on Manta to handle replication and maintenance of redundancy in the face of failures. Further discussion led to the idea that storage in Manta may lead to a lot of overlap with IMGAPI. That would contribute nicely to another customer request - the ability to deploy clones from snapshots.
If we had some form of elastic storage, Snapper becomes much more practical because the per-snapper limitations become much more flexible and resilience can be delegated to the elastic storage. Elastic storage is not this project. We need clarity on the requirements to know which path we should be pursuing.
If storing to a file not in manta, the expectation is that the customer's NFS server could be mounted on each CN. In no way is this project about providing NFS, SMB, etc. If using a CN's NFS client is for some reason problematic, then we may be at a point of requiring temporary space at least as large as the largest VM and the ability to use scp or similar to copy it off host.
Will clarify
Will clarify |
I haven't had a chance to fully read and understand the RFD and all the discussion, as it is quite large and complex. But I'm a massive fan of KISS, and as an end user of Triton and SmartOS in production on our cloud, the core MVP functionality we're after is simply: Easy
This is basic functionality that is missing which we already have on our non-Triton based SmartOS cloud and works fine - AFAIK it's trivial to implement. Having this would be exceptionally helpful! We don't use delegated datasets, but if we did I imagine adding a "recursive" option for SmartOS zones would be handy which includes the delegated datasets, same for rollbacks. Otherwise, it just does the zone root. Medium
Again the above seems fairly straight forward and plugs a big gap in functionality easily. Hard
It would be nice if Manta wasn't needed, as we don't have an intention of spinning up a Manta instance (and AFAIK Joyent doesn't currently have Manta in eu-ams-1 and support told me there were no plans to in the near term). KISS principle to me suggests that for a non-manta installation, images are pushed for storage on headnode via a similar mechanism to whatever imgadm uses. Hope the above is helpful. |
This is for discussion of
RFD 148 VM Snapshots
https://github.com/joyent/rfd/blob/master/rfd/0148/README.md
The text was updated successfully, but these errors were encountered: