-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Filter zero sized snapshots by written property. #34
Comments
I have just realised that it is irrelevant how much was written between current and next, only the amount between current and previous is important. I think I got onto the wrong track in my initial thinking. If written between current and next is zero then next must also have a 'USED' property of zero and will fail the current/prev filter and so will be deleted, and this process will continue until there is a non-zero written between current and the next remaining snapshot. If we require both previous and next when filtering then we would delete the snapshots in the order they are visited until the smallest set is discovered, rather than keeping the first point of the change. Consider:
If we delete snapshots that having nothing written since their previous we would keep A,B,E since C and D hold nothing that B doesn't. If the written property is called without the |
I've just noticed that you list/process the snapshots in reverse chronological order. This makes finding the 'written' since previous harder to do, but I think also it is less efficient than doing them in chronological order anyway. Consider that in the typical case older snapshots are more likely to have already had zero sized snapshots cleaned up, then going in chronological order means that these snapshots will not be destroyed and so the cached The choice of reverse chronological order has the effect of coallescing snapshots to the first point in time that some data existed, rather than the last point in time (i.e. currently the shared snapshots are removed newest to oldest so that the oldest becomes the unique holder) but was this intentional? |
@Parakleta could you show a |
also, when creating a snapshot, what do you think is a proper |
@mailinglists35 it depends on how you define zero sized. This is the point of the issue that I raised. I'm not sure what else you want to know than what I have already written. AFAIK a snapshot is always zero sized at the time when you create it, so your second question doesn't make sense. |
OK, so it's taken me a while to break this down to actually figure out what I need, but I have no experience with Ruby so I'm hoping you'll consider adding this as an option for me.
Essentially I have discovered that the 'USED' space of a snapshot is only what is uniquely referenced by that snapshot. The setup I have is with two different overlapping automatic snapshots running, one is frequent but sparse (remove empties), and the other is infrequent but dense (keep empties). The aim is to have the last 500 changes in a data set at 5 minute intervals, and hourly intervals for the last 4 weeks. The problem is that where the two snapshot sequences overlap (i.e. on the hour) the sparse set always registers as zero-sized because it shares its dataset with the hourly snapshot and so is removed.
There is however a property called
written@snapshot
which describes how much data was written to the target since the requested snapshot. For example:will say how much data was written to the
06h05U
snapshot that didn't exist in the06h00U
snapshot. While this doesn't tell us the size of the06h00U
snapshot we know that it must also contain some data that the06h05U
snapshot doesn't (if only because the metadata addressing the written data must have been updated) and so it is the last snapshot on that 'interval' to reference that data. This means that if this snapshot has a zero size currently, we do know that at some point in the future if older snapshots are removed then this snapshot will become non-zero sized.If we only used this metric then we may double our number of kept snapshots because the previous kept snapshot may already hold the same data as us. Effectively the snapshot we now keep may actually represents the last point that the kept snapshot before it remains unchanged. To differentiate this case we need to ensure that this snapshot also differs from the previous kept snapshot.
So, essentially what I am asking for is a filter for each zero sized snapshots that checks whether data was written between the previous snapshot and itself, and between itself and the next snapshot. If data exists in both of those cases I would like to preserve the snapshot because its 'USED' property would be non-zero if all snapshots not of its 'interval' were removed. The edge case of the most recent snapshot doesn't apply because it isn't subject to zero sized removal anyway, and the oldest snapshot should treat its written against previous as non-zero since it must be unique on this 'interval'.
The text was updated successfully, but these errors were encountered: