On Wed, Apr 24, 2019 at 13:43:03 -0500, Eric Blake wrote:
On 4/24/19 8:26 AM, Peter Krempa wrote:
> On Wed, Apr 17, 2019 at 09:09:03 -0500, Eric Blake wrote:
>> Introduce a bunch of new public APIs related to backup checkpoints.
>> Checkpoints are modeled heavily after virDomainSnapshotPtr (both
>> represent a point in time of the guest), although a snapshot exists
>> with the intent of rolling back to that state, while a checkpoint
>> exists to make it possible to create an incremental backup at a later
>> time.
>>
>> The following map shows the API relations to snapshots, with new APIs
>> on the right:
>>
>> Operate on a domain object to create/redefine a child:
>> virDomainSnapshotCreateXML virDomainCheckpointCreateXML
>>
>> Operate on a child object for lifetime management:
>> virDomainSnapshotDelete virDomainCheckpointDelete
>> virDomainSnapshotFree virDomainCheckpointFree
>> virDomainSnapshotRef virDomainCheckpointRef
>>
>> Operate on a child object to learn more about it:
>> virDomainSnapshotGetXMLDesc virDomainCheckpointGetXMLDesc
>> virDomainSnapshotGetConnect virDomainCheckpointGetConnect
>> virDomainSnapshotGetDomain virDomainCheckpointGetDomain
>> virDomainSnapshotGetName virDomainCheckpiontGetName
>> virDomainSnapshotGetParent virDomainCheckpiontGetParent
>> virDomainSnapshotHasMetadata virDomainCheckpointHasMetadata
One additional thing I remembered is that snapshots without metadata are
a dead end and don't make much sense. While backups without metadata can
be at least somewhat more usable since we don't really need a VM
definition to go with them I'd consider dropping that notion completely.
If users wan't to use backups, metadata should be present at all time.
>> virDomainSnapshotIsCurrent
virDomainCheckpointIsCurrent
>
> The 'current' checkpoint has very sparse documentation. While it makes
> some sense for the current snapshot to exist I'm not persuaded we need
> this for checkpoints.
>
> In case of checkpoints the bitmaps backing it track the state even if
> you create a new bitmap unlike checkpoints. Thus it does not seem to
> make sense to me.
>
> If you think it does please elaborate, ideally in form of documentation.
I can add documentation, but the short answer is that there are two ways
to implement differential backups:
1. One active bitmap for every point in time that you might ever need a
differential backup from; if you have 20 checkpoints, then you have 20
active bitmaps, and each guest write has to potentially modify all 20
bitmaps, but backups can directly use the desired bitmap and nothing
else for that particular backup:
Time: T1 T2 T3 T4 present
Bitmap1: --------------------------------
Bitmap2: --------------------------
Bitmap3: -------------------
Bitmap4: ------------
2. One disabled bitmap for every delta between two consecutive points in
time, plus one active bitmap since the most recent point in time; if you
have 20 checkpoints, then you have 19 disabled bitmap and 1 active 1,
where each guest write only has to modify 1 bitmap, but backups have to
construct a temporary bitmap that contains the union of all bitmaps that
span the time in question:
Time: T1 T2 T3 T4 present
Bitmap1: ------
Bitmap2: -------
Bitmap3: -------
Bitmap4: ------------
Given that guest writes are more frequent than backup operations, and
that performing the creation of a temporary bitmap to merge together
other bitmaps at the start of a bitmap is not all that onerous, the code
goes with approach 2. And, since there is always 1 active bitmap while
the rest are disabled, it lends itself well to the notion of a current
checkpoint (one that is tracking active guest writes).
This makes sense as long as other implementations do/allow the same. If
you have a hypervisor that implements only type 1 you won't have another
option.
In that case though we should model it differently. The current or
active bitmap/checkpoint should be that one that actually receives
bitmap updates. That means that the value should reflec the actual state
from the VM rather than something saved. The problem is that this is not
compatible with the mode described above. As the notion of the 'current'
checkpoint supports only 1, while there can be possibly more of them
that receive updates.
Also it seems to me that the user does not need to care which one is
active at all. It's just internal implementation. User only cares about
the times when they were created and whether they can do a backup of
differences between a time in the past and now.
Also it's probably good to mention that a bitmap currently will not
support a difference between two points in the past, that means that
from the top level, both behave identically.
I just don't see enough solid reasons to have 'current' as a property of
the checkpoint.
>> Operate on a domain object to list all children:
>> virDomainSnapshotNum (no counterpart, this is the old
>> virDomainSnapshotListNames racy interface)
>> virDomainSnapshotListAllSnapshots virDomainListAllCheckpoints
>>
>> Operate on a child object to list descendents:
>> virDomainSnapshotNumChildren (no counterpart, this is the old
>> virDomainSnapshotListChildrenNames racy interface)
>> virDomainSnapshotListAllChildren virDomainCheckpointListAllChildren
>>
>> Operate on a domain to locate a particular child:
>> virDomainSnapshotLookupByName virDomainCheckpointLookupByName
>> virDomainHasCurrentSnapshot virDomainHasCurrentCheckpoint
>> virDomainSnapshotCurrent virDomainCheckpointCurrent
>>
>> Operate on a snapshot to roll back to earlier state:
>> virDomainSnapshotRevert (no counterpart, instead checkpoints
>> are used in incremental backups via
>
> This patch or a different one should also add docs to virDomainSnapshotRevert
> that outline what happens to the checkpoints when reverting snapshots.
Right now, it needs to be an error (until we actually design how the two
should play together nicely, the most conservative approach is to state
that reverting a domain with checkpoints risks enough confusion to be
forbidden in the initial implementation, even if we lift that
restriction later when we iron out how it should work).
I think we need more than that. At the point when a snapshot is taken it
needs also snapshot the state of checkpoints. Otherwise we won't be able
to reconstruct the state from the point when the snapshot was taken.