On 04/23/2018 05:31 AM, Vladimir Sementsov-Ogievskiy wrote:
21.04.2018 00:26, Eric Blake wrote:
> On 04/20/2018 01:24 PM, John Snow wrote:
>
>>>> Why is option 3 unworkable, exactly?:
>>>>
>>>> (3) Checkpoints exist as structures only with libvirt. They are saved
>>>> and remembered in the XML entirely.
>>>>
>>>> Or put another way:
>>>>
>>>> Can you explain to me why it's important for libvirt to be able to
>>>> reconstruct checkpoint information from a qcow2 file?
>>>>
>>> In short it take extra effort for metadata to be consistent when
>>> libvirtd crashes occurs. See for more detailed explanation
>>> in [1] starting from words "Yes it is possible".
>>>
>>> [1]
>>>
https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html
> I'd argue the converse. Libvirt already knows how to do atomic updates
> of XML files that it tracks. If libvirtd crashes/restarts in the middle
> of an API call, you already have indeterminate results of whether the
> API worked or failed; once libvirtd is restarted, you'll have to
> probably retry the command. For all other cases, the API call
> completes, and either no XML changes were made (the command failed and
> reports the failure properly), or all XML changes were made (the command
> created the appropriate changes to track the new checkpoint, including
> whatever bitmap names have to be recorded to map the relation between
> checkpoints and bitmaps).
>
> Consider the case of internal snapshots. Already, we have the case
> where qemu itself does not track enough useful metadata about internal
> snapshots (right now, just a name and timestamp of creation); so libvirt
> additionally tracks further information in <domainsnapshot>: the name,
> timestamp, relationship to any previous snapshot (libvirt can then
> reconstruct a tree relationship between all snapshots; where a parent
> can have more than one child if you roll back to a snapshot and then
> execute the guest differently), the set of disks participating in the
> snapshot, and the <domain> description at the time of the snapshot (if
> you hotplug devices, or even the fact that creating external snapshots
> changes which file is the active qcow2 in a backing chain, you'll need
> to know how to roll back to the prior domain state as part of
> reverting). This is approximately the same set of information that a
> <domaincheckpoint> will need to track.
>
> I'm slightly tempted to just overload <domainsnapshot> to track three
> modes instead of two (internal, external, and now checkpoint); but think
> that will probably be a bit too confusing, so more likely I will create
> <domaincheckpoint> as a new object, but copy a lot of coding paradigms
> from <domainsnapshot>.
>
> So, from that point of view, libvirt tracking the relationship between
> qcow2 bitmaps in order to form checkpoint information can be done ALL
> with libvirt, and without NEEDING the qcow2 file to track any relations
> between bitmaps. BUT, libvirt's job can probably be made easier if
> qcow2 would, at the least, allow bitmaps to track their parent, and/or
> provide APIs to easily merge a parent..intermediate..child chain of
> related bitmaps to be merged into a single bitmap, for easy runtime
> creation of the temporary bitmap used to express the delta between two
> checkpoints.
I don't think this is a good idea:
https://www.redhat.com/archives/libvir-list/2018-April/msg01306.html
In short, I think, if we do something to support checkpoints in qemu
(updated BdrvDirtyBitmap, qapi, qcow2 and migration stream, new nbd meta
context), we'd better implement checkpoints, than .parent relationship.
[I'm going to answer this in response to the thread you've referenced.]
>
>> OK; I can't speak to the XML design (I'll leave that to Eric and other
>> libvirt engineers) but the data consistency issues make sense.
> And I'm still trying to figure out exactly what is needed, to capture
> everything needed to create checkpoints and take backups (both push and
> pull model). Reverting to data from an external backup may be a bit
> more manual, at least at first (after all, we STILL don't have decent
> libvirt support for rolling back to external snapshots, several years
> later). In other words, my focus right now is "how can we safely track
> checkpoints for capturing of point-in-time incremental backups with
> minimal guest downtime", rather than "given an incremental backup
> captured previously, how do we roll a guest back to that point in time".
>
>> ATM I am concerned that by shifting the snapshots into bitmap names that
>> you still leave yourself open for data corruption if these bitmaps are
>> modified outside of libvirt -- these third party tools can't possibly
>> understand the schema that they were created under.
>>
>> (Though I suppose very simply that if a bitmap is missing you'd be able
>> to detect that in libvirt and signal an error, but it's not very nice.)
> Well, we also have to realize that third-party tools shouldn't really be
> mucking around with bitmaps they don't understand. If you are going to
> manipulate a qcow2 file that contains persistent bitmaps, you should not
> delete a bitmap you did not create; and if the bitmap is autoloaded, you
> must obey the rules and amend the bitmap for any guest-visible changes
> you make during your data edits. Just like a third-party tool shouldn't
> really be deleting internal snapshots it didn't create. I don't think
> we have to worry as much about being robust to what a third party tool
> would do behind our backs (after all, the point of the pull model
> backups is so that third-party tools can track the backup in the format
> THEY choose, after reading the dirty bitmap and data over NBD, rather
> than having to learn qcow2).
>
>> I'll pick up discussion with Eric and Vladimir in the other portion of
>> this thread where we're discussing a checkpoints API and we'll pick this
>> up on QEMU list if need be.
> Yes, between this thread, and some IRC chats I've had with John in the
> meantime, it looks like we DO want some improvements on the qcow2 side
> of things on the qemu list.
>
> Other things that I need to capture from IRC:
>
> Right now, it sounds like the incremental backup model (whether push or
> pull) is heavily dependent on qcow2 files for persistent bitmaps. While
> libvirt can perform external snapshots by creating a qcow2 wrapper
> around any file type, and live commit can then merge that qcow2 file
> back into the original file, libvirt is already insistent that internal
> snapshots can only be taken if all disks are qcow2. So the same logic
> will apply to taking backups (whether the backup is incremental by
> starting from a checkpoint, or full over the complete disk contents).
>
> Also, how should checkpoints interact with external snapshots? Suppose
> I have:
>
> base <- snap1
>
> and create a checkpoint at time T1 (which really means I create a bitmap
> titled B1 to track all changes that occur _after_ T1). Then later I
> create an external snapshot, so that now I have:
>
> base <- snap1 <- snap2
>
> at that point, the bitmap B1 in snap1 is no longer being modified,
> because snap1 is read-only. But we STILL want to track changes since
> T1, which means we NEED a way in qemu to not only add snap2 as a new
> snapshot, but ALSO to create a new bitmap B2 in snap2, that tracks all
> changes (until the next checkpoint, of course). Whether B2 starts life
> empty (and libvirt just has to remember that it must merge snap1.B1 and
> snap2.B2 when constructing the delta), or whether B2 starts life as a
> clone of the final contents of snap1.B1, is something that we need to
> consider in qemu.
I'm sure that the latter is a true way, in which snapshots are actually
unrelated to checkpoints. We just have a "snapshot" of the bitmap in
snapshot file.
This is roughly where I came down in terms of the "quick" way. If we
copy everything up into the new active layer there's not much else to
do. The existing commands and API in QEMU can just continue ignorant of
what happened.
Here is an additional interesting point: it works for internal
snapshots
too, as bitmaps will go to the state through migration channel (if we
enable corresponding capability, of course)
> And if there is more than one bitmap on snap1, do we
> need to bring all of those bitmaps forward into snap2, or just the one
> that was currently active?
Again, I think, to make snapshots unrelated, it's better to keep them
all. Let disk snapshot to be a snapshot of dirty-bitmaps too.
Agree for the same reasons -- unless we want to complicate the bitmap
mechanisms ... which I think we do not.
> Similarly, if we later decide to live commit
> snap2 back into snap1, we'll want to merge the changes in snap2.B2 back
> into snap1.B1 (now that snap1 is once again active, it needs to track
> all changes that were merged in, and all future changes until the next
> snapshot).
And here we will just drop older versions of bitmaps.
I think:
- Any names that conflict, the bitmap in the backing layer is dropped.
- Any existing inactive bitmaps can stay around.
- Any existing active bitmaps will need to be updated to record the new
writes that were caused by the commit.
> Which means we need to at least be thinking about cross-node
> snapshot merges,
hmm, what is it?
[Assuming Eric explained in his reply.]
> even if, from the libvirt perspective, checkpoints are
> more of a per-drive attribute rather than a per-node attribute.
>