21.04.2018 00:26, Eric Blake wrote:
On 04/20/2018 01:24 PM, John Snow wrote:
>>> Why is option 3 unworkable, exactly?:
>>>
>>> (3) Checkpoints exist as structures only with libvirt. They are saved
>>> and remembered in the XML entirely.
>>>
>>> Or put another way:
>>>
>>> Can you explain to me why it's important for libvirt to be able to
>>> reconstruct checkpoint information from a qcow2 file?
>>>
>> In short it take extra effort for metadata to be consistent when
>> libvirtd crashes occurs. See for more detailed explanation
>> in [1] starting from words "Yes it is possible".
>>
>> [1]
https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html
I'd argue the converse. Libvirt already knows how to do atomic updates
of XML files that it tracks. If libvirtd crashes/restarts in the middle
of an API call, you already have indeterminate results of whether the
API worked or failed; once libvirtd is restarted, you'll have to
probably retry the command. For all other cases, the API call
completes, and either no XML changes were made (the command failed and
reports the failure properly), or all XML changes were made (the command
created the appropriate changes to track the new checkpoint, including
whatever bitmap names have to be recorded to map the relation between
checkpoints and bitmaps).
Consider the case of internal snapshots. Already, we have the case
where qemu itself does not track enough useful metadata about internal
snapshots (right now, just a name and timestamp of creation); so libvirt
additionally tracks further information in <domainsnapshot>: the name,
timestamp, relationship to any previous snapshot (libvirt can then
reconstruct a tree relationship between all snapshots; where a parent
can have more than one child if you roll back to a snapshot and then
execute the guest differently), the set of disks participating in the
snapshot, and the <domain> description at the time of the snapshot (if
you hotplug devices, or even the fact that creating external snapshots
changes which file is the active qcow2 in a backing chain, you'll need
to know how to roll back to the prior domain state as part of
reverting). This is approximately the same set of information that a
<domaincheckpoint> will need to track.
I'm slightly tempted to just overload <domainsnapshot> to track three
modes instead of two (internal, external, and now checkpoint); but think
that will probably be a bit too confusing, so more likely I will create
<domaincheckpoint> as a new object, but copy a lot of coding paradigms
from <domainsnapshot>.
So, from that point of view, libvirt tracking the relationship between
qcow2 bitmaps in order to form checkpoint information can be done ALL
with libvirt, and without NEEDING the qcow2 file to track any relations
between bitmaps. BUT, libvirt's job can probably be made easier if
qcow2 would, at the least, allow bitmaps to track their parent, and/or
provide APIs to easily merge a parent..intermediate..child chain of
related bitmaps to be merged into a single bitmap, for easy runtime
creation of the temporary bitmap used to express the delta between two
checkpoints.
I don't think this is a good idea:
https://www.redhat.com/archives/libvir-list/2018-April/msg01306.html
In short, I think, if we do something to support checkpoints in qemu
(updated BdrvDirtyBitmap, qapi, qcow2 and migration stream, new nbd meta
context), we'd better implement checkpoints, than .parent relationship.
> OK; I can't speak to the XML design (I'll leave that to Eric and other
> libvirt engineers) but the data consistency issues make sense.
And I'm still trying to figure out exactly what is needed, to capture
everything needed to create checkpoints and take backups (both push and
pull model). Reverting to data from an external backup may be a bit
more manual, at least at first (after all, we STILL don't have decent
libvirt support for rolling back to external snapshots, several years
later). In other words, my focus right now is "how can we safely track
checkpoints for capturing of point-in-time incremental backups with
minimal guest downtime", rather than "given an incremental backup
captured previously, how do we roll a guest back to that point in time".
> ATM I am concerned that by shifting the snapshots into bitmap names that
> you still leave yourself open for data corruption if these bitmaps are
> modified outside of libvirt -- these third party tools can't possibly
> understand the schema that they were created under.
>
> (Though I suppose very simply that if a bitmap is missing you'd be able
> to detect that in libvirt and signal an error, but it's not very nice.)
Well, we also have to realize that third-party tools shouldn't really be
mucking around with bitmaps they don't understand. If you are going to
manipulate a qcow2 file that contains persistent bitmaps, you should not
delete a bitmap you did not create; and if the bitmap is autoloaded, you
must obey the rules and amend the bitmap for any guest-visible changes
you make during your data edits. Just like a third-party tool shouldn't
really be deleting internal snapshots it didn't create. I don't think
we have to worry as much about being robust to what a third party tool
would do behind our backs (after all, the point of the pull model
backups is so that third-party tools can track the backup in the format
THEY choose, after reading the dirty bitmap and data over NBD, rather
than having to learn qcow2).
> I'll pick up discussion with Eric and Vladimir in the other portion of
> this thread where we're discussing a checkpoints API and we'll pick this
> up on QEMU list if need be.
Yes, between this thread, and some IRC chats I've had with John in the
meantime, it looks like we DO want some improvements on the qcow2 side
of things on the qemu list.
Other things that I need to capture from IRC:
Right now, it sounds like the incremental backup model (whether push or
pull) is heavily dependent on qcow2 files for persistent bitmaps. While
libvirt can perform external snapshots by creating a qcow2 wrapper
around any file type, and live commit can then merge that qcow2 file
back into the original file, libvirt is already insistent that internal
snapshots can only be taken if all disks are qcow2. So the same logic
will apply to taking backups (whether the backup is incremental by
starting from a checkpoint, or full over the complete disk contents).
Also, how should checkpoints interact with external snapshots? Suppose
I have:
base <- snap1
and create a checkpoint at time T1 (which really means I create a bitmap
titled B1 to track all changes that occur _after_ T1). Then later I
create an external snapshot, so that now I have:
base <- snap1 <- snap2
at that point, the bitmap B1 in snap1 is no longer being modified,
because snap1 is read-only. But we STILL want to track changes since
T1, which means we NEED a way in qemu to not only add snap2 as a new
snapshot, but ALSO to create a new bitmap B2 in snap2, that tracks all
changes (until the next checkpoint, of course). Whether B2 starts life
empty (and libvirt just has to remember that it must merge snap1.B1 and
snap2.B2 when constructing the delta), or whether B2 starts life as a
clone of the final contents of snap1.B1, is something that we need to
consider in qemu.
I'm sure that the latter is a true way, in which snapshots are actually
unrelated to checkpoints. We just have a "snapshot" of the bitmap in
snapshot file.
Here is an additional interesting point: it works for internal snapshots
too, as bitmaps will go to the state through migration channel (if we
enable corresponding capability, of course)
And if there is more than one bitmap on snap1, do we
need to bring all of those bitmaps forward into snap2, or just the one
that was currently active?
Again, I think, to make snapshots unrelated, it's better to keep them
all. Let disk snapshot to be a snapshot of dirty-bitmaps too.
Similarly, if we later decide to live commit
snap2 back into snap1, we'll want to merge the changes in snap2.B2 back
into snap1.B1 (now that snap1 is once again active, it needs to track
all changes that were merged in, and all future changes until the next
snapshot).
And here we will just drop older versions of bitmaps.
Which means we need to at least be thinking about cross-node
snapshot merges,
hmm, what is it?
even if, from the libvirt perspective, checkpoints are
more of a per-drive attribute rather than a per-node attribute.
--
Best regards,
Vladimir