
On Tue, Mar 12, 2019 at 11:52 PM Eric Blake <eblake@redhat.com> wrote:
On 3/12/19 4:35 PM, Nir Soffer wrote:
We don't have a need to list or define snapshots since we managed snapshots on oVirt side. We want an API to list and redefine checkpoints.
But the proposed <domaincheckpoint> also has a <domain> subelement, so it has the same problem (the XML for a bulk operation can become prohibitively large).
Why do we need <domain> in a checkpoint?
The initial design for snapshots did not have <domain>, and it bit us hard; you cannot safely revert to a snapshot if you don't know the state of the domain at that time. The initial design for checkpoints has thus mandated that <domain> be present (my v5 series fails to redefine a snapshot if you omit <domain>, even though it has a NO_DOMAIN flag during dumpxml for reduced output). If we are convinced that defining a snapshot without <domain> is safe enough, then I can relax the checkpoint code to allow redefined metadata without <domain> the way snapshots already have to do it, even though I was hoping that checkpoints could start life with fewer back-compats that snapshot has had to carry along. But I'd rather start strict and relax later (require <domain> and then remove it when proven safe), and not start loose (leave <domain> optional, and then wish we had made it mandatory).
You keep mentioning snapshots, but we are not going to redefine snapshots, only checkpoints. We need to do this only because qcow2 does not store a chain of bitmaps, but un-ordered bitmaps without any metadata, so we need to remind libvirt which bitmaps we have in every disk, and what is the order of the bitmaps. We will be happy if we could just specify the bitmaps and disks in the backup API, and avoid rebuilding the checkopints history, just like we avoid rebuilding the snapshot history.
You can see here what we store in a checkpoint (based on your patches v2 or
v3):
https://ovirt.org/develop/release-management/features/storage/incremental-ba...
Which is:
vm_checkpoints table
- checkpoint_id: UUID - parent: UUID - vm_id: UUID - creation_date: TIMESTAMP
vm_checkpoint_disk_map table - disk_id: UUID - checkpoint_id: UUID
And no state of the <domain> at the time the checkpoint was created?
No, but the configuration of the VM is stored in oVirt, so the backup application can get this configuration from oVirt at the time of the backup.
Our use case is:
1. Start VM on some host 2. When oVirt starts a backup, it adds a checkpoint in oVirt database
And does this database preserve topological ordering? If not, why not?
Using the parent UUID we can create the correct chain in the right order, and this is what we send to vdsm. Based on this, vdsm will generate the
XML
for libvirt.
There is no domain info related to these checkpoints, and I hope we are not adding such requirements at this stage.
Rather, that requirement has been there in all of my drafts, and it sounds like you are trying to get me to relax that requirement to not need a <domain> during checkpoint metadata redefine.
We never understood this requirement it in the html documentation built from your patches. Looking again in v3 - I see: domain The inactive domain configuration at the time the checkpoint was created. Readonly. And we have an example: <domaincheckpoint> <description>Completion of updates after OS install</description> <disks> <disk name='vda' checkpoint='bitmap'/> <disk name='vdb' checkpoint='no'/> </disks> </domaincheckpoint> Which does not have a domain, since it is read-only. There is an example of the xml we get from libvirt: <domaincheckpoint> <name>1525889631</name> <description>Completion of updates after OS install</description> <creationTime>1525889631</creationTime> <parent> <name>1525111885</name> </parent> <disks> <disk name='vda' checkpoint='bitmap' bitmap='1525889631'/> <disk name='vdb' checkpoint='no'/> </disks> <domain type='qemu'> <name>fedora</name> <uuid>93a5c045-6457-2c09-e56c-927cdf34e178</uuid> <memory>1048576</memory> ... <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/path/to/file1'/> <target dev='vda' bus='virtio'/> </disk> <disk type='file' device='disk' snapshot='external'> <driver name='qemu' type='raw'/> <source file='/path/to/file2'/> <target dev='vdb' bus='virtio'/> </disk> ... </devices> </domain> </domaincheckpoint> But as the domain is read-only, we assumed that we don't need to pass it in the domaincheckpoint xml. We also don't need it since we keep the the disks uuids included in every checkpoint, and from this you can find the information needed to generate the tiny checkpoint xml above. Can you explain why historic domain xml is needed for performing a backup? The caller of the API is responsible to give you the list of bitmaps that should be included for every disk. Libvirt needs to create a new active bitmap and start the NBD server exposing the bitmaps specified by the caller. How having the xml for every checkpoint/bitmap is needed for perform these steps? Note also that libvirt does not have snapshot history on oVirt host, since we manage snapshots externally. When libvirt starts a VM, the only thing it knows about snapshots is the chain of disks. Even if we restore the domain xml for every checkpoint, libvirt cannot validate it against the non-existing snapshots. While we use persistent VMs since 4.2, our VMs are actually transient. We started to use persistent VMs because they are more tested, and we can use the VM <metadata> to persist stuff, instead of maintaining our own temporary metadata persistence solution.
That's a different feature request - you're asking for
virDomainCheckpointCreateXML(REDEFINE) to have some additional flag where the metadata re-creation fails if the underlying bitmap is corrupt/missing. Which may still be a good flag to add, but is something you can do with one-at-a-time checkpoint redefinitions, and doesn't, in itself, require bulk redefinition.
Checking for unknown bitmap on disk is not possible unless you have the complete list of checkpoints. Unless you add something like:
virDomainCheckpointValidate()
Which will fail if the checkpoints known to libvirt do not match the bitmaps in the underlying qcow2 images.
Yes, an API like that may be better than trying to validate on a per-redefine basis whether libvirt metadata matches qcow2 metadata.
But even if we add this, the user will have to do:
try: for checkpoint in orderdered_checkpoints: generate xml... redefine checkpoint... except ...: remove all checkpoints...
validate checkpoints
Yes, but that's not hard. And it's no different whether the 'try/except' code is a single libvirt API (with bulk redefine) or multiple libvirt APIs (no bulk redefine, but easy enough to one-at-a-time redefine in topological order). Why you have to generate the XML instead of reusing the XML that libvirt produces is something I'm not sure of (that is, what's wrong with using libvirt's dumpxml output as your input, rather than trying to reconstruct it on the fly?).
The xml libvirt produces may not be unstable. For example the disk uuid "22e5a7d6-5409-482d-bda4-f5939b27aad4" may be called now "sda", but when the vm was running on another host 2 weeks ago maybe it was called "sdb", since the user added or removed some devices. When we create the xml using the disk uuuid, we will always refer to the current name of the disk, regardless of the history of the vm.
Unless the xml for single checkpoint is big, I don't see why all libvirt users need to redefine checkopoints one by one and handle all the possible errors (e.g. some checkpoints redefined, some not).
As currently proposed, <domaincheckpoint> includes a mandatory <domain> sub-element on redefinition, which means that the XML _is_ big for a series of checkpoints. It also means that you're better off storing the XML produced by libvirt than trying to regenerate it by hand, when it comes to metadata redefinition.
Note that vdsm may be killed in the middle of the redefine loop, and in this case we leave livbirt with partial info about checkpoints, and we need to redefine the checkpoints again handling this partial sate.
But that's relatively easy - if you don't know whether libvirt might have partial data, then wipe the data and start the redefine loop from scratch.
But this easy boilerplate will be duplicated in all libvirt clients, instead of having single API to restore the state. Of course if we must keep historic domain configuration for every checkpoint bulk API does not make sense, but I don't understand this requirement. Nir