On Tue, Mar 12, 2019 at 11:02 PM Eric Blake <eblake@redhat.com> wrote:
On 3/12/19 3:29 PM, Nir Soffer wrote:

>>>   snapshots =
>> srcdom.ListAllSnapshots(libvirt.VIR_DOMAIN_SNAPSHOT_LIST_TOPOLOGICAL)
>>>   for snapshot in snapshts:
>>>       xml = snapshot.XMLDesc()
>>>       dstdom.SnapshotCreateXML(xml)
>>>
>>> I don't really see a reason why this is so hard that it justifies
>>> adding these bulk snapshot list/define  APIs.
>>
>> Nir, do you want to chime in here?
>>
>
> We don't have a need to list or define snapshots since we managed snapshots
> on oVirt side.
> We want an API to list and redefine checkpoints.

But the proposed <domaincheckpoint> also has a <domain> subelement, so
it has the same problem (the XML for a bulk operation can become
prohibitively large).

Why do we need <domain> in a checkpoint?

You can see here what we store in a checkpoint (based on your patches v2 or v3):

Which is:

vm_checkpoints table

- checkpoint_id: UUID
- parent: UUID
- vm_id: UUID
- creation_date: TIMESTAMP

vm_checkpoint_disk_map table
- disk_id: UUID
- checkpoint_id: UUID

>
> Our use case is:
>
> 1. Start VM on some host
> 2. When oVirt starts a backup, it adds a checkpoint in oVirt database

And does this database preserve topological ordering? If not, why not?

Using the parent UUID we can create the correct chain in the right order,
and this is what we send to vdsm. Based on this, vdsm will generate the XML
for libvirt.

There is no domain info related to these checkpoints, and I hope we are not
adding such requirements at this stage.
 
> 3. When libvirt starts a backup, libvirt add the checkpoint oVirt created
> 4. Steps 2 and 3 are repeated while the VM is running...
> 5. Stop VM - at this point we undefine the VM, so libivrt lost all info
> about checkopints
> 6. Start VM on some host
> 7. oVirt use the bulk checkpoint API to redefine all checkpoints in oVirt
> database
> 8. If there is unknown bitmap on a disk, or a bitmap is missing, it means
> that someone
>    modified the disk behind oVirt back, and oVirt will delete the affected
> checkpoints
>    and use libvirt checkpoints API to delete the checkpoints and bitmaps.
> This forces
>   full backup for the next backup of the affected disks.
>
> We would like that libvirt will fail to redefine the checkpoints if the
> info in the checkpoints
> does not match the the bitmap on the disks mentioned in the checkpoints.

That's a different feature request - you're asking for
virDomainCheckpointCreateXML(REDEFINE) to have some additional flag
where the metadata re-creation fails if the underlying bitmap is
corrupt/missing. Which may still be a good flag to add, but is something
you can do with one-at-a-time checkpoint redefinitions, and doesn't, in
itself, require bulk redefinition.

Checking for unknown bitmap on disk is not possible unless you have the complete
list of checkpoints. Unless you add something like:

    virDomainCheckpointValidate()

Which will fail if the checkpoints known to libvirt do not match the bitmaps in the 
underlying qcow2 images.

But even if we add this, the user will have to do:

    try:
        for checkpoint in orderdered_checkpoints:
           generate xml...
           redefine checkpoint...
    except ...:
        remove all checkpoints...

    validate checkpoints

Unless the xml for single checkpoint is big, I don't see why all libvirt users
need to redefine checkopoints one by one and handle all the possible errors
(e.g. some checkpoints redefined, some not).

Note that vdsm may be killed in the middle of the redefine loop, and in this case
we leave livbirt with partial info about checkpoints, and we need to redefine 
the checkpoints again handling this partial sate.

We means all libvirt clients that need to managed more than one host.

> To
> do this,
> libvirt must have the complete checkpoint state. It cannot if we define
> checkpoints one
> by one.

As long as you redefine checkpoints in topological order, I don't see
how defining one at a time or bulk changes whether libvirt can validate
that the checkpoints are correct.  And even then, redefining in
topological order is only necessary so that libvirt can ensure that any
<parent/> checkpoint exists before attempting to define the metadata for
a child checkpoint.

>
> If libivirt cannot fail, we need to a way to list all checkpoints and
> bitmaps, so we can
> sync oVirt database with reality.
>
> If we don't validate bitmaps with the expected bitmaps, the backup may fail
> when starting
> a backup (good), or succeed, including wrong data in the backup.
>
> It also does not make sense to redefine checkpoint state in incremental
> way, this like
> building a VM XML with many devices with one call per device.

The argument here is that it will be the same amount of XML whether it
is one call per checkpoint or one call for all checkpoints (note - there
are multiple devices in one checkpoint, so it is NOT one call per
'device * checkpoint')

>
> The same checkpoint management will have to be duplicated in any libvirt
> client that
> manage more then one host (e.g. KubeVirt, OpenStack), since libvirt does
> not handle
> more than one host.

But again, if you are moving checkpoint metadata from one host to
another, doing so one-at-a-time in topological order shouldn't be much
harder than doing so in one single call, but without the drawbacks of
exceeding XML size over RPC call limits.

>
> For more info see
> -
> https://ovirt.org/develop/release-management/features/storage/incremental-backup.html
> - https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml#L11525
> -
> https://github.com/oVirt/vdsm/blob/0874b89a19af2d7726ed2f0b9d674cf0bb48a67e/lib/vdsm/api/vdsm-api.yml#L8129
>

>> Unless Nir speaks up with strong reasons in favor of new bulk APIs
>> (rather than living with the new topological sort flags), I'm okay
>> dropping this series.
>
>
> I'm ok with dropping bulk snapshots, but I think libvirt should provide
> bulk APIs for checkpoints.
>
> Nir
>

Given that the same amount of XML is involved across RPC, I'm
hard-pressed to include bulk operations on one object without providing
it on both.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org