On Tue, Sep 06, 2022 at 12:19:38PM +0200, Peter Krempa wrote:
On Tue, Aug 23, 2022 at 18:32:23 +0200, Pavel Hrdina wrote:
> In order to save some CPU cycles we will collect all the necessary data
> to delete external snapshot before we even start. They will be later
> used by code that deletes the snapshots and updates metadata when
> needed.
>
> With external snapshots we need data that libvirt gets from running QEMU
> process so if the VM is not running we need to start paused QEMU process
> for the snapshot deletion and kill at afterwards.
>
> Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com>
> ---
> src/qemu/qemu_snapshot.c | 144 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 142 insertions(+), 2 deletions(-)
First a few general notes for all upcoming patches.
1) All of the non-trivial functions dealing with snapshots should have
documentation explaing what they do. The snapshot code, as you certainly
know, is very complex and in many cases un-obvious. Having them will
save us (or anybody else wanting to maintain the code) some headaches in
the future.
2) I'd also suggest that you add a knowledge base internals article
outlining how snapshot deletion is supposed to work. In case there are
some user-visible caveats, we also have an user-focused article on
snapshots ( kbase/snapshots.rst ).
3) I'll try to assess the code also from the point of view of changes
needed to implement reversion of snapshots and the intricacies that
might bring for deletion.
> diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
> index da9b4c30f0..dade5dcea4 100644
> --- a/src/qemu/qemu_snapshot.c
> +++ b/src/qemu/qemu_snapshot.c
> @@ -2232,6 +2232,120 @@ qemuSnapshotRevert(virDomainObj *vm,
> }
>
>
> +typedef struct {
> + virDomainMomentObj *parentSnap;
> + virDomainSnapshotDiskDef *snapDisk;
> + virDomainDiskDef *domDisk;
> + virDomainDiskDef *parentDomDisk;
> + virStorageSource *diskSrc;
> + virStorageSource *parentDiskSrc;
> + virStorageSource *prevDiskSrc;
> + qemuBlockJobData *job;
> +} qemuSnapshotDeleteExternalData;
> +
> +
> +static virDomainMomentObj*
> +qemuSnapshotFindParentSnapForDisk(virDomainMomentObj *snap,
> + virDomainSnapshotDiskDef *snapDisk)
> +{
> + virDomainMomentObj *parentSnap = snap->parent;
> +
> + while (parentSnap) {
> + ssize_t i;
> + virDomainSnapshotDef *parentSnapdef =
virDomainSnapshotObjGetDef(parentSnap);
> +
> + if (!parentSnapdef)
> + break;
> +
> + for (i = 0; i < parentSnapdef->ndisks; i++) {
> + virDomainSnapshotDiskDef *parentSnapDisk =
&(parentSnapdef->disks[i]);
> +
> + if (parentSnapDisk->snapshot != VIR_DOMAIN_SNAPSHOT_LOCATION_NO
&&
> + STREQ(snapDisk->name, parentSnapDisk->name)) {
> + return parentSnap;
> + }
> + }
> +
> + parentSnap = parentSnap->parent;
> + }
> +
> + return NULL;
> +}
> +
> +
> +static GPtrArray*
> +qemuSnapshotDeleteExternalPrepare(virDomainObj *vm,
> + virDomainMomentObj *snap)
> +{
> + ssize_t i;
> + virDomainSnapshotDef *snapdef = virDomainSnapshotObjGetDef(snap);
> + g_autoptr(GPtrArray) ret = g_ptr_array_new_full(0, g_free);
Is there any specific requirement to access these randomly?
(specifically why GPtrArray is used instead of e.g. a G(S)List?
There is no need to access these randomly but G(S)List doesn't work that
nicely with g_autoptr and dynamically allocated elements. For this case
we would have to use g_list_free_full().
> + for (i = 0; i < snapdef->ndisks; i++) {
> + g_autofree qemuSnapshotDeleteExternalData *data = NULL;
> + virDomainSnapshotDiskDef *snapDisk = &(snapdef->disks[i]);
> +
> + if (snapDisk->snapshot == VIR_DOMAIN_SNAPSHOT_LOCATION_NO)
> + continue;
[...]
> typedef struct _virQEMUMomentReparent virQEMUMomentReparent;
> struct _virQEMUMomentReparent {
> const char *dir;
> @@ -2504,9 +2618,9 @@ qemuSnapshotDeleteValidate(virDomainMomentObj *snap,
> }
>
> if (virDomainSnapshotIsExternal(snap) &&
> - !(flags & VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY)) {
> + (flags & VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN)) {
> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> - _("deletion of external disk snapshots not
supported"));
> + _("deletion of external disk snapshot with internal
children disk snapshots not supported"));
This seems a bit unrelated to this patch.
> return -1;
> }
>
> @@ -2523,6 +2637,8 @@ qemuSnapshotDelete(virDomainObj *vm,
> int ret = -1;
> virDomainMomentObj *snap = NULL;
> bool metadata_only = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY);
> + bool stop_qemu = false;
> + g_autoptr(GPtrArray) externalData = NULL;
>
> virCheckFlags(VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN |
> VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY |
> @@ -2537,6 +2653,25 @@ qemuSnapshotDelete(virDomainObj *vm,
> if (!metadata_only) {
> if (qemuSnapshotDeleteValidate(snap, flags) < 0)
> goto endjob;
> +
> + if (virDomainSnapshotIsExternal(snap) &&
> + !(flags & (VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN |
> + VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY))) {
> + if (!virDomainObjIsActive(vm)) {
> + if (qemuProcessStart(NULL, driver, vm, NULL, VIR_ASYNC_JOB_NONE,
> + NULL, -1, NULL, NULL,
> + VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
> + VIR_QEMU_PROCESS_START_PAUSED) < 0) {
Note that a user might attempt to call a 'virsh resume' here. This might
be a problem especially if you went the route of asynchronously handling
the deletion.
> + goto endjob;
> + }
> +
> + stop_qemu = true;
> + }
> +
> + externalData = qemuSnapshotDeleteExternalPrepare(vm, snap);
> + if (!externalData)
> + goto endjob;
Preferrably this operation will be done before starting qemu to do the
actual deletion. I understand that it has to be done after, to have
correct pointers after the copy of the definition needed for the
startup.
In this instance it probably is still better to do the checks first,
throw them away if qemu startup is needed and re-do them after, as the
startup of qemu is a waaay more expensive operation.
Yeah I don't like this ordering as well, I'll check what parts of the
qemuSnapshotDeleteExternalPrepare() can be done without qemu running.
> + }
> }
>
> if (flags & (VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN |
> @@ -2547,6 +2682,11 @@ qemuSnapshotDelete(virDomainObj *vm,
> }
>
> endjob:
> + if (stop_qemu) {
> + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN,
> + VIR_ASYNC_JOB_NONE, 0);
So definitely user interaction of the VM needs to be forbidden during
this API because it would be bad if the VM gets terminated here.
Good point, will look into it.
Pavel