[libvirt] [PATCH 1/2] snapshots: allow create --redefine to force a new configuration

Currently there is no means to permanently modify the metadata stored within a snapshot if it does not pass the ABI compat checker. That is pretty much intentional but there are use cases where users verified that the change will work according to their needs and don't want to: snapshot-revert -> edit -> start, but instead store the modified domain definition within the metadata. To achive that this adds a --force argument to snapshot-create --redefine which will make it skip the check. Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> --- include/libvirt/libvirt-domain-snapshot.h | 3 +++ src/conf/snapshot_conf.c | 3 ++- src/qemu/qemu_driver.c | 7 ++++++- tools/virsh-snapshot.c | 6 ++++++ tools/virsh.pod | 8 +++++++- 5 files changed, 24 insertions(+), 3 deletions(-) diff --git a/include/libvirt/libvirt-domain-snapshot.h b/include/libvirt/libvirt-domain-snapshot.h index 602e5def59..dc8ba2b4a9 100644 --- a/include/libvirt/libvirt-domain-snapshot.h +++ b/include/libvirt/libvirt-domain-snapshot.h @@ -71,6 +71,9 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_CREATE_LIVE = (1 << 8), /* create the snapshot while the guest is running */ + VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK = (1 << 9), /* take definition as + provided without + ABI compat check */ } virDomainSnapshotCreateFlags; /* Take a snapshot of the current VM state */ diff --git a/src/conf/snapshot_conf.c b/src/conf/snapshot_conf.c index ffb1313c89..f95b104ad5 100644 --- a/src/conf/snapshot_conf.c +++ b/src/conf/snapshot_conf.c @@ -474,7 +474,8 @@ virDomainSnapshotRedefineValidate(virDomainSnapshotDefPtr def, if (other->def->dom) { if (def->dom) { - if (!virDomainDefCheckABIStability(other->def->dom, + if (!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK) && + !virDomainDefCheckABIStability(other->def->dom, def->dom, xmlopt)) return -1; } else { diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a16eab5467..f3d032f633 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15680,7 +15680,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC | - VIR_DOMAIN_SNAPSHOT_CREATE_LIVE, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_LIVE | + VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK, NULL); VIR_REQUIRE_FLAG_RET(VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY, @@ -15689,6 +15690,10 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, NULL); + VIR_REQUIRE_FLAG_RET(VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK, + VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, + NULL); + if ((redefine && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_CURRENT)) || (flags & VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)) update_current = false; diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index 6ea6e2744a..a680d2b410 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -146,6 +146,10 @@ static const vshCmdOptDef opts_snapshot_create[] = { .type = VSH_OT_BOOL, .help = N_("require atomic operation") }, + {.name = "force", + .type = VSH_OT_BOOL, + .help = N_("do not check XML for ABI compatibility") + }, VIRSH_COMMON_OPT_LIVE(N_("take a live snapshot")), {.name = NULL} }; @@ -177,6 +181,8 @@ cmdSnapshotCreate(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; if (vshCommandOptBool(cmd, "live")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_LIVE; + if (vshCommandOptBool(cmd, "force")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK; if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) goto cleanup; diff --git a/tools/virsh.pod b/tools/virsh.pod index db72343159..a8cd15bad2 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -4570,7 +4570,7 @@ used to represent properties of snapshots. =item B<snapshot-create> I<domain> [I<xmlfile>] {[I<--redefine> [I<--current>]] | [I<--no-metadata>] [I<--halt>] [I<--disk-only>] [I<--reuse-external>] -[I<--quiesce>] [I<--atomic>] [I<--live>]} +[I<--quiesce>] [I<--atomic>] [I<--live>] [I<--force>]} Create a snapshot for domain I<domain> with the properties specified in I<xmlfile>. Normally, the only properties settable for a domain snapshot @@ -4629,6 +4629,12 @@ the guest is running. Both disk snapshot and domain memory snapshot are taken. This increases the size of the memory image of the external snapshot. This is currently supported only for full system external snapshots. +If I<--force> is specified, then the usual ABI compatibility check that +would be done on I<--redefine> will be skipped allowing to update the +domain definition stored in the snapshot metadata arbitrarily. +Note: this obviously involves extra risk, supplying I<--force> assures +libvirt that the snapshot will be compatible with the new configuration. + Existence of snapshot metadata will prevent attempts to B<undefine> a persistent domain. However, for transient domains, snapshot metadata is silently lost when the domain quits running (whether -- 2.17.1

This allows users to actively ignore validadtion if needed e.g. when changing metadata intentionally that does not pass the ABI checker. It is utilizing EDIT_RELAX of virsh-edit.c to set the VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK flag defined in the former patch. Fixes: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802944 Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> --- tools/virsh-snapshot.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index a680d2b410..6177fc282d 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -563,7 +563,12 @@ cmdSnapshotEdit(vshControl *ctl, const vshCmd *cmd) (strstr(doc, "<state>disk-snapshot</state>") ? \ define_flags |= VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY : 0), \ edited = virDomainSnapshotCreateXML(dom, doc_edited, define_flags) +#define EDIT_RELAX \ + do { \ + define_flags |= VIR_DOMAIN_SNAPSHOT_CREATE_NOCHECK; \ + } while (0); #include "virsh-edit.c" +#undef EDIT_RELAX edited_name = virDomainSnapshotGetName(edited); if (STREQ(name, edited_name)) { -- 2.17.1

On Mon, Mar 18, 2019 at 16:21:18 +0100, Christian Ehrhardt wrote:
Currently there is no means to permanently modify the metadata stored within a snapshot if it does not pass the ABI compat checker.
Could you elaborate when this happens? The ABI check is quite important and if you change the ABI the guest may not be able to run. We are not really willing to allow the users to stab themselves in this case. Some cases may actually be bugs in the ABI check.

On Mon, Mar 18, 2019 at 4:58 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Mar 18, 2019 at 16:21:18 +0100, Christian Ehrhardt wrote:
Currently there is no means to permanently modify the metadata stored within a snapshot if it does not pass the ABI compat checker.
Could you elaborate when this happens? The ABI check is quite important and if you change the ABI the guest may not be able to run. We are not really willing to allow the users to stab themselves in this case. Some cases may actually be bugs in the ABI check.
Hi Peter, yeah reasonable question - let me elaborate on the case that I know. First of all this is -not- meant for --live snapshots, instead it is meant for disk snapshots - we might try to detect/limit that if you want. The other things one has to know is that we have realized that people stick way too long to their old machine types (qemu -M ...). There might be issues only solved in the newer features and in general it is recommended to use the latest type if possible. And to be fair, on the other hand it makes the maintenance burden more manageable if your delta ends at least after 10 years or so. So far we only removed some less used types and might reconsider removing the types altogether, but that would not mean it isn't recommended to update the type which still triggers the issue I try to improve here. The TL;DR of the above without stalling with even more details is, that we currently have the case where newer versions of qemu (intentionally) no more have the machine type that you saved your disk snapshot with. On a non-snapshot cases you can easily do your "virtual HW upgrade" through virsh edit and similar before you restart your guest. But for disk-snapshots there is no way anyone can modify the meta data. It is correct that the ABI checker complains (the types do differ), but then people just want to start the older disk snapshots content on a new machine type, see [1] for an example. I hope that helped to show that there are valid cases where you'd want to allow forcing changes, without any self-stabbing involved. [1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802944 -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd

On Tue, Mar 19, 2019 at 08:18:15 +0100, Christian Ehrhardt wrote:
On Mon, Mar 18, 2019 at 4:58 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Mar 18, 2019 at 16:21:18 +0100, Christian Ehrhardt wrote:
Currently there is no means to permanently modify the metadata stored within a snapshot if it does not pass the ABI compat checker.
Could you elaborate when this happens? The ABI check is quite important and if you change the ABI the guest may not be able to run. We are not really willing to allow the users to stab themselves in this case. Some cases may actually be bugs in the ABI check.
Hi Peter, yeah reasonable question - let me elaborate on the case that I know.
First of all this is -not- meant for --live snapshots, instead it is meant for disk snapshots - we might try to detect/limit that if you want.
The other things one has to know is that we have realized that people stick way too long to their old machine types (qemu -M ...). There might be issues only solved in the newer features and in general it is recommended to use the latest type if possible. And to be fair, on the other hand it makes the maintenance burden more manageable if your delta ends at least after 10 years or so. So far we only removed some less used types and might reconsider removing the types altogether, but that would not mean it isn't recommended to update the type which still triggers the issue I try to improve here.
The TL;DR of the above without stalling with even more details is, that we currently have the case where newer versions of qemu (intentionally) no more have the machine type that you saved your disk snapshot with.
On a non-snapshot cases you can easily do your "virtual HW upgrade" through virsh edit and similar before you restart your guest.
But for disk-snapshots there is no way anyone can modify the meta data. It is correct that the ABI checker complains (the types do differ), but then people just want to start the older disk snapshots content on a new machine type, see [1] for an example.
I hope that helped to show that there are valid cases where you'd want to allow forcing changes, without any self-stabbing involved.
[1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802944
So looking at the bug, user created a snapshot, then upgraded qemu to a version which dropped the old machine type. Reverting to a inactive internal snapshot in this case will restore the definition including the machine type. That is what snapshots are expected to do. Since the snaphsot is inactive the VM is not running. The user then tries to start the VM and gets an error that the machine type is not valid. At that point the user tries to 'virsh snapshot-edit' the machine type. If we'd allow the change, attempting to 'virsh start' would fail again. After reverting the snapshot the definition becomes active and needs to be edited using 'virsh edit'. I think in this case it does not make entirely much sense to add the flag as the user would have to revert again to the edited snapshot. Additionally the correct workflow in my opinion would be to revert to an old snapshot, use virsh-edit to fix anything required and then create a new snapshot which would capture the new configuration. This obviously will not work for active snapshots, but the described scenario would anyways require reverting as inactive and then discard the memory state. I'm not persuaded that this workaround is necessary.

On Wed, Mar 20, 2019 at 8:53 AM Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, Mar 19, 2019 at 08:18:15 +0100, Christian Ehrhardt wrote:
On Mon, Mar 18, 2019 at 4:58 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Mar 18, 2019 at 16:21:18 +0100, Christian Ehrhardt wrote:
Currently there is no means to permanently modify the metadata stored within a snapshot if it does not pass the ABI compat checker.
Could you elaborate when this happens? The ABI check is quite important and if you change the ABI the guest may not be able to run. We are not really willing to allow the users to stab themselves in this case. Some cases may actually be bugs in the ABI check.
Hi Peter, yeah reasonable question - let me elaborate on the case that I know.
First of all this is -not- meant for --live snapshots, instead it is meant for disk snapshots - we might try to detect/limit that if you want.
The other things one has to know is that we have realized that people stick way too long to their old machine types (qemu -M ...). There might be issues only solved in the newer features and in general it is recommended to use the latest type if possible. And to be fair, on the other hand it makes the maintenance burden more manageable if your delta ends at least after 10 years or so. So far we only removed some less used types and might reconsider removing the types altogether, but that would not mean it isn't recommended to update the type which still triggers the issue I try to improve here.
The TL;DR of the above without stalling with even more details is, that we currently have the case where newer versions of qemu (intentionally) no more have the machine type that you saved your disk snapshot with.
On a non-snapshot cases you can easily do your "virtual HW upgrade" through virsh edit and similar before you restart your guest.
But for disk-snapshots there is no way anyone can modify the meta data. It is correct that the ABI checker complains (the types do differ), but then people just want to start the older disk snapshots content on a new machine type, see [1] for an example.
I hope that helped to show that there are valid cases where you'd want to allow forcing changes, without any self-stabbing involved.
[1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1802944
So looking at the bug, user created a snapshot, then upgraded qemu to a version which dropped the old machine type.
Reverting to a inactive internal snapshot in this case will restore the definition including the machine type. That is what snapshots are expected to do.
Since the snaphsot is inactive the VM is not running. The user then tries to start the VM and gets an error that the machine type is not valid. At that point the user tries to 'virsh snapshot-edit' the machine type. If we'd allow the change, attempting to 'virsh start' would fail again. After reverting the snapshot the definition becomes active and needs to be edited using 'virsh edit'.
I think in this case it does not make entirely much sense to add the flag as the user would have to revert again to the edited snapshot.
Additionally the correct workflow in my opinion would be to revert to an old snapshot, use virsh-edit to fix anything required and then create a new snapshot which would capture the new configuration.
This obviously will not work for active snapshots, but the described scenario would anyways require reverting as inactive and then discard the memory state.
I'm not persuaded that this workaround is necessary.
Thanks Peter for taking a deeper look! And yes your summary seems correct. I personally still like to give admins the ability to force configs, but I'm ok if the general upstream opinion to that is no. I have asked the reporter - on the bug that I got - to chime in here and do the "convincing" as he is the affected person I think he is more able to do so - e.g. express the pain with the suggested workaround. -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd

On 3/20/19 3:16 AM, Christian Ehrhardt wrote:
So looking at the bug, user created a snapshot, then upgraded qemu to a version which dropped the old machine type.
Reverting to a inactive internal snapshot in this case will restore the definition including the machine type. That is what snapshots are expected to do.
We already have a FORCE flag for virDomainSnapshotRevert(); is that not sufficient to allow reverting to an inactive snapshot where the domain description is no longer viable in modern qemu? I'm trying to see why a second --force flag to create --redefine helps things. Also, since current snapshot --redefine code lets you omit <domain> from the snapshot, what's so hard about back-to-back redefines, the first which removes the non-viable <domain>, and the second which now adds a corrected one in (libvirt can no longer tell that the two definitions are ABI-incompatible, because of the intermediate state where libvirt behaves like pre-0.9.5 days when there was no <domain> and the burden was on the user instead of on libvirt to remain ABI compatible). If we add a FORCE flag during snapshot redefinition, would that mean we'd also need a FORCE flag during a redefinition of my proposed checkpoints? Note that with checkpoints, I've specifically stated that the <domain> element is going to be mandatory during redefine (at least, initially). The only reason it was not mandatory for snapshots is history (we had old releases that did not generate the <domain> subelement, before realizing that knowing the old state of the domain is important for proper reverts).
And yes your summary seems correct. I personally still like to give admins the ability to force configs, but I'm ok if the general upstream opinion to that is no.
I have asked the reporter - on the bug that I got - to chime in here and do the "convincing" as he is the affected person I think he is more able to do so - e.g. express the pain with the suggested workaround.
I look forward to more details. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

On Wed, Mar 20, 2019 at 11:39:01 -0500, Eric Blake wrote:
On 3/20/19 3:16 AM, Christian Ehrhardt wrote:
So looking at the bug, user created a snapshot, then upgraded qemu to a version which dropped the old machine type.
Reverting to a inactive internal snapshot in this case will restore the definition including the machine type. That is what snapshots are expected to do.
We already have a FORCE flag for virDomainSnapshotRevert(); is that not sufficient to allow reverting to an inactive snapshot where the domain description is no longer viable in modern qemu? I'm trying to see why a second --force flag to create --redefine helps things.
In the bug description the virDomainSnapshotRevert call succeessfully reverted the state. Since the snapshot was taken in inactive state the VM was stopped. A subsequent 'virsh start' failed and reported that the machine type which was recorded was no longer suppored. The ABI stability check which is done a part of redefining the snapshot via 'virsh snapshot-edit' failed. Even if the ABI stability check would be overriden, a 'virsh start' would still fail until reverting to the snapshot again. In this case the configuration can be simply fixed by editing the current configuration after reverting the snapshpot by 'virsh edit'. While that does not allow to fix snapshots which were not reverted to I'm arguing that having the force flag is not required. This also does avoid potential other problems which users could inflict on themselves by forcing config without thinking.
participants (3)
-
Christian Ehrhardt
-
Eric Blake
-
Peter Krempa