[libvirt] [PATCH v2 0/3] Mark and document restore with managed save as risky

This series marks restore of an inactive qemu snapshot while there is managed saved state as risky due to the reasons explained in patch 1 and 3. Patch 2 is a simple reformatting of the documentation with no other changes in preparation of addition of more reasons why reverts might need to be forced. Changes from v1: - reword error message to "error: revert requires force: snapshot without memory state, removal of existing managed saved state strongly recommended to avoid corruption" - add documentation of the new behaviour Michael Weiser (3): qemu: Warn of restore with managed save being risky docs: Reformat snapshot-revert force reasons docs: Add snapshot-revert qemu managedsave force docs/manpages/virsh.rst | 39 ++++++++++++++++++++++++--------------- src/qemu/qemu_driver.c | 9 +++++++++ 2 files changed, 33 insertions(+), 15 deletions(-) -- 2.24.1

Internal snapshots of a non-running domain do not carry any memory state and restoring such a snapshot will not replace existing saved memory state. This allows a scenario, where a user first suspends a domain into managedsave, restores a non-running snapshot and then resumes the domain from managedsave. After that, the guest system will run with its previous memory state atop a different disk state. The most obvious possible fallout from this is extensive file system corruption. Swap content and RAID bitmaps might also be off. This has been discussed[1] and fixed[2] from the end-user perspective for virt-manager. This patch marks the restore operation as risky at the libvirt level, requiring the user to remove the saved memory state first or force the operation. [1] https://www.redhat.com/archives/virt-tools-list/2019-November/msg00011.html [2] https://www.redhat.com/archives/virt-tools-list/2019-December/msg00049.html Signed-off-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Cc: Cole Robinson <crobinso@redhat.com> --- src/qemu/qemu_driver.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ec8faf384c..691a9b45c7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16652,6 +16652,15 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("must respawn qemu to start inactive snapshot")); goto endjob; } + if (vm->hasManagedSave && + !(snapdef->state == VIR_DOMAIN_SNAPSHOT_RUNNING || + snapdef->state == VIR_DOMAIN_SNAPSHOT_PAUSED)) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", + _("snapshot without memory state, removal of " + "existing managed saved state strongly " + "recommended to avoid corruption")); + goto endjob; + } } if (snap->def->dom) { -- 2.24.1

Reformat explanations of the snapshot-revert force reasons in preparation for more to be added. This is a simple reformat without any wording changes. Signed-off-by: Michael Weiser <michael.weiser@gmx.de> Cc: Cole Robinson <crobinso@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index fea0527caf..f5f962cba1 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6952,20 +6952,22 @@ transient domains cannot be inactive, it is required to use one of these flags when reverting to a disk snapshot of a transient domain. There are two cases where a snapshot revert involves extra risk, which -requires the use of *--force* to proceed. One is the case of a -snapshot that lacks full domain information for reverting -configuration (such as snapshots created prior to libvirt 0.9.5); -since libvirt cannot prove that the current configuration matches what -was in use at the time of the snapshot, supplying *--force* assures -libvirt that the snapshot is compatible with the current configuration -(and if it is not, the domain will likely fail to run). The other is -the case of reverting from a running domain to an active state where a -new hypervisor has to be created rather than reusing the existing -hypervisor, because it implies drawbacks such as breaking any existing -VNC or Spice connections; this condition happens with an active -snapshot that uses a provably incompatible configuration, as well as -with an inactive snapshot that is combined with the *--start* or -*--pause* flag. +requires the use of *--force* to proceed: + + * One is the case of a snapshot that lacks full domain information for + reverting configuration (such as snapshots created prior to libvirt + 0.9.5); since libvirt cannot prove that the current configuration matches + what was in use at the time of the snapshot, supplying *--force* assures + libvirt that the snapshot is compatible with the current configuration + (and if it is not, the domain will likely fail to run). + + * The other is the case of reverting from a running domain to an active + state where a new hypervisor has to be created rather than reusing the + existing hypervisor, because it implies drawbacks such as breaking any + existing VNC or Spice connections; this condition happens with an active + snapshot that uses a provably incompatible configuration, as well as with + an inactive snapshot that is combined with the *--start* or *--pause* + flag. snapshot-delete -- 2.24.1

On 1/2/20 1:00 PM, Michael Weiser wrote:
Reformat explanations of the snapshot-revert force reasons in preparation for more to be added. This is a simple reformat without any wording changes.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de> Cc: Cole Robinson <crobinso@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
docs/manpages/virsh.rst | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index fea0527caf..f5f962cba1 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6952,20 +6952,22 @@ transient domains cannot be inactive, it is required to use one of these flags when reverting to a disk snapshot of a transient domain.
There are two cases where a snapshot revert involves extra risk, which -requires the use of *--force* to proceed. One is the case of a -snapshot that lacks full domain information for reverting -configuration (such as snapshots created prior to libvirt 0.9.5); -since libvirt cannot prove that the current configuration matches what -was in use at the time of the snapshot, supplying *--force* assures -libvirt that the snapshot is compatible with the current configuration -(and if it is not, the domain will likely fail to run). The other is -the case of reverting from a running domain to an active state where a -new hypervisor has to be created rather than reusing the existing -hypervisor, because it implies drawbacks such as breaking any existing -VNC or Spice connections; this condition happens with an active -snapshot that uses a provably incompatible configuration, as well as -with an inactive snapshot that is combined with the *--start* or -*--pause* flag. +requires the use of *--force* to proceed: + + * One is the case of a snapshot that lacks full domain information for + reverting configuration (such as snapshots created prior to libvirt + 0.9.5); since libvirt cannot prove that the current configuration matches + what was in use at the time of the snapshot, supplying *--force* assures + libvirt that the snapshot is compatible with the current configuration + (and if it is not, the domain will likely fail to run). + + * The other is the case of reverting from a running domain to an active + state where a new hypervisor has to be created rather than reusing the + existing hypervisor, because it implies drawbacks such as breaking any + existing VNC or Spice connections; this condition happens with an active + snapshot that uses a provably incompatible configuration, as well as with + an inactive snapshot that is combined with the *--start* or *--pause* + flag.
snapshot-delete

Add documentation for additional reason why snapshot-revert might need to be forced. This explains why restoring an inactive snapshot while there is managed saved state is refused by default. Signed-off-by: Michael Weiser <michael.weiser@gmx.de> Cc: Cole Robinson <crobinso@redhat.com> --- docs/manpages/virsh.rst | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index f5f962cba1..09be872fdf 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6951,7 +6951,7 @@ no vm state leaves the domain in an inactive state. Passing either the transient domains cannot be inactive, it is required to use one of these flags when reverting to a disk snapshot of a transient domain. -There are two cases where a snapshot revert involves extra risk, which +There are a number of cases where a snapshot revert involves extra risk, which requires the use of *--force* to proceed: * One is the case of a snapshot that lacks full domain information for @@ -6961,7 +6961,7 @@ requires the use of *--force* to proceed: libvirt that the snapshot is compatible with the current configuration (and if it is not, the domain will likely fail to run). - * The other is the case of reverting from a running domain to an active + * Another is the case of reverting from a running domain to an active state where a new hypervisor has to be created rather than reusing the existing hypervisor, because it implies drawbacks such as breaking any existing VNC or Spice connections; this condition happens with an active @@ -6969,6 +6969,13 @@ requires the use of *--force* to proceed: an inactive snapshot that is combined with the *--start* or *--pause* flag. + * Also, libvirt will refuse to restore snapshots of inactive qemu domains + while there is managed saved state. This is because those snapshots do not + contain memory state and will therefore not replace the exising memory + state. This ends up switching a disk underneath a running system and will + likely cause extensive filesystem corruption or crashes due to swap content + mismatches when run. + snapshot-delete --------------- -- 2.24.1

On 1/2/20 1:00 PM, Michael Weiser wrote:
Add documentation for additional reason why snapshot-revert might need to be forced. This explains why restoring an inactive snapshot while there is managed saved state is refused by default.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de> Cc: Cole Robinson <crobinso@redhat.com> --- docs/manpages/virsh.rst | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index f5f962cba1..09be872fdf 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6951,7 +6951,7 @@ no vm state leaves the domain in an inactive state. Passing either the transient domains cannot be inactive, it is required to use one of these flags when reverting to a disk snapshot of a transient domain.
-There are two cases where a snapshot revert involves extra risk, which +There are a number of cases where a snapshot revert involves extra risk, which requires the use of *--force* to proceed:
* One is the case of a snapshot that lacks full domain information for @@ -6961,7 +6961,7 @@ requires the use of *--force* to proceed: libvirt that the snapshot is compatible with the current configuration (and if it is not, the domain will likely fail to run).
- * The other is the case of reverting from a running domain to an active + * Another is the case of reverting from a running domain to an active state where a new hypervisor has to be created rather than reusing the existing hypervisor, because it implies drawbacks such as breaking any existing VNC or Spice connections; this condition happens with an active @@ -6969,6 +6969,13 @@ requires the use of *--force* to proceed: an inactive snapshot that is combined with the *--start* or *--pause* flag.
+ * Also, libvirt will refuse to restore snapshots of inactive qemu domains
Side note: there is mixed usage of both 'qemu' and 'QEMU' in this documentation, with more instances of 'QEMU'.
+ while there is managed saved state. This is because those snapshots do not + contain memory state and will therefore not replace the exising memory
s/exising/existing With this typo fixed: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
+ state. This ends up switching a disk underneath a running system and will + likely cause extensive filesystem corruption or crashes due to swap content + mismatches when run. +
snapshot-delete ---------------
participants (2)
-
Daniel Henrique Barboza
-
Michael Weiser