On 9/9/19 5:17 PM, Daniel Henrique Barboza wrote:
On 9/9/19 5:52 PM, Eric Blake wrote:
> Commit f10562799 introduced a regression: if reverting to a snapshot
> fails early (such as when we refuse to revert to an external
> snapshot), we lose track of the domain's current snapshot.
>
> See:
https://bugzilla.redhat.com/1738747
> Signed-off-by: Eric Blake <eblake(a)redhat.com>
> ---
I don't understand why f10562799 broke this code like this - tried
looking the changes made in the commit and the "if (snap)" was
there since a long time, no changes were made in the 'snap'
variable as well - but this change didn't break anything else, so:
Reviewed-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
I'll update the commit message to give more details:
Before that patch, we maintained the notion of the current snapshot in
two places: vm->current_snapshot, and snap->def->current. If you have a
domain with two snapshots, s1 and s2 (where s2 is current) and want to
revert to s1, the old code cleared the def->current flag on s2 before
noticing that reverting to s1 was not possible, but at least left
vm->current_snapshot unchanged. That meant we had a _different_ bug -
the code was inconsistent on whether the domain had a current snapshot
(as long as libvirtd didn't shut down, vm->current_snapshot ruled, so s2
was still claimed as current; but if you restart libvirtd, the XML for
s2 didn't track that it was current, so after restart you'd have no
current snapshot). After that patch, the code was unconditionally
changing vm->current_snapshot to NULL (but at least was consistent
everywhere else that the XML never got out of sync with the notion of
which snapshot was current). It also didn't help that after that patch,
the code clearing the snapshot occurs twice in the function - once right
after determining that the early checks have succeeded, the other
unconditionally on all failure paths.
So the fix is as simple as removing the unconditional clearing of s2 as
the current snapshot in the cleanup code, in favor of the earlier
clearing that happens only after the early checks succeed.
ps: I think adding a test case for this failure (in tests/virsh-snapshot ?)
is worth considering, especially considering that we changes from
from Maxiwell (adding an inactive XML to the snapshot) that can
add up more complexity in the snapshot mechanics.
I tried. But the test driver doesn't forbid 'snapshot-revert' on
external snapshots, and also doesn't try to write state to XML files, so
it never copied the problematic code from the qemu driver that could
even trigger the bug.
> src/qemu/qemu_driver.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index b28a26c3d6..093b15f500 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -16941,8 +16941,6 @@
> qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
> virDomainSnapshotSetCurrent(vm->snapshots, NULL);
> ret = -1;
> }
> - } else if (snap) {
> - virDomainSnapshotSetCurrent(vm->snapshots, NULL);
> }
> if (ret == 0 && config && vm->persistent &&
> !(ret = virDomainSaveConfig(cfg->configDir, driver->caps,
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org