[libvirt] [PATCH] qemu: check for active domain after agent interaction

Commit b606bbb41 reminded me that any time we drop locks to run back-to-back guest interaction commands, we have to check that the guest didn't disappear in between the two commands. A quick audit found a couple of spots that were missing this check. * src/qemu/qemu_driver.c (qemuDomainShutdownFlags) (qemuDomainSetVcpusFlags): Check that domain is still up. Signed-off-by: Eric Blake <eblake@redhat.com> --- I found this by inspection, and did not try to actually reproduce if it would cause failures. I think a failure could be easier to reproduce by temporarily sticking a strategic sleep() call just prior to the qemuDomainObjExtAgent() call, and during the time while libvirtd is sleeping, manually cause the guest to exit. But we've definitely fixed issues like this before. src/qemu/qemu_driver.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 02088cc..4ae76e5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1921,6 +1921,13 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) (ret < 0 && (acpiRequested || !flags))) { qemuDomainSetFakeReboot(driver, vm, isReboot); + /* Even if agent failed, we have to check if guest went away + * by itself while our locks were down. */ + if (useAgent && !virDomainObjIsActive(vm)) { + ret = 0; + goto endjob; + } + qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(driver, vm); @@ -4360,6 +4367,12 @@ qemuDomainSetVcpusFlags(virDomainPtr dom, unsigned int nvcpus, if (ncpuinfo < 0) goto endjob; + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + if (qemuAgentUpdateCPUInfo(nvcpus, cpuinfo, ncpuinfo) < 0) goto endjob; -- 1.9.3

On Wed, Aug 20, 2014 at 03:35:08PM -0600, Eric Blake wrote:
Commit b606bbb41 reminded me that any time we drop locks to run back-to-back guest interaction commands, we have to check that the guest didn't disappear in between the two commands. A quick audit found a couple of spots that were missing this check.
* src/qemu/qemu_driver.c (qemuDomainShutdownFlags) (qemuDomainSetVcpusFlags): Check that domain is still up.
Signed-off-by: Eric Blake <eblake@redhat.com> ---
I found this by inspection, and did not try to actually reproduce if it would cause failures. I think a failure could be easier to reproduce by temporarily sticking a strategic sleep() call just prior to the qemuDomainObjExtAgent() call, and during the time while libvirtd is sleeping, manually cause the guest to exit. But we've definitely fixed issues like this before.
src/qemu/qemu_driver.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 02088cc..4ae76e5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1921,6 +1921,13 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) (ret < 0 && (acpiRequested || !flags))) { qemuDomainSetFakeReboot(driver, vm, isReboot);
+ /* Even if agent failed, we have to check if guest went away + * by itself while our locks were down. */ + if (useAgent && !virDomainObjIsActive(vm)) { + ret = 0; + goto endjob; + } +
I'm thinking about a place where we could stick this to make it more generic, but can't find one. That would be a completely different cleanup anyway, so ACK to this. Martin
qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(driver, vm); @@ -4360,6 +4367,12 @@ qemuDomainSetVcpusFlags(virDomainPtr dom, unsigned int nvcpus, if (ncpuinfo < 0) goto endjob;
+ if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + if (qemuAgentUpdateCPUInfo(nvcpus, cpuinfo, ncpuinfo) < 0) goto endjob;
-- 1.9.3
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 08/21/2014 12:50 AM, Martin Kletzander wrote:
On Wed, Aug 20, 2014 at 03:35:08PM -0600, Eric Blake wrote:
Commit b606bbb41 reminded me that any time we drop locks to run back-to-back guest interaction commands, we have to check that the guest didn't disappear in between the two commands. A quick audit found a couple of spots that were missing this check.
* src/qemu/qemu_driver.c (qemuDomainShutdownFlags) (qemuDomainSetVcpusFlags): Check that domain is still up.
Signed-off-by: Eric Blake <eblake@redhat.com> ---
I'm thinking about a place where we could stick this to make it more generic, but can't find one. That would be a completely different cleanup anyway, so ACK to this.
Thanks; pushed. About the only idea I had would be making qemuDomainObjExitAgent() return a value marked ATTRIBUTE_RETURN_CHECK, where the caller can be informed if the guest is still running every time it regains control after temporarily dropping locks; but that is, as you say, a completely different cleanup because it would touch a lot of callers. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Martin Kletzander