[libvirt] [PATCH 0/2] qemu: properly handle delayed vCPU hotunplug

Patch 1 would help in debugging this issue. Patch 2 fixes the bug. Peter Krempa (2): qemu: hotplug: Add debug log when dispatching device removal to existing thread qemu: hotplug: Reset device removal waiting code after vCPU unplug src/qemu/qemu_hotplug.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) -- 2.12.0

Note that the waiting thread is singalled in the debug logs to simplify debugging. --- src/qemu/qemu_hotplug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 0c0885695..889f110a9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -4362,6 +4362,7 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm, qemuDomainObjPrivatePtr priv = vm->privateData; if (STREQ_NULLABLE(priv->unplug.alias, devAlias)) { + VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias); qemuDomainResetDeviceRemoval(vm); priv->unplug.status = status; virDomainObjBroadcast(vm); -- 2.12.0

On Fri, Mar 03, 2017 at 04:11 PM +0100, Peter Krempa <pkrempa@redhat.com> wrote:
Note that the waiting thread is singalled in the debug logs to simplify
s/singalled/signaled
debugging. --- src/qemu/qemu_hotplug.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 0c0885695..889f110a9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -4362,6 +4362,7 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm, qemuDomainObjPrivatePtr priv = vm->privateData;
if (STREQ_NULLABLE(priv->unplug.alias, devAlias)) { + VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias); qemuDomainResetDeviceRemoval(vm); priv->unplug.status = status; virDomainObjBroadcast(vm); -- 2.12.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On Fri, Mar 03, 2017 at 16:11:49 +0100, Peter Krempa wrote:
Note that the waiting thread is singalled in the debug logs to simplify
s/sing/sign/ as noted by Marc. The double L may be OK depending on which side of the Atlantic Ocean is closer to your heart :-) ACK Jirka

On Fri, Mar 10, 2017 at 08:50:48 +0100, Jiri Denemark wrote:
On Fri, Mar 03, 2017 at 16:11:49 +0100, Peter Krempa wrote:
Note that the waiting thread is singalled in the debug logs to simplify
s/sing/sign/ as noted by Marc. The double L may be OK depending on which side of the Atlantic Ocean is closer to your heart :-)
I used the spelling pointed out by Marc. I refuse to accept that it reflects my preferred side of the Atlantic :) Thanks Peter

If the delivery of the DEVICE_DELETED event for the vCPU being deleted would time out, the code would not call 'qemuDomainResetDeviceRemoval'. Since the waiting thread did not unregister itself prior to stopping the waiting the monitor code would try to wake it up instead of dispatching it to the event worker. As a result the unplug process would not be completed and the definition would not be updated. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1428893 https://bugzilla.redhat.com/show_bug.cgi?id=1427801 --- src/qemu/qemu_hotplug.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 889f110a9..636a806f0 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -5334,6 +5334,7 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, int oldvcpus = virDomainDefGetVcpus(vm->def); unsigned int nvcpus = vcpupriv->vcpus; int rc; + int ret = -1; if (!vcpupriv->alias) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, @@ -5348,11 +5349,11 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, rc = qemuMonitorDelDevice(qemuDomainGetMonitor(vm), vcpupriv->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) - return -1; + goto cleanup; if (rc < 0) { virDomainAuditVcpu(vm, oldvcpus, oldvcpus - nvcpus, "update", false); - return -1; + goto cleanup; } if ((rc = qemuDomainWaitForDeviceRemoval(vm)) <= 0) { @@ -5360,10 +5361,17 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("vcpu unplug request timed out")); - return -1; + goto cleanup; } - return qemuDomainRemoveVcpu(driver, vm, vcpu); + if (qemuDomainRemoveVcpu(driver, vm, vcpu) < 0) + goto cleanup; + + ret = 0; + + cleanup: + qemuDomainResetDeviceRemoval(vm); + return ret; } -- 2.12.0

On Fri, Mar 03, 2017 at 16:11:50 +0100, Peter Krempa wrote:
If the delivery of the DEVICE_DELETED event for the vCPU being deleted would time out, the code would not call 'qemuDomainResetDeviceRemoval'.
Since the waiting thread did not unregister itself prior to stopping the waiting the monitor code would try to wake it up instead of dispatching it to the event worker. As a result the unplug process would not be completed and the definition would not be updated.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1428893 https://bugzilla.redhat.com/show_bug.cgi?id=1427801 --- src/qemu/qemu_hotplug.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
ACK Jirka
participants (3)
-
Jiri Denemark
-
Marc Hartmayer
-
Peter Krempa