[libvirt] [PATCH 0/5] qemu_hotplug: Fix a rare race condition when detaching a device twice

*** BLURB HERE *** Michal Prívozník (5): qemu_hotplug: Properly check for qemuMonitorDelDevice retval qemu_hotplug: Introduce and use qemuDomainDeleteDevice DO NOT APPLY: Simple reproducer DO NOT APPLY: Revert simple reproducer qemu_hotplug: Fix a rare race condition when detaching a device twice src/qemu/qemu_domain.h | 1 + src/qemu/qemu_hotplug.c | 99 ++++++++++++++++++++++++++++++++--------- 2 files changed, 79 insertions(+), 21 deletions(-) -- 2.19.2

Luckily, the function returns only 0 or -1 so all the checks work as expected. Anyway, our rule is that a positive value means success so if the function ever returns a positive value these checks will fail. Make them check for a negative value properly. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 19b7e6e8ea..f43f80668c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -5457,11 +5457,11 @@ int qemuDomainDetachControllerDevice(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); if (detach->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI && - qemuDomainDetachExtensionDevice(priv->mon, &detach->info)) { + qemuDomainDetachExtensionDevice(priv->mon, &detach->info) < 0) { goto exit_monitor; } - if (qemuMonitorDelDevice(priv->mon, detach->info.alias)) { + if (qemuMonitorDelDevice(priv->mon, detach->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -6975,7 +6975,7 @@ qemuDomainDetachInputDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &input->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, input->info.alias)) { + if (qemuMonitorDelDevice(priv->mon, input->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -7018,7 +7018,7 @@ qemuDomainDetachVsockDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &vsock->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, vsock->info.alias)) { + if (qemuMonitorDelDevice(priv->mon, vsock->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } -- 2.19.2

On Tue, Mar 12, 2019 at 16:13:16 +0100, Michal Privoznik wrote:
Luckily, the function returns only 0 or -1 so all the checks work as expected. Anyway, our rule is that a positive value means success so if the function ever returns a positive value these checks will fail. Make them check for a negative value properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
The commit message suggests that only qemuMonitorDelDevice calls are fixed but this commit also fixes one other call. While the call is slightly related to this series qemuDomainDetachExtensionDevice properly handles return from qemuMonitorDelDevice and thus changing return value from there would not impact the code. ACK if you mention qemuDomainDetachExtensionDevice

The aim of this function will be to fix return value of qemuMonitorDelDevice() in one specific case. But that is yet to come. Right now this is nothing but a plain substitution. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 60 ++++++++++++++++++++++++++++------------- 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index f43f80668c..574477e916 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -67,6 +67,28 @@ VIR_LOG_INIT("qemu.qemu_hotplug"); unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5; +/** + * qemuDomainDeleteDevice: + * @mon: qemu monitor + * @alias: device to remove + * + * A simple wrapper around qemuMonitorDelDevice(). + * @mon must be locked upon entry. + * + * Returns: 0 on success, + * -1 otherwise. + */ +static inline int +qemuDomainDeleteDevice(qemuMonitorPtr mon, + const char *alias) +{ + if (qemuMonitorDelDevice(mon, alias) < 0) + return -1; + + return 0; +} + + /** * qemuHotplugPrepareDiskSourceAccess: * @driver: qemu driver struct @@ -167,7 +189,7 @@ qemuDomainDetachZPCIDevice(qemuMonitorPtr mon, if (virAsprintf(&zpciAlias, "zpci%d", info->addr.pci.zpci.uid) < 0) goto cleanup; - if (qemuMonitorDelDevice(mon, zpciAlias) < 0) + if (qemuDomainDeleteDevice(mon, zpciAlias) < 0) goto cleanup; ret = 0; @@ -5229,7 +5251,7 @@ qemuDomainDetachVirtioDiskDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditDisk(vm, detach->src, NULL, "detach", false); @@ -5267,7 +5289,7 @@ qemuDomainDetachDiskDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditDisk(vm, detach->src, NULL, "detach", false); @@ -5461,7 +5483,7 @@ int qemuDomainDetachControllerDevice(virQEMUDriverPtr driver, goto exit_monitor; } - if (qemuMonitorDelDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5505,7 +5527,7 @@ qemuDomainDetachHostPCIDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorDelDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5531,7 +5553,7 @@ qemuDomainDetachHostUSBDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorDelDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5557,7 +5579,7 @@ qemuDomainDetachHostSCSIDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorDelDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) return -1; @@ -5584,7 +5606,7 @@ qemuDomainDetachSCSIVHostDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorDelDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) return -1; @@ -5612,7 +5634,7 @@ qemuDomainDetachMediatedDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorDelDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5792,7 +5814,7 @@ qemuDomainDetachShmemDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &shmem->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, shmem->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, shmem->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5853,7 +5875,7 @@ qemuDomainDetachWatchdog(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &watchdog->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, watchdog->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, watchdog->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5903,7 +5925,7 @@ qemuDomainDetachRedirdevDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &tmpRedirdevDef->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, tmpRedirdevDef->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, tmpRedirdevDef->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5974,7 +5996,7 @@ qemuDomainDetachNetDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditNet(vm, detach, NULL, "detach", false); @@ -6151,7 +6173,7 @@ int qemuDomainDetachChrDevice(virQEMUDriverPtr driver, goto cleanup; } } else { - if (qemuMonitorDelDevice(priv->mon, tmpChr->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, tmpChr->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -6207,7 +6229,7 @@ qemuDomainDetachRNGDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &tmpRNG->info); qemuDomainObjEnterMonitor(driver, vm); - rc = qemuMonitorDelDevice(priv->mon, tmpRNG->info.alias); + rc = qemuDomainDeleteDevice(priv->mon, tmpRNG->info.alias); if (qemuDomainObjExitMonitor(driver, vm) || rc < 0) goto cleanup; @@ -6259,7 +6281,7 @@ qemuDomainDetachMemoryDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &mem->info); qemuDomainObjEnterMonitor(driver, vm); - rc = qemuMonitorDelDevice(priv->mon, mem->info.alias); + rc = qemuDomainDeleteDevice(priv->mon, mem->info.alias); if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) goto cleanup; @@ -6367,7 +6389,7 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); - rc = qemuMonitorDelDevice(qemuDomainGetMonitor(vm), vcpupriv->alias); + rc = qemuDomainDeleteDevice(qemuDomainGetMonitor(vm), vcpupriv->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; @@ -6975,7 +6997,7 @@ qemuDomainDetachInputDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &input->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, input->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, input->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -7018,7 +7040,7 @@ qemuDomainDetachVsockDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &vsock->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorDelDevice(priv->mon, vsock->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vsock->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } -- 2.19.2

On Tue, Mar 12, 2019 at 16:13:17 +0100, Michal Privoznik wrote:
The aim of this function will be to fix return value of qemuMonitorDelDevice() in one specific case. But that is yet to come. Right now this is nothing but a plain substitution.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 60 ++++++++++++++++++++++++++++------------- 1 file changed, 41 insertions(+), 19 deletions(-)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index f43f80668c..574477e916 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -67,6 +67,28 @@ VIR_LOG_INIT("qemu.qemu_hotplug"); unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5;
+/** + * qemuDomainDeleteDevice: + * @mon: qemu monitor + * @alias: device to remove + * + * A simple wrapper around qemuMonitorDelDevice().
You don't change this once this becomes a more complex helper, so you can perhaps leave it out.
+ * @mon must be locked upon entry. + * + * Returns: 0 on success, + * -1 otherwise. + */ +static inline int
Inline? Don't.
+qemuDomainDeleteDevice(qemuMonitorPtr mon, + const char *alias) +{
ACK with above addressed.

https://bugzilla.redhat.com/show_bug.cgi?id=1623389 Steps to reproduce: 1) cat shmem.xml <shmem name='my_shmem0'> <model type='ivshmem-plain'/> <size unit='M'>4</size> <alias name='ua-123'/> </shmem> 2) virsh attach-device vm1 shmem.xml 3) virsh detach-device-alias vm1 ua-123; virsh detach-device vm1 shmem.xml 4) observe that the device is still in the domain: virsh dumpxml vm1 <shmem name='my_shmem0'> <model type='ivshmem-plain'/> <size unit='M'>4</size> <alias name='ua-123'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </shmem> 5) But qemu has the device no more: virsh detach-device-alias vm1 ua-123 error: Failed to detach device with alias ua-123 error: internal error: unable to execute QEMU command 'device_del': Device 'ua-123' not found This reproducer is to make sure that DELETE_DEVICE event arrives while monitor is unlocked. It is very hard to time qemu and libvirt so that the event comes exactly at the time when detach-device from 3) is doing the monitor call. Simulate this by unlocking monitor, waiting a few seconds and locking the monitor again. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 574477e916..0caf75659c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -82,8 +82,12 @@ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) + if (qemuMonitorDelDevice(mon, alias) < 0) { + virObjectUnlock(mon); + sleep(10); + virObjectLock(mon); return -1; + } return 0; } -- 2.19.2

On Tue, Mar 12, 2019 at 16:13:18 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1623389
Steps to reproduce:
1) cat shmem.xml <shmem name='my_shmem0'> <model type='ivshmem-plain'/> <size unit='M'>4</size> <alias name='ua-123'/> </shmem>
2) virsh attach-device vm1 shmem.xml
3) virsh detach-device-alias vm1 ua-123; virsh detach-device vm1 shmem.xml
4) observe that the device is still in the domain: virsh dumpxml vm1 <shmem name='my_shmem0'> <model type='ivshmem-plain'/> <size unit='M'>4</size> <alias name='ua-123'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </shmem>
5) But qemu has the device no more: virsh detach-device-alias vm1 ua-123 error: Failed to detach device with alias ua-123 error: internal error: unable to execute QEMU command 'device_del': Device 'ua-123' not found
This reproducer is to make sure that DELETE_DEVICE event arrives while monitor is unlocked. It is very hard to time qemu and libvirt so that the event comes exactly at the time when detach-device from 3) is doing the monitor call. Simulate this by unlocking monitor, waiting a few seconds and locking the monitor again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I strongly suggest to not add sign-off to patches which are not meant to be pushed. This will actually make the otherwise useless sign-off somewhat usefull as the push-hook will reject pushing such a series.

This reverts previous commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 0caf75659c..574477e916 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -82,12 +82,8 @@ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) { - virObjectUnlock(mon); - sleep(10); - virObjectLock(mon); + if (qemuMonitorDelDevice(mon, alias) < 0) return -1; - } return 0; } -- 2.19.2

https://bugzilla.redhat.com/show_bug.cgi?id=1623389 If a device is detached twice from the same domain the following race condition may happen: 1) The first DetachDevice() call will issue "device_del" on qemu monitor, but since the DEVICE_DELETED event did not arrive in time, the API ends claiming "Device detach request sent successfully". 2) The second DetachDevice() therefore still find the device in the domain and thus proceeds to detaching it again. It calls EnterMonitor() and qemuMonitorSend() trying to issue "device_del" command again. This gets both domain lock and monitor lock released. 3) At this point, qemu sends us the DEVICE_DELETED event which is going to be handled by the event loop which ends up calling qemuDomainSignalDeviceRemoval() to determine who is going to remove the device from domain definition. Whether it is the caller that marked the device for removal or whether it is going to be the event processing thread. 4) Because the device was marked for removal, qemuDomainSignalDeviceRemoval() returns true, which means the event is to be processed by the thread that has marked the device for removal (and is currently still trying to issue "device_del" command) 5) The thread finally issues the "device_del" command, which fails (obviously) and therefore it calls qemuDomainResetDeviceRemoval() to reset the device marking and quits immediately after, NOT removing any device from the domain definition. At this point, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition. Solution is to note down that we've seen the event and if the second "device_del" fails, not take it as a failure but carry on with the usual execution. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_hotplug.c | 83 +++++++++++++++++++++++++++++------------ 2 files changed, 60 insertions(+), 24 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9f468e5661..fb361515ba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr; struct _qemuDomainUnpluggingDevice { const char *alias; qemuDomainUnpluggingDeviceStatus status; + bool eventSeen; /* True if DEVICE_DELETED event arrived. */ }; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 574477e916..93c0e14adf 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -70,22 +70,47 @@ unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5; /** * qemuDomainDeleteDevice: * @mon: qemu monitor + * @vm: domain object * @alias: device to remove * * A simple wrapper around qemuMonitorDelDevice(). - * @mon must be locked upon entry. + * @mon must be locked upon entry, @vm shan't. * * Returns: 0 on success, * -1 otherwise. */ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, + virDomainObjPtr vm, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) - return -1; + qemuDomainObjPrivatePtr priv; + int ret = 0; - return 0; + if (qemuMonitorDelDevice(mon, alias) < 0) { + if (vm) { + /* It is safe to lock and unlock both @mon and @vm + * here because: + * a) qemuDomainObjEnterMonitor() ensures @mon is + * ref()'d + * b) The API that is calling us ensures that @vm is + * ref()'d + */ + virObjectUnlock(mon); + virObjectLock(vm); + priv = vm->privateData; + if (priv->unplug.eventSeen) + virResetLastError(); + else + ret = -1; + virObjectLock(mon); + virObjectUnlock(vm); + } else { + ret = -1; + } + } + + return ret; } @@ -189,7 +214,11 @@ qemuDomainDetachZPCIDevice(qemuMonitorPtr mon, if (virAsprintf(&zpciAlias, "zpci%d", info->addr.pci.zpci.uid) < 0) goto cleanup; - if (qemuDomainDeleteDevice(mon, zpciAlias) < 0) + /* zPCI devices are not exposed in domain XML yet. Therefore, + * they are treated as collateral devices which can't be + * unplugged directly at user's will. Hence, it's safe to + * pass NULL here. */ + if (qemuDomainDeleteDevice(mon, NULL, zpciAlias) < 0) goto cleanup; ret = 0; @@ -5165,6 +5194,7 @@ qemuDomainResetDeviceRemoval(virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; priv->unplug.alias = NULL; + priv->unplug.eventSeen = false; } /* Returns: @@ -5187,7 +5217,8 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm) return 1; until += qemuDomainRemoveDeviceWaitTime; - while (priv->unplug.alias) { + while (priv->unplug.alias && + !priv->unplug.eventSeen) { if ((rc = virDomainObjWaitUntil(vm, until)) == 1) return 0; @@ -5204,6 +5235,9 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm) return -1; } + VIR_DEBUG("unplug.alias=%s unplug.eventSeen=%d", + NULLSTR(priv->unplug.alias), priv->unplug.eventSeen); + return 1; } @@ -5224,6 +5258,7 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm, VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias); qemuDomainResetDeviceRemoval(vm); priv->unplug.status = status; + priv->unplug.eventSeen = true; virDomainObjBroadcast(vm); return true; } @@ -5251,7 +5286,7 @@ qemuDomainDetachVirtioDiskDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditDisk(vm, detach->src, NULL, "detach", false); @@ -5289,7 +5324,7 @@ qemuDomainDetachDiskDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditDisk(vm, detach->src, NULL, "detach", false); @@ -5483,7 +5518,7 @@ int qemuDomainDetachControllerDevice(virQEMUDriverPtr driver, goto exit_monitor; } - if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, detach->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5527,7 +5562,7 @@ qemuDomainDetachHostPCIDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, vm, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5553,7 +5588,7 @@ qemuDomainDetachHostUSBDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, vm, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5579,7 +5614,7 @@ qemuDomainDetachHostSCSIDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, vm, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) return -1; @@ -5606,7 +5641,7 @@ qemuDomainDetachSCSIVHostDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, vm, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) return -1; @@ -5634,7 +5669,7 @@ qemuDomainDetachMediatedDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); - ret = qemuDomainDeleteDevice(priv->mon, detach->info->alias); + ret = qemuDomainDeleteDevice(priv->mon, vm, detach->info->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; @@ -5814,7 +5849,7 @@ qemuDomainDetachShmemDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &shmem->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, shmem->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, shmem->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5875,7 +5910,7 @@ qemuDomainDetachWatchdog(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &watchdog->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, watchdog->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, watchdog->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5925,7 +5960,7 @@ qemuDomainDetachRedirdevDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &tmpRedirdevDef->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, tmpRedirdevDef->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, tmpRedirdevDef->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -5996,7 +6031,7 @@ qemuDomainDetachNetDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, detach->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, detach->info.alias) < 0) { if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; virDomainAuditNet(vm, detach, NULL, "detach", false); @@ -6173,7 +6208,7 @@ int qemuDomainDetachChrDevice(virQEMUDriverPtr driver, goto cleanup; } } else { - if (qemuDomainDeleteDevice(priv->mon, tmpChr->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, tmpChr->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -6229,7 +6264,7 @@ qemuDomainDetachRNGDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &tmpRNG->info); qemuDomainObjEnterMonitor(driver, vm); - rc = qemuDomainDeleteDevice(priv->mon, tmpRNG->info.alias); + rc = qemuDomainDeleteDevice(priv->mon, vm, tmpRNG->info.alias); if (qemuDomainObjExitMonitor(driver, vm) || rc < 0) goto cleanup; @@ -6281,7 +6316,7 @@ qemuDomainDetachMemoryDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &mem->info); qemuDomainObjEnterMonitor(driver, vm); - rc = qemuDomainDeleteDevice(priv->mon, mem->info.alias); + rc = qemuDomainDeleteDevice(priv->mon, vm, mem->info.alias); if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) goto cleanup; @@ -6389,7 +6424,7 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); - rc = qemuDomainDeleteDevice(qemuDomainGetMonitor(vm), vcpupriv->alias); + rc = qemuDomainDeleteDevice(qemuDomainGetMonitor(vm), vm, vcpupriv->alias); if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; @@ -6997,7 +7032,7 @@ qemuDomainDetachInputDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &input->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, input->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, input->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } @@ -7040,7 +7075,7 @@ qemuDomainDetachVsockDevice(virDomainObjPtr vm, qemuDomainMarkDeviceForRemoval(vm, &vsock->info); qemuDomainObjEnterMonitor(driver, vm); - if (qemuDomainDeleteDevice(priv->mon, vsock->info.alias) < 0) { + if (qemuDomainDeleteDevice(priv->mon, vm, vsock->info.alias) < 0) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; } -- 2.19.2

On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1623389
If a device is detached twice from the same domain the following race condition may happen:
1) The first DetachDevice() call will issue "device_del" on qemu monitor, but since the DEVICE_DELETED event did not arrive in time, the API ends claiming "Device detach request sent successfully".
2) The second DetachDevice() therefore still find the device in the domain and thus proceeds to detaching it again. It calls EnterMonitor() and qemuMonitorSend() trying to issue "device_del" command again. This gets both domain lock and monitor lock released.
3) At this point, qemu sends us the DEVICE_DELETED event which is going to be handled by the event loop which ends up calling qemuDomainSignalDeviceRemoval() to determine who is going to remove the device from domain definition. Whether it is the caller that marked the device for removal or whether it is going to be the event processing thread.
4) Because the device was marked for removal, qemuDomainSignalDeviceRemoval() returns true, which means the event is to be processed by the thread that has marked the device for removal (and is currently still trying to issue "device_del" command)
5) The thread finally issues the "device_del" command, which fails (obviously) and therefore it calls qemuDomainResetDeviceRemoval() to reset the device marking and quits immediately after, NOT removing any device from the domain definition.
At this point, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition.
Solution is to note down that we've seen the event and if the second "device_del" fails, not take it as a failure but carry on with the usual execution.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_hotplug.c | 83 +++++++++++++++++++++++++++++------------ 2 files changed, 60 insertions(+), 24 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9f468e5661..fb361515ba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr; struct _qemuDomainUnpluggingDevice { const char *alias; qemuDomainUnpluggingDeviceStatus status; + bool eventSeen; /* True if DEVICE_DELETED event arrived. */ };
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 574477e916..93c0e14adf 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -70,22 +70,47 @@ unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5; /** * qemuDomainDeleteDevice: * @mon: qemu monitor + * @vm: domain object * @alias: device to remove * * A simple wrapper around qemuMonitorDelDevice(). - * @mon must be locked upon entry. + * @mon must be locked upon entry, @vm shan't.
Please use either the more contemporary short form or the long form.
* * Returns: 0 on success, * -1 otherwise. */ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, + virDomainObjPtr vm, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) - return -1; + qemuDomainObjPrivatePtr priv; + int ret = 0;
- return 0; + if (qemuMonitorDelDevice(mon, alias) < 0) { + if (vm) { + /* It is safe to lock and unlock both @mon and @vm + * here because: + * a) qemuDomainObjEnterMonitor() ensures @mon is + * ref()'d + * b) The API that is calling us ensures that @vm is + * ref()'d
Don't break the two lines.
+ */ + virObjectUnlock(mon); + virObjectLock(vm); + priv = vm->privateData; + if (priv->unplug.eventSeen)
You can't do this without also clearing priv->unplug.alias. If the event was not seen at this point you need to make sure that it will unconditionally be handled via the separate event handler thread prior to giving up the lock on VM. By not clearing it, you still have another chance at the event handler thinking the unplug will be handled here without actually doing so.
+ virResetLastError();
Additionally it would be better if qemuMonitorJSONDelDevice actually returns the error string or object without reporting it so that we can decide here not to report it at all rather than resetting it.
+ else + ret = -1; + virObjectLock(mon); + virObjectUnlock(vm); + } else { + ret = -1; + } + } + + return ret;

On Tue, Mar 12, 2019 at 16:41:15 +0100, Peter Krempa wrote:
On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1623389
If a device is detached twice from the same domain the following race condition may happen:
1) The first DetachDevice() call will issue "device_del" on qemu monitor, but since the DEVICE_DELETED event did not arrive in time, the API ends claiming "Device detach request sent successfully".
2) The second DetachDevice() therefore still find the device in the domain and thus proceeds to detaching it again. It calls EnterMonitor() and qemuMonitorSend() trying to issue "device_del" command again. This gets both domain lock and monitor lock released.
3) At this point, qemu sends us the DEVICE_DELETED event which is going to be handled by the event loop which ends up calling qemuDomainSignalDeviceRemoval() to determine who is going to remove the device from domain definition. Whether it is the caller that marked the device for removal or whether it is going to be the event processing thread.
4) Because the device was marked for removal, qemuDomainSignalDeviceRemoval() returns true, which means the event is to be processed by the thread that has marked the device for removal (and is currently still trying to issue "device_del" command)
5) The thread finally issues the "device_del" command, which fails (obviously) and therefore it calls qemuDomainResetDeviceRemoval() to reset the device marking and quits immediately after, NOT removing any device from the domain definition.
At this point, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition.
Solution is to note down that we've seen the event and if the second "device_del" fails, not take it as a failure but carry on with the usual execution.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_hotplug.c | 83 +++++++++++++++++++++++++++++------------ 2 files changed, 60 insertions(+), 24 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9f468e5661..fb361515ba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr; struct _qemuDomainUnpluggingDevice { const char *alias; qemuDomainUnpluggingDeviceStatus status; + bool eventSeen; /* True if DEVICE_DELETED event arrived. */ };
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 574477e916..93c0e14adf 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -70,22 +70,47 @@ unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5; /** * qemuDomainDeleteDevice: * @mon: qemu monitor + * @vm: domain object * @alias: device to remove * * A simple wrapper around qemuMonitorDelDevice(). - * @mon must be locked upon entry. + * @mon must be locked upon entry, @vm shan't.
Please use either the more contemporary short form or the long form.
* * Returns: 0 on success, * -1 otherwise. */ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, + virDomainObjPtr vm, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) - return -1; + qemuDomainObjPrivatePtr priv; + int ret = 0;
- return 0; + if (qemuMonitorDelDevice(mon, alias) < 0) { + if (vm) { + /* It is safe to lock and unlock both @mon and @vm + * here because: + * a) qemuDomainObjEnterMonitor() ensures @mon is + * ref()'d + * b) The API that is calling us ensures that @vm is + * ref()'d
Don't break the two lines.
+ */ + virObjectUnlock(mon); + virObjectLock(vm); + priv = vm->privateData; + if (priv->unplug.eventSeen)
You can't do this without also clearing priv->unplug.alias. If the event was not seen at this point you need to make sure that it will unconditionally be handled via the separate event handler thread prior to giving up the lock on VM. By not clearing it, you still have another chance at the event handler thinking the unplug will be handled here without actually doing so.
+ virResetLastError();
Additionally it would be better if qemuMonitorJSONDelDevice actually returns the error string or object without reporting it so that we can decide here not to report it at all rather than resetting it.
After some more thinking I think that: 1) The new helper should encapsulate also the call to enter the monitor - that way you can avoid the super-hacky way that re-locks the monitor. - you can then intergrate setting of the alias into private data - resetting of it after exit of the monitor - checking that the event was seen This should work nicely as AFAIK all code paths using 'device_del' should only ever call this one API and do everything else. 2) Refactor the monitor APIs for device_del so that they return the error message reported by the monitor as success and return the copy of the error message. This will probably also require a different monitor error checking function That way you can get the error message and report it if you decide so. If any code path requires more than one monitor call, the 1) will also ensure that the alias which we are waiting for is set only during controlled amount of time and thus we are sure that the event thread handles any DEVICE_DELETED events otherwise.

On 3/12/19 7:57 PM, Peter Krempa wrote:
On Tue, Mar 12, 2019 at 16:41:15 +0100, Peter Krempa wrote:
On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1623389
If a device is detached twice from the same domain the following race condition may happen:
1) The first DetachDevice() call will issue "device_del" on qemu monitor, but since the DEVICE_DELETED event did not arrive in time, the API ends claiming "Device detach request sent successfully".
2) The second DetachDevice() therefore still find the device in the domain and thus proceeds to detaching it again. It calls EnterMonitor() and qemuMonitorSend() trying to issue "device_del" command again. This gets both domain lock and monitor lock released.
3) At this point, qemu sends us the DEVICE_DELETED event which is going to be handled by the event loop which ends up calling qemuDomainSignalDeviceRemoval() to determine who is going to remove the device from domain definition. Whether it is the caller that marked the device for removal or whether it is going to be the event processing thread.
4) Because the device was marked for removal, qemuDomainSignalDeviceRemoval() returns true, which means the event is to be processed by the thread that has marked the device for removal (and is currently still trying to issue "device_del" command)
5) The thread finally issues the "device_del" command, which fails (obviously) and therefore it calls qemuDomainResetDeviceRemoval() to reset the device marking and quits immediately after, NOT removing any device from the domain definition.
At this point, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition.
Solution is to note down that we've seen the event and if the second "device_del" fails, not take it as a failure but carry on with the usual execution.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_hotplug.c | 83 +++++++++++++++++++++++++++++------------ 2 files changed, 60 insertions(+), 24 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9f468e5661..fb361515ba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr; struct _qemuDomainUnpluggingDevice { const char *alias; qemuDomainUnpluggingDeviceStatus status; + bool eventSeen; /* True if DEVICE_DELETED event arrived. */ };
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 574477e916..93c0e14adf 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -70,22 +70,47 @@ unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5; /** * qemuDomainDeleteDevice: * @mon: qemu monitor + * @vm: domain object * @alias: device to remove * * A simple wrapper around qemuMonitorDelDevice(). - * @mon must be locked upon entry. + * @mon must be locked upon entry, @vm shan't.
Please use either the more contemporary short form or the long form.
* * Returns: 0 on success, * -1 otherwise. */ static inline int qemuDomainDeleteDevice(qemuMonitorPtr mon, + virDomainObjPtr vm, const char *alias) { - if (qemuMonitorDelDevice(mon, alias) < 0) - return -1; + qemuDomainObjPrivatePtr priv; + int ret = 0;
- return 0; + if (qemuMonitorDelDevice(mon, alias) < 0) { + if (vm) { + /* It is safe to lock and unlock both @mon and @vm + * here because: + * a) qemuDomainObjEnterMonitor() ensures @mon is + * ref()'d + * b) The API that is calling us ensures that @vm is + * ref()'d
Don't break the two lines.
+ */ + virObjectUnlock(mon); + virObjectLock(vm); + priv = vm->privateData; + if (priv->unplug.eventSeen)
You can't do this without also clearing priv->unplug.alias. If the event was not seen at this point you need to make sure that it will unconditionally be handled via the separate event handler thread prior to giving up the lock on VM. By not clearing it, you still have another chance at the event handler thinking the unplug will be handled here without actually doing so.
+ virResetLastError();
Additionally it would be better if qemuMonitorJSONDelDevice actually returns the error string or object without reporting it so that we can decide here not to report it at all rather than resetting it.
After some more thinking I think that:
1) The new helper should encapsulate also the call to enter the monitor - that way you can avoid the super-hacky way that re-locks the monitor. - you can then intergrate setting of the alias into private data - resetting of it after exit of the monitor - checking that the event was seen This should work nicely as AFAIK all code paths using 'device_del' should only ever call this one API and do everything else.
2) Refactor the monitor APIs for device_del so that they return the error message reported by the monitor as success and return the copy of the error message. This will probably also require a different monitor error checking function
That way you can get the error message and report it if you decide so.
Alternatively, I can introduce an argument to qemuMonitorDelDevice() which would supress error reporting for this specific case (we still want to report errors from QEMU_CHECK_MONITOR(), qemuMonitorJSONMakeCommand(), ...), return say -2 so that my code knows device_del failed because the device is already missing.
If any code path requires more than one monitor call, the 1) will also ensure that the alias which we are waiting for is set only during controlled amount of time and thus we are sure that the event thread handles any DEVICE_DELETED events otherwise.
I started writing it this way. But problem is the qemuDomainDetachExtensionDevice() which is called in some rollback codes. For instance in qemuDomainAttachDiskGeneric: qemuDomainObjEnterMonitor(driver, vm); if (qemuHotplugDiskSourceAttach(priv->mon, diskdata) < 0) goto exit_monitor; if (qemuDomainAttachExtensionDevice(priv->mon, &disk->info) < 0) goto exit_monitor; if (qemuMonitorAddDevice(priv->mon, devstr) < 0) { ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &disk->info)); goto exit_monitor; } if (qemuDomainObjExitMonitor(driver, vm) < 0) { But I guess I can do what I'm doing now - if some argument is NULL then I can assume caller has done EnterMonitor() and not call it again. Michal

On Wed, Mar 13, 2019 at 11:35:50 +0100, Michal Privoznik wrote:
On 3/12/19 7:57 PM, Peter Krempa wrote:
On Tue, Mar 12, 2019 at 16:41:15 +0100, Peter Krempa wrote:
On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:
[...]
Additionally it would be better if qemuMonitorJSONDelDevice actually returns the error string or object without reporting it so that we can decide here not to report it at all rather than resetting it.
After some more thinking I think that:
1) The new helper should encapsulate also the call to enter the monitor - that way you can avoid the super-hacky way that re-locks the monitor. - you can then intergrate setting of the alias into private data - resetting of it after exit of the monitor - checking that the event was seen This should work nicely as AFAIK all code paths using 'device_del' should only ever call this one API and do everything else.
2) Refactor the monitor APIs for device_del so that they return the error message reported by the monitor as success and return the copy of the error message. This will probably also require a different monitor error checking function
That way you can get the error message and report it if you decide so.
Alternatively, I can introduce an argument to qemuMonitorDelDevice() which would supress error reporting for this specific case (we still want to report errors from QEMU_CHECK_MONITOR(), qemuMonitorJSONMakeCommand(), ...), return say -2 so that my code knows device_del failed because the device is already missing.
Well, in this case you could check for the "DeviceNotFound" error class. The thing is you still want to report the original error from qemu in case when the event was not received on failure, so you'll have to pass it up anyways to do the decision where 'priv' is available after you exit the monitor. Note that even when a GenericError or any other error was received you still need to process the device deletion if it was hijacked by the current thread.
If any code path requires more than one monitor call, the 1) will also ensure that the alias which we are waiting for is set only during controlled amount of time and thus we are sure that the event thread handles any DEVICE_DELETED events otherwise.
I started writing it this way. But problem is the qemuDomainDetachExtensionDevice() which is called in some rollback
You certainly can do a separate wrapper for this. The extension device will never be handled through the local processing, thus it's alias should not be set in priv->unplug.alias, so it does not need this handling. We need to do this only for those where we do want to wait for the detach event for local processing.
codes. For instance in qemuDomainAttachDiskGeneric:
qemuDomainObjEnterMonitor(driver, vm);
if (qemuHotplugDiskSourceAttach(priv->mon, diskdata) < 0) goto exit_monitor;
if (qemuDomainAttachExtensionDevice(priv->mon, &disk->info) < 0) goto exit_monitor;
if (qemuMonitorAddDevice(priv->mon, devstr) < 0) { ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &disk->info)); goto exit_monitor; }
if (qemuDomainObjExitMonitor(driver, vm) < 0) {
But I guess I can do what I'm doing now - if some argument is NULL then I can assume caller has done EnterMonitor() and not call it again.
Michal
participants (2)
-
Michal Privoznik
-
Peter Krempa