[libvirt] any better way to treat device-detach timeout ?

Hi,all: I find that we remove devices only after DEVICE_DELETED event.(patch:3fbf78bd). And it treats TIMEOUT as success. The detailed codes are shown as below: rc = qemuDomainWaitForDeviceRemoval(vm); if (rc == 0 || rc == 1) ret = qemuDomainRemoveDiskDevice(driver, vm, detach); else ret = 0; Here comes the problem: If 1. We pass flags=3(VIR_DOMAIN_AFFECT_LIVE|VIR_DOMAIN_AFFECT_CONFIG) to virDomainDetachDeviceFlags, 2. It timed out to receive DEVICE_DELETED event, virDomainDetachDeviceFlags api will return 0(succeed). Please be aware that: vm->newdef(persistent) removes the device while vm->def(transient) keeps it untouched. 3. Then we try to attach the same device with flag 3, because it differs in newDef and def, the API virDomainDetachDeviceFlags() returns ERROR. Consequent attach/detach jobs all fail after then. So, shall we make it return FAIL when it fails in qemuDomainWaitForDeviceRemoval()? Or just remove the device in def after we get the DEVICE_DELETED event? Any ideas, thanks!

On Fri, Mar 11, 2016 at 07:40:54 +0000, Zhangbo (Oscar) wrote:
Hi,all:
I find that we remove devices only after DEVICE_DELETED event.(patch:3fbf78bd). And it treats TIMEOUT as success. The detailed codes are shown as below: rc = qemuDomainWaitForDeviceRemoval(vm); if (rc == 0 || rc == 1) ret = qemuDomainRemoveDiskDevice(driver, vm, detach); else ret = 0;
Here comes the problem: If 1. We pass flags=3(VIR_DOMAIN_AFFECT_LIVE|VIR_DOMAIN_AFFECT_CONFIG) to virDomainDetachDeviceFlags, 2. It timed out to receive DEVICE_DELETED event, virDomainDetachDeviceFlags api will return 0(succeed). Please be aware that: vm->newdef(persistent) removes the device while vm->def(transient) keeps it untouched. 3. Then we try to attach the same device with flag 3, because it differs in newDef and def, the API virDomainDetachDeviceFlags() returns ERROR. Consequent attach/detach jobs all fail after then.
So, shall we make it return FAIL when it fails in qemuDomainWaitForDeviceRemoval()?
No.
Or just remove the device in def after we get the DEVICE_DELETED event?
Yes, and we already do that. The API is explicitly documented as a request to detach a device. It returns 0 when this request was successfully passed to the guest OS. Some devices (e.g., USB devices) are detached immediately, but others may require guest cooperation and may take long time to detach or the guest may even completely ignore the request. Thus, once virDomainDetachDeviceFlags returns 0, you need to either periodically check the active XML to see if the device was removed or you can register a callback for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event which is emitted when libvirt really removes the device from the domain (after getting DEVICE_DELETED from QEMU). Jirka

On Fri, Mar 11, 2016 at 07:40:54 +0000, Zhangbo (Oscar) wrote:
Hi,all:
I find that we remove devices only after DEVICE_DELETED event.(patch:3fbf78bd). And it treats TIMEOUT as success. The detailed codes are shown as below: rc = qemuDomainWaitForDeviceRemoval(vm); if (rc == 0 || rc == 1) ret = qemuDomainRemoveDiskDevice(driver, vm, detach); else ret = 0;
Here comes the problem: If 1. We pass flags=3(VIR_DOMAIN_AFFECT_LIVE|VIR_DOMAIN_AFFECT_CONFIG) to virDomainDetachDeviceFlags, 2. It timed out to receive DEVICE_DELETED event, virDomainDetachDeviceFlags api will return 0(succeed). Please be aware that: vm->newdef(persistent) removes the device while vm->def(transient) keeps it untouched. 3. Then we try to attach the same device with flag 3, because it differs in newDef and def, the API virDomainDetachDeviceFlags() returns ERROR. Consequent attach/detach jobs all fail after then.
So, shall we make it return FAIL when it fails in qemuDomainWaitForDeviceRemoval()?
No.
Or just remove the device in def after we get the DEVICE_DELETED event?
Yes, and we already do that. The API is explicitly documented as a request to detach a device. It returns 0 when this request was successfully passed to the guest OS. Some devices (e.g., USB devices) are detached immediately, but others may require guest cooperation and may take long time to detach or the guest may even completely ignore the request. Thus, once virDomainDetachDeviceFlags returns 0, you need to either periodically check the active XML to see if the device was removed or you can register a callback for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event which is emitted when
I got it! Thanks!
libvirt really removes the device from the domain (after getting DEVICE_DELETED from QEMU).
Jirka
--

After rethinking, I guess there's still a problem: Suggest that 1 the device got detached in qemu after 10 secs, 2 the func virDomainDetachDeviceFlags() has already returned success in the 5th second. Libvirt won't wait for the DEVICE_DELETED event after then. 3 the 'def' has the device untouched 4 thus, 10 secs later, the device is detached successfully in qemu, but libvirt has the device un-detached in def(transient XML). Which means that: libvirt is not correct in device list then, we will fail to attach/detach that device after then. Oscar.
On Fri, Mar 11, 2016 at 07:40:54 +0000, Zhangbo (Oscar) wrote:
Hi,all:
I find that we remove devices only after DEVICE_DELETED event.(patch:3fbf78bd). And it treats TIMEOUT as success. The detailed codes are shown as below: rc = qemuDomainWaitForDeviceRemoval(vm); if (rc == 0 || rc == 1) ret = qemuDomainRemoveDiskDevice(driver, vm, detach); else ret = 0;
Here comes the problem: If 1. We pass flags=3(VIR_DOMAIN_AFFECT_LIVE|VIR_DOMAIN_AFFECT_CONFIG) to virDomainDetachDeviceFlags, 2. It timed out to receive DEVICE_DELETED event, virDomainDetachDeviceFlags api will return 0(succeed). Please be aware that: vm->newdef(persistent) removes the device while vm->def(transient) keeps it untouched. 3. Then we try to attach the same device with flag 3, because it differs in newDef and def, the API virDomainDetachDeviceFlags() returns ERROR. Consequent attach/detach jobs all fail after then.
So, shall we make it return FAIL when it fails in qemuDomainWaitForDeviceRemoval()?
No.
Or just remove the device in def after we get the DEVICE_DELETED event?
Yes, and we already do that. The API is explicitly documented as a request to detach a device. It returns 0 when this request was successfully passed to the guest OS. Some devices (e.g., USB devices) are detached immediately, but others may require guest cooperation and may take long time to detach or the guest may even completely ignore the request. Thus, once virDomainDetachDeviceFlags returns 0, you need to either periodically check the active XML to see if the device was removed or you can register a callback for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event which is emitted when
I got it! Thanks!
libvirt really removes the device from the domain (after getting DEVICE_DELETED from QEMU).
Jirka
--
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Mar 11, 2016 at 09:26:23 +0000, Zhangbo (Oscar) wrote:
After rethinking, I guess there's still a problem:
Suggest that 1 the device got detached in qemu after 10 secs, 2 the func virDomainDetachDeviceFlags() has already returned success in the 5th second. Libvirt won't wait for the DEVICE_DELETED event after then. 3 the 'def' has the device untouched 4 thus, 10 secs later, the device is detached successfully in qemu, but libvirt has the device un-detached in def(transient XML).
Do you have an evidence (logs) of this happening? Such behavior would be a bug since libvirt always listens to DEVICE_DELETED event and acts on it even if virDomainDetachDeviceFlags already returned. Jirka

After rethinking, I guess there's still a problem:
Suggest that 1 the device got detached in qemu after 10 secs, 2 the func virDomainDetachDeviceFlags() has already returned success in
3 the 'def' has the device untouched 4 thus, 10 secs later, the device is detached successfully in qemu, but
On Fri, Mar 11, 2016 at 09:26:23 +0000, Zhangbo (Oscar) wrote: the 5th second. Libvirt won't wait for the DEVICE_DELETED event after then. libvirt has the device un-detached in def(transient XML).
Do you have an evidence (logs) of this happening? Such behavior would be a bug since libvirt always listens to DEVICE_DELETED event and acts on it even if virDomainDetachDeviceFlags already returned.
Jirka
--
Wrong test, sorry for bothering! :)
participants (2)
-
Jiri Denemark
-
Zhangbo (Oscar)