On 18.05.2013 05:31, Li Wei wrote:
> On 05/17, Michal Privoznik wrote:
>> On 17.05.2013 11:37, Li Wei wrote:
>>> On 05/17/2013 04:52 PM, Michal Privoznik wrote:
>>>> On 17.05.2013 10:16, Li Wei wrote:
>>>>> On 05/17/2013 04:07 PM, Michal Privoznik wrote:
>>>>>> On 17.05.2013 07:47, Li Wei wrote:
>>>>>>> Hi Michal, Daniel
>>>>>>>
>>>>>>> It seems there is something wrong with the 2/2 part of this
patchset.
>>>>>>> When I do an "change-media" command in virsh, it
doesn't do any better
>>>>>>> than before, but even worse(I must to wait 5 secs to see the
error).
>>>>>>>
>>>>>>> I'm not family with libvirt, just add some log things in
the qemu_hotplug.c
>>>>>>> and found the tray_status never change to open, but with
michal's original
>>>>>>> patch (which do active poll on tray_status), I can do
"change-media" successfully
>>>>>>> every time.
>>>>>>
>>>>>> Yes & no. Thing is - originally we ignored tray status in the
guest. So
>>>>>> in case guest has locked the tray and thus ignored our eject
request, we
>>>>>> have changed the media anyway, resulting in possible I/O errors
within
>>>>>> guest.
>>>>>>
>>>>>> With my patch, we send eject request to the qemu, and actively
poll for
>>>>>> the tray to open. We check for up to 10 times every 200ms. So
guest has
>>>>>> 2 seconds to open the tray. If the guest doesn't eject media
and open
>>>>>> the tray we don't proceed with forcibly changing the media.
>>>>>>
>>>>>> These patches are actually a bug fix for
>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=892289
>>>>>
>>>>> Hmm... I think you misunderstood my meaning(sorry for my pool
english),
>>>>> simply to say, I can get expected behavior by apply patch V1 but not
V2.
>>>>>
>>>>> In V1, we do active poll on qemuMonitorGetBlockInfo(), and in V2 we
just
>>>>> wait for origdisk->tray_status to become VIR_DOMAIN_DISK_TRAY_OPEN
but
>>>>> this never happened.
>>>>>
>>>>> Thanks,
>>>>> Wei
>>>>
>>>> Aaah, now I understand. So you are saying that the event is not being
>>>> delivered. What's the qemu version?
>>>
>>>
>>> I have the following version:
>>>
>>> virsh # version
>>> Compiled against library: libvirt 1.0.5
>>> Using library: libvirt 1.0.5
>>> Using API: QEMU 1.0.5
>>> Running hypervisor: QEMU 1.4.1
>>>
>>>>
>>>> If you turn the debug logs on, do you see DEVICE_TRAY_MOVED in them?
>>>>
http://wiki.libvirt.org/page/DebugLogs
>>>>
>>>> Or maybe the timeout is just short for your guest to eject the tray. So
>>>> after a while, does 'virsh change-media' succeed?
>>>
>>> Yes, I can see the DEVICE_TRAY_MOVED in the log file, it goes after
>>> the debug message "qemuDomainChangeEjectableMedia:103 : Waiting 500ms
for tray to open. Retries left 0".
>>> does that mean the timeout is too short?
>>>
>>> If the guest needs too much time to eject, I'm wondering why when I use
>>> the v1 patch, I can get succeed in about 1 to 2 seconds. I doubt if there
>>> are other problems cause tray_status not updated in time?
>>>
>>> I appended the log file for your reference, if you need any other
information,
>>> please tell me.
>>>
>>> Thanks,
>>> Wei
>>
>> I am pasting some interesting knobs from the log:
>>
>> 2013-05-17 09:04:46.234+0000: 27004: debug : qemuMonitorSend:892 :
QEMU_MONITOR_SEND_MSG: mon=0x7f7ac4000bd0
msg={"execute":"eject","arguments":{"device":"drive-ide0-1-0","force":false},"id":"libvirt-7"}
>> fd=-1
>> 2013-05-17 09:04:46.234+0000: 27001: debug : virObjectRef:295 : OBJECT_REF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:46.234+0000: 27001: debug : qemuMonitorIOWrite:458 :
QEMU_MONITOR_IO_WRITE: mon=0x7f7ac4000bd0
buf={"execute":"eject","arguments":{"device":"drive-ide0-1-0","force":false},"id":"libvirt-7"}
>> len=92 ret=92 errno=2
>> 2013-05-17 09:04:46.234+0000: 27001: debug : virObjectUnref:258 : OBJECT_UNREF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:46.234+0000: 27001: debug : virObjectRef:295 : OBJECT_REF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:46.234+0000: 27001: debug : qemuMonitorIOProcess:350 :
QEMU_MONITOR_IO_PROCESS: mon=0x7f7ac4000bd0 buf={"id": "libvirt-7",
"error": {"class": "GenericError", "desc":
"Device 'drive-ide0-1-0' is locked"}}
>> len=102
>> 2013-05-17 09:04:46.234+0000: 27001: debug : virObjectUnref:258 : OBJECT_UNREF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:46.235+0000: 27004: error : qemuMonitorJSONCheckError:352 :
internal error unable to execute QEMU command 'eject': Device
'drive-ide0-1-0' is locked
>> 2013-05-17 09:04:46.235+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 9
>> 2013-05-17 09:04:46.735+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 8
>> 2013-05-17 09:04:47.235+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 7
>> 2013-05-17 09:04:47.735+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 6
>> 2013-05-17 09:04:48.186+0000: 27001: debug : virObjectRef:295 : OBJECT_REF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:48.235+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 5
>> 2013-05-17 09:04:48.735+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 4
>> 2013-05-17 09:04:49.235+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 3
>> 2013-05-17 09:04:49.736+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 2
>> 2013-05-17 09:04:50.236+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 1
>> 2013-05-17 09:04:50.736+0000: 27004: debug : qemuDomainChangeEjectableMedia:103 :
Waiting 500ms for tray to open. Retries left 0
>> 2013-05-17 09:04:51.236+0000: 27004: error : qemuDomainChangeEjectableMedia:117 :
operation failed: Unable to eject media before changing it
>> 2013-05-17 09:04:51.236+0000: 27004: debug : virObjectUnref:258 : OBJECT_UNREF:
obj=0x7f7ac4000bd0
>> 2013-05-17 09:04:51.236+0000: 27004: debug : virDomainLockDiskDetach:282 :
plugin=0x7f7acc007c00 dom=0x7f7acc0a4160 disk=0x7f7ac4003690
>> 2013-05-17 09:04:51.236+0000: 27004: debug : virDomainLockManagerNew:128 :
plugin=0x7f7acc007c00 dom=0x7f7acc0a4160 withResources=0
>> 2013-05-17 09:04:51.236+0000: 27004: debug : virLockManagerPluginGetDriver:295 :
plugin=0x7f7acc007c00
>> 2013-05-17 09:04:51.236+0000: 27004: debug : virLockManagerNew:319 :
driver=0x7f7ae53dc5e0 type=0 nparams=5 params=0x7f7ad7d25830 flags=0
>> 2013-05-17 09:04:51.236+0000: 27001: debug : qemuMonitorIOProcess:350 :
QEMU_MONITOR_IO_PROCESS: mon=0x7f7ac4000bd0 buf={"timestamp":
{"seconds": 1368781488, "microseconds": 186399}, "event":
"DEVICE_TRAY_MOVED", "data": {"device":
"drive-ide0-1-0", "tray-open": true}}
>> len=151
>>
>>
>> What we can see here is:
>> 1) libvirt asks qemu to eject
>> 2) qemu responds with "Device is locked" - which is correct as it
emulates guest request
>> 3) However, after a while, the guest decides it's now safe to eject the media
(e.g. all reads have finished, and the device has been unmounted) which is done
>> 4) [not in the pasted log] After a while, you've decided to try again and
this time it succeeded, as the tray is already opened.
>>
>> What we may do here is lifting the number of retries up. You can try yourself:
>
> I have indeed increased retries to 20 and unfortunately that didn't work either,
> we can see in the log file that the DEVICE_TRAY_MOVED event received immediately
> after the final retry. After we doubled the retries, that would be more that enough
> time for the guest to eject the tray. So, maybe there is something wrong otherwhere.
>
> Thanks,
> Wei
I think I know where the bug is. When waiting on tray to open, we don't unlock the
domain. So later, when an event arrives, the first thing that the event handler does is
locking the domain. Hence, it seems like the event is delivered right after we've
failed waiting. That's just because after we've finished waiting, we've
unlocked the domain. And I think same applies for the domain monitor. So can you please
try this out:
This patch works fine, thanks :)
Tested-by: Li Wei <lw(a)cn.fujitsu.com>
Thanks,
Wei
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index d037c9d..767d5a1 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -95,14 +95,20 @@ int qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
qemuDomainObjEnterMonitor(driver, vm);
ret = qemuMonitorEjectMedia(priv->mon, driveAlias, force);
+ virObjectRef(vm);
/* we don't want to report errors from media tray_open polling */
while (retries--) {
if (origdisk->tray_status == VIR_DOMAIN_DISK_TRAY_OPEN)
break;
+ qemuDomainObjExitMonitor(driver, vm);
+ virObjectUnlock(vm);
VIR_DEBUG("Waiting 500ms for tray to open. Retries left %d",
retries);
usleep(500 * 1000); /* sleep 500ms */
+ virObjectLock(vm);
+ qemuDomainObjEnterMonitor(driver, vm);
}
+ virObjectUnref(vm);
if (disk->src) {
/* deliberately don't depend on 'ret' as 'eject' may have
failed for the
Michal