On 04/01/2016 02:44 AM, Zhangbo (Oscar) wrote:
>> Hi all:
>> Suppose we have a guest domain which is pvops, for example, rhel6.4.
>>
>> Steps to produce the problem:
>> 1 start the guest by virDomainCreate()
>> 2 the API returns before the guest domain fully available, which means,
> the disks, network interfaces and some import services are not available inside
> the guest.
>> 3 we call virDomainShutdown() to shutdown the guest.
>>
>> Expected result:
>> The guest got shutdown.
>>
>> The result in fact:
>> Because the guest is not available when we call virDomainShutdown(),
> it couldn't respond to our 'shutdown' xenstore request, the guest turns
on
> later, rather than shutting down.
>
> I don't think this is unique to a pvops guest kernel, or even a xen stack. I see
> the same behavior with qemu. 'virsh create dom.xml && virsh shutdown
dom'
> results in the guest kernel missing the shutdown event and booting anyhow. I
> guess SeaBIOS could still be loading when the shutdown event is issued :-). The
> virDomainShutdownFlags documentation even states "that the guest OS may
> ignore
> the request". In my example, the guest OS isn't even alive yet.
>
>> So , the question is:
>> In libxl_driver( xen-hypervisor environment), how can we tell that the
> guest is available or not, and is it suitable to shutdown the guest at that
> moment?
>
> libxl has no API to determine if a guest OS has booted. In a qemu/kvm stack, I
> suppose qemu-ga is the preferred way to know when a guest OS has booted, or
> is
> far enough along to respond to shutdown events.
>
> One possible approach in xen, which is not supported by libvirt, would be to
> monitor the state of a device frontend in xenstore. E.g. when
> /local/domain/<domid>/device/vif/<vifid>/state reaches 4 (connected),
you'll at
> least know the driver in the guest is up and running.
I've tried that way, but even the device state is not trustable, because inside the
guest, it calls "add_disk" after the device state changes to 4, and before it
could respond the 'shutdown' xenstore request, which takes a while to complete.
Yeah, I thought that was a longshot. Synchronization of the front and backend
drivers doesn't necessarily mean the OS is in a position to respond to the
shutdown event. Lacking a guest agent, another option would be to wait for the
guests network stack to come alive, e.g. responds to pings or connection requests.
Regards,
Jim