Re: [libvirt] [Qemu-devel] [PATCH RFC 0/4] Allow hibernation on guests

Thursday, 26 January 2012

On 01/26/2012 02:13 PM, Luiz Capitulino wrote:
...
 On Thu, 26 Jan 2012 20:41:13 +0100
 Michal Privoznik&lt;mprivozn(a)redhat.com&gt;  wrote:

> On 26.01.2012 20:35, Luiz Capitulino wrote:
>> On Thu, 26 Jan 2012 08:18:03 -0700
>> Eric Blake&lt;eblake(a)redhat.com&gt;  wrote:
>>
>>> [adding qemu-devel]
>>>
>>> On 01/26/2012 07:46 AM, Daniel P. Berrange wrote:
>>>>> One thing, that you'll probably notice is this
>>>>> 'set-support-level' command. Basically, it tells GA what qemu
version
>>>>> is it running on. Ideally, this should be done as soon as
>>>>> GA starts up. However, that cannot be determined from outside
>>>>> world as GA doesn't emit any events yet.
>>>>> Ideally^2 this command should be left out as it should be qemu
>>>>> who tells its own agent this kind of information.
>>>>> Anyway, I was going to call this command in qemuProcess{Startup,
>>>>> Reconnect,Attach}, but it won't work. We need to un-pause guest
CPUs
>>>>> so guest can boot and start GA, but that implies returning from
qemuProcess*.
>>>>>
>>>>> So I am setting this just before 'guest-suspend' command, as
>>>>> there is one more thing about GA. It is unable to remember anything
>>>>> upon its restart (GA process). Which has BTW show flaw
>>>>> in our current code with FS freeze&  thaw. If we freeze guest
>>>>> FS, and somebody restart GA, the simple FS Thaw will not succeed as
>>>>> GA thinks FS are not frozen. But that's a different cup of tea.
>>>>>
>>>>> Because of what written above, we need to call set-level
>>>>> on every suspend.
>>>>
>>>>
>>>> IMHO all this says that the 'set-level' command is a
conceptually
>>>> unfixably broken design&  should be killed in QEMU before it turns
>>>> into an even bigger mess.
>>
>> Can you elaborate on this? Michal and I talked on irc about making the
>> compatibility level persistent, would that help?
>>
>>>> Once we're in a situation where we need to call 'set-level'
prior
>>>> to every single invocation, you might as well just allow the QEMU
>>>> version number to be passed in directly as an arg to the command
>>>> you are running directly thus avoiding this horrificness.
>>>
>>> Qemu folks, would you care to chime in on this?
>>>
>>> Exactly how is the set-level command supposed to work?  As I understand
>>> it, the goal is that if the guest has qemu-ga 1.1 installed, but is
>>> being run by qemu 1.0, then we want to ensure that any guest agent
>>> command supported by qemu-ga 1.1 but requiring features of qemu not
>>> present in qemu 1.0 will be properly rejected.
>>
>> Not exactly, the default support of qemu-ga is qemu 1.0. This means that by
>> default qemu-ga will only support qemu 1.0 even when running on qemu 2.0. This
>> way the set-support-level command allows you to specify that qemu 2.0 features
>> are supported.
>>
>> Note that this is only about specific features that depend on host support,
>> like S3 suspend which is known to be buggy in current and old qemu.
>>
>>> But whose job is it to tell the guest agent what version of qemu is
>>> running?  Based on the above conversation, it looks like the current
>>> qemu implementation does not do any handshaking on its own when the
>>> guest agent first comes alive, which means that you are forcing the work
>>> on the management app (libvirt).  And this is inherently racy - if the
>>> guest is allowed to restart its qemu-ga process at will, and each
>>> restart of that guest process triggers a need to redo the handshake,
>>> then libvirt can never reliably know what version the agent is running at.
>>
>> Making the set-support-level persistent would solve it, wouldn't it?
>
> Yes and no. We still need an event when GA come to live. Because if
> anybody tries to write something for GA which is not running (and for
> purpose of this scenario assume it never will), like 'set-support-level'
> and wait for answer (which will never come) he will be blocked
> indefinitely. However, if he writes it after 1st event come, everything
> is OK.

 What if the event never reach libvirt?

 This problem is a lot more general and is not related to the
 set-support-level command. Maybe adding shutdown&  start events can serve as
 good hints, but they won't fix the problem. 
Yah, start up events are a good indicator to issue the guest-sync 
sequence (we had them at one point, and planned to re-add them for QMP 
integration, and since libvirt is taking on this role for now it might 
make sense to re-add it now), but once that sequence is issued the agent 
can still be manually stopped, or the guest-sync sequence itself can 
timeout.

And there's no way to reliably send a stop indicator, maybe to capture 
shutdown events, but not consistently enough that we can base the 
protocol on it (agent might get pkill -9'd for instance, and 
virtio-serial doesn't currently plumb a guest-side disconnect to the 
chardev front-end, so you'd never know).

So, the only indication you'll ever get that your "session" ended is 
either a timeout, or, if we add it, a start up event. In either case the 
response is to issue the reset sequence.

The way it would need to work with resets is everytime a command times 
out you:

1) report the timeout error to libvirt client/management app. set 
guest-agent_available = 0, such that further libvirt calls that depend 
on it would return "not currently available", or something to that effect.
2) issue guest-sync with new unique session id
3) read a json object/response.
   - if you time out, goto 2
   - if your response doesn't have the session id you're expecting, 
repeat 3) (since it may be a response to a previous guest-sync RPC that 
you timed out on prematurely, but you can't just wait indefinitely, 
since it may never arrive)
4) set guest_agent_available = 1, proceed with normal operation till the 
next timeout (or start up event, if we add one).

...

 IMHO, the best way to solve this is to issue the guest-sync command with
 a timeout. If you get no answer, then try again later.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [Qemu-devel] [PATCH RFC 0/4] Allow hibernation on guests