On Thu, 19 Jan 2012 10:15:55 -0700
Eric Blake <eblake(a)redhat.com> wrote:
On 01/19/2012 08:56 AM, Luiz Capitulino wrote:
> Long ago, commit 625a5be added the guest provided memory statistics to
> the query-balloon command. Unfortunately, it also introduced a severe
> bug: query-balloon would hang if the guest didn't respond. This, in turn,
> would also cause a hang in libvirt.
>
> Because of that, we decided to disable the guest memory stats feature
> (commit 11724ff).
>
> As we decided to let commands implement ad-hoc async mechanisms until we
> get a proper way to do it, I decided to try to re-enable that feature.
>
> My idea is to have a command and an event. The command gets the process
> started by sending a request to guest and returns. Later, when the guest
> makes the memory stats info available, it's sent to the client by means
> of an QMP event (please, take a look at patch 05/05 for full details).
>
> I'm not sure if that approach is good for libvirt though, so it would be
> very helpful to get their input (Eric, I'm CC'ing you here, but feel free
> to route this to someone else).
[I went ahead and cc'd the libvirt list]
Yes, libvirt can live with this approach. And having this in parallel
to a qemu-ga verb is nice, since, as it was pointed out, this would
allow interaction with guests that have a balloon device but not a guest
agent.
You may want to read this thread [1], for thoughts on the impact of
making another existing blocking command be extended into one that
starts an async event and ends when an event is raised; libvirt can
expose both a blocking and an asynchronous implementation to the user on
top of the qemu model being just asynchronous.
[1]
https://www.redhat.com/archives/libvir-list/2012-January/msg00562.html
Thinking aloud - do we need a means to poll the state of the
balloon-stat query?
We could have a query-balloon-memory-stats command that returns the last
available stats (or none, if ballon-get-memory-stats wasn't issued), and
I think that it would be better to move the stats info from the event to
the query command too, this way the event would just signal that the stats
info are available.
I find that approach a bit more complicated though.
On the one hand, if libvirtd issues the start
command, then gets stopped, then the event occurs, then libvirtd is
restarted, then libvirt won't know that the event was missed. On the
other hand, since this involves guest interaction, libvirt already has
to assume that the guest may be malicious and refuse to report stats
and/or report invalid stats, so libvirt would already have to be
prepared to give up if no event has arrived in a fixed amount of time,
and that also means that restarting libvirtd can just ignore any balloon
query that was in flight before the restart.
Yes, there's no guarantee the event will be ever sent. If it doesn't
arrive after a fixed amount of time, the best thing to do is to issue
the start command again.
So I guess I'm okay with just a start and an event, with no poll
of the
last-known guest response. But it does mean that qemu has to gracefully
handle if libvirt makes two start requests in a row without any
intervening events, and conversely that libvirt has to be prepared for
an event that happens even when libvirt doesn't remember triggering the
start command.
There could be intervening events. Everything can happen between the
start command and the event (I/O Error, VM stop, etc). Libvirt has to be
prepared for that.
> Another interesting point is that, there's another way of doing this and
> it's using qemu-ga instead. That's, qemu-ga could read that information
> from proc and return it. This is easier & simpler, as it doesn't involve
> guest communication. We also could return a lot more information if needed.
> The only disadvantage I can see is the dependency on qemu-ga...
Most likely, we would want to teach libvirt to use both methods, and
give the choice to the user on which approach to use when the guest
supports both.