Re: [libvirt] [PATCH] processSerialChangedEvent: Close agent monitor early

Wednesday, 6 May 2015

On 05/06/2015 02:46 AM, Michal Privoznik wrote:
...
 https://bugzilla.redhat.com/show_bug.cgi?id=890648

 So, imagine you've issued an API that involves guest agent. For
 instance, you want to query guest's IP addresses. So the API acquires
 QUERY_JOB, locks the guest agent and issues the agemt command. 
s/agemt/agent/

...
 However, for some reason, guest agent replies to initial ping
 correctly, but then crashes tragically while executing real command
 (in this case guest-network-get-interfaces). Since initial ping went
 well, libvirt thinks guest agent is accessible and awaits reply to the
 real command. But it will never come. What will is a monitor event. 
The monitor event is telling you that your side of the socket is still
valid, but the guest's side of the agenet is no longer connected.

...
 Our handler (processSerialChangedEvent) will try to acquire
 MODIFY_JOB, which will fail obviously because the other thread that's
 executing the API already holds a job. So the event handler exits
 early, and the QUERY_JOB is never released nor ended.

 I don't understand why, but qemu should in fact instead of sending a
 event on the monitor close the socket. 
Reads awkwardly; I think you were trying to say:

qemu should in fact be closing the socket instead of sending an event

But that doesn't make sense technically.  The fact that the agent is no
longer open in the guest is no reason to close the monitor socket, and
closing the agent socket is inappropriate in case the guest restarts the
guest agent and reopens the port.  Thus, listening to events on the
monitor socket about whether the agent socket is connected or dangling
is sufficient for us to know if the agent will be responsive or not.

...
 But leave that for a different
 discussion. Since qemu is not doing that, we must work around it and
 close and unregister the agent's FD from the even loop ourselves. This
 will wake up all callers stuck in waiting for agent's reply. 
Why do we have to kill the agent's fd?  If the guest restarts the agent,
how will we reconnect to it if we have killed our agent fd?

...
 Hopefully, it will make them finish the job too and even handler can
 acquire its job later. 
I agree that we want to detect agent death events as a reason to mark
the agent unresponsive, in such a way that any other job pending on the
agent then gracefully aborts, but I'm not sure that this is the right
solution.

...

 Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com&gt;
 ---
  src/qemu/qemu_driver.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index d173aa1..becb6ae 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c
 @@ -4480,6 +4480,14 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
      VIR_DEBUG("Changing serial port state %s in domain %p %s",
                devAlias, vm, vm->def->name);

 +    if (newstate == VIR_DOMAIN_CHR_DEVICE_STATE_DISCONNECTED &&
 +        virDomainObjIsActive(vm) && priv->agent) {
 +        /* Close agent monitor early, so that other threads
 +         * waiting for the agent to reply can finish and our
 +         * job we acquire below can succeed. */
 +        qemuAgentClose(priv->agent);
 +    }
 +
      if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
          goto cleanup;

 @@ -4508,7 +4516,6 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
              }
          } else {
              if (priv->agent) {
 -                qemuAgentClose(priv->agent);
                  priv->agent = NULL;
                  priv->agentError = false;
              }

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH] processSerialChangedEvent: Close agent monitor early