[libvirt] qemu: Add timeout for monitor to avoid virsh getting stuck when monitor gets die.

Hello Guys, When the qemu process becomes hung, virsh will get stuck on the hung guest and can't move forward. It can be reproduced by the following steps: 1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: virsh list I think we can add a timeout for qemu monitor to resolve this problem: using virCondWaitUntil instead of virCondWait in qemuMonitorSend. What's your opinions? Thanks!

On Wed, Apr 06, 2011 at 12:13:10 +0800, Mark Wu wrote:
Hello Guys,
When the qemu process becomes hung, virsh will get stuck on the hung guest and can't move forward. It can be reproduced by the following steps:
1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: virsh list
I think we can add a timeout for qemu monitor to resolve this problem: using virCondWaitUntil instead of virCondWait in qemuMonitorSend. What's your opinions?
This is not the right approach. Introducing a timeout into all monitor command send to qemu is a bad thing. I think the right approach is to have a simple API which would just return domain's state without talking to its monitor or doing other complicated stuff. The new API could then be used by virsh list to list all domains even though they are blocked on qemu monitor. After all, virsh is not really interested in current memory consumption of domains when listing them. The new API could also provide a reason which lead to current state so that one can see the reason even without watching for libvirt events. Jirka

于 2011年04月06日 16:51, Jiri Denemark 写道:
On Wed, Apr 06, 2011 at 12:13:10 +0800, Mark Wu wrote:
Hello Guys,
When the qemu process becomes hung, virsh will get stuck on the hung guest and can't move forward. It can be reproduced by the following steps:
1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: virsh list
I think we can add a timeout for qemu monitor to resolve this problem: using virCondWaitUntil instead of virCondWait in qemuMonitorSend. What's your opinions?
This is not the right approach. Introducing a timeout into all monitor command send to qemu is a bad thing. I think the right approach is to have a simple API which would just return domain's state without talking to its monitor or doing other complicated stuff. The new API could then be used by virsh list to list all domains even though they are blocked on qemu monitor. After all, virsh is not really interested in current memory consumption of domains when listing them.
IMHO this won't work, if you restart libvirtd after the monitor gets dead, you even can't get connection to libvirtd anymore, see: http://www.redhat.com/archives/libvir-list/2011-March/msg01422.html How to reproduce: 1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: # kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: 4. # service libvirtd restart 5. # virsh (hangs here)
The new API could also provide a reason which lead to current state so that one can see the reason even without watching for libvirt events.
Jirka
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Apr 07, 2011 at 18:04:19 +0800, Osier Yang wrote:
于 2011年04月06日 16:51, Jiri Denemark 写道:
This is not the right approach. Introducing a timeout into all monitor command send to qemu is a bad thing. I think the right approach is to have a simple API which would just return domain's state without talking to its monitor or doing other complicated stuff. The new API could then be used by virsh list to list all domains even though they are blocked on qemu monitor. After all, virsh is not really interested in current memory consumption of domains when listing them.
IMHO this won't work, if you restart libvirtd after the monitor gets dead, you even can't get connection to libvirtd anymore, see:
http://www.redhat.com/archives/libvir-list/2011-March/msg01422.html
How to reproduce: 1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: # kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: 4. # service libvirtd restart 5. # virsh (hangs here)
Yes, but that's a separate issue, IMHO. virsh hangs at this point since libvirtd doesn't even get to the point when it is able to accept connections from client. So introducing a timeout to some specific operations to help solve this issue might be the right approach but doing all commands with timeout is certainly not what we should do. Daniel described all the issues very nicely in his detailed email. Jirka

于 2011年04月07日 18:17, Jiri Denemark 写道:
On Thu, Apr 07, 2011 at 18:04:19 +0800, Osier Yang wrote:
于 2011年04月06日 16:51, Jiri Denemark 写道:
This is not the right approach. Introducing a timeout into all monitor command send to qemu is a bad thing. I think the right approach is to have a simple API which would just return domain's state without talking to its monitor or doing other complicated stuff. The new API could then be used by virsh list to list all domains even though they are blocked on qemu monitor. After all, virsh is not really interested in current memory consumption of domains when listing them.
IMHO this won't work, if you restart libvirtd after the monitor gets dead, you even can't get connection to libvirtd anymore, see:
http://www.redhat.com/archives/libvir-list/2011-March/msg01422.html
How to reproduce: 1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: # kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: 4. # service libvirtd restart 5. # virsh (hangs here)
Yes, but that's a separate issue, IMHO. virsh hangs at this point since libvirtd doesn't even get to the point when it is able to accept connections from client. So introducing a timeout to some specific operations to help solve this issue might be the right approach but doing all commands with timeout is certainly not what we should do. Daniel described all the issues very nicely in his detailed email.
Oh, I didn't mean your proposal won't work for "virsh list", that will work well I guess. :) I meant if we consider "virsh list can't work if the monitor dies" as a problem, then we need to solve the even more bad problem --- libvirtd can't accept connection anymore if it's restarted. Regards Osier

On Wed, Apr 06, 2011 at 12:13:10PM +0800, Mark Wu wrote:
Hello Guys,
When the qemu process becomes hung, virsh will get stuck on the hung guest and can't move forward. It can be reproduced by the following steps:
1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: virsh list
I think we can add a timeout for qemu monitor to resolve this problem: using virCondWaitUntil instead of virCondWait in qemuMonitorSend. What's your opinions?
Thanks!
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index fca8590..65d8de9 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -25,6 +25,7 @@
#include <poll.h> #include <sys/un.h> +#include <sys/time.h> #include <unistd.h> #include <fcntl.h>
@@ -691,11 +692,14 @@ int qemuMonitorClose(qemuMonitorPtr mon) return refs; }
+#define QEMU_JOB_WAIT_TIME (1000ull * 30)
int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg) { int ret = -1; + struct timeval now; + unsigned long long then;
if (mon->eofcb) { msg->lastErrno = EIO; @@ -706,7 +710,14 @@ int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorUpdateWatch(mon);
while (!mon->msg->finished) { - if (virCondWait(&mon->notify, &mon->lock) < 0) + if (gettimeofday(&now, NULL) < 0) { + virReportSystemError(errno, "%s", + _("cannot get time of day")); + return -1; + } + then = (now.tv_sec * 1000ull) + (now.tv_usec / 1000); + then += QEMU_JOB_WAIT_TIME; + if (virCondWaitUntil(&mon->notify, &mon->lock, then) < 0) goto cleanup; }
This may seem simple, but it has a lot of nasty consequences. Adding the timeout causes the thread to stop waiting for the monitor command reply, and returns an error for that API call. If QEMU should recover though, and more API calls are made which issue monitor commands, all those future commands will receive the wrong reply data. If we are going to allow a timeout when waiting for a reply to a monitor command, then we need to mark the monitor as 'broken' and forbid all future use of it until the VM is restarted. If it was a simple 'info' command, then we could potentially add code to read+discard the delayed reply later. If it was an action command though, we can't simply discard the delayed reply, because that will result in libvirt's view of the world becoming out of sync with QEMU. In certain cases we may be able to cope with this, by listening for event notifications. eg, if 'stop' command times out, and libvirt will thing the VM is still running, but if QEMU later completes it, then it will in fact be paused. We will see a 'PAUSED' event from QEMU that lets us re-sync our state. If something like a 'device_add' commands time out though, we have no way to gracefully recover. NB, I'm not saying your patch is wrong. In fact I think it is potentially a good idea, but we need to make sure we are able to safely deal with the consequences of timing out first. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Apr 06, 2011 at 11:02:58AM +0100, Daniel P. Berrange wrote:
On Wed, Apr 06, 2011 at 12:13:10PM +0800, Mark Wu wrote:
Hello Guys,
When the qemu process becomes hung, virsh will get stuck on the hung guest and can't move forward. It can be reproduced by the following steps:
1. setup a virt guest with qemu-kvm, and start it 2. stop qemu process with following: kill -STOP `ps aux | grep qemu | grep -v grep | awk '{print $2}'` 3. run the following command: virsh list
I think we can add a timeout for qemu monitor to resolve this problem: using virCondWaitUntil instead of virCondWait in qemuMonitorSend. What's your opinions?
Thanks!
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index fca8590..65d8de9 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -25,6 +25,7 @@
#include <poll.h> #include <sys/un.h> +#include <sys/time.h> #include <unistd.h> #include <fcntl.h>
@@ -691,11 +692,14 @@ int qemuMonitorClose(qemuMonitorPtr mon) return refs; }
+#define QEMU_JOB_WAIT_TIME (1000ull * 30)
int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg) { int ret = -1; + struct timeval now; + unsigned long long then;
if (mon->eofcb) { msg->lastErrno = EIO; @@ -706,7 +710,14 @@ int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorUpdateWatch(mon);
while (!mon->msg->finished) { - if (virCondWait(&mon->notify, &mon->lock) < 0) + if (gettimeofday(&now, NULL) < 0) { + virReportSystemError(errno, "%s", + _("cannot get time of day")); + return -1; + } + then = (now.tv_sec * 1000ull) + (now.tv_usec / 1000); + then += QEMU_JOB_WAIT_TIME; + if (virCondWaitUntil(&mon->notify, &mon->lock, then) < 0) goto cleanup; }
This may seem simple, but it has a lot of nasty consequences.
Adding the timeout causes the thread to stop waiting for the monitor command reply, and returns an error for that API call.
If QEMU should recover though, and more API calls are made which issue monitor commands, all those future commands will receive the wrong reply data.
If we are going to allow a timeout when waiting for a reply to a monitor command, then we need to mark the monitor as 'broken' and forbid all future use of it until the VM is restarted.
If it was a simple 'info' command, then we could potentially add code to read+discard the delayed reply later.
If it was an action command though, we can't simply discard the delayed reply, because that will result in libvirt's view of the world becoming out of sync with QEMU. In certain cases we may be able to cope with this, by listening for event notifications.
eg, if 'stop' command times out, and libvirt will thing the VM is still running, but if QEMU later completes it, then it will in fact be paused. We will see a 'PAUSED' event from QEMU that lets us re-sync our state.
If something like a 'device_add' commands time out though, we have no way to gracefully recover.
NB, I'm not saying your patch is wrong. In fact I think it is potentially a good idea, but we need to make sure we are able to safely deal with the consequences of timing out first.
What I wonder is if we shouldn't do more tight monitoring of the subprocesses instead of just relying on the pipe to the monitor to just detect exit. At least on linux we could use /proc/$pid/status to get more information about the process state, this would allow to detect maybe ahead of time things like stopped processes or abnormal activity. The problem as you said is that once we sent a command to the monitor, not waiting for the answer put the whole communication at risk, Ideally we should be able to signal the qemu child that we want to reset the interface like the reset button on a VT220 terminal sending a break through the serial line, it would be really useful to have something to reset asynchronously data on a QEmu monitor. I don't know what would be the best way to implement this on an Unix/Linux pipe though. But I agree using a timeout and aborting in the current framework doesn't work. I also agree with Jirka that a simpler API for just listing state not involving a round-trip to the app would be a significant improvement. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (5)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jiri Denemark
-
Mark Wu
-
Osier Yang