[libvirt] [PATCH] qemu: Try multiple times to open unix monitor socket

Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host. Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me. --- src/qemu_driver.c | 22 +++++++++++++++++++++- 1 files changed, 21 insertions(+), 1 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index e2b7acb..200718b 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -874,6 +874,8 @@ qemudOpenMonitorUnix(virConnectPtr conn, { struct sockaddr_un addr; int monfd; + int timeout = 3; /* In seconds */ + int ret, i; if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { virReportSystemError(conn, errno, @@ -885,10 +887,28 @@ qemudOpenMonitorUnix(virConnectPtr conn, addr.sun_family = AF_UNIX; strncpy(addr.sun_path, monitor, sizeof(addr.sun_path)); - if (connect(monfd, (struct sockaddr *) &addr, sizeof(addr)) < 0) { + do { + ret = connect(monfd, (struct sockaddr *) &addr, sizeof(addr)); + + if (ret == 0) + break; + + if (errno == EACCES || errno == ECONNREFUSED) { + /* EACCES : Socket may not have shown up yet + * ECONNREFUSED : Leftover socket hasn't been removed yet */ + continue; + } + virReportSystemError(conn, errno, "%s", _("failed to connect to monitor socket")); goto error; + + } while ((++i <= timeout*5) && (usleep(.2 * 1000000) <= 0)); + + if (ret != 0) { + virReportSystemError(conn, errno, "%s", + _("monitor socket did not show up.")); + goto error; } if (qemudOpenMonitorCommon(conn, driver, vm, monfd, reconnect) < 0) -- 1.6.0.6

On Tue, Jul 14, 2009 at 06:22:42PM -0400, Cole Robinson wrote:
Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host.
Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me.
It's always a bit annoying to end up with heuristics like this but if we don't have any other way, okay, ACK thanks, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Wed, Jul 15, 2009 at 11:40:42AM +0200, Daniel Veillard wrote:
On Tue, Jul 14, 2009 at 06:22:42PM -0400, Cole Robinson wrote:
Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host.
Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me.
It's always a bit annoying to end up with heuristics like this but if we don't have any other way, okay, ACK
I don't like it much either, but this is no worse than what we had todo to find the /dev/pts/XXX path where we waited ina loop for 3 seconds. ACK to this patch Long term we'll need to discuss with QEMU developers to find a better way todo this without needing a timeout. One idea is actually instead of passing a UNIX domain socket path to QEMU, actually create & bind the socket in libvirt and then pass the pre-opened FD to QEMU. This would guarentee that we can instantly connect to the monitor. Of course then the job of waiting passes to the code that sends monitor commands. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Wed, Jul 15, 2009 at 11:40:42AM +0200, Daniel Veillard wrote:
On Tue, Jul 14, 2009 at 06:22:42PM -0400, Cole Robinson wrote:
Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host.
Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me.
It's always a bit annoying to end up with heuristics like this but if we don't have any other way, okay, ACK
I don't like it much either, but this is no worse than what we had todo to find the /dev/pts/XXX path where we waited ina loop for 3 seconds. ACK to this patch
Long term we'll need to discuss with QEMU developers to find a better way todo this without needing a timeout. One idea is actually instead of passing a UNIX domain socket path to QEMU, actually create & bind the socket in libvirt and then pass the pre-opened FD to QEMU. This would guarentee that we can instantly connect to the monitor. Of course then the job of waiting passes to the code that sends monitor commands.
What about qemu's -daemonize option: -daemonize Daemonize the QEMU process after initialization. QEMU will not detach from standard IO until it is ready to receive connections on any of its devices. This option is a useful way for external programs to launch QEMU without having to cope with initialization race conditions. It looks like it was introduced in 0.9.0. -jim

On Wed, Jul 15, 2009 at 09:49:03AM -0400, Jim Paris wrote:
Daniel P. Berrange wrote:
On Wed, Jul 15, 2009 at 11:40:42AM +0200, Daniel Veillard wrote:
On Tue, Jul 14, 2009 at 06:22:42PM -0400, Cole Robinson wrote:
Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host.
Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me.
It's always a bit annoying to end up with heuristics like this but if we don't have any other way, okay, ACK
I don't like it much either, but this is no worse than what we had todo to find the /dev/pts/XXX path where we waited ina loop for 3 seconds. ACK to this patch
Long term we'll need to discuss with QEMU developers to find a better way todo this without needing a timeout. One idea is actually instead of passing a UNIX domain socket path to QEMU, actually create & bind the socket in libvirt and then pass the pre-opened FD to QEMU. This would guarentee that we can instantly connect to the monitor. Of course then the job of waiting passes to the code that sends monitor commands.
What about qemu's -daemonize option:
-daemonize Daemonize the QEMU process after initialization. QEMU will not detach from standard IO until it is ready to receive connections on any of its devices. This option is a useful way for external programs to launch QEMU without having to cope with initialization race conditions.
It looks like it was introduced in 0.9.0.
Hmm, that's a possibility. I'l have to take a look what it does and whether it wil have any interactions with stuff that libvirt does in between daemonizing & exec'ing QEMU Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Wed, Jul 15, 2009 at 11:40:42AM +0200, Daniel Veillard wrote:
On Tue, Jul 14, 2009 at 06:22:42PM -0400, Cole Robinson wrote:
Unlike the pty monitor (which we know exists since we scrape its path from stdout), we have no way of knowing that the unix monitor socket should exist/ be initialized. As a result, some of my KVM guests randomly fail to start on F10 host.
Try to open the unix socket in a 3 second timeout loop. Ignore EACCES (path does not exist if a first time run) and ECONNREFUSED (leftover socket from a previous run hasn't been removed yet). Fixes things for me. It's always a bit annoying to end up with heuristics like this but if we don't have any other way, okay, ACK
I don't like it much either, but this is no worse than what we had todo to find the /dev/pts/XXX path where we waited ina loop for 3 seconds. ACK to this patch
Thanks, pushed now. - Cole
participants (4)
-
Cole Robinson
-
Daniel P. Berrange
-
Daniel Veillard
-
Jim Paris