[libvirt] [PATCH 2/5] daemonize qemu processes

Make sure vms don't get killed when the libvirtd quits unexpectedly. Needs the previous patch since it looks at the pid file. --- src/qemu_driver.c | 36 ++++++++++++++++++++++++------------ 1 files changed, 24 insertions(+), 12 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index a2e573e..7804094 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -858,6 +858,7 @@ static int qemudStartVMDaemon(virConnectPtr conn, const char *emulator; uid_t uid = geteuid(); mode_t logmode; + pid_t child; FD_ZERO(&keepfd); @@ -988,12 +989,26 @@ static int qemudStartVMDaemon(virConnectPtr conn, for (i = 0 ; i < ntapfds ; i++) FD_SET(tapfds[i], &keepfd); - ret = virExec(conn, argv, progenv, &keepfd, &vm->pid, + ret = virExec(conn, argv, progenv, &keepfd, &child, vm->stdin_fd, &vm->stdout_fd, &vm->stderr_fd, - VIR_EXEC_NONBLOCK); - if (ret == 0) + VIR_EXEC_NONBLOCK | VIR_EXEC_DAEMON); + + /* wait for qemu process to to show up */ + if (ret == 0) { + int retries = 100; + while (retries) { + if ((ret = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) == 0) + break; + usleep(10*1000); + retries--; + } + if (ret) + qemudLog(QEMUD_WARN, _("Domain %s didn't show up\n"), vm->def->name); + } + + if (ret == 0) { vm->state = migrateFrom ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING; - else + } else vm->def->id = -1; for (i = 0 ; argv[i] ; i++) @@ -1069,7 +1084,9 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED, qemudLog(QEMUD_INFO, _("Shutting down VM '%s'\n"), vm->def->name); - kill(vm->pid, SIGTERM); + if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno)); qemudVMData(driver, vm, vm->stdout_fd); qemudVMData(driver, vm, vm->stderr_fd); @@ -1089,13 +1106,8 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED, vm->stderr_fd = -1; vm->monitor = -1; - if (waitpid(vm->pid, NULL, WNOHANG) != vm->pid) { - kill(vm->pid, SIGKILL); - if (waitpid(vm->pid, NULL, 0) != vm->pid) { - qemudLog(QEMUD_WARN, - "%s", _("Got unexpected pid, damn\n")); - } - } + /* shut it off for sure */ + kill(vm->pid, SIGKILL); vm->pid = -1; vm->def->id = -1; -- 1.6.0.2

On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido Günther wrote:
Make sure vms don't get killed when the libvirtd quits unexpectedly. Needs the previous patch since it looks at the pid file. [...] + /* wait for qemu process to to show up */ + if (ret == 0) { + int retries = 100; + while (retries) { + if ((ret = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) == 0) + break; + usleep(10*1000); + retries--; + } + if (ret) + qemudLog(QEMUD_WARN, _("Domain %s didn't show up\n"), vm->def->name); + }
so we are waiting at most one second and waking up the process 100 times, I would suggest to relax that a bit, give it a bit more time like 10 seconds in case the system is trashing a bit 1 second may really be too short. And maybe wake up a bit less ... but patch looks good to me, it's just unfortunate we need to do some tuning there. thanks ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Dec 15, 2008 at 10:19:28AM +0100, Daniel Veillard wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido Günther wrote:
Make sure vms don't get killed when the libvirtd quits unexpectedly. Needs the previous patch since it looks at the pid file. [...] + /* wait for qemu process to to show up */ + if (ret == 0) { + int retries = 100; + while (retries) { + if ((ret = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) == 0) + break; + usleep(10*1000); + retries--; + } + if (ret) + qemudLog(QEMUD_WARN, _("Domain %s didn't show up\n"), vm->def->name); + }
so we are waiting at most one second and waking up the process 100 times, I would suggest to relax that a bit, give it a bit more time like 10 seconds in case the system is trashing a bit 1 second may really be too short. And maybe wake up a bit less ...
but patch looks good to me, it's just unfortunate we need to do some tuning there. I've put an updated version of this and the follow up patch to reconnect to running vms at: http://honk.sigxcpu.org/projects/libvirt/daemon-restart/0001-daemonize-qemu-... http://honk.sigxcpu.org/projects/libvirt/daemon-restart/0002-read-saved-vm-s... Just in case somebody else wants to try them. Cheers, -- Guido

On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
Make sure vms don't get killed when the libvirtd quits unexpectedly. Needs the previous patch since it looks at the pid file.
--- src/qemu_driver.c | 36 ++++++++++++++++++++++++------------ 1 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/src/qemu_driver.c b/src/qemu_driver.c index a2e573e..7804094 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -858,6 +858,7 @@ static int qemudStartVMDaemon(virConnectPtr conn, const char *emulator; uid_t uid = geteuid(); mode_t logmode; + pid_t child;
FD_ZERO(&keepfd);
@@ -988,12 +989,26 @@ static int qemudStartVMDaemon(virConnectPtr conn, for (i = 0 ; i < ntapfds ; i++) FD_SET(tapfds[i], &keepfd);
- ret = virExec(conn, argv, progenv, &keepfd, &vm->pid, + ret = virExec(conn, argv, progenv, &keepfd, &child, vm->stdin_fd, &vm->stdout_fd, &vm->stderr_fd, - VIR_EXEC_NONBLOCK); - if (ret == 0) + VIR_EXEC_NONBLOCK | VIR_EXEC_DAEMON); + + /* wait for qemu process to to show up */ + if (ret == 0) { + int retries = 100; + while (retries) { + if ((ret = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) == 0) + break;
If we want to support versions of oQEMU without -pidfile, we'll have to conditionally run them without the _DAEMON flag.
+ usleep(10*1000); + retries--; + } + if (ret) + qemudLog(QEMUD_WARN, _("Domain %s didn't show up\n"), vm->def->name); + } + + if (ret == 0) { vm->state = migrateFrom ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING; - else + } else vm->def->id = -1;
for (i = 0 ; argv[i] ; i++) @@ -1069,7 +1084,9 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED,
qemudLog(QEMUD_INFO, _("Shutting down VM '%s'\n"), vm->def->name);
- kill(vm->pid, SIGTERM); + if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for if (vm->pid > 1) kill.. One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results - "If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)" Yes, I killed every single process on my dev machine several times before discovering this isues :-)
qemudVMData(driver, vm, vm->stdout_fd); qemudVMData(driver, vm, vm->stderr_fd); @@ -1089,13 +1106,8 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED, vm->stderr_fd = -1; vm->monitor = -1;
- if (waitpid(vm->pid, NULL, WNOHANG) != vm->pid) { - kill(vm->pid, SIGKILL); - if (waitpid(vm->pid, NULL, 0) != vm->pid) { - qemudLog(QEMUD_WARN, - "%s", _("Got unexpected pid, damn\n")); - } - } + /* shut it off for sure */ + kill(vm->pid, SIGKILL);
Same here - we should add a guard for pid > 1 Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Dec 15, 2008 at 11:21:20AM +0000, Daniel P. Berrange wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
+ if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for
if (vm->pid > 1) kill..
One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results -
"If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)"
Yes, I killed every single process on my dev machine several times before discovering this isues :-)
This call for an utility function int virKillProcess(pid_t pid, int sig) call checking for this, could also help on debugging/logging Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Dec 15, 2008 at 02:22:42PM +0100, Daniel Veillard wrote:
On Mon, Dec 15, 2008 at 11:21:20AM +0000, Daniel P. Berrange wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
+ if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for
if (vm->pid > 1) kill..
One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results -
"If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)"
Yes, I killed every single process on my dev machine several times before discovering this isues :-)
This call for an utility function int virKillProcess(pid_t pid, int sig) call checking for this, could also help on debugging/logging I'll add that. I'd like to apply the other 3 out of 5 patches that monitor the state of the qemu instances nevertheless since they don't change any behaviour but only add the necessary status tracking to fork of the qemu/kvm processes later. This way we can easily check if the status file is in sync with the running qemu by comapring it to "virsh dumpxml". Cheers, -- Guido

On Mon, Dec 15, 2008 at 11:21:20AM +0000, Daniel P. Berrange wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
+ if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for
if (vm->pid > 1) kill..
One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results -
"If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)"
Yes, I killed every single process on my dev machine several times before discovering this isues :-)
This call for an utility function int virKillProcess(pid_t pid, int sig) call checking for this, could also help on debugging/logging O.k. to apply the attached version of virKillProcess? I'm already using
On Mon, Dec 15, 2008 at 02:22:42PM +0100, Daniel Veillard wrote: this in the libvirtd restart patches. -- Guido

On Mon, Dec 29, 2008 at 03:36:15PM +0100, Guido G?nther wrote:
On Mon, Dec 15, 2008 at 11:21:20AM +0000, Daniel P. Berrange wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
+ if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for
if (vm->pid > 1) kill..
One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results -
"If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)"
Yes, I killed every single process on my dev machine several times before discovering this isues :-)
This call for an utility function int virKillProcess(pid_t pid, int sig) call checking for this, could also help on debugging/logging O.k. to apply the attached version of virKillProcess? I'm already using
On Mon, Dec 15, 2008 at 02:22:42PM +0100, Daniel Veillard wrote: this in the libvirtd restart patches.
ACK Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Jan 05, 2009 at 09:58:06AM +0000, Daniel P. Berrange wrote:
On Mon, Dec 29, 2008 at 03:36:15PM +0100, Guido G?nther wrote:
On Mon, Dec 15, 2008 at 11:21:20AM +0000, Daniel P. Berrange wrote:
On Fri, Dec 12, 2008 at 07:26:32PM +0100, Guido G?nther wrote:
+ if (kill(vm->pid, SIGTERM) < 0) + qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"), + vm->def->name, vm->pid, strerror(errno));
We should add a guard around all the kill() statements for
if (vm->pid > 1) kill..
One of the bugs I found when doing the LXC/UML drivers is that when relying on an external PID file, some things can go wrong in unexpected ways that just don't happen when getting the PID straight back from fork. This often ended up with vm->pid being -1, 0 with horrific results -
"If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)"
Yes, I killed every single process on my dev machine several times before discovering this isues :-)
This call for an utility function int virKillProcess(pid_t pid, int sig) call checking for this, could also help on debugging/logging O.k. to apply the attached version of virKillProcess? I'm already using
On Mon, Dec 15, 2008 at 02:22:42PM +0100, Daniel Veillard wrote: this in the libvirtd restart patches.
ACK Applied. -- Guido
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Guido Günther