[PATCH 0/2] qemu_process: Start probing process more robustly

I've found these in an old branch where dnsmasq starting is reworked too. But that part is not ready yet, so let's merge at least the first two patches. Michal Prívozník (2): qemu_process: Be nicer to killing QEMU when probing caps qemu_process: Start QEMU for caps probing more robustly src/qemu/qemu_process.c | 64 +++++++++++++++++++++++++++-------------- src/qemu/qemu_process.h | 1 - 2 files changed, 43 insertions(+), 22 deletions(-) -- 2.34.1

The qemuProcessQMPStop() function is intended to kill this dummy QEMU process we started only for querying capabilities. Nevertheless, it may be not plain QEMU binary we executed, but in fact it may be a memcheck tool (e.g. valgrind) that executes QEMU later. By switching to virProcessKillPainfully() we allow this wrapper tool to exit gracefully. Another up side is that virProcessKillPainfully() reports an error so no need for us to VIR_ERROR() ourselves. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b19a6218d0..2e149699b0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -9132,11 +9132,7 @@ qemuProcessQMPStop(qemuProcessQMP *proc) if (proc->pid != 0) { VIR_DEBUG("Killing QMP caps process %lld", (long long)proc->pid); - if (virProcessKill(proc->pid, SIGKILL) < 0 && errno != ESRCH) - VIR_ERROR(_("Failed to kill process %lld: %s"), - (long long)proc->pid, - g_strerror(errno)); - + virProcessKillPainfully(proc->pid, true); proc->pid = 0; } -- 2.34.1

On Wed, Mar 16, 2022 at 04:39:35PM +0100, Michal Privoznik wrote:
The qemuProcessQMPStop() function is intended to kill this dummy QEMU process we started only for querying capabilities. Nevertheless, it may be not plain QEMU binary we executed, but in fact it may be a memcheck tool (e.g. valgrind) that executes QEMU later. By switching to virProcessKillPainfully() we allow this wrapper tool to exit gracefully.
Another up side is that virProcessKillPainfully() reports an error so no need for us to VIR_ERROR() ourselves.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b19a6218d0..2e149699b0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -9132,11 +9132,7 @@ qemuProcessQMPStop(qemuProcessQMP *proc)
if (proc->pid != 0) { VIR_DEBUG("Killing QMP caps process %lld", (long long)proc->pid); - if (virProcessKill(proc->pid, SIGKILL) < 0 && errno != ESRCH) - VIR_ERROR(_("Failed to kill process %lld: %s"), - (long long)proc->pid, - g_strerror(errno)); - + virProcessKillPainfully(proc->pid, true);
Unfortunately this uses virReportError(), so shouldn't we clear the error since this function is void? If yes and you add the reset here, then Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
proc->pid = 0; }
-- 2.34.1

When probing QEMU capabilities, we look at whatever <emulator/> was specified in the domain XML and execute it with couple of arguments (-daemonize being one of them) Then, we use virCommandSetErrorBuffer() to read stderr of the child process hoping to read possible error message just before the process daemonized itself. Well, this works as long as the emulator binary behaves. If the binary is evil and basically does the following: #!/bin/bash sleep 1h then virCommandRun() called from qemuProcessQMPLaunch() doesn't return for whole hour (because it's stuck in reading stderr of the child process). This behavior of ours is very suboptimal. The solution is to not rely on the binary behaving correctly on -daemonize argument but to daemonize the process ourselves (via virCommandDaemonize()) and then wait for the monitor to show up with a timeout. This in turn means, that we can no longer use virCommandSetErrorBuffer() but we can do the equivalent with virCommandSetErrorFD() and a bit of code. Sure, this doesn't shield us from malicious binaries 100% but helps preventing depletion of worker threads. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 58 +++++++++++++++++++++++++++++------------ src/qemu/qemu_process.h | 1 - 2 files changed, 42 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2e149699b0..d038f7e2ae 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -9166,7 +9166,6 @@ qemuProcessQMPFree(qemuProcessQMP *proc) g_free(proc->monpath); g_free(proc->monarg); g_free(proc->pidfile); - g_free(proc->stdErr); g_free(proc); } @@ -9285,7 +9284,9 @@ static int qemuProcessQMPLaunch(qemuProcessQMP *proc) { const char *machine; - int status = 0; + VIR_AUTOCLOSE errfd = -1; + virTimeBackOffVar timebackoff; + const unsigned long long timeout = 30 * 1000; /* ms */ int rc; if (proc->forceTCG) @@ -9310,9 +9311,7 @@ qemuProcessQMPLaunch(qemuProcessQMP *proc) "-nographic", "-machine", machine, "-qmp", proc->monarg, - "-pidfile", proc->pidfile, - "-daemonize", - NULL); + NULL); virCommandAddEnvPassCommon(proc->cmd); virCommandClearCaps(proc->cmd); @@ -9326,26 +9325,53 @@ qemuProcessQMPLaunch(qemuProcessQMP *proc) virCommandSetGID(proc->cmd, proc->runGid); virCommandSetUID(proc->cmd, proc->runUid); - virCommandSetErrorBuffer(proc->cmd, &(proc->stdErr)); + virCommandSetPidFile(proc->cmd, proc->pidfile); + virCommandSetErrorFD(proc->cmd, &errfd); + virCommandDaemonize(proc->cmd); - if (virCommandRun(proc->cmd, &status) < 0) + if (virCommandRun(proc->cmd, NULL) < 0) return -1; - if (status != 0) { - VIR_DEBUG("QEMU %s exited with status %d", proc->binary, status); - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to start QEMU binary %s for probing: %s"), - proc->binary, - proc->stdErr ? proc->stdErr : _("unknown error")); - return -1; - } - if ((rc = virPidFileReadPath(proc->pidfile, &proc->pid)) < 0) { virReportSystemError(-rc, _("Failed to read pidfile %s"), proc->pidfile); return -1; } + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto error; + while (virTimeBackOffWait(&timebackoff)) { + char errbuf[1024] = { 0 }; + + if (virFileExists(proc->monpath)) + break; + + if (virProcessKill(proc->pid, 0) == 0) + continue; + + ignore_value(saferead(errfd, errbuf, sizeof(errbuf) - 1)); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to start QEMU binary %s for probing: %s"), + proc->binary, + errbuf[0] ? errbuf : _("unknown error")); + goto error; + } + + if (!virFileExists(proc->monpath)) { + virReportError(VIR_ERR_OPERATION_TIMEOUT, "%s", + _("QEMU monitor did not show up")); + goto error; + } + return 0; + + error: + virCommandAbort(proc->cmd); + if (proc->pid >= 0) + virProcessKillPainfully(proc->pid, true); + if (proc->pidfile) + unlink(proc->pidfile); + + return -1; } diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 289cd74eb7..f73722846b 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -221,7 +221,6 @@ struct _qemuProcessQMP { char *libDir; uid_t runUid; gid_t runGid; - char *stdErr; char *monarg; char *monpath; char *pidfile; -- 2.34.1

On Wed, Mar 16, 2022 at 04:39:36PM +0100, Michal Privoznik wrote:
When probing QEMU capabilities, we look at whatever <emulator/> was specified in the domain XML and execute it with couple of arguments (-daemonize being one of them) Then, we use virCommandSetErrorBuffer() to read stderr of the child process hoping to read possible error message just before the process daemonized itself. Well, this works as long as the emulator binary behaves.
If the binary is evil and basically does the following:
#!/bin/bash sleep 1h
then virCommandRun() called from qemuProcessQMPLaunch() doesn't return for whole hour (because it's stuck in reading stderr of the child process). This behavior of ours is very suboptimal.
The solution is to not rely on the binary behaving correctly on -daemonize argument but to daemonize the process ourselves (via virCommandDaemonize()) and then wait for the monitor to show up with a timeout. This in turn means, that we can no longer use virCommandSetErrorBuffer() but we can do the equivalent with virCommandSetErrorFD() and a bit of code.
Sure, this doesn't shield us from malicious binaries 100% but helps preventing depletion of worker threads.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 58 +++++++++++++++++++++++++++++------------ src/qemu/qemu_process.h | 1 - 2 files changed, 42 insertions(+), 17 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2e149699b0..d038f7e2ae 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -9166,7 +9166,6 @@ qemuProcessQMPFree(qemuProcessQMP *proc) g_free(proc->monpath); g_free(proc->monarg); g_free(proc->pidfile); - g_free(proc->stdErr); g_free(proc); }
@@ -9285,7 +9284,9 @@ static int qemuProcessQMPLaunch(qemuProcessQMP *proc) { const char *machine; - int status = 0; + VIR_AUTOCLOSE errfd = -1; + virTimeBackOffVar timebackoff; + const unsigned long long timeout = 30 * 1000; /* ms */
The comment seems misleading, just say it is 30 seconds. On that note, 30 seconds feels too much for me, but that's always subjective. This patch also invalidates comment in qemuProcessQMPInit() about -daemonize. With those two things fixed Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

On Wed, Mar 16, 2022 at 04:39:36PM +0100, Michal Privoznik wrote:
When probing QEMU capabilities, we look at whatever <emulator/> was specified in the domain XML and execute it with couple of arguments (-daemonize being one of them) Then, we use virCommandSetErrorBuffer() to read stderr of the child process hoping to read possible error message just before the process daemonized itself. Well, this works as long as the emulator binary behaves.
If the binary is evil and basically does the following:
#!/bin/bash sleep 1h
then virCommandRun() called from qemuProcessQMPLaunch() doesn't return for whole hour (because it's stuck in reading stderr of the child process). This behavior of ours is very suboptimal.
The solution is to not rely on the binary behaving correctly on -daemonize argument but to daemonize the process ourselves (via virCommandDaemonize()) and then wait for the monitor to show up with a timeout. This in turn means, that we can no longer use virCommandSetErrorBuffer() but we can do the equivalent with virCommandSetErrorFD() and a bit of code.
Sure, this doesn't shield us from malicious binaries 100% but helps preventing depletion of worker threads.
I don't think malicious binaries is a threat we need to even contemplate. Any scenario that triggers this involves privileged access to libvirt. I absolutely don't want us to inject a timeout into the startup process, as they are inherantly fragile. We worked hard to eliminate them in normal QEMU startup, and don't think we want them in capabilities probing either. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrangé
-
Martin Kletzander
-
Michal Privoznik