[RFC 0/1] Check for pid re-use before killing domain process

Libvirt stores pid of domain(e.g Qemu) process when a domain process is started and same pid is used while destroying domain process. There is always a race possible that before libvirt tries to kill domain process, actual domain process is already dead and same pid is used for another process. In that case libvirt may kill an innocent process. With this patch we store start time of domain process when domain is started and we match stime again while killing domain process if it does not match we can assume domain process is already dead. This patch series tries to create a generic interface which can be used for non-domain processes too, even though initial patch series handles pid re-use for domain process only. Proposed changes: 1. While creating a daemon process through virExec, create a file which stores start time of the process, along with the pid file. A Separate file for start time is created instead of storing start time too in *.pid file so that existing external user of <domain-id>*.pid file does not break. 2. Persist stime in domstatus in domain xml. 3. In virProcessKill before sending signal to process also verify start time of process. For sending signal to process pidfd_send_signal is used avoid any race between verifying start time and sending signal. Following steps are taken while killing a process. 3.1 Open a pid-fd for given pid with pidfd_open. 3.2 Verify start time of pid with stored start time. If start time does not match, assume process is already dead. 3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used just after verifying star time, signal will be sent to a pidfd instance which was created before verifying timestamp. manish.mishra (1): Check for pid re-use before killing qemu process src/conf/domain_conf.c | 14 ++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 7 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 49 ++++++-- src/util/vircommand.c | 37 ++++++ src/util/vircommand.h | 3 + src/util/virpidfile.c | 113 +++++++++++++----- src/util/virpidfile.h | 17 +++ src/util/virprocess.c | 73 ++++++++++- src/util/virprocess.h | 8 ++ .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- .../blockjob-blockdev-in.xml | 2 +- .../blockjob-mirror-in.xml | 2 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-in.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- 25 files changed, 295 insertions(+), 56 deletions(-) -- 2.22.3

Libvirt stores pid of domain(e.g Qemu) process when a domain process is started and same pid is used while destroying domain process. There is always a race possible that before libvirt tries to kill domain process, actual domain process is already dead and same pid is used for another process. In that case libvirt may kill an innocent process. With this patch we store start time of domain process when domain is started and we match stime again while killing domain process if it does not match we can assume domain process is already dead. This patch series tries to create a generic interface which can be used for non-domain processes too, even though initial patch series handles re-use of domain process only. Signed-off-by: manish.mishra <manish.mishra@nutanix.com> --- src/conf/domain_conf.c | 14 ++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 7 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 49 ++++++-- src/util/vircommand.c | 37 ++++++ src/util/vircommand.h | 3 + src/util/virpidfile.c | 113 +++++++++++++----- src/util/virpidfile.h | 17 +++ src/util/virprocess.c | 73 ++++++++++- src/util/virprocess.h | 8 ++ .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- .../blockjob-blockdev-in.xml | 2 +- .../blockjob-mirror-in.xml | 2 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-in.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- 25 files changed, 295 insertions(+), 56 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a609cc4f68..1639da04c7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19031,6 +19031,13 @@ virDomainObjParseXML(xmlXPathContextPtr ctxt, } obj->pid = (pid_t)val; + if (virXPathLong("string(./@stime)", ctxt, &val) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("invalid stime")); + return NULL; + } + obj->stime = val; + if ((n = virXPathNodeSet("./taint", ctxt, &taintNodes)) < 0) return NULL; for (i = 0; i < n; i++) { @@ -27372,10 +27379,13 @@ virDomainObjFormat(virDomainObj *obj, size_t i; state = virDomainObjGetState(obj, &reason); - virBufferAsprintf(&buf, "<domstatus state='%s' reason='%s' pid='%lld'>\n", + virBufferAsprintf(&buf, + "<domstatus state='%s' reason='%s' pid='%lld' " + "stime='%lld'>\n", virDomainStateTypeToString(state), virDomainStateReasonToString(state, reason), - (long long)obj->pid); + (long long)obj->pid, + obj->stime); virBufferAdjustIndent(&buf, 2); for (i = 0; i < VIR_DOMAIN_TAINT_LAST; i++) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 352b88eae5..7093180df5 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3097,6 +3097,7 @@ struct _virDomainObj { virDomainJobObj *job; pid_t pid; /* 0 for no PID, avoid negative values like -1 */ + long long stime; /* Start time for domain process to check pid re-use. */ virDomainStateReason state; unsigned int autostart : 1; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5be40dbefe..e61e057e60 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2115,6 +2115,7 @@ virCommandSetPidFile; virCommandSetPreExecHook; virCommandSetSELinuxLabel; virCommandSetSendBuffer; +virCommandSetStimeFile; virCommandSetUID; virCommandSetUmask; virCommandSetWorkingDirectory; @@ -3128,6 +3129,9 @@ virPidFileRelease; virPidFileReleasePath; virPidFileWrite; virPidFileWritePath; +virPidStimeFileAcquirePath; +virPidStimeFileBuildPath; +virPidStimeFileReadPath; # util/virpolkit.h @@ -3161,6 +3165,9 @@ virProcessGroupKill; virProcessKill; virProcessKillPainfully; virProcessKillPainfullyDelay; +virProcessKillPainfullyDelayWithStime; +virProcessKillPainfullyWithStime; +virProcessKillWithStime; virProcessNamespaceAvailable; virProcessRunInFork; virProcessRunInMountNamespace; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index a22deaf113..30395491d9 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -114,6 +114,7 @@ struct _qemuDomainObjPrivate { bool beingDestroyed; char *pidfile; + char *stimefile; virDomainPCIAddressSet *pciaddrs; virDomainUSBAddressSet *usbaddrs; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 97336e2622..d0f60832fb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -129,6 +129,12 @@ qemuProcessRemoveDomainStatus(virQEMUDriver *driver, errno != ENOENT) VIR_WARN("Failed to remove PID file for %s: %s", vm->def->name, g_strerror(errno)); + + if (priv->stimefile && + unlink(priv->stimefile) < 0 && + errno != ENOENT) + VIR_WARN("Failed to remove Stime file for %s: %s", + vm->def->name, g_strerror(errno)); } @@ -7124,6 +7130,21 @@ qemuProcessPrepareHost(virQEMUDriver *driver, return -1; } + VIR_FREE(priv->stimefile); + if (!(priv->stimefile = virPidStimeFileBuildPath(cfg->stateDir, vm->def->name))) { + virReportSystemError(errno, + "%s", _("Failed to build stimefile path.")); + return -1; + } + + if (unlink(priv->stimefile) < 0 && + errno != ENOENT) { + virReportSystemError(errno, + _("Cannot remove stale Stime file %s"), + priv->stimefile); + return -1; + } + VIR_DEBUG("Write domain masterKey"); if (qemuDomainWriteMasterKeyFile(driver, vm) < 0) return -1; @@ -7550,6 +7571,7 @@ qemuProcessLaunch(virConnectPtr conn, virCommandSetErrorFD(cmd, &logfile); virCommandNonblockingFDs(cmd); virCommandSetPidFile(cmd, priv->pidfile); + virCommandSetStimeFile(cmd, priv->stimefile); virCommandDaemonize(cmd); virCommandRequireHandshake(cmd); @@ -7574,6 +7596,12 @@ qemuProcessLaunch(virConnectPtr conn, goto cleanup; } + if ((rv = virPidStimeFileReadPath(priv->stimefile, &vm->stime)) < 0) { + VIR_DEBUG("Failed to get stime for pid=%lld vm=%p name=%s", + (long long)vm->pid, vm, vm->def->name); + goto cleanup; + } + VIR_DEBUG("Writing early domain status to disk"); if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0) goto cleanup; @@ -8009,18 +8037,21 @@ qemuProcessKill(virDomainObj *vm, unsigned int flags) } if (flags & VIR_QEMU_PROCESS_KILL_NOWAIT) { - virProcessKill(vm->pid, - (flags & VIR_QEMU_PROCESS_KILL_FORCE) ? - SIGKILL : SIGTERM); + virProcessKillWithStime(vm->pid, + (flags & VIR_QEMU_PROCESS_KILL_FORCE) ? + SIGKILL : SIGTERM, + vm->stime); return 0; } /* Request an extra delay of two seconds per current nhostdevs * to be safe against stalls by the kernel freeing up the resources */ - return virProcessKillPainfullyDelay(vm->pid, - !!(flags & VIR_QEMU_PROCESS_KILL_FORCE), - vm->def->nhostdevs * 2, - false); + return virProcessKillPainfullyDelayWithStime(vm->pid, + !!(flags & + VIR_QEMU_PROCESS_KILL_FORCE), + vm->def->nhostdevs * 2, + false, + vm->stime); } @@ -8702,6 +8733,10 @@ qemuProcessReconnect(void *opaque) if (!(priv->pidfile = virPidFileBuildPath(cfg->stateDir, obj->def->name))) goto error; + if (!(priv->stimefile = + virPidStimeFileBuildPath(cfg->stateDir, obj->def->name))) + goto error; + /* Restore the masterKey */ if (qemuDomainMasterKeyReadFile(priv) < 0) goto error; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 4e003266bf..1c95833457 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -125,6 +125,7 @@ struct _virCommand { pid_t pid; char *pidfile; + char *stimefile; bool reap; bool rawStatus; @@ -803,6 +804,8 @@ virExec(virCommand *cmd) if (cmd->pidfile) { int pidfilefd = -1; + int stimefilefd = -1; + long long unsigned stime = 0; char c; pidfilefd = virPidFileAcquirePath(cmd->pidfile, false, pid); @@ -814,6 +817,22 @@ virExec(virCommand *cmd) goto fork_error; } + if (cmd->stimefile) { + /* If stimefile is present, get start time of process and store + * in stimefile. + */ + if (virProcessGetStartTime(pid, &stime) < 0) { + virReportSystemError(errno, + _("Failed to get start time for pid %d"), + pid); + goto fork_error; + } + stimefilefd = virPidStimeFileAcquirePath(cmd->stimefile, false, + (long long)stime); + if (stimefilefd < 0) + goto fork_error; + } + c = '1'; if (safewrite(pipesync[1], &c, sizeof(c)) != sizeof(c)) { virReportSystemError(errno, "%s", _("Unable to notify child process")); @@ -1073,6 +1092,17 @@ virCommandSetPidFile(virCommand *cmd, const char *pidfile) } +void +virCommandSetStimeFile(virCommand *cmd, const char *stimefile) +{ + if (virCommandHasError(cmd)) + return; + + VIR_FREE(cmd->stimefile); + cmd->stimefile = g_strdup(stimefile); +} + + gid_t virCommandGetGID(virCommand *cmd) { @@ -2553,6 +2583,12 @@ virCommandRunAsync(virCommand *cmd, pid_t *pid) goto cleanup; } + if (cmd->stimefile && !cmd->pidfile) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Stime file can not be created without pidfile")); + goto cleanup; + } + if (dryRunBuffer || dryRunCallback) { g_autofree char *cmdstr = NULL; dryRunStatus = 0; @@ -3008,6 +3044,7 @@ virCommandFree(virCommand *cmd) } g_free(cmd->pidfile); + g_free(cmd->stimefile); if (cmd->reap) virCommandAbort(cmd); diff --git a/src/util/vircommand.h b/src/util/vircommand.h index c7a580e152..1b3bc285a5 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -59,6 +59,9 @@ void virCommandPassFD(virCommand *cmd, void virCommandSetPidFile(virCommand *cmd, const char *pidfile) ATTRIBUTE_NONNULL(2); +void virCommandSetStimeFile(virCommand *cmd, + const char *stimefile) ATTRIBUTE_NONNULL(2); + gid_t virCommandGetGID(virCommand *cmd) ATTRIBUTE_NONNULL(1); uid_t virCommandGetUID(virCommand *cmd) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virpidfile.c b/src/util/virpidfile.c index bfd967c1af..b6be32708a 100644 --- a/src/util/virpidfile.c +++ b/src/util/virpidfile.c @@ -40,17 +40,31 @@ VIR_LOG_INIT("util.pidfile"); -char *virPidFileBuildPath(const char *dir, const char* name) +char *virPidFileBuildPathCommon(const char *dir, const char* name, + const char* delim) { g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; virBufferAsprintf(&buf, "%s", dir); - virBufferEscapeString(&buf, "/%s.pid", name); + virBufferEscapeString(&buf, "/%s", name); + virBufferAsprintf(&buf, ".%s", delim); return virBufferContentAndReset(&buf); } +char *virPidFileBuildPath(const char *dir, const char* name) +{ + return virPidFileBuildPathCommon(dir, name, "pid"); +} + + +char *virPidStimeFileBuildPath(const char *dir, const char* name) +{ + return virPidFileBuildPathCommon(dir, name, "stime"); +} + + int virPidFileWritePath(const char *pidfile, pid_t pid) { @@ -102,39 +116,38 @@ int virPidFileWrite(const char *dir, } -int virPidFileReadPath(const char *path, - pid_t *pid) +int virPidFileReadPathCommon(const char *path, + long long *value) { int fd; int rc; ssize_t bytes; - long long pid_value = 0; - char pidstr[VIR_INT64_STR_BUFLEN]; + long long val = 0; + char valuestr[VIR_INT64_STR_BUFLEN]; char *endptr = NULL; - *pid = 0; + *value = 0; if ((fd = open(path, O_RDONLY)) < 0) { rc = -errno; goto cleanup; } - bytes = saferead(fd, pidstr, sizeof(pidstr)); + bytes = saferead(fd, valuestr, sizeof(valuestr)); if (bytes < 0) { rc = -errno; VIR_FORCE_CLOSE(fd); goto cleanup; } - pidstr[bytes] = '\0'; + valuestr[bytes] = '\0'; - if (virStrToLong_ll(pidstr, &endptr, 10, &pid_value) < 0 || - !(*endptr == '\0' || g_ascii_isspace(*endptr)) || - (pid_t) pid_value != pid_value) { + if (virStrToLong_ll(valuestr, &endptr, 10, &val) < 0 || + !(*endptr == '\0' || g_ascii_isspace(*endptr))) { rc = -EINVAL; goto cleanup; } - *pid = pid_value; + *value = val; rc = 0; cleanup: @@ -145,6 +158,31 @@ int virPidFileReadPath(const char *path, } +int virPidFileReadPath(const char *path, + pid_t *pid) +{ + long long val = 0; + int rc = 0; + + if ((rc = virPidFileReadPathCommon(path, &val))) { + return rc; + } + if ((pid_t)val != val) { + return -1; + } + *pid = val; + + return 0; +} + + +int virPidStimeFileReadPath(const char *path, + long long *val) +{ + return virPidFileReadPathCommon(path, val); +} + + int virPidFileRead(const char *dir, const char *name, pid_t *pid) @@ -362,12 +400,13 @@ int virPidFileDelete(const char *dir, return virPidFileDeletePath(pidfile); } -int virPidFileAcquirePath(const char *path, - bool waitForLock, - pid_t pid) + +int virPidFileAcquirePathCommon(const char *path, + bool waitForLock, + long long val) { int fd = -1; - char pidstr[VIR_INT64_STR_BUFLEN]; + char valstr[VIR_INT64_STR_BUFLEN]; if (path[0] == '\0') return 0; @@ -376,7 +415,7 @@ int virPidFileAcquirePath(const char *path, struct stat a, b; if ((fd = open(path, O_WRONLY|O_CREAT, 0644)) < 0) { virReportSystemError(errno, - _("Failed to open pid file '%s'"), + _("Failed to open file '%s'"), path); return -1; } @@ -391,7 +430,7 @@ int virPidFileAcquirePath(const char *path, if (fstat(fd, &b) < 0) { virReportSystemError(errno, - _("Unable to check status of pid file '%s'"), + _("Unable to check status of file '%s'"), path); VIR_FORCE_CLOSE(fd); return -1; @@ -399,17 +438,17 @@ int virPidFileAcquirePath(const char *path, if (virFileLock(fd, false, 0, 1, waitForLock) < 0) { virReportSystemError(errno, - _("Failed to acquire pid file '%s'"), + _("Failed to acquire file '%s'"), path); VIR_FORCE_CLOSE(fd); return -1; } - /* Now make sure the pidfile we locked is the same + /* Now make sure the file we locked is the same * one that now exists on the filesystem */ if (stat(path, &a) < 0) { - VIR_DEBUG("Pid file '%s' disappeared: %s", + VIR_DEBUG("File '%s' disappeared: %s", path, g_strerror(errno)); VIR_FORCE_CLOSE(fd); /* Someone else must be racing with us, so try again */ @@ -419,24 +458,24 @@ int virPidFileAcquirePath(const char *path, if (a.st_ino == b.st_ino) break; - VIR_DEBUG("Pid file '%s' was recreated", path); + VIR_DEBUG("File '%s' was recreated", path); VIR_FORCE_CLOSE(fd); /* Someone else must be racing with us, so try again */ } - g_snprintf(pidstr, sizeof(pidstr), "%lld", (long long) pid); + g_snprintf(valstr, sizeof(valstr), "%lld", val); if (ftruncate(fd, 0) < 0) { virReportSystemError(errno, - _("Failed to truncate pid file '%s'"), + _("Failed to truncate file '%s'"), path); VIR_FORCE_CLOSE(fd); return -1; } - if (safewrite(fd, pidstr, strlen(pidstr)) < 0) { + if (safewrite(fd, valstr, strlen(valstr)) < 0) { virReportSystemError(errno, - _("Failed to write to pid file '%s'"), + _("Failed to write to file '%s'"), path); VIR_FORCE_CLOSE(fd); } @@ -445,6 +484,26 @@ int virPidFileAcquirePath(const char *path, } +int virPidFileAcquirePath(const char *path, + bool waitForLock, + pid_t pid) +{ + return virPidFileAcquirePathCommon(path, + waitForLock, + (long long)pid); +} + + +int virPidStimeFileAcquirePath(const char *path, + bool waitForLock, + long long stime) +{ + return virPidFileAcquirePathCommon(path, + waitForLock, + stime); +} + + int virPidFileAcquire(const char *dir, const char *name, bool waitForLock, diff --git a/src/util/virpidfile.h b/src/util/virpidfile.h index e84542f298..7c8fc2ddf2 100644 --- a/src/util/virpidfile.h +++ b/src/util/virpidfile.h @@ -26,8 +26,13 @@ #include <sys/types.h> #include "internal.h" +char *virPidFileBuildPathCommon(const char *dir, + const char *name, + const char* delim); char *virPidFileBuildPath(const char *dir, const char *name); +char *virPidStimeFileBuildPath(const char *dir, + const char* name); int virPidFileWritePath(const char *path, pid_t pid) G_GNUC_WARN_UNUSED_RESULT; @@ -35,8 +40,13 @@ int virPidFileWrite(const char *dir, const char *name, pid_t pid) G_GNUC_WARN_UNUSED_RESULT; +int virPidFileReadPathCommon(const char *path, + long long *val) G_GNUC_WARN_UNUSED_RESULT; int virPidFileReadPath(const char *path, pid_t *pid) G_GNUC_WARN_UNUSED_RESULT; +int virPidStimeFileReadPath(const char *path, + long long *val) G_GNUC_WARN_UNUSED_RESULT; + int virPidFileRead(const char *dir, const char *name, pid_t *pid) G_GNUC_WARN_UNUSED_RESULT; @@ -56,9 +66,16 @@ int virPidFileDelete(const char *dir, const char *name); +int virPidFileAcquirePathCommon(const char *path, + bool waitForLock, + long long val) G_GNUC_WARN_UNUSED_RESULT; int virPidFileAcquirePath(const char *path, bool waitForLock, pid_t pid) G_GNUC_WARN_UNUSED_RESULT; +int virPidStimeFileAcquirePath(const char *path, + bool waitForLock, + long long stime) G_GNUC_WARN_UNUSED_RESULT; + int virPidFileAcquire(const char *dir, const char *name, bool waitForLock, diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 11f36e00a8..9507d1d600 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -24,6 +24,10 @@ #include <fcntl.h> #include <signal.h> + +#ifndef WIN32 +# include <sys/syscall.h> +#endif #ifndef WIN32 # include <sys/wait.h> #endif @@ -232,6 +236,19 @@ virProcessWait(pid_t pid, int *exitstatus, bool raw) return -1; } +int virPidFdkill(int pidfd, int sig) +{ + siginfo_t siginfo; + + siginfo.si_code = SI_QUEUE; + siginfo.si_signo = sig; + siginfo.si_errno = 0; + siginfo.si_pid = getpid(); + siginfo.si_uid = getuid(); + + return syscall(SYS_pidfd_send_signal, pidfd, sig, &siginfo, 0); +} + #else /* WIN32 */ char * @@ -259,10 +276,18 @@ virProcessWait(pid_t pid, int *exitstatus G_GNUC_UNUSED, bool raw G_GNUC_UNUSED) #endif /* WIN32 */ - -/* send signal to a single process */ -int virProcessKill(pid_t pid, int sig) +/* send signal to a single process + * + * If a valid stime value is provided, verify start time of process + * before sending signal to it. Value 0 for stime is considered as + * invalid. + */ +int virProcessKillWithStime(pid_t pid, int sig, + long long stime) { + int pidfd; + unsigned long long starttime; + if (pid <= 1) { errno = ESRCH; return -1; @@ -312,7 +337,21 @@ int virProcessKill(pid_t pid, int sig) } return 0; #else - return kill(pid, sig); + pidfd = syscall(SYS_pidfd_open, pid, 0); + if (pidfd == -1) { + return -1; + } + + if (stime && (virProcessGetStartTime(pid, &starttime) >= 0) && + (long long)starttime != stime) { + /* If start time of process does not match, it means actual + * process is already dead and pid has been re-used. + */ + errno = ESRCH; + return -1; + } + + return virPidFdkill(pidfd, sig); #endif } @@ -334,6 +373,12 @@ int virProcessGroupKill(pid_t pid, int sig G_GNUC_UNUSED) } +int virProcessKill(pid_t pid, int sig) +{ + return virProcessKillWithStime(pid, sig, 0); +} + + /* get process group from a pid */ pid_t virProcessGroupGet(pid_t pid) { @@ -362,7 +407,10 @@ pid_t virProcessGroupGet(pid_t pid) * wait longer than the default. */ int -virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay, bool group) +virProcessKillPainfullyDelayWithStime(pid_t pid, bool force, + unsigned int extradelay, + bool group, + long long stime) { size_t i; /* This is in 1/5th seconds since polling is on a 0.2s interval */ @@ -408,7 +456,7 @@ virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay, boo if (group) rc = virProcessGroupKill(pid, signum); else - rc = virProcessKill(pid, signum); + rc = virProcessKillWithStime(pid, signum, stime); if (rc < 0) { if (errno != ESRCH) { @@ -431,11 +479,24 @@ virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay, boo } +int virProcessKillPainfullyDelay(pid_t pid, bool force, + unsigned int extradelay, + bool group) +{ + return virProcessKillPainfullyDelayWithStime(pid, force, extradelay, + group, 0); +} + int virProcessKillPainfully(pid_t pid, bool force) { return virProcessKillPainfullyDelay(pid, force, 0, false); } +int virProcessKillPainfullyWithStime(pid_t pid, bool force, long long int stime) +{ + return virProcessKillPainfullyDelayWithStime(pid, force, 0, false, stime); +} + #if WITH_DECL_CPU_SET_T int virProcessSetAffinity(pid_t pid, virBitmap *map, bool quiet) diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 91ad5618db..0e550c4806 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -52,14 +52,21 @@ virProcessWait(pid_t pid, int *exitstatus, bool raw) G_GNUC_WARN_UNUSED_RESULT; int virProcessKill(pid_t pid, int sig); +int virProcessKillWithStime(pid_t pid, int sig, long long stime); int virProcessGroupKill(pid_t pid, int sig); pid_t virProcessGroupGet(pid_t pid); int virProcessKillPainfully(pid_t pid, bool force); +int virProcessKillPainfullyWithStime(pid_t pid, bool force, long long stime); int virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay, bool group); +int virProcessKillPainfullyDelayWithStime(pid_t pid, + bool force, + unsigned int extradelay, + bool group, + long long stime); int virProcessSetAffinity(pid_t pid, virBitmap *map, bool quiet); @@ -204,3 +211,4 @@ int virProcessGetStatInfo(unsigned long long *cpuTime, int virProcessGetSchedInfo(unsigned long long *cpuWait, pid_t pid, pid_t tid); +int virPidFdkill(int pidfd, int sig); diff --git a/tests/qemustatusxml2xmldata/backup-pull-in.xml b/tests/qemustatusxml2xmldata/backup-pull-in.xml index e7fdc6c478..b11077577a 100644 --- a/tests/qemustatusxml2xmldata/backup-pull-in.xml +++ b/tests/qemustatusxml2xmldata/backup-pull-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='7690'> +<domstatus state='running' reason='booted' pid='7690' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-4-copy/monitor.sock' type='unix'/> <namespaces> diff --git a/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml b/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml index b62b3149c2..361ca8641e 100644 --- a/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml +++ b/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='7690'> +<domstatus state='running' reason='booted' pid='7690' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-4-copy/monitor.sock' type='unix'/> <namespaces> diff --git a/tests/qemustatusxml2xmldata/blockjob-mirror-in.xml b/tests/qemustatusxml2xmldata/blockjob-mirror-in.xml index e9f1d77856..8f165be2d8 100644 --- a/tests/qemustatusxml2xmldata/blockjob-mirror-in.xml +++ b/tests/qemustatusxml2xmldata/blockjob-mirror-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='3803518'> +<domstatus state='running' reason='booted' pid='3803518' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/test.monitor' type='unix'/> <vcpus> diff --git a/tests/qemustatusxml2xmldata/migration-in-params-in.xml b/tests/qemustatusxml2xmldata/migration-in-params-in.xml index 03773a089b..60d900453d 100644 --- a/tests/qemustatusxml2xmldata/migration-in-params-in.xml +++ b/tests/qemustatusxml2xmldata/migration-in-params-in.xml @@ -1,4 +1,4 @@ -<domstatus state='paused' reason='migration' pid='2296'> +<domstatus state='paused' reason='migration' pid='2296' stime='24731'> <taint flag='high-privileges'/> <taint flag='host-cpu'/> <monitor path='/var/lib/libvirt/qemu/domain-1-nest/monitor.sock' type='unix'/> diff --git a/tests/qemustatusxml2xmldata/migration-out-nbd-bitmaps-in.xml b/tests/qemustatusxml2xmldata/migration-out-nbd-bitmaps-in.xml index 4ee44ffbd4..eb62f2aec8 100644 --- a/tests/qemustatusxml2xmldata/migration-out-nbd-bitmaps-in.xml +++ b/tests/qemustatusxml2xmldata/migration-out-nbd-bitmaps-in.xml @@ -1,4 +1,4 @@ -<domstatus state='paused' reason='migration' pid='2458694'> +<domstatus state='paused' reason='migration' pid='2458694' stime='24731'> <monitor path='/var/lib/libvirt/qemu/domain-11-migr/monitor.sock' type='unix'/> <namespaces> <mount/> diff --git a/tests/qemustatusxml2xmldata/migration-out-nbd-in.xml b/tests/qemustatusxml2xmldata/migration-out-nbd-in.xml index 636accf054..0d9e9e1acf 100644 --- a/tests/qemustatusxml2xmldata/migration-out-nbd-in.xml +++ b/tests/qemustatusxml2xmldata/migration-out-nbd-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='15433'> +<domstatus state='running' reason='booted' pid='15433' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-4-upstream/monitor.sock' json='1' type='unix'/> <vcpus> diff --git a/tests/qemustatusxml2xmldata/migration-out-nbd-out.xml b/tests/qemustatusxml2xmldata/migration-out-nbd-out.xml index de92146eaa..e5cc8dd854 100644 --- a/tests/qemustatusxml2xmldata/migration-out-nbd-out.xml +++ b/tests/qemustatusxml2xmldata/migration-out-nbd-out.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='15433'> +<domstatus state='running' reason='booted' pid='15433' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-4-upstream/monitor.sock' type='unix'/> <vcpus> diff --git a/tests/qemustatusxml2xmldata/migration-out-nbd-tls-in.xml b/tests/qemustatusxml2xmldata/migration-out-nbd-tls-in.xml index 2cd6c9a5e9..f24bf6a9fb 100644 --- a/tests/qemustatusxml2xmldata/migration-out-nbd-tls-in.xml +++ b/tests/qemustatusxml2xmldata/migration-out-nbd-tls-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='68472'> +<domstatus state='running' reason='booted' pid='68472' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-3-upstream/monitor.sock' json='1' type='unix'/> <namespaces> diff --git a/tests/qemustatusxml2xmldata/migration-out-nbd-tls-out.xml b/tests/qemustatusxml2xmldata/migration-out-nbd-tls-out.xml index 6bdd128259..13a0ad10ea 100644 --- a/tests/qemustatusxml2xmldata/migration-out-nbd-tls-out.xml +++ b/tests/qemustatusxml2xmldata/migration-out-nbd-tls-out.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='68472'> +<domstatus state='running' reason='booted' pid='68472' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-3-upstream/monitor.sock' type='unix'/> <namespaces> diff --git a/tests/qemustatusxml2xmldata/migration-out-params-in.xml b/tests/qemustatusxml2xmldata/migration-out-params-in.xml index 24ee86e4c0..6a1888de1b 100644 --- a/tests/qemustatusxml2xmldata/migration-out-params-in.xml +++ b/tests/qemustatusxml2xmldata/migration-out-params-in.xml @@ -1,4 +1,4 @@ -<domstatus state='paused' reason='migration' pid='21586'> +<domstatus state='paused' reason='migration' pid='21586' stime='24731'> <taint flag='high-privileges'/> <taint flag='host-cpu'/> <monitor path='/var/lib/libvirt/qemu/domain-7-nest/monitor.sock' type='unix'/> diff --git a/tests/qemustatusxml2xmldata/modern-in.xml b/tests/qemustatusxml2xmldata/modern-in.xml index 7759034f7a..139a1ff441 100644 --- a/tests/qemustatusxml2xmldata/modern-in.xml +++ b/tests/qemustatusxml2xmldata/modern-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='195139'> +<domstatus state='running' reason='booted' pid='195139' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/domain-1-upstream/monitor.sock' type='unix'/> <namespaces> diff --git a/tests/qemustatusxml2xmldata/upgrade-in.xml b/tests/qemustatusxml2xmldata/upgrade-in.xml index 7fa56429f4..08d089392c 100644 --- a/tests/qemustatusxml2xmldata/upgrade-in.xml +++ b/tests/qemustatusxml2xmldata/upgrade-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='195139'> +<domstatus state='running' reason='booted' pid='195139' stime='24731'> <monitor path='/var/lib/libvirt/qemu/domain-1-upstream/monitor.sock' json='1' type='unix'/> <namespaces> <mount/> diff --git a/tests/qemustatusxml2xmldata/upgrade-out.xml b/tests/qemustatusxml2xmldata/upgrade-out.xml index e663b3dbb5..fb06b2f1c1 100644 --- a/tests/qemustatusxml2xmldata/upgrade-out.xml +++ b/tests/qemustatusxml2xmldata/upgrade-out.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='195139'> +<domstatus state='running' reason='booted' pid='195139' stime='24731'> <monitor path='/var/lib/libvirt/qemu/domain-1-upstream/monitor.sock' type='unix'/> <namespaces> <mount/> diff --git a/tests/qemustatusxml2xmldata/vcpus-multi-in.xml b/tests/qemustatusxml2xmldata/vcpus-multi-in.xml index e71d6a83c7..de6679d0fa 100644 --- a/tests/qemustatusxml2xmldata/vcpus-multi-in.xml +++ b/tests/qemustatusxml2xmldata/vcpus-multi-in.xml @@ -1,4 +1,4 @@ -<domstatus state='running' reason='booted' pid='3803518'> +<domstatus state='running' reason='booted' pid='3803518' stime='24731'> <taint flag='high-privileges'/> <monitor path='/var/lib/libvirt/qemu/test.monitor' type='unix'/> <vcpus> -- 2.22.3

I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18. On 10/6/22 4:33 AM, manish.mishra wrote:
Libvirt stores pid of domain(e.g Qemu) process when a domain process is started and same pid is used while destroying domain process. There is always a race possible that before libvirt tries to kill domain process, actual domain process is already dead and same pid is used for another process. In that case libvirt may kill an innocent process.
With this patch we store start time of domain process when domain is started and we match stime again while killing domain process if it does not match we can assume domain process is already dead.
This patch series tries to create a generic interface which can be used for non-domain processes too, even though initial patch series handles pid re-use for domain process only.
Proposed changes: 1. While creating a daemon process through virExec, create a file which stores start time of the process, along with the pid file. A Separate file for start time is created instead of storing start time too in *.pid file so that existing external user of <domain-id>*.pid file does not break. 2. Persist stime in domstatus in domain xml. 3. In virProcessKill before sending signal to process also verify start time of process. For sending signal to process pidfd_send_signal is used avoid any race between verifying start time and sending signal. Following steps are taken while killing a process. 3.1 Open a pid-fd for given pid with pidfd_open. 3.2 Verify start time of pid with stored start time. If start time does not match, assume process is already dead. 3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used just after verifying star time, signal will be sent to a pidfd instance which was created before verifying timestamp.
manish.mishra (1): Check for pid re-use before killing qemu process
src/conf/domain_conf.c | 14 ++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 7 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 49 ++++++-- src/util/vircommand.c | 37 ++++++ src/util/vircommand.h | 3 + src/util/virpidfile.c | 113 +++++++++++++----- src/util/virpidfile.h | 17 +++ src/util/virprocess.c | 73 ++++++++++- src/util/virprocess.h | 8 ++ .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- .../blockjob-blockdev-in.xml | 2 +- .../blockjob-mirror-in.xml | 2 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-in.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- 25 files changed, 295 insertions(+), 56 deletions(-)

On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18.
We need to support FreeBSD / macOS for the QEMU driver too, which becomes the same problem. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Thanks for review Jonathon, Daniel On 11/10/22 9:56 pm, Daniel P. Berrangé wrote:
On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18.
I did not know that, will it okay to put it in some condional check. Pidfd is used anyway to remove a very small window race, more important one is start time comparison. As these kind of issues can be very easily reproducible if domain process dies when libvirt is dead and by time libvirt comes pid is re-used. So atleast start time checking reduce race window even for older kernel.
We need to support FreeBSD / macOS for the QEMU driver too, which becomes the same problem.
Sorry Daniel i could not understand much of this, can you please give some more detail.
With regards, Daniel
Thanks Manish Mishra

On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote:
Thanks for review Jonathon, Daniel
On 11/10/22 9:56 pm, Daniel P. Berrangé wrote:
On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18.
I did not know that, will it okay to put it in some condional check. Pidfd is used anyway to remove a very small window race, more important one is start time comparison. As these kind of issues can be very easily reproducible if domain process dies when libvirt is dead and by time libvirt comes pid is re-used. So atleast start time checking reduce race window even for older kernel.
I presume you're seeing this on fairly old kernels ? On modern kernels, pid_max is ~4 billion, instead of 64k, so the chances of seeing PID reuse is tiny. If the risk is primarily from the situation where libvirtd was shutoff at the time QEMU stopped, then we ought to read /proc/$PID/stat in qemuProcessReconnect() and entirely skip any attempt to reconnect if we see the starttime has changed while we were stopped. Of course needs to be skipped on non-Linux. Also having an conditionally compiled pidfd check during the kill() path would be a second safety net, for modern Linux.
We need to support FreeBSD / macOS for the QEMU driver too, which becomes the same problem.
Sorry Daniel i could not understand much of this, can you please give some
Libvirt targets multiple platforms, not merely Linux. the QEMU driver is expected to run on Linux, FreeBSD and macOS, so cannot rely on Linux specific functionality - that has to be conditionally built. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/10/22 10:17 pm, Daniel P. Berrangé wrote:
On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote:
Thanks for review Jonathon, Daniel
On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18. I did not know that, will it okay to put it in some condional check. Pidfd is used anyway to remove a very small window race, more important one is start time comparison. As these kind of issues can be very easily reproducible if domain process dies when libvirt is dead and by time
On 11/10/22 9:56 pm, Daniel P. Berrangé wrote: libvirt comes pid is re-used. So atleast start time checking reduce race window even for older kernel. I presume you're seeing this on fairly old kernels ? On modern kernels, pid_max is ~4 billion, instead of 64k, so the chances of seeing PID reuse is tiny.
If the risk is primarily from the situation where libvirtd was shutoff at the time QEMU stopped, then we ought to read /proc/$PID/stat in qemuProcessReconnect() and entirely skip any attempt to reconnect if we see the starttime has changed while we were stopped. Of course needs to be skipped on non-Linux.
Also having an conditionally compiled pidfd check during the kill() path would be a second safety net, for modern Linux.
We need to support FreeBSD / macOS for the QEMU driver too, which becomes the same problem. Sorry Daniel i could not understand much of this, can you please give some Libvirt targets multiple platforms, not merely Linux. the QEMU driver is expected to run on Linux, FreeBSD and macOS, so cannot rely on Linux specific functionality - that has to be conditionally built.
Yes got it, this clarifies, thanks Daniel, I currently i had for non-windows platform, sure i will make is for specific, also check for kernel version. Are you okay to create a separate file for stime. I mean i have to keep this keeping in mind that i can not change .pid file and libvirt can restart any time so persisting in just domain xml by sharing stime with pipes may not be full proof.
With regards, Daniel
participants (3)
-
Daniel P. Berrangé
-
Jonathon Jongsma
-
manish.mishra