[libvirt PATCH 0/3] wait longer for virtiofsd to exit (virtio-fs epopee)

See patch 2/3. Ján Tomko (3): Introduce virPidFileForceCleanupPathDelay qemu: wait more for virtiofsd to exit util: fix typo src/libvirt_private.syms | 1 + src/qemu/qemu_virtiofs.c | 2 +- src/util/virpidfile.c | 16 +++++++++++++++- src/util/virpidfile.h | 2 ++ src/util/virprocess.c | 2 +- 5 files changed, 20 insertions(+), 3 deletions(-) -- 2.31.1

Add a version of virPidFileForceCleanupPath with an extradelay parameter for processes where the default timeout is not enough. Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virpidfile.c | 16 +++++++++++++++- src/util/virpidfile.h | 2 ++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2efa787664..dc25dd6ac0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3027,6 +3027,7 @@ virPidFileConstructPath; virPidFileDelete; virPidFileDeletePath; virPidFileForceCleanupPath; +virPidFileForceCleanupPathDelay; virPidFileRead; virPidFileReadIfAlive; virPidFileReadPath; diff --git a/src/util/virpidfile.c b/src/util/virpidfile.c index c6389c1869..6c3d869460 100644 --- a/src/util/virpidfile.c +++ b/src/util/virpidfile.c @@ -511,10 +511,14 @@ virPidFileConstructPath(bool privileged, * called multiple times with the same path, be it in threads or * processes. This function does not raise any errors. * + * If @extradelay (in seconds) is specified, we wait for + * 15 + extradelay seconds more. + * * Returns 0 if the pidfile was successfully cleaned up, -1 otherwise. */ int -virPidFileForceCleanupPath(const char *path) +virPidFileForceCleanupPathDelay(const char *path, + unsigned int extradelay) { pid_t pid = 0; int fd = -1; @@ -532,6 +536,9 @@ virPidFileForceCleanupPath(const char *path) /* Only kill the process if the pid is valid one. 0 means * there is somebody else doing the same pidfile cleanup * machinery. */ + if (extradelay && + virProcessKillPainfullyDelay(pid, false, extradelay) >= 0) + pid = 0; if (pid) virProcessKillPainfully(pid, true); @@ -544,3 +551,10 @@ virPidFileForceCleanupPath(const char *path) return 0; } + + +int +virPidFileForceCleanupPath(const char *path) +{ + return virPidFileForceCleanupPathDelay(path, 0); +} diff --git a/src/util/virpidfile.h b/src/util/virpidfile.h index 370a59892e..ef26377375 100644 --- a/src/util/virpidfile.h +++ b/src/util/virpidfile.h @@ -73,4 +73,6 @@ int virPidFileConstructPath(bool privileged, const char *progname, char **pidfile); +int virPidFileForceCleanupPathDelay(const char *path, + unsigned int extradelay) ATTRIBUTE_NONNULL(1); int virPidFileForceCleanupPath(const char *path) ATTRIBUTE_NONNULL(1); -- 2.31.1

In some cases, such as doing intense I/O on slow filesystems, it can take virtiofsd as long as 42 seconds to exit. Add a delay of extra 45 seconds before we forcefully kill it. https://bugzilla.redhat.com/show_bug.cgi?id=1940276 Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/qemu/qemu_virtiofs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_virtiofs.c b/src/qemu/qemu_virtiofs.c index edaedf0304..3f99f0caa4 100644 --- a/src/qemu/qemu_virtiofs.c +++ b/src/qemu/qemu_virtiofs.c @@ -281,7 +281,7 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup; - if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileForceCleanupPathDelay(pidfile, 30) < 0) { VIR_WARN("Unable to kill virtiofsd process"); } else { if (QEMU_DOMAIN_FS_PRIVATE(fs)->vhostuser_fs_sock) -- 2.31.1

On 6/18/21 1:45 PM, Ján Tomko wrote:
In some cases, such as doing intense I/O on slow filesystems, it can take virtiofsd as long as 42 seconds to exit.
Add a delay of extra 45 seconds before we forcefully kill it.
This is inaccessible :-( We should look into making it accessible.
Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/qemu/qemu_virtiofs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_virtiofs.c b/src/qemu/qemu_virtiofs.c index edaedf0304..3f99f0caa4 100644 --- a/src/qemu/qemu_virtiofs.c +++ b/src/qemu/qemu_virtiofs.c @@ -281,7 +281,7 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup;
Please put a comment here that the wait is 30 + 15 seconds. Otherwise 30 is misleading.
- if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileForceCleanupPathDelay(pidfile, 30) < 0) { VIR_WARN("Unable to kill virtiofsd process"); } else { if (QEMU_DOMAIN_FS_PRIVATE(fs)->vhostuser_fs_sock)
Michal

On Fri, Jun 18, 2021 at 01:45:18PM +0200, Ján Tomko wrote:
In some cases, such as doing intense I/O on slow filesystems, it can take virtiofsd as long as 42 seconds to exit.
Add a delay of extra 45 seconds before we forcefully kill it.
https://bugzilla.redhat.com/show_bug.cgi?id=1940276
Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/qemu/qemu_virtiofs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_virtiofs.c b/src/qemu/qemu_virtiofs.c index edaedf0304..3f99f0caa4 100644 --- a/src/qemu/qemu_virtiofs.c +++ b/src/qemu/qemu_virtiofs.c @@ -281,7 +281,7 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup;
- if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileForceCleanupPathDelay(pidfile, 30) < 0) {
The main process of virtiofsd is waiting for the child process on waitpid(). And virPidFileForceCleanupPath() (and virPidFileForceCleanupPathDelay()) will send SIGTERM to the main process, so waitpid() returns as EINTR and the main process exits. However, the child process and the sub threads may still remain for the IO completion. How about adding a timer to wait for the closing of the main process instead of sending SIGTERM? Like as follows: @@ -275,13 +278,35 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, { g_autofree char *pidfile = NULL; virErrorPtr orig_err; + unsigned long long timeout = 60 * 1000; + virTimeBackOffVar timebackoff; + bool terminated = false; + pid_t pid = (pid_t) -1; virErrorPreserveLast(&orig_err); if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup; - if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileReadPath(pidfile, &pid) < 0) + goto cleanup; + + /* in case the IO threads are still running, wait for the completion. */ + if (virProcessKill(pid, 0) == 0) { + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto cleanup; + + while (virTimeBackOffWait(&timebackoff)) { + if (virProcessKill(pid, 0) != 0) { + terminated = true; + break; + } + } + } else { + terminated = true; + } + + if ((virPidFileForceCleanupPath(pidfile) < 0) || !terminated) { VIR_WARN("Unable to kill virtiofsd process"); } else { if (QEMU_DOMAIN_FS_PRIVATE(fs)->vhostuser_fs_sock) -- 2.27.0 Thanks, Masa

On 6/21/21 4:53 AM, Masayoshi Mizuma wrote:
On Fri, Jun 18, 2021 at 01:45:18PM +0200, Ján Tomko wrote:
In some cases, such as doing intense I/O on slow filesystems, it can take virtiofsd as long as 42 seconds to exit.
Add a delay of extra 45 seconds before we forcefully kill it.
https://bugzilla.redhat.com/show_bug.cgi?id=1940276
Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/qemu/qemu_virtiofs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_virtiofs.c b/src/qemu/qemu_virtiofs.c index edaedf0304..3f99f0caa4 100644 --- a/src/qemu/qemu_virtiofs.c +++ b/src/qemu/qemu_virtiofs.c @@ -281,7 +281,7 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup;
- if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileForceCleanupPathDelay(pidfile, 30) < 0) {
The main process of virtiofsd is waiting for the child process on waitpid(). And virPidFileForceCleanupPath() (and virPidFileForceCleanupPathDelay()) will send SIGTERM to the main process, so waitpid() returns as EINTR and the main process exits. However, the child process and the sub threads may still remain for the IO completion.
How about adding a timer to wait for the closing of the main process instead of sending SIGTERM? Like as follows:
That might help for this particular use case (some I/O threads take longer to finish), but for the Destroy API call, libvirt should be able to deal even with stuck threads. So the solution I proposed here is incomplete. I'll send a v2 that operates on the whole process group. Jano
@@ -275,13 +278,35 @@ qemuVirtioFSStop(virQEMUDriver *driver G_GNUC_UNUSED, { g_autofree char *pidfile = NULL; virErrorPtr orig_err; + unsigned long long timeout = 60 * 1000; + virTimeBackOffVar timebackoff; + bool terminated = false; + pid_t pid = (pid_t) -1;
virErrorPreserveLast(&orig_err);
if (!(pidfile = qemuVirtioFSCreatePidFilename(vm, fs->info.alias))) goto cleanup;
- if (virPidFileForceCleanupPath(pidfile) < 0) { + if (virPidFileReadPath(pidfile, &pid) < 0) + goto cleanup; + + /* in case the IO threads are still running, wait for the completion. */ + if (virProcessKill(pid, 0) == 0) { + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto cleanup; + + while (virTimeBackOffWait(&timebackoff)) { + if (virProcessKill(pid, 0) != 0) { + terminated = true; + break; + } + } + } else { + terminated = true; + } + + if ((virPidFileForceCleanupPath(pidfile) < 0) || !terminated) { VIR_WARN("Unable to kill virtiofsd process"); } else { if (QEMU_DOMAIN_FS_PRIVATE(fs)->vhostuser_fs_sock)

Signed-off-by: Ján Tomko <jtomko@redhat.com> --- src/util/virprocess.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 01d5d01d02..5fad0db63d 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -368,7 +368,7 @@ int virProcessKill(pid_t pid, int sig) * was killed forcibly, -1 if it is still alive, * or another error occurred. * - * Callers can proide an extra delay in seconds to + * Callers can provide an extra delay in seconds to * wait longer than the default. */ int -- 2.31.1

On 6/18/21 1:45 PM, Ján Tomko wrote:
See patch 2/3.
Ján Tomko (3): Introduce virPidFileForceCleanupPathDelay qemu: wait more for virtiofsd to exit util: fix typo
src/libvirt_private.syms | 1 + src/qemu/qemu_virtiofs.c | 2 +- src/util/virpidfile.c | 16 +++++++++++++++- src/util/virpidfile.h | 2 ++ src/util/virprocess.c | 2 +- 5 files changed, 20 insertions(+), 3 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (4)
-
Jano Tomko
-
Ján Tomko
-
Masayoshi Mizuma
-
Michal Prívozník