[libvirt] [PATCH v4 0/2] Fix detection of slow guest shutdown

Hi, after a good discussion a few days ago in https://www.redhat.com/archives/libvir-list/2018-August/msg00122.html and a short lived but back then untested v2 in https://www.redhat.com/archives/libvir-list/2018-August/msg00199.html I finally get access to the right HW again and completed the series. Being finally retested and working I finally feel safe to submit without a RFC prefix. I think this would be a great addition for a better handling of guests with plenty of host devices passed through. With the new code in place I can shutdown systems that have 12, 16 or even more hostdevs attached without getting into the "zombie" mode where libvirt will forever consider the guest as "in shutdown" as it gave up waiting too early because the signal zero still was able to reach it. Scaling examples (extracted with gdb): 16 Devices: virProcessKillPainfullyDelay (pid=67096, force=true, extradelay=32) 12 Devices: virProcessKillPainfullyDelay (pid=68251, force=true, extradelay=24) *Updates in v4* - virDebug now reports the extradelay as requested (in seconds) and thereby mostly matches the gdb output seen above - header function prototype defines the variable name - clarify the usage of delay units - seconds (API call) - 5th of seconds (internal poll loop) - explain the request for 2*nhostdevs from the qemu shutdown code *Updates in v3* - fixup some issues found in testing and code checks *Updates in v2* - removed the "accept the lack of /proc/<pid> as valid process removal" approach due to valid concerns about reusing ressources. - added a dynamic extra wait scaling with the amount of hostdevs Christian Ehrhardt (2): process: wait longer on kill per assigned Hostdev process: wait longer 5->30s on hard shutdown src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 7 +++++-- src/util/virprocess.c | 22 ++++++++++++++++++---- src/util/virprocess.h | 3 +++ 4 files changed, 27 insertions(+), 6 deletions(-) -- 2.17.1

It was found that in cases with host devices virProcessKillPainfully might be able to send signal zero to the target PID for quite a while with the process already being gone from /proc/<PID>. That is due to cleanup and reset of devices which might include a secondary bus reset that on top of the actions taken has a 1s delay to let the bus settle. Due to that guests with plenty of Host devices could easily exceed the default timeouts. To solve that, this adds an extra delay of 2s per hostdev that is associated to a VM. Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 7 +++++-- src/util/virprocess.c | 20 +++++++++++++++++--- src/util/virprocess.h | 3 +++ 4 files changed, 26 insertions(+), 5 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ca4a192a4a..47ea35f864 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2605,6 +2605,7 @@ virProcessGetPids; virProcessGetStartTime; virProcessKill; virProcessKillPainfully; +virProcessKillPainfullyDelay; virProcessNamespaceAvailable; virProcessRunInMountNamespace; virProcessSchedPolicyTypeFromString; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b42fda850f..64097b29cb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6817,8 +6817,11 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int flags) return 0; } - ret = virProcessKillPainfully(vm->pid, - !!(flags & VIR_QEMU_PROCESS_KILL_FORCE)); + /* Request an extra delay of two seconds per current nhostdevs + * to be safe against stalls by the kernel freeing up the resources */ + ret = virProcessKillPainfullyDelay(vm->pid, + !!(flags & VIR_QEMU_PROCESS_KILL_FORCE), + vm->def->nhostdevs * 2); return ret; } diff --git a/src/util/virprocess.c b/src/util/virprocess.c index ecea27a2d4..4c7f2ed97c 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -341,15 +341,21 @@ int virProcessKill(pid_t pid, int sig) * Returns 0 if it was killed gracefully, 1 if it * was killed forcibly, -1 if it is still alive, * or another error occurred. + * + * Callers can proide an extra delay in seconds to + * wait longer than the default. */ int -virProcessKillPainfully(pid_t pid, bool force) +virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay) { size_t i; int ret = -1; + /* This is in 1/5th seconds since polling is on a 0.2s interval */ + unsigned int polldelay = 75 + (extradelay*5); const char *signame = "TERM"; - VIR_DEBUG("vpid=%lld force=%d", (long long)pid, force); + VIR_DEBUG("vpid=%lld force=%d extradelay=%u", + (long long)pid, force, extradelay); /* This loop sends SIGTERM, then waits a few iterations (10 seconds) * to see if it dies. If the process still hasn't exited, and @@ -357,9 +363,12 @@ virProcessKillPainfully(pid_t pid, bool force) * wait up to 5 seconds more for the process to exit before * returning. * + * An extra delay can be passed by the caller for cases that are + * expected to clean up slower than usual. + * * Note that setting @force could result in dataloss for the process. */ - for (i = 0; i < 75; i++) { + for (i = 0; i < polldelay; i++) { int signum; if (i == 0) { signum = SIGTERM; /* kindly suggest it should exit */ @@ -402,6 +411,11 @@ virProcessKillPainfully(pid_t pid, bool force) } +int virProcessKillPainfully(pid_t pid, bool force) +{ + return virProcessKillPainfullyDelay(pid, force, 0); +} + #if HAVE_SCHED_GETAFFINITY int virProcessSetAffinity(pid_t pid, virBitmapPtr map) diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 3c5a882772..5faa0892fe 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -55,6 +55,9 @@ virProcessWait(pid_t pid, int *exitstatus, bool raw) int virProcessKill(pid_t pid, int sig); int virProcessKillPainfully(pid_t pid, bool force); +int virProcessKillPainfullyDelay(pid_t pid, + bool force, + unsigned int extradelay); int virProcessSetAffinity(pid_t pid, virBitmapPtr map); -- 2.17.1

On Tue, Aug 21, 2018 at 02:33:25PM +0200, Christian Ehrhardt wrote:
It was found that in cases with host devices virProcessKillPainfully might be able to send signal zero to the target PID for quite a while with the process already being gone from /proc/<PID>.
That is due to cleanup and reset of devices which might include a secondary bus reset that on top of the actions taken has a 1s delay to let the bus settle. Due to that guests with plenty of Host devices could easily exceed the default timeouts.
To solve that, this adds an extra delay of 2s per hostdev that is associated to a VM.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 7 +++++-- src/util/virprocess.c | 20 +++++++++++++++++--- src/util/virprocess.h | 3 +++ 4 files changed, 26 insertions(+), 5 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

In cases where virProcessKillPainfully already reailizes that SIGTERM wasn't enough we are partially on a bad path already. Maybe the system is overloaded or having serious trouble to free and reap resources in time. In those case give the SIGKILL that was sent after 10 seconds some more time to take effect if force was set (only then we are falling back to SIGKILL anyway). Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/virprocess.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 4c7f2ed97c..3988f5546c 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -351,7 +351,7 @@ virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay) size_t i; int ret = -1; /* This is in 1/5th seconds since polling is on a 0.2s interval */ - unsigned int polldelay = 75 + (extradelay*5); + unsigned int polldelay = (force ? 200 : 75) + (extradelay*5); const char *signame = "TERM"; VIR_DEBUG("vpid=%lld force=%d extradelay=%u", @@ -360,7 +360,7 @@ virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay) /* This loop sends SIGTERM, then waits a few iterations (10 seconds) * to see if it dies. If the process still hasn't exited, and * @force is requested, a SIGKILL will be sent, and this will - * wait up to 5 seconds more for the process to exit before + * wait up to 30 seconds more for the process to exit before * returning. * * An extra delay can be passed by the caller for cases that are -- 2.17.1

On Tue, Aug 21, 2018 at 2:34 PM Christian Ehrhardt < christian.ehrhardt@canonical.com> wrote:
Hi, after a good discussion a few days ago in https://www.redhat.com/archives/libvir-list/2018-August/msg00122.html and a short lived but back then untested v2 in https://www.redhat.com/archives/libvir-list/2018-August/msg00199.html I finally get access to the right HW again and completed the series.
Being finally retested and working I finally feel safe to submit without a RFC prefix. I think this would be a great addition for a better handling of guests with plenty of host devices passed through.
With the new code in place I can shutdown systems that have 12, 16 or even more hostdevs attached without getting into the "zombie" mode where libvirt will forever consider the guest as "in shutdown" as it gave up waiting too early because the signal zero still was able to reach it.
Scaling examples (extracted with gdb): 16 Devices: virProcessKillPainfullyDelay (pid=67096, force=true, extradelay=32) 12 Devices: virProcessKillPainfullyDelay (pid=68251, force=true, extradelay=24)
*Updates in v4* - virDebug now reports the extradelay as requested (in seconds) and thereby mostly matches the gdb output seen above - header function prototype defines the variable name - clarify the usage of delay units - seconds (API call) - 5th of seconds (internal poll loop) - explain the request for 2*nhostdevs from the qemu shutdown code
*Updates in v3* - fixup some issues found in testing and code checks
*Updates in v2* - removed the "accept the lack of /proc/<pid> as valid process removal" approach due to valid concerns about reusing ressources. - added a dynamic extra wait scaling with the amount of hostdevs
Christian Ehrhardt (2): process: wait longer on kill per assigned Hostdev process: wait longer 5->30s on hard shutdown
FYI after there was no further feedback I pushed the v4 with the appropriate reviewed by tags. Thanks everybody for your participation!
src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 7 +++++-- src/util/virprocess.c | 22 ++++++++++++++++++---- src/util/virprocess.h | 3 +++ 4 files changed, 27 insertions(+), 6 deletions(-)
-- 2.17.1
-- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd
participants (2)
-
Christian Ehrhardt
-
Daniel P. Berrangé