
On Tue, Aug 21, 2018 at 02:20:21PM +0200, Christian Ehrhardt wrote:
On Tue, Aug 21, 2018 at 1:15 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Aug 14, 2018 at 11:27:33AM +0200, Christian Ehrhardt wrote:
It was found that in cases with host devices virProcessKillPainfully might be able to send signal zero to the target PID for quite a while with the process already being gone from /proc/<PID>.
That is due to cleanup and reset of devices which might include a secondary bus reset that on top of the actions taken has a 1s delay to let the bus settle. Due to that guests with plenty of Host devices could easily exceed the default timeouts.
To solve that, this adds an extra delay of 2s per hostdev that is associated to a VM.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 5 +++-- src/util/virprocess.c | 18 +++++++++++++++--- src/util/virprocess.h | 1 + 4 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ca4a192a4a..47ea35f864 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2605,6 +2605,7 @@ virProcessGetPids; virProcessGetStartTime; virProcessKill; virProcessKillPainfully; +virProcessKillPainfullyDelay; virProcessNamespaceAvailable; virProcessRunInMountNamespace; virProcessSchedPolicyTypeFromString; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 02fdc55156..b7bf8813da 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6814,8 +6814,9 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int flags) return 0; }
- ret = virProcessKillPainfully(vm->pid, - !!(flags & VIR_QEMU_PROCESS_KILL_FORCE)); + ret = virProcessKillPainfullyDelay(vm->pid, + !!(flags & VIR_QEMU_PROCESS_KILL_FORCE), + vm->def->nhostdevs * 2);
This API contract is a bit wierd. You've got an arbitray x2 multiplier here...
Out of the discussion with Alex I derived that due to the 1 sec settling of the bus plus some extra work 2 seconds per device would be a good rule of thumb. The delay is meant to be in seconds and here it requests 2*nhostdevs to get the amount needed.
And I'd not want to make the "unit" of the call "double-seconds" :-)
Let me add a comment at this call to explain the reasoning on the multiplier here.
-virProcessKillPainfully(pid_t pid, bool force) +virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int extradelay) { size_t i; int ret = -1; + unsigned int delay = 75 + (extradelay*5);
...and it gets another x5 multiplier here.
That *5 is due to the internal unit it polls being 0.2 seconds. I think externally anything other than seconds (double-seconds, half-deca-seconds, ...) makes no sense.
Opps, yes, I mis-understood the behaviour of the code. A comment is fine Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|