On Tue, Aug 21, 2018 at 02:20:21PM +0200, Christian Ehrhardt wrote:
On Tue, Aug 21, 2018 at 1:15 PM Daniel P. Berrangé
<berrange(a)redhat.com>
wrote:
> On Tue, Aug 14, 2018 at 11:27:33AM +0200, Christian Ehrhardt wrote:
> > It was found that in cases with host devices virProcessKillPainfully
> > might be able to send signal zero to the target PID for quite a while
> > with the process already being gone from /proc/<PID>.
> >
> > That is due to cleanup and reset of devices which might include a
> > secondary bus reset that on top of the actions taken has a 1s delay
> > to let the bus settle. Due to that guests with plenty of Host devices
> > could easily exceed the default timeouts.
> >
> > To solve that, this adds an extra delay of 2s per hostdev that is
> associated
> > to a VM.
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt(a)canonical.com>
> > ---
> > src/libvirt_private.syms | 1 +
> > src/qemu/qemu_process.c | 5 +++--
> > src/util/virprocess.c | 18 +++++++++++++++---
> > src/util/virprocess.h | 1 +
> > 4 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> > index ca4a192a4a..47ea35f864 100644
> > --- a/src/libvirt_private.syms
> > +++ b/src/libvirt_private.syms
> > @@ -2605,6 +2605,7 @@ virProcessGetPids;
> > virProcessGetStartTime;
> > virProcessKill;
> > virProcessKillPainfully;
> > +virProcessKillPainfullyDelay;
> > virProcessNamespaceAvailable;
> > virProcessRunInMountNamespace;
> > virProcessSchedPolicyTypeFromString;
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 02fdc55156..b7bf8813da 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -6814,8 +6814,9 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int
> flags)
> > return 0;
> > }
> >
> > - ret = virProcessKillPainfully(vm->pid,
> > - !!(flags &
> VIR_QEMU_PROCESS_KILL_FORCE));
> > + ret = virProcessKillPainfullyDelay(vm->pid,
> > + !!(flags &
> VIR_QEMU_PROCESS_KILL_FORCE),
> > + vm->def->nhostdevs * 2);
>
> This API contract is a bit wierd. You've got an arbitray x2 multiplier
> here...
>
Out of the discussion with Alex I derived that due to the 1 sec settling of
the bus plus some extra work 2 seconds per device would be a good rule of
thumb.
The delay is meant to be in seconds and here it requests 2*nhostdevs to get
the amount needed.
And I'd not want to make the "unit" of the call "double-seconds"
:-)
Let me add a comment at this call to explain the reasoning on the
multiplier here.
> > -virProcessKillPainfully(pid_t pid, bool force)
> > +virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int
> extradelay)
> > {
> > size_t i;
> > int ret = -1;
> > + unsigned int delay = 75 + (extradelay*5);
>
> ...and it gets another x5 multiplier here.
>
That *5 is due to the internal unit it polls being 0.2 seconds.
I think externally anything other than seconds (double-seconds,
half-deca-seconds, ...) makes no sense.
Opps, yes, I mis-understood the behaviour of the code. A comment is fine
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|