On Thu, Mar 07, 2024 at 09:32:45AM -0800, Andrea Bolognani wrote:
On Thu, Mar 07, 2024 at 05:15:46PM +0000, Daniel P. Berrangé wrote:
> On Thu, Mar 07, 2024 at 08:45:37AM -0800, Andrea Bolognani wrote:
> > On Thu, Mar 07, 2024 at 03:30:30PM +0000, Daniel P. Berrangé wrote:
> > > I wonder if something is hitting the 'max_client_requests' limit
and
> > > getting stalled.
> > >
> > > The initial thread message here says the lockup is happening during
> > > bulk concurrent live migrations of 200 VMs, 5 at a time.
> > >
> > > The default 'max_client_requests' is 5.... DANGER WILL
ROBINSON...
> > >
> > > With live migration making requests across multiple libvirt daemons,
> > > if the target host has filled its 5 requests queue with long running
> > > operations, and then a "prepare migrate' call comes in,
that'll get
> > > stalled behind a possibly slow operation at the RPC dispatch level.
> > >
> > > I'd suggest bumping 'max_client_requests' to 100 and seeing if
the
> > > problem goes away.
> > >
> > > If so I wonder if we shouldn't raise our out of the box limits.
> > > '5' is pretty low considering the scale of virtualization hosts
> > > in the modern world, and where even my laptop has 20 CPUs and
> > > 64 GB of RAM.
> >
> > FWIW I was running a simple workload inside KubeVirt (a test case
> > that's part of its functional test suite and involves spawning and
> > subsequently migrating a single VM) yesterday and I could see
> > warnings about hitting max_client_requests in the logs.
>
> Hmm, I could have sworn we told KubeVirt to raise the limits in their
> config files quite a while ago, but maybe i'm mixing it up with
> OpenStack.
I just checked and they don't set the value at all.
I think this is all making the case for increasing the defaults :-)
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|