Re: Error Cannot acquire state change lock from remoteDispatchDomainMigratePrepare3Params during live migration of domains

7 Mar 2024


      On Thu, Mar 07, 2024 at 05:15:46PM +0000, Daniel P. Berrangé wrote:
...
On Thu, Mar 07, 2024 at 08:45:37AM -0800, Andrea Bolognani wrote:
...
On Thu, Mar 07, 2024 at 03:30:30PM +0000, Daniel P. Berrangé wrote:
...
I wonder if something is hitting the 'max_client_requests' limit and
getting stalled.
The initial thread message here says the lockup is happening during
bulk concurrent live migrations of 200 VMs, 5 at a time.
The default 'max_client_requests' is 5.... DANGER WILL ROBINSON...
With live migration making requests across multiple libvirt daemons,
if the target host has filled its 5 requests queue with long running
operations, and then a "prepare migrate' call comes in, that'll get
stalled behind a possibly slow operation at the RPC dispatch level.
I'd suggest bumping 'max_client_requests' to 100 and seeing if the
problem goes away.
If so I wonder if we shouldn't raise our out of the box limits.
'5' is pretty low considering the scale of virtualization hosts
in the modern world, and where even my laptop has 20 CPUs and
64 GB of RAM.
FWIW I was running a simple workload inside KubeVirt (a test case
that's part of its functional test suite and involves spawning and
subsequently migrating a single VM) yesterday and I could see
warnings about hitting max_client_requests in the logs.
Hmm, I could have sworn we told KubeVirt to raise the limits in their
config files quite a while ago, but maybe i'm mixing it up with
OpenStack.
I just checked and they don't set the value at all.

-- 
Andrea Bolognani / Red Hat / Virtualization