Re: [libvirt] Symptoms of main loop slowing down in libvirtd

Tuesday, 2 May 2017

(Dropped invalid address from cc-list)

On Tue, May 02, 2017 at 15:33:47 +0530, Prerna wrote:
...
 Hi all,
 On my host, I have been seeing instances of keepalive responses slow down
 intermittently when issuing bulk power offs.
 With some tips from Danpb on the channel, I was able to trace via systemtap
 that the main event loop would not run for about 6-9 seconds. This would
 stall keepalives and kill client connections.

 I was able to trace it to the fact that qemuProcessHandleEvent() needed the
 vm lock, and this was called from the main loop. I had hook scripts that
 slightly elongated the time the power off RPC completed and the subsequent
 keepalive delays were noticeable. 
I filed a bug about this a while ago:

https://bugzilla.redhat.com/show_bug.cgi?id=1402921

...
 I agree that the easiest solution is to unblock the Vm lock before
hook
 scripts are activated.
 However, I was wondering why we contend on the per-Vm lock directly from
 the main loop at all ? Can we do this instead : have the main loop "park"
 events to a separate event queue, and then have a dedicated thread pool in
 the qemu driver pick these raw events and then try grabbing the per-vm lock
 for that VM ?
 That way, we can be sure that the main event loop is _never_ delayed
 irrespective of an RPC dragging on.

 If this sounds reasonable I will be happy to post the driver rewrite
 patches to that end. 
And this is the solution I planed to do. Note that in worst case you
need to have one thread per VM (if all are busy), but note that the
thread pool should not be needlesly large. Requests for a single
VM need to be queued with the same thread obviously.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Symptoms of main loop slowing down in libvirtd