
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed. Thanks, Prerna On Tue, May 2, 2017 at 3:56 PM, Peter Krempa <pkrempa@redhat.com> wrote:
(Dropped invalid address from cc-list)
Hi all, On my host, I have been seeing instances of keepalive responses slow down intermittently when issuing bulk power offs. With some tips from Danpb on the channel, I was able to trace via systemtap that the main event loop would not run for about 6-9 seconds. This would stall keepalives and kill client connections.
I was able to trace it to the fact that qemuProcessHandleEvent() needed
On Tue, May 02, 2017 at 15:33:47 +0530, Prerna wrote: the
vm lock, and this was called from the main loop. I had hook scripts that slightly elongated the time the power off RPC completed and the subsequent keepalive delays were noticeable.
I filed a bug about this a while ago:
https://bugzilla.redhat.com/show_bug.cgi?id=1402921
I agree that the easiest solution is to unblock the Vm lock before hook scripts are activated. However, I was wondering why we contend on the per-Vm lock directly from the main loop at all ? Can we do this instead : have the main loop "park" events to a separate event queue, and then have a dedicated thread pool in the qemu driver pick these raw events and then try grabbing the per-vm lock for that VM ? That way, we can be sure that the main event loop is _never_ delayed irrespective of an RPC dragging on.
If this sounds reasonable I will be happy to post the driver rewrite patches to that end.
And this is the solution I planed to do. Note that in worst case you need to have one thread per VM (if all are busy), but note that the thread pool should not be needlesly large. Requests for a single VM need to be queued with the same thread obviously.