[libvirt] Symptoms of main loop slowing down in libvirtd

Hi all, On my host, I have been seeing instances of keepalive responses slow down intermittently when issuing bulk power offs. With some tips from Danpb on the channel, I was able to trace via systemtap that the main event loop would not run for about 6-9 seconds. This would stall keepalives and kill client connections. I was able to trace it to the fact that qemuProcessHandleEvent() needed the vm lock, and this was called from the main loop. I had hook scripts that slightly elongated the time the power off RPC completed and the subsequent keepalive delays were noticeable. I agree that the easiest solution is to unblock the Vm lock before hook scripts are activated. However, I was wondering why we contend on the per-Vm lock directly from the main loop at all ? Can we do this instead : have the main loop "park" events to a separate event queue, and then have a dedicated thread pool in the qemu driver pick these raw events and then try grabbing the per-vm lock for that VM ? That way, we can be sure that the main event loop is _never_ delayed irrespective of an RPC dragging on. If this sounds reasonable I will be happy to post the driver rewrite patches to that end. Regards, Prerna

(Dropped invalid address from cc-list) On Tue, May 02, 2017 at 15:33:47 +0530, Prerna wrote:
Hi all, On my host, I have been seeing instances of keepalive responses slow down intermittently when issuing bulk power offs. With some tips from Danpb on the channel, I was able to trace via systemtap that the main event loop would not run for about 6-9 seconds. This would stall keepalives and kill client connections.
I was able to trace it to the fact that qemuProcessHandleEvent() needed the vm lock, and this was called from the main loop. I had hook scripts that slightly elongated the time the power off RPC completed and the subsequent keepalive delays were noticeable.
I filed a bug about this a while ago: https://bugzilla.redhat.com/show_bug.cgi?id=1402921
I agree that the easiest solution is to unblock the Vm lock before hook scripts are activated. However, I was wondering why we contend on the per-Vm lock directly from the main loop at all ? Can we do this instead : have the main loop "park" events to a separate event queue, and then have a dedicated thread pool in the qemu driver pick these raw events and then try grabbing the per-vm lock for that VM ? That way, we can be sure that the main event loop is _never_ delayed irrespective of an RPC dragging on.
If this sounds reasonable I will be happy to post the driver rewrite patches to that end.
And this is the solution I planed to do. Note that in worst case you need to have one thread per VM (if all are busy), but note that the thread pool should not be needlesly large. Requests for a single VM need to be queued with the same thread obviously.

Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed. Thanks, Prerna On Tue, May 2, 2017 at 3:56 PM, Peter Krempa <pkrempa@redhat.com> wrote:
(Dropped invalid address from cc-list)
Hi all, On my host, I have been seeing instances of keepalive responses slow down intermittently when issuing bulk power offs. With some tips from Danpb on the channel, I was able to trace via systemtap that the main event loop would not run for about 6-9 seconds. This would stall keepalives and kill client connections.
I was able to trace it to the fact that qemuProcessHandleEvent() needed
On Tue, May 02, 2017 at 15:33:47 +0530, Prerna wrote: the
vm lock, and this was called from the main loop. I had hook scripts that slightly elongated the time the power off RPC completed and the subsequent keepalive delays were noticeable.
I filed a bug about this a while ago:
https://bugzilla.redhat.com/show_bug.cgi?id=1402921
I agree that the easiest solution is to unblock the Vm lock before hook scripts are activated. However, I was wondering why we contend on the per-Vm lock directly from the main loop at all ? Can we do this instead : have the main loop "park" events to a separate event queue, and then have a dedicated thread pool in the qemu driver pick these raw events and then try grabbing the per-vm lock for that VM ? That way, we can be sure that the main event loop is _never_ delayed irrespective of an RPC dragging on.
If this sounds reasonable I will be happy to post the driver rewrite patches to that end.
And this is the solution I planed to do. Note that in worst case you need to have one thread per VM (if all are busy), but note that the thread pool should not be needlesly large. Requests for a single VM need to be queued with the same thread obviously.

On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote: [please don't top-post on technical lists]
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed.
That is wrong, since you can't guarantee that it will not lock up. Since the workers handling monitor events tend to call monitor commands themselves it's possible that it will get stuck due to unresponsive qemu. Without having a worst-case-scenario of a thread per VM you can't guarantee that the pool won't be depleted. If you want to fix this properly, you'll need a dynamic pool.

On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
[please don't top-post on technical lists]
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed.
That is wrong, since you can't guarantee that it will not lock up. Since the workers handling monitor events tend to call monitor commands themselves it's possible that it will get stuck due to unresponsive qemu. Without having a worst-case-scenario of a thread per VM you can't guarantee that the pool won't be depleted.
Once a worker thread "picks" an event, it will contend on the per-VM lock for that VM. Consequently, the handling for that event will be delayed until an existing RPC call for that VM completes.
If you want to fix this properly, you'll need a dynamic pool.
To improve the efficiency of the thread pool, we can try contending for a VM's lock for a specific time, say, N seconds, and then relinquish the lock. The same thread in the pool can then move on to process events of the next VM. Note that this needs all VMs to be hashed to a constant number of threads in the pool, say 5. This ensures that each worker thread has a unique , non-overlapping set of VMs to work with. As an example, {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker thread. If this particular worker thread cannot find the requisite VM's lock, it will move on to the event list for the next VM and so on. The use of pthread_trylock() ensures that the worker thread will never be stuck forever.

On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote:
On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
[please don't top-post on technical lists]
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed.
That is wrong, since you can't guarantee that it will not lock up. Since the workers handling monitor events tend to call monitor commands themselves it's possible that it will get stuck due to unresponsive qemu. Without having a worst-case-scenario of a thread per VM you can't guarantee that the pool won't be depleted.
Once a worker thread "picks" an event, it will contend on the per-VM lock for that VM. Consequently, the handling for that event will be delayed until an existing RPC call for that VM completes.
If you want to fix this properly, you'll need a dynamic pool.
To improve the efficiency of the thread pool, we can try contending for a VM's lock for a specific time, say, N seconds, and then relinquish the lock. The same thread in the pool can then move on to process events of the next VM.
This would unnecessarily delay events which are not locked.
Note that this needs all VMs to be hashed to a constant number of threads in the pool, say 5. This ensures that each worker thread has a unique , non-overlapping set of VMs to work with.
How would this help?
As an example, {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker thread. If this particular worker thread cannot find the requisite VM's lock, it will move on to the event list for the next VM and so on. The use of pthread_trylock() ensures that the worker thread will never be stuck forever.
No, I think this isn't the right approach at all. You could end up having all VM's handled with one thread, with others being idle. I think the right approach will be to have a dynamic pool, which will handle incomming events. In case when two events for a single VM should be handled in parallel, the same thread should pick them up in order they arrived. In that way, you will have at most a thread per VM, while normally you will have only one.

On Tue, May 2, 2017 at 4:27 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
[please don't top-post on technical lists]
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed.
That is wrong, since you can't guarantee that it will not lock up. Since the workers handling monitor events tend to call monitor commands themselves it's possible that it will get stuck due to unresponsive qemu. Without having a worst-case-scenario of a thread per VM you can't guarantee that the pool won't be depleted.
Once a worker thread "picks" an event, it will contend on the per-VM lock for that VM. Consequently, the handling for that event will be delayed until an existing RPC call for that VM completes.
If you want to fix this properly, you'll need a dynamic pool.
To improve the efficiency of the thread pool, we can try contending for a VM's lock for a specific time, say, N seconds, and then relinquish the lock. The same thread in the pool can then move on to process events of
On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote: the
next VM.
This would unnecessarily delay events which are not locked.
Note that this needs all VMs to be hashed to a constant number of threads in the pool, say 5. This ensures that each worker thread has a unique , non-overlapping set of VMs to work with.
How would this help?
As an example, {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker thread. If this particular worker thread cannot find the requisite VM's lock, it will move on to the event list for the next VM and so on. The use of pthread_trylock() ensures that the worker thread will never be stuck forever.
No, I think this isn't the right approach at all. You could end up having all VM's handled with one thread, with others being idle. I think the right approach will be to have a dynamic pool, which will handle incomming events. In case when two events for a single VM should be handled in parallel, the same thread should pick them up in order they arrived. In that way, you will have at most a thread per VM, while normally you will have only one.
I agree that dynamic threadpool is helpful when there are events from distinct VMs that need to be processed at the same time. But I am also concerned about efficiently using the threads in this pool. If we have a few threads only contend on per-VM locks until the RPCs for that VM complete, it is not a very efficient use of resources. I would rather have this thread drop handling of this VM's events and do something useful while it is unable to grab this VM's lock. This is the reason I wanted to load-balance incoming events by VM IDs and hash them onto distinct threads. The idea was that a pthread always has something else to take up if the current Vm's lock is not available. Would you have some suggestions on improving the efficacy of the thread pool as a whole ?

On Tue, May 02, 2017 at 04:42:11PM +0530, Prerna wrote:
On Tue, May 2, 2017 at 4:27 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
[please don't top-post on technical lists]
Thanks for the quick response Peter ! This ratifies the basic approach I had in mind. It needs some (not-so-small) cleanup of the qemu driver code, and I have already started cleaning up some of it. I am planning to have a constant number of event handler threads to start with. I'll try adding this as a configurable parameter in qemu.conf once basic functionality is completed.
That is wrong, since you can't guarantee that it will not lock up. Since the workers handling monitor events tend to call monitor commands themselves it's possible that it will get stuck due to unresponsive qemu. Without having a worst-case-scenario of a thread per VM you can't guarantee that the pool won't be depleted.
Once a worker thread "picks" an event, it will contend on the per-VM lock for that VM. Consequently, the handling for that event will be delayed until an existing RPC call for that VM completes.
If you want to fix this properly, you'll need a dynamic pool.
To improve the efficiency of the thread pool, we can try contending for a VM's lock for a specific time, say, N seconds, and then relinquish the lock. The same thread in the pool can then move on to process events of
On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote: the
next VM.
This would unnecessarily delay events which are not locked.
Note that this needs all VMs to be hashed to a constant number of threads in the pool, say 5. This ensures that each worker thread has a unique , non-overlapping set of VMs to work with.
How would this help?
As an example, {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker thread. If this particular worker thread cannot find the requisite VM's lock, it will move on to the event list for the next VM and so on. The use of pthread_trylock() ensures that the worker thread will never be stuck forever.
No, I think this isn't the right approach at all. You could end up having all VM's handled with one thread, with others being idle. I think the right approach will be to have a dynamic pool, which will handle incomming events. In case when two events for a single VM should be handled in parallel, the same thread should pick them up in order they arrived. In that way, you will have at most a thread per VM, while normally you will have only one.
I agree that dynamic threadpool is helpful when there are events from distinct VMs that need to be processed at the same time. But I am also concerned about efficiently using the threads in this pool. If we have a few threads only contend on per-VM locks until the RPCs for that VM complete, it is not a very efficient use of resources. I would rather have this thread drop handling of this VM's events and do something useful while it is unable to grab this VM's lock. This is the reason I wanted to load-balance incoming events by VM IDs and hash them onto distinct threads. The idea was that a pthread always has something else to take up if the current Vm's lock is not available. Would you have some suggestions on improving the efficacy of the thread pool as a whole ?
Hashing VMs onto threads is really overkill and is also not providing any kind of fairness. ie, if one VM gets stuck for a long time, any other VM hashed onto the same thread will get delayed longer than VMs whose events occurr much later. If we want fewer threads than VMs, then we should handle events FIFO for fairness. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrange
-
Peter Krempa
-
Prerna