On Tue, May 2, 2017 at 4:27 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote:
> On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa@redhat.com> wrote:
>
> > On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
> >
> > [please don't top-post on technical lists]
> >
> > > Thanks for the quick response Peter !
> > > This ratifies the basic approach I had in mind.
> > > It needs some (not-so-small) cleanup of the qemu driver code, and I have
> > > already started cleaning up some of it. I am planning to have a constant
> > > number of event handler threads to start with. I'll try adding this as a
> > > configurable parameter in qemu.conf once basic functionality is
> > completed.
> >
> > That is wrong, since you can't guarantee that it will not lock up. Since
> > the workers handling monitor events tend to call monitor commands
> > themselves it's possible that it will get stuck due to unresponsive
> > qemu. Without having a worst-case-scenario of a thread per VM you can't
> > guarantee that the pool won't be depleted.
> >
>
> Once a worker thread "picks" an event, it will contend on the per-VM lock
> for that VM. Consequently, the handling for that event will be delayed
> until an existing RPC call for that VM completes.
>
>
> >
> > If you want to fix this properly, you'll need a dynamic pool.
> >
>
> To improve the efficiency of the thread pool, we can try contending for a
> VM's lock for a specific time, say, N seconds, and then relinquish the
> lock. The same thread in the pool can then move on to process events of the
> next VM.

This would unnecessarily delay events which are not locked.

> Note that this needs all VMs to be hashed to a constant number of threads
> in the pool, say 5. This ensures that each worker thread has a unique ,
> non-overlapping set of VMs to work with.

How would this help?

> As an example,  {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker
> thread. If this particular worker thread cannot find the requisite VM's
> lock, it will move on to the event list for the next VM and so on. The use
> of pthread_trylock() ensures that the worker thread will never be stuck
> forever.

No, I think this isn't the right approach at all. You could end up
having all VM's handled with one thread, with others being idle. I think
the right approach will be to have a dynamic pool, which will handle
incomming events. In case when two events for a single VM should be
handled in parallel, the same thread should pick them up in order they
arrived. In that way, you will have at most a thread per VM, while
normally you will have only one.

I agree that dynamic threadpool is helpful when there are events from distinct VMs that need to be processed at the same time. 
But I am also concerned about efficiently using the threads in this pool. If we have a few threads only contend on per-VM locks until the RPCs for that VM complete, it is not a very efficient use of resources. I would rather have this thread drop handling of this VM's events and do something useful while it is unable to grab this VM's lock. 
This is the reason I wanted to load-balance incoming events by VM IDs and hash them onto distinct threads. The idea was that a pthread always has something else to take up if the current Vm's lock is not available. Would you have some suggestions on improving the efficacy of the thread pool as a whole ?