On Fri, Jun 17, 2011 at 10:55:43 +0100, Daniel P. Berrange wrote:
On Thu, Jun 16, 2011 at 04:03:36PM -0400, Dave Allan wrote:
> Dan, can you suggest some possible strategies here? I don't have a
> strong opinion on the implementation, although I agree with your
> concern about spawning unlimited numbers of threads.
As I mentioned, we need to make the QEMU monitor timeout after some
period of time waiting, and ensure that the monitor for that VM cannot
be used thereafter.
I'm not sure that's the best way to deal with this either. I hate this kind of
timeouts since I worked on Xen :-) The problem with this timeout is that no
matter how big the timeout is, it is usually pretty easy to get into a
situation when the timeout is not big enough. If anything in the system goes
crazy (easiest is just causing lots of disk writes) the monitor command times
out and you cannot do nothing with the domain except for destroying it (or
shutting it down from inside) even though you fixed the issue and the system
returns back to normal operation.
Another issue is that the threads don't have to be stuck in QEMU monitor after
all, they can be doing migration, for example. Let's say you one client
connects to libvirtd and starts 5 migrations. Then 15 other clients connect
and each issues 1 additional migration. So we have 16 clients connected and
all 20 threads consumed. So even though a new client can connect to libvirtd
it can't do anything (not even cancel the migrations) since no worker is free.
I know this is not a probable scenario but I just wanted to show that we need
to think about more possibilities how libvirtd can become unresponsive.
I'm afraid we won't find any perfect solution and we'll just need to take the
one that we think sucks less.
Jirka