On Sat, Jul 26, 2014 at 03:47:09PM +0800, James wrote:
On 2014/7/25 18:07, Martin Kletzander wrote:
> On Fri, Jul 25, 2014 at 04:45:55PM +0800, James wrote:
>> There's a kind of situation that when libvirtd's under a lot of pressure,
just as we
>> start a lot of VMs at the same time, some libvirt APIs may take a lot of time to
return.
>> And this will block the up level job to be finished. Mostly we can't wait
forever, we
>> want a time out mechnism to help us out. When one API takes more than some time,
it can
>> return time out as a result, and do some rolling back.
>>
>> So my question is: do we have a plan to give a 'time out' solution or a
better solution
>> to fix this kind of problems in the future? And when?
>>
>
> Is it only because there are not enough workers available? If yes,
> then changing the limits in libvirtd.conf (both global and
> per-connection) might be the easiest way to go.
>
> Martin
That's very nice to receive your reply quickly.
The job pressure is just one point for time out mechnism. If something really bad
happened
just like a blocked bug which stops libvirt API returning, and it's very rare to
happen,
what can we do to assure the job not blocked by the blocked API?
It's like Process A call libvirt API b, but b never returns, A is blocked there
forever, so
what's the best for us to do?
As that is pretty rare case that cannot be dealt with inside the API
(since the API is the place where it gets locked), it has to be dealt
with outside it. I guess whatever you would do by hand is OK. If,
for example, you are used to restart libvirtd after the block is
detected, then restart it and try again. You can spawn another
process that will do it if you want some fine-grained control, or you
can use client (and server) -side keepalive to be automatically
disconnected in case the block happens inside the event loop (but it
won't catch it outside). I'm not sure how to answer more properly
since this is not libvirt-specific. If there's something
libvirt-specific I missed, let me know.
Martin