On Tue, Aug 05, 2014 at 03:15:18PM +0800, James wrote:
In fact, to deal with this kind of situation, we add some timeout
codes in libvirtd, during remote_dispatch process.
The mechanism is like this:
1. when we call an API, we start a thread to do the timer, when time out, the timer set a
timeout flag to the API,
and return timeout result to the libvirt client.
2. when the API return to remote_dispatch level, it checkout the timeout flag to consider
what to do next.
If timeout, we do some rollback action. It's like detach device, if we attach
device at first.
In this solution, there's something trouble, first, we have to figure out suitable
rollback actions. Second, I'm
not sure it's the best way to solve this kind of block problem, not so elegant.
How do you think about it?
I'm not sure what do you want to know. Yes, there are problems like
"what rollback actions to do", which would depend on where the call
got stuck and "what's the timeout that should be set", which depends
on thousands of factors. I can't think of any elegant solution that
would prevent locking properly. Mainly because this is literally the
Halting problem [1] plus a bit more.
I'd say that whatever works for you in this situation is OK, but will
(most probably) work only for your particular scenario.
Martin
[1]
https://en.wikipedia.org/wiki/Halting_problem