On 05/05/2011 08:51 PM, Eric Blake wrote:
On 05/05/2011 02:29 AM, Michal Privoznik wrote:
> When we try to stop domain's CPUs and qemu crash meanwhile, a cleanup
> of domain is performed and domain gets into state VIR_DOMAIN_SHUTOFF.
> But this is later overwritten and thus results in domain marked as
> running but with negative ID.
> ---
> src/qemu/qemu_process.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 7691cbe..539e4f3 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -1919,7 +1919,7 @@ int qemuProcessStopCPUs(struct qemud_driver *driver,
virDomainObjPtr vm)
> qemuDomainObjEnterMonitorWithDriver(driver, vm);
> ret = qemuMonitorStopCPUs(priv->mon);
> qemuDomainObjExitMonitorWithDriver(driver, vm);
> - if (ret< 0) {
> + if (ret< 0&& vm->state == VIR_DOMAIN_PAUSED) {
Interesting idea. But I think it is subsumed by the larger patch to
make qemuDomainObjExitMonitorWithDriver return a warn-unused-result
status that states whether the vm is still active at the point when the
lock was regained. That is, if we apply that patch, then this patch
might not be needed.
https://www.redhat.com/archives/libvir-list/2011-April/msg00689.html
I guess I still need to review that one.
Well, i took a look at that patch. Looks nice but it is not exactly what
we need to prevent mine case here.
Problem is in qemuProcessStopCPUs function itself. From Wen's patch:
@@ -1908,7 +1916,8 @@ int qemuProcessStopCPUs(struct qemud_driver
*driver, virDomainObjPtr vm)
vm->state = VIR_DOMAIN_PAUSED;
qemuDomainObjEnterMonitorWithDriver(driver, vm);
ret = qemuMonitorStopCPUs(priv->mon);
- qemuDomainObjExitMonitorWithDriver(driver, vm);
+ if (qemuDomainObjExitMonitorWithDriver(driver, vm) < 0)
+ ret = -1;
if (ret < 0) {
vm->state = oldState;
}
So what migh happen - we want to stop CPUs for running domain. We store
VIR_DOMAIN_RUNNING into oldState (this is not seen in the snippet, but
anyway), mark domain paused and start monitor stuff. QEMU dies, we
detect EOF at monitor and free the domain, which implies settin
vm->state = VIR_DOMAIN_SHUTOFF. qemuDomainObjExitMonitorWithDriver
returns -1; so we store -1 into ret. And here it comes: we check ret for
< 0, which evaluates true, and overwrite vm->state, which we do not want.
So yes - Wen's patch do it's work, but i believe this is a slightly
different case.
Michal