On 06/25/2017 10:56 PM, Yi Wang wrote:
Libvirt forgets to remove inactive vm when failed to start a defined
vm.
That may result in residual domain in driver->domains on such condition:
during the process of starting a vm, undefine it, and qemu exit because
of some exception. As we undefined the vm successfully, the vm->persistent
was set to 0, we will always fail to undefine it until restart libvirtd.
This seems to be a strange sequence of operations, but the claim is that
by adding this logic to CreateWithFlags, then the problem you're facing
is resolved. However, is adding this to the Create logic the right thing
to do?
IIUC:
CreateXMLWithFlags is successful
Undefine domain is successful
-> Logic at top says, "if !vm->persistent", then fail
-> Logic at bottom has:
-> vm->persistent = 0
-> if !virDomainObjIsActive then qemuDomainRemoveInactive
... Thus if domain is active, expectation is that RemoveInactive is
done when the domain is "normally" shutdown...
... time goes by...
QEMU exits w/ some exception
-> qemuDomainDestroyFlags isn't run
-> ? Thus avoiding some libvirt logic to handle the exception (would
that be qemuProcessAutoDestroyAdd assuming VIR_DOMAIN_START_AUTODESTROY
was used) ?
So your fix is, the next time a CreateWithFlags happens, if anything
fails, then domain removal can be done with the assumption that
something within ObjStart will fail eventually because Undefine deleted
most traces of the guest.
Of course if libvirtd restart happens, then because the config no longer
exists the domain object isn't created.
I wonder if perhaps (yuck) another flag is required to indicate that a
domain went from persistent to !persistent in Undefine, but because it
was still running the RemoveInactive wasn't called.
The interesting part of that is how would the reaping happen. Seems this
would be part of some qemuProcessHandle* function. Something that would
be resetting vm->def->id and/or running qemuProcessStop (e.g. processing
via AutoDestroy)
Some more details and understanding why/what qemuProcessHandle* may (or
may not be) called I think is necessary.
It doesn't seem to me that domain removal for this condition should
require going through Start processing, but something in that monitor
event processing logic would seem to need to be taught about this
possibility.
Signed-off-by: Yi Wang <wang.yi59(a)zte.com.cn>
---
src/qemu/qemu_driver.c | 2 ++
1 file changed, 2 insertions(+)
There's two reasons to get to the endjob: label...
#1 virDomainObjIsActive (domain already active)
#2 qemuDomainObjStart fails
So this certainly wouldn't be right for the IsActive condition, under
"normal" circumstances. And again, I don't think requiring a subsequent
call to Start would be "correct" either. I am curious though if when in
this condition, is the vm->def->id set? IOW: What does 'virsh list'
indicate or even virsh dominfo $domain?
John
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index cdb727b..af8afab 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -7185,6 +7185,8 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned int flags)
endjob:
qemuProcessEndJob(driver, vm);
+ if (ret < 0)
+ qemuDomainRemoveInactive(driver, vm);
cleanup:
virDomainObjEndAPI(&vm);