On 07/24/2017 04:06 AM, Peter Krempa wrote:
On Sat, Jul 22, 2017 at 04:55:49 -0400, Yi Wang wrote:
> Start a domain whilst undefine it, if starting failed duing ProcessLaunch,
> on which period qemu exited unexpectedly, the operation will lead to failure
> of undefine the domain until libvirtd restarted. The reason is that libvirtd
> will unlock vm during qemuProcessStart, qemuDomainUndefineFlags can get the
> lock and set vm->persistent 0 but not remove the "active" domain.
Shouldn't the startup code handle that? It definitely works when
starting a transient domain, so making it transient while the startup
code is executed should be the same case.
Since we copy the definition prior to startup, there really should not
be any problem in making the VM transient while it's being started.
FWIW:
This patch started as:
https://www.redhat.com/archives/libvir-list/2017-June/msg01081.html
I reviewed earlier this month:
https://www.redhat.com/archives/libvir-list/2017-July/msg00278.html
but responses to the initial review and a couple followups by the
submitter got unthreaded... A trail of a few breadcrumbs:
https://www.redhat.com/archives/libvir-list/2017-July/msg00387.html
https://www.redhat.com/archives/libvir-list/2017-July/msg00762.html
https://www.redhat.com/archives/libvir-list/2017-July/msg00864.html
In any case, the crux of the issue is that during startup the domain obj
lock is released right around the time we go to start the monitor (and
the vmagent) - an extra reference is taken to avoid obj deletion. In
another thread there's an Undefine run. That thread gets the lock, finds
a persistent domain and then clears that persistent bit at the end,
calls RemoveInactive. By that time, the domain startup thread gets the
obj lock back. The Undefine running caused some issues, causing the
start to go into it's failure path, but that fails to remove the domain.
The command used to start/undefine buried in one response is
virsh start win7 &; sleep 0.2; virsh undefine win7
John