On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange(a)redhat.com>
The following sequence
1. Define a persistent QMEU guest
s/QMEU/QEMU
2. Start the QEMU guest
3. Stop libvirtd
4. Kill the QEMU process
5. Start libvirtd
6. List persistent guets
s/guets/guests
At the last step, the previously running persistent guest
will be missing. This is because of a race condition in the
QEMU driver startup code. It does
1. Load all VM state files
2. Spawn thread to reconnect to each VM
3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get
marked as "persistent". There is therefore a window where
the thread reconnecting to the VM will remove the persistent
VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
src/qemu/qemu_driver.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
The fix seems reasonable, although I immediately wondered why "at some
time" it was considered OK to reconnect before being persistent flag was
set for inactive guests. The condition/fault initially described
includes a host reboot in the processing. In that case (I would assume)
the restart of the guest would occur if autostart was set. The external
action of the kill()'ing of a guest outside of libvirtd's control
results in some unknown/unpredictable state for the guest. Is there
something in that initial load that could detect this condition better?
I tried following the steps without the patch, but on my host the guest
was listed after the restart - so yes a timing condition - but what
causes that timing condition.
Would setting the dom->persistent before the virObjectUnlock(dom) in
virDomainObjListLoadAllConfigs() change the results?
Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll()
"together" after the loading of the inactive persistent configs seems to
keep code flow more normal. Whether that comes before or after the
Snapshot/ManagedSave load is I suppose just an "implementation detail".
Also, other drivers follow the load running, reconnect, and load
inactive/persistent configs. Should those have similar patches as well?
John
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index c613967..9c3daad 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver);
-
/* Then inactive persistent configs */
if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
cfg->configDir,
@@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged,
NULL, NULL) < 0)
goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains,
qemuDomainSnapshotLoad,