
On Mon, Oct 28, 2013 at 09:17:16AM -0400, John Ferlan wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest
s/QMEU/QEMU
2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
s/guets/guests
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
The fix seems reasonable, although I immediately wondered why "at some time" it was considered OK to reconnect before being persistent flag was set for inactive guests. The condition/fault initially described includes a host reboot in the processing. In that case (I would assume) the restart of the guest would occur if autostart was set. The external action of the kill()'ing of a guest outside of libvirtd's control results in some unknown/unpredictable state for the guest. Is there something in that initial load that could detect this condition better? I tried following the steps without the patch, but on my host the guest was listed after the restart - so yes a timing condition - but what causes that timing condition.
Would setting the dom->persistent before the virObjectUnlock(dom) in virDomainObjListLoadAllConfigs() change the results?
No, I tried that and it didn't help. None the less that should be fixed separately.
Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll() "together" after the loading of the inactive persistent configs seems to keep code flow more normal. Whether that comes before or after the Snapshot/ManagedSave load is I suppose just an "implementation detail".
Also, other drivers follow the load running, reconnect, and load inactive/persistent configs. Should those have similar patches as well?
Historically it was safe because reconnecting to QEMU was serialized. At some point we switched to using a separate thread for reconnecting to QEMU which introduced a race. The other drivers are still serialized IIUC. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|