On Thu, Jan 23, 2014 at 03:06:16PM -0500, Adam Walters wrote:
This patchset implements a tiered driver loading system. I split the
hypervisor
drivers out into their own tier, which is loaded after the other drivers. This
has the net effect of ensuring that things like secrets, networks, etc., are
initialized and auto-started before any hypervisors, such as QEMU, LXC, etc.
This resolves the race condition currently present when starting libvirtd
while domains are running, which happens when restarting libvirtd after having
started at least one domain.
[snip]
For anyone who is not familiar with the race condition I mention
above, the
basic description is that upon restarting libvirtd, any running QEMU domains
using storage volumes are killed randomly due to their associated storage pool
not yet being online. This is due to storage pool auto-start not having
completed prior to QEMU initialization. In my prior testing, I found that this
race condition affected at least one domain approximately 40% of the time. I
sent this information to the mailing list back on 06DEC2013, if anyone is
interested in going back and re-reading my description.
Ok, so the current sequence is /supposed/ to be
1 StateInitialize
1.1 StorageInitialize
1.1.1 Load all configs
1.1.2 Detect currently active pools
1.2 QEMUInitialize
1.2.1 Load all configs
1.2.2 Detect currently active guests
2 StateAutostart
2.1 StorageAutostart
2.1.1 Start any inactive pools marked autostart
2.2 QEMUAutostart
2.2.1 Start any inactive guests marked autostart
Looking at the storage driver code though, I see a flaw in that
1.1.2 does not actually exist. The checking for existing active
storage pools is only done in step 2.1.1 as part of autostart,
which as you say is clearly too late.
Also the checking for existing active storage pools looks like
it only works for storage pools which actually have some persistent
kernel / userspace state outside of libvirt. ie iSCSI pools will
remain active even when libvirt is not running, since the iSCSI
initiator is outside scope of libvirt. If we have a RBD pool,
however, which is using librbd instead of the FUSE driver there's
no persistent state to detect. The issue here is that the storage
driver is not correctly keeping track of what storage pools were
active prior to restart. ie any active storage pools before restart,
should still be active after restart, *regardless* of the autostart
flag.
So I understand why your proposed split of driver loading works
to solve this, but even with this it only solves the startup
ordering problem. There's still the issue that we're reliant on
the autostart flag being set. If there's a guest with an RBD
volume against a pool that isn't set to autostart, we're still
doomed.
The storage drivers are just plain broken wrt libvirt restarts
because they're not correctly recording their state and restoring
back to the exact same state after restart.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|