On Tue, Jan 28, 2014 at 8:05 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jan 23, 2014 at 03:06:16PM -0500, Adam Walters wrote:
> This patchset implements a tiered driver loading system. I split the hypervisor
> drivers out into their own tier, which is loaded after the other drivers. This
> has the net effect of ensuring that things like secrets, networks, etc., are
> initialized and auto-started before any hypervisors, such as QEMU, LXC, etc.
> This resolves the race condition currently present when starting libvirtd
> while domains are running, which happens when restarting libvirtd after having
> started at least one domain.

[snip]

> For anyone who is not familiar with the race condition I mention above, the
> basic description is that upon restarting libvirtd, any running QEMU domains
> using storage volumes are killed randomly due to their associated storage pool
> not yet being online. This is due to storage pool auto-start not having
> completed prior to QEMU initialization. In my prior testing, I found that this
> race condition affected at least one domain approximately 40% of the time. I
> sent this information to the mailing list back on 06DEC2013, if anyone is
> interested in going back and re-reading my description.

Ok, so the current sequence is /supposed/ to be

  1 StateInitialize
       1.1 StorageInitialize
            1.1.1 Load all configs
            1.1.2 Detect currently active pools
       1.2 QEMUInitialize
            1.2.1 Load all configs
            1.2.2 Detect currently active guests
  2 StateAutostart
       2.1 StorageAutostart
            2.1.1 Start any inactive pools marked autostart
       2.2 QEMUAutostart
            2.2.1 Start any inactive guests marked autostart


Looking at the storage driver code though, I see a flaw in that
1.1.2 does not actually exist. The checking for existing active
storage pools is only done in step 2.1.1 as part of autostart,
which as you say is clearly too late.

Also the checking for existing active storage pools looks like
it only works for storage pools which actually have some persistent
kernel / userspace state outside of libvirt. ie iSCSI pools will
remain active even when libvirt is not running, since the iSCSI
initiator is outside scope of libvirt.  If we have a RBD pool,
however, which is using librbd instead of the FUSE driver there's
no persistent state to detect. The issue here is that the storage
driver is not correctly keeping track of what storage pools were
active prior to restart. ie any active storage pools before restart,
should still be active after restart, *regardless* of the autostart
flag.

So I understand why your proposed split of driver loading works
to solve this, but even with this it only solves the startup
ordering problem. There's still the issue that we're reliant on
the autostart flag being set. If there's a guest with an RBD
volume against a pool that isn't set to autostart, we're still
doomed.

I hadn't really thought about that piece, as I always set my storage pools
to auto-start. I'm still not sure that a race condition would not still exist if
the storage driver continues to utilize a hard-coded connection to 'qemu:///',
though. I'm not particularly familiar enough with the libvirt code to know
exactly at which point the QEMU driver can accept a connection to its URI
and facilitate access to information about networks, secrets, etc.

Also note that I was also able to reproduce the race condition with a file-backed
storage pool, too. So even taking out the semi-exotic RBD piece, I think there
would still be problems if 1.1.2 were added into the code. I could certainly look
into implementing something to store state information for storage pools. I'll likely
look toward existing domain code for an idea of how to implement it.

That issue does sort of bleed into my other patchset that implements the
'config:///' URI, which we had some discussion over prior to the fixup and
resubmit.
 

The storage drivers are just plain broken wrt libvirt restarts
because they're not correctly recording their state and restoring
back to the exact same state after restart.

Daniel
--
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|