This patchset implements a tiered driver loading system. I split the hypervisor
drivers out into their own tier, which is loaded after the other drivers. This
has the net effect of ensuring that things like secrets, networks, etc., are
initialized and auto-started before any hypervisors, such as QEMU, LXC, etc.
This resolves the race condition currently present when starting libvirtd
while domains are running, which happens when restarting libvirtd after having
started at least one domain.
This patch will work without my config driver patchset, which is about to be
submitted, as well. Without the config driver patchset, however, RBD storage
pools using CephX authentication can not be auto-started due to a circular
dependency between the QEMU and storage drivers. This may also affect other
storage backends, but I currently only have the capacity to test with RBD and
file backed storage pools.
The reason this interferes with RBD storage pools is that currently, the
storage driver has a hard-coded connection to QEMU in order to look up secrets.
After this patchset, the QEMU driver will not be loaded until after the storage
driver has completed its initialization and auto-start routines, which causes
issues looking up secrets. Any pool type that does not use or need data from
outside of the base storage pool definition should continue to auto-start
along with no longer being affected by the current race condition. I have
verified that file-based storage pools definitely auto-start fine after this
patchset, and no longer have any issue with the current race condition.
For anyone who is not familiar with the race condition I mention above, the
basic description is that upon restarting libvirtd, any running QEMU domains
using storage volumes are killed randomly due to their associated storage pool
not yet being online. This is due to storage pool auto-start not having
completed prior to QEMU initialization. In my prior testing, I found that this
race condition affected at least one domain approximately 40% of the time. I
sent this information to the mailing list back on 06DEC2013, if anyone is
interested in going back and re-reading my description.
I would appreciate any comments and suggestions about this patchset. It works
for me on 4 machines running three different distros of Linux (Archlinux,
Gentoo, and CentOS), so I would imagine that it should work most anywhere.
Adam Walters (2):
driver: Implement new state driver field
libvirt: Implement tiered driver loading
src/check-driverimpls.pl | 1 +
src/driver.h | 7 +++++
src/interface/interface_backend_netcf.c | 1 +
src/libvirt.c | 45 ++++++++++++++++++++-------------
src/libxl/libxl_driver.c | 1 +
src/lxc/lxc_driver.c | 1 +
src/network/bridge_driver.c | 1 +
src/node_device/node_device_hal.c | 1 +
src/node_device/node_device_udev.c | 1 +
src/nwfilter/nwfilter_driver.c | 1 +
src/qemu/qemu_driver.c | 1 +
src/remote/remote_driver.c | 1 +
src/secret/secret_driver.c | 1 +
src/storage/storage_driver.c | 1 +
src/uml/uml_driver.c | 1 +
src/xen/xen_driver.c | 1 +
16 files changed, 48 insertions(+), 18 deletions(-)
--
1.8.5.2