On Mon, Jan 20, 2014 at 11:27 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Fri, Dec 20, 2013 at 11:03:53PM -0500, Adam Walters wrote:
> This patchset adds a driver named 'config' that allows access to configuration
> data, such as secret and storage definitions. This is a pre-requisite for my
> next patchset which resolves the race condition on libvirtd startup and the
> circular dependencies between QEMU and the storage driver.

I vaguely recall something being mentioned in the past, but not the
details. Can you explain the details of the circular dependancy
problem that you're currently facing ?

Peter's description is nearly complete. I have actually reproduced the issue even with file-backed volumes.
Only when using the disk type of volume, though. There had been a thread on the list late last year where I
brought up the problem, but the discussion sort of died off. I then worked on the patches here to resolve the
issue. This certainly isn't the only method of resolution, but merely the best one that I could think of and implement
on my own.

In more detail, there are actually two problems (and thus the two patchsets I submitted) that I discovered in libvirt.

The first is the unpredictable driver load order.

In this issue, all of the VMs I ran (all were QEMU-based, using either RBD- or file-backed storage pool volumes) were
randomly restarted due to a race condition in libvirt. QEMU and the storage driver are both starting simultaneously.
If QEMU manages to start before storage, the domains get restarted due to the storage pool being offline (they are
all offline initially). The patchset that changes the libvirt driver loading code resolves this particular issue. Once I fixed
this locally, I ran into the second problem below.

The second issue is the circular dependency problem. This issue is only really visible once the driver load race condition is resolved.

In this issue, the storage driver has a hard-coded connection to QEMU, as Peter mentioned in his reply. This is the only problem
that the config driver actually resolves. Without this (or a similar patch to fix the inter-dependency between QEMU and storage),
an errata would be needed to note that using secrets with storage pools is not to be used in production due to known issues. If I
remember correctly, RBD-backed pools used to require authentication. If that has not been resolved (and I will be honest that I
have not checked in a while since I use cephx authentication), then no RBD pools should be used for the time being, unless the
user is okay with their VMs being restarted anytime libvirt restarts. It should be a rare event, but painful when not expected.

My third patchset was merely a resubmit of my code to add support for RBD storage pools to the domain XML. Currently, you are
able to define a RBD pool, but not actually use it. I didn't like having to duplicate my cephx username, key, and the ceph monitor
hosts in all of my domain's XML, so I wrote the patch to fix it. I resubmitted it due to the length of time that had passed. The only
change was that it had been rebased to the latest libvirt version.
 

> The basic rationale behind this idea is that there exist circumstances under
> which a driver may need to access things such as secrets during a time at
> which there is no active connection to a hypervisor. Without a connection,
> the data can't be accessed currently. I felt that this was a much simpler
> solution to the problem that building new APIs that do not require a connection
> to operate.

We have a handful of places in our code where we call out to public
APIs to implement some piece of functionality. I've never been all
that happy about these scenarios. If we call the public API directly,
they cause havoc with error reporting because we end up dispatching
and resetting errors in the middle of an nested API call. Calling the
driver methods directly by doing conn->driver->invokeMethod() is
somewhat tedious & error prone because we're then skipping various
sanity tests that the public APIs do.  With ACL checking now implemented
we also have the slightly odd situation that a public API check which
is documented as requiring permission 'xxxx' may also require permissions
'yyyy' and 'zzzz' to deal with other public APIs we invoke secretly.

I think there is a fairly strong argument that our internal implementations
could be decoupled from the public APIs, so we can call methods internally
without having to go via the public APIs at all.

On the flip side though, there's also a long term desire to separate
the different drivers into separate daemons, eg so the secrets driver
might move into a  libvirt-secretsd daemon, which might explicitly
require use to go via the public APIs.


If there were an internal API that did not require a connection, that would probably be a better solution, at least as long
as libvirt stays one monolithic process, anyway. In the specific problem I have run in to, the conn variable is NULL, due to
no connection existing (or being able to exist). In this case, trying conn->driver->invokeMethod() would fail unless there is
some magic in the code I'm not aware of.
 
> This driver is technically what one may call a hypervisor driver, but it
> does not implement any domain operations. It simply exists to handle requests
> by drivers for access to informatino that would otherwise require a connection.
> The URI used for this driver is 'config:///' and has been tested working on 4
> different machines of mine, running three different distributions of Linux
> (Archlinux, Gentoo, and CentOS). Being a very simple driver, I would expect
> it to work pretty much anywhere.
>
> I would love to hear any comments and suggestions you may have about this
> driver. At the very least this plus my next patchset resolves the startup
> race condition on my machine. If a more robust setup (likely a new internal
> API) is in the works, this driver could act as a band-aid to allow access
> to this type of data in the interim if a better resolution is a ways off.

One big concern I have about this approach is the fact that once we add
this 'config:///' driver, it is very hard for us to ever remove it again,
since this concept leaks out into the public API. So we're going to want
to be very sure that this is something we wish to support long term, and
I don't really have such confidence myself.

I'd like to understand a bit more about the dependancy issue you're facing
to see if there's something else we can do to address it.

I agree that adding a new connection driver would make it difficult to remove later. In fact, as I (think I) mentioned in my
patch descriptions, adding the config driver is not required to resolve the race condition. It is (or some similar fix to the
circular dependency between storage and qemu), however, needed in order to allow storage volumes (of any kind) to use
secrets on running VMs during a libvirt restart. Without a fix for that issue, any VM using a volume with a secret will be killed
and restarted upon a libvirt restart. In fact, if the driver load order is resolved, but the circular dependency is not, the restart
no longer randomly reboots VMs on my system, it always does. Though, to be fair, in that case, it never reboots a VM with
a file-backed volume once the race condition is removed. I've also never seen a VM get rebooted on a libvirt restart when
not using storage pools, either. Unless I'm mistaken, though, it looks to me like storage pools are sort of the future in libvirt.
It makes the configuration of a VM a lot simpler, since the custom stuff is held in the storage pool. Storage pools also enable
management frontends to support new storage backends with little to no work, which is a big plus for any administrator. Currently,
you're pretty much just stuck with a file-backed VM unless you want to hand-craft your XML.
 

Daniel
--
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|