Re: [libvirt] lock manager and libguestfs use (was: Re: RFC [2/3]: The lock manager plugin driver API)

10 Nov 2010


      On Wed, Nov 10, 2010 at 04:17:13PM +0000, Richard W.M. Jones wrote:
...
On Wed, Nov 10, 2010 at 03:52:49PM +0000, Daniel P. Berrange wrote:
...
The load_drv() method there is to allow the plugin implementation
to block its use with a libvirtd that is too old. eg, say we add
a new API 'foo', and the plugin implements that API. It wants to
block its use in any libvirt which doesn't know about the 'foo'
What sort of thing could 'foo' be?
...
entry point, because such use could be dangerous to safety.
You should just rename the lock manager .so file in this case.  But
it'd be best to write an enduring minimum API and ABI now to avoid
having to do this.
...
The
load_drv() method allows that to be handled. If load_drv() fails
libvirt/libguestfs *must* refuse to run any VMs.
Well, libguestfs has other alternatives.  Ignoring the lock manager is
one, allowing read-only calls to proceed is another one.
Allowing read-only calls to continue isn't guarenteed to be safe.
The lock manager plugin may wish to deny even readonly  usage, to
ensure that there aren't data consistency complication with NFS
close-to-open behaviour during migration.  If a lock manager is
configured for use and it fails, not running any VMs is the only
safe thing, until the admin has fixed the config problem.
...
...
...
(2) Without compiler help it is hard to call load_drv/unload_drv at
the right time in a thread-safe way.
In libvirt context they are loaded when the QEMU driver is
initialized in libvirtd which is still single threaded.
In libguestfs, you've no global "driver" state, so I'd
suggest it is easiest to just call load_drv before starting
the VM, and unload_drv after stopping it. The API contract
requires that a plugin needs to cope with being loaded
multiple times.
There are still issues with this.  Multiple threads can call
guestfs_launch (on different handles), resulting in reentrant calls to
load_drv.  I'm not convinced that the API as proposed is safe against
this sort of thing:
http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html
As mentioned below, these APIs are intended for doing global
state management, instead they are for version/feature
compatibility checks.
...
...
...
(3) Having all the functions buried inside a structure means I cannot
use ordinary techniques (ie. dlsym) to find out if a function is
supported.  Because I cannot use dlsym, I'm forced to use hard-coded
ABI checks!
You're thinking only about libvirtd checking what a plugin
supports. The load_drv() method is about the *opposite*,
allowing the plugin to check what libvirtd supports. You
can't do that with dlsym, or __attribute__().
You can by renaming the lock manager DSO.  Still, better to avoid
having to do that by having at least a minimum stable ABI which you
support forever.
There can be multiple lock manager plugin impls, with different
requirements. Renaming the DSO impacts all implementations, where
as the load_drv() callback is intended to allow individual impls
to define their own minimum requirements.
...
...
In addition to the reasons already mentioned for constructor not
being able to replace load_drv,  "destructor" can't replace the
unload_drv method. The unload_drv method is there to allow the
plugin to refuse to allow module unloading from the app.
Why not just pause in the resource release function (shutdown_drv is
it?)  I'm still unclear what sort of locking would allow you to
release the resource but still be unable to unload the module.
It isn't about the locking preventing unload. There are a variety
of scenarios in which is it unsafe to unload *any* dlopen'd 
library. eg, if the module has set a thread local variable with
a destructor, it is not safe to unload the module unless all
threads have terminated, or all thread locals cleaned up. The
module has no way to ensure this, so the only safe option is to
never dlclose() it, otherwise you'll get SIGBUS or SIGSEGV.  The
unload_drv() method return code is to allow for this scenario to
be dealt with safely.
...
...
...
The thing here is that I have to translate the XML into something that
the lock manager wants.  Why not let me just pass the XML?  Or the
virDomainPtr?
At the point where libvirt drivers invoke the plugin, there is no
virDomainPtr in available, only the internal objects virDomainDefPtr
which aren't something we want to expose since they're not ABI
stable. In addition from the context where the plugin executes, it
likely won't be able to use the libvirt APIs safely without causing
deadlock in libvirt drivers.
It isn't passing the XML because we didn't want the lock manager
implementations to all have to re-implement XML parsing, and again
the libvirt driver code doesn't have the XML around at this point.
This is unfriendly to other callers ...
Unfortunately this is a no-win situation. Either the plugin impls
have to parse the XML, or the plugin callers have to. On balance
there are more plugin impls than callers, and one of the callers
already knows how to parse the XML, so I put the burden of XML
parsing on the caller rather than the plugin impl.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|