Re: [libvirt] Redesigning Libvirt: Better supporting non-hypervisor agnostic concepts

20 Nov 2017


      On Sun, Nov 19, 2017 at 10:21:32PM +0100, Martin Kletzander wrote:
...
On Wed, Nov 15, 2017 at 06:19:38PM +0000, Daniel P. Berrange wrote:
...
On Wed, Nov 15, 2017 at 05:57:45PM +0000, Richard W.M. Jones wrote:
...
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
...
I would anticipate a standalone process "libvirt-qemu" that an
application can spawn, providing a normal domain XML file via the
command line or stdin. It would then connect to libvirtd to register
its existance and claim its ownership of the guest name +
UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn
QEMU.
To be really clear about this, the application would run something
like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid);
  libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) {
    execlp ("libvirt-qemu",
            "libvirt-qemu", "--config", libvirt_xml_file, NULL);
  }
dom = virDomainLookupByUUID (conn, uuid);
libvirt-qemu would exec(2) qemu?
Yes, that is pretty much exactly what I am suggesting.
...
Above I've assumed that we need to get a libvirt handle for ongoing
interactions with the new qemu process.  Would we get that via the
name or UUID from the XML, ie. calling virDomainLookupByUUID?  I guess
there's some raciness here.  The libvirt domain wouldn't exist
immediately in the application process.
The libvirt-qemu would register itself with libvirtd, and then libguestfs
would have to speak to libvirtd for ongoing management. Though for the
purposes of shutdown, it would be valid to just kill() the children
directly if desired.  In the second mail in this series, I describe a way
to decompose libvirtd, whereupon ongoing management could be handled inside
libvirt-qemu itself. That would potentially avoid the need for libguestfs
to talk to libvirtd at all. Though this would be a secondary piece of work
Initially we could avoid the raciness if libvirt-qemu implemented the
systemd startup notification protocol. That would let libvirt-qemu notify
libguestfs (or whoever spawns it), /after/ it has successfully registered
itself with libvirtd.  So you can then virDomainLookupByUUID without any
race (ie you can then assume that if virDomainLookupByUUID fails, it means
the new QEMU has already quit)
I like the whole idea.  I'm replying here because this is the most
relevant part of this particular sub-thread.  Would it be too much for us
to go beyond this and offer more functionality without actually talking
to the daemon?  Let's say we:
- return the UUID instead of requiring it
- allow having more signal handlers than for just SIGTERM
- maybe add some simple protocol that libvirt-qemu shim would implement
 on stdin/out
these three things would provide the shim usable for things not compiled
with libvirt at all, maybe even users.  I'm not saying this must be
something we strive for from day one, just something we could consider
not forbidding.
I guess there's a distinction between what the apps has to do vs what
the shim has todo. At least in the short-medium term, the shim itself
would still need to communicate with libvirtd (or other related daemons
if it needs to resolve other objects. eg resolve the "default" virtual
network). The application does not neccessarily need to talk to libvirtd
though. It would be enough to spawn the libvirt-qemu shim, and query
stats out of cgroup directly and then kill() the shim when done. If
you want to dynamically make changes to the QEMU config on the fly
though, then we need some kind of API, and I'm not seeing a compelling
reason to change the API we currently provide apps. It could, however,
be possible for that libvirt.so remote driver to connect directly to
the shim todo its work.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|