Re: [Libvir] PATCH 0/2: Support QEMU (+KVM) in libvirt

Tuesday, 9 January 2007

On Fri, Jan 05, 2007 at 09:14:30PM +0000, Daniel P. Berrange wrote:
...

 The following series of (2) patches adds a QEMU driver to libvirt. The first patch
 provides a daemon for managing QEMU instances, the second provides a driver letting
 libvirt manage QEMU via the daemon. 

 Basic architecture
 ------------------

 The reason for the daemon architecture is two fold:

  - At this time, there is no (practical) way to enumerate QEMU instances, or
    reliably connect to the monitor console of an existing process. There is
    also no way to determine the guest configuration associated with a daemon. 
  Okay, we admitted that principle in the first round of QEmu patches last
year. The only question I have is about the multiplication of running daemons
for libvirt, as we also have another one already for read only xen hypervisor
access. We could either decide to keep daemon usages very specific (allows
to also easilly restrict their priviledge) or try to unify them. I guess 
from a security POV it's still better to keep them separate, and anyway
they are relatively unlikely to be run at the same time (KVM and Xen
on the same Node).

...
  - It is desirable to be able to manage QEMU instances using either
an unprivilegd
    local client, or a remote client. The daemon can provide connectivity via UNIX
    domain sockets, or IPv4 / IPv6 and layer in suitable authentication / encryption
    via TLS and/or SASL protocols. 
  C.f. my previous mail, yes authentication is key. Could you elaborate in some
way how the remote access and the authentication is set up, see my previous mail
on the remote xend access, we should try to unify and set up a specific
page to document remote accesses.

...
 Anthony Ligouri is working on patches for QEMU with the goal of
addressing the 
 first point. For example, an extra command line argument will cause QEMU to save
 a PID file and create a UNIX socket for its monitor at a well-defined path. More
 functionality in the monitor console will allow the guest configuration to be
 reverse engineered from a running guest. Even with those patches, however, it will
 still be desirable to have a daemon to provide more flexible connectivity, and to
 facilitate implementation libvirt APIs which are host (rather than guest) related.
 Thus I expect that over time we can simply enhance the daemon to take advantage of
 newer capabilities in the QEMU monitor, but keep the same basic libvirt driver
 architecture. 
  Okay, Work in Progress.

...
 Considering some of the other hypervisor technologies out there, in
particular 
 User Mode Linux, and lhype, it may well become possible to let this QEMU daemon
 also provide the management of these guests - allowing re-use of the single driver
 backend in the libvirt client library itself. 
  Which reopen the question, one multi-featured daemon or multiple simpler
(but possibly redundant) daemons ?

...
 XML format
 ----------

 As discussed in the previous mail thread, the XML format for describing guests
 with the QEMU backend is the same structure as that for Xen guests, with 
 following enhancements:

   - The 'type' attribute on the top level <domain> tag can take one of the
     values 'qemu', 'kqemu' or 'kvm' instead of 'xen'.
This selects between
     the different virtualization approaches QEMU can provide.

   - The '<type>' attribute within the <os> block of the XML (for now)
is 
     still expected to the 'hvm' (indicating full virtualization), although
     I'm trying to think of a better name, since its not technically hardware
     accelerated unless you're using KVM 
   yeah I don't have a good value to suggest except "unknown" bacause
basically
we don't know a priori what the running OS will be.

...
   - The '<type>' attribute within the <os> block
of the XML can have two
     optional 'arch' and 'machine' attributes. The former selects the CPU
     architecture to be emulated; the latter the specific machine to have
     QEMU emulate (determine those supported by QEMU using 'qemu -M ?'). 
  Okay, I hope we will have enough flexibility in the virNodeInfo model to
express the various combinations, we have a 32 char string for this, I guess
that should be sufficient, but I don't know how to express that in the best way
I will see how the pacth does it. From my recollection of posts on qemu-devel
some of the machines names can be a bit long on specific emulated target. At
least we should be okay for a PC architecture.

...
   - The <kernel>, <initrd>, <cmdline> elements can
be used to specify 
     an explicit kernel to boot off[1], otherwise it'll do a boot of the 
     cdrom, harddisk / floppy (based on <boot> element). Well,the kernel
     bits are parsed at least. I've not got around to using them when 
     building the QEMU argv yet. 
  Okay

...
   - The disk devices are configured in same way as Xen HVM guests. eg
you
     have to use  hda -> hdd, and/or fda -> fdb. Only hdc can be selected
     as a cdrom device. 
  Good !

...
   - The network configuration is work in progress. QEMU has many ways
to
     setup networking. I use the 'type' attribute to select between the
     different approachs 'user', 'tap', 'server',
'client', 'mcast' mapping
     them directly onto QEMU command line arguments. You can specify a
     MAC address as usual too. I need to implement auto-generation of MAC
     addresses if omitted. Most of them have extra bits of metadata though
     which I've not figured out appropriate XML for yet. Thus when building
     the QEMU argv I currently just hardcode 'user' networking. 
  Okay, since user is the default in QEmu (assuming I remember correctly :-)

...
   - The QEMU binary is determined automatically based on the
requested
     CPU architecture, defaulting to i686 if non specified. It is possible
     to override the default binary using the <emulator> element within the
     <devices> section. This is different to previously discussed, because
     recent work by Anthony merging VMI + KVM to give paravirt guests means
     that the <loader> element is best kept to refer to the VMI ROM (or other
     ROM like files :-) - this is also closer to Xen semantics anyway. 
  Hum, the ROM, one more parameter, actually we may once need to provide for 
mutiple of them at some point if they start mapping non contiguous area.

...
 Connectivity
 ------------

 The namespace under which all connection URIs come is 'qemud'. Thereafter
 there are several options. First, two well-known local hypervisor
 connections

   - qemud:///session

     This is a per-user private hypervisor connection. The libvirt daemon and
     qemu guest processes just run as whatever UNIX user your client app is 
     running. This lets unprivileged users use the qemu driver without needing 
     any kind admin rights. Obviously you can't use KQEMU or KVM accelerators
     unless the /dev/ device node is chmod/chown'd to give you access.

     The communication goes over a UNIX domain socket which is mode 0600 created
     in the abstract namespace at $HOME/.qemud.d/sock. 
  okay, makes sense. Everything runs under the user privilege and there is no
escalation.

...
   - qemud:///system

     This is a system-wide privileged hypervisor connection. There is only one
     of these on any given machine. The libvirt_qemud daemon would be started
     ahead of time (by an init script), possibly running as root, or maybe under
     a dedicated system user account (and the KQEMU/KVM devices chown'd to match).

  Would that be hard to allow autostart ? That's what we do for the read-only
xen hypervisor access. Avoiding starting up stuff in init.d when we have no
garantee it will be used, and auto-shutdown when there is no client is IMHO
generally nicer, but that feature can just be added later possibly, main 
drawback is that it requires an suid binary.

...
     The admin would optionally also make it listen on IPv4/6 addrs to
allow
     remote communication. (see next URI example)

     The local communication goes over one of two possible UNIX domain sockets
     Both in the abstract namespace under the directory /var/run. The first socket
     called 'qemud' is mode 0600, so only privileged apps (ie root) can access
it,
     and gives full control capabilities. The other called 'qemud-ro'  is mode
0666 
     and any clients connecting to it will be restricted to only read-only libvirt 
     operations by the server.

   - qemud://hostname:port/

     This lets you connect to a daemon over IPv4 or IPv6. If omitted the port is
     8123 (will probably change it). This lets you connect to a system daemon
     on a remote host - assuming it was configured to listen on IPv4/6 interfaces. 
  Hum, for that the daemon requires to be started statically too.

...
     Currently there is zero auth or encryption, but I'm planning
to make it 
     mandortory to use the TLS protocol - using the GNU TLS library. This will give
     encryption, and mutual authentication using either x509 certificates or
     PGP keys & trustdbs or perhaps both :-) Will probably start off by implementing
     PGP since I understand it better.

     So if you wanted to remotely manage a server, you'd copy the server's 
     certificate/public key to the client into a well known location. Similarly
     you'd generate a keypair for the client & copy its public key to the 
     server. Perhaps I'll allow clients without a key to connect in read-only
     mode. Need to prototype it first and then write up some ideas. 
  okay, though there is multiple authentication and encryption libraries, 
and picking the Right One may not be possible, there is so many options, 
and people may have specific infrastructure in place. Anyway the current state
is no-auth so anything will be better :-)

...
 Server architecture
 -------------------

 The server is a fairly simple beast. It is single-threaded using non-blocking I/O
 and poll() for all operations. It will listen on multiple sockets for incoming
 connections. The protocol used for client-server comms is a very simple binary
 message format close to the existing libvirt_proxy. 
  Good so we keep similar implementations. Any possibility of sharing part of 
that code, it's always very sensitive areas, both for security and edge case
in the communication.

...
 Client sends a message, server
 receives it, performs appropriate operation & sends a reply to the client. The
 client (ie libvirt driver) blocks after sending its message until it gets a reply.
 The server does non-blocking reads from the client buffering until it has a single
 complete message, then processes it and populates the buffer with a reply and does
 non-blocking writes to send it back to the client. It won't try to read a further
 message from the client until its sent the entire reply back. ie, it is a totally
 synchronous message flow - no batching/pipelining of messages.  
  Honnestly I think that's good enough, I don't see hundreds of QEmu instances
having to be monitored remotely from a single Node. On an monitoring machine
things may be accelerated by multithreading the gathering process to talk to
multiple Nodes in parallel. At least on the server side I prefer to keep things
as straight as possible.

...
 During the time
 the server is processes a message it is not dealing with any other I/O, but thus
 far all the operations are very fast to implement, so this isn't a serious issue,
 and there ways to deal with it if there are operations which turn out to take a
 long time. I certainly want to avoid multi-threading in the server at all costs! 
 completely agree :-)

...
 As well as monitoring the client & client sockets, the poll()
event loop in the
 server also captures stdout & stderr from the QEMU processes. Currently we just
 dump this to stdout of the daemon, but I expect we can log it somewhere. When we
 start accessing the QEMU monitor there will be another fd in the event loop - ie
 the pseduo-TTY  (or UNIX socket) on which we talk to the monitor. 
  At some point we will need to look at adding a Console dump API, that will
be doable for Xen too, but it's not urgent since nobody requested it yet :-)

...
 Inactive guests
 ---------------

 Guests created using 'virsh create'  (or equiv API) are treated as
'transient'
 domains - ie their config files are not saved to disk. This is consistent with
 the behaviour in the Xen backend. Guests created using 'virsh define', however,
 are saved out to disk in $HOME/.qemud.d for the per-user session daemon. The
 system-wide daemon should use /etc/qemud.d, but currently its still /root/.qemud.d

  Maybe this should be asked on the qemu-devel list, Fabrice and Co. may have
a preference on where to store config related stuff for QEmu even if it's not
directly part of QEmu.

...
 The config files are simply saved as the libvirt  XML blob ensuring
no data
 conversion issues. In any case, QEMU doesn't currently have any config file 
 format we can leverage. The list of inactive guests is loaded at startup of the
 daemon. New config files are expected to be created via the API - files manually
 created in the directory after initial startup are not seen. Might like to change
 this later. 
  Hum, maybe we could use FAM/gamin if found at configure time, but well 
it's just an additional feature, let's just avoid any uneeeded timer.

...
 XML Examples
 ------------

 This is a guest using plain qemu, with x86_64 architecture and a ISA-only
 (ie no PCI) machine emulation. I was actually running this on a 32-bit
 host :-) VNC is configured to run on port 5906. QEMU can't automatically
 choose a VNC port, so if one isn't specified we assign one based on the
 domain ID. This should be fixed in QEMU....

 <domain type='qemu'>
   <name>demo1</name>
   <uuid>4dea23b3-1d52-d8f3-2516-782e98a23fa0</uuid>
   <memory>131072</memory>
   <vcpu>1</vcpu>
   <os>
     <type arch='x86_64' machine='isapc'>hvm</type>
   </os>
   <devices>
     <disk type='file' device='disk'>
       <source file='/home/berrange/fedora/diskboot.img'/>
       <target dev='hda'/>
     </disk>
     <interface type='user'>
       <mac address='24:42:53:21:52:45'/>
     </interface>
     <graphics type='vnc' port='5906'/>
   </devices>
 </domain>

 A second example, this time using KVM acceleration. Note how I specify a
 non-default path to QEMU to pick up the KVM build of QEMU. Normally KVM
 binary will default to /usr/bin/qemu-kvm - this may change depending on
 how distro packaging of KVM turns out - it may even be merged into regular
 QEMU binaries.

 <domain type='kvm'>
   <name>demo2</name>
   <uuid>4dea24b3-1d52-d8f3-2516-782e98a23fa0</uuid>
   <memory>131072</memory>
   <vcpu>1</vcpu>
   <os>
     <type>hvm</type>
   </os>
   <devices>
     <emulator>/home/berrange/usr/kvm-devel/bin/qemu-system-x86_64</emulator>
     <disk type='file' device='disk'>
       <source file='/home/berrange/fedora/diskboot.img'/>
       <target dev='hda'/>
     </disk>
     <interface type='user'>
       <mac address='24:42:53:21:52:45'/>
     </interface>
     <graphics type='vnc' port='-1'/>
   </devices>
 </domain> 
  Okay, I'm nearing completion of a Relax-NG schemas allowing to validate 
XML instances, I will augment to allow the changes, but based on last week 
discussion it should not bee too hard and still retain good validation 
properties.

...
 Outstanding work
 ----------------

   - TLS support. Need to add TLS encryption & authentication to both the client
     and server side for IPv4/6 communications. This will obviously add a dependancy
     on libgnutls.so in libvirt & the daemon. I don't consider this a major
problem
     since every non-trivial network app these days uses TLS. The other possible impl
     of OpenSSL has GPL-compatability issues, so is not considered.

   - Change the wire format to use fixed size data types (ie, int8, int16, int32, etc)
     instead of the size-dependant  int/long types. At same time define some rules for
     the byte ordering. Client must match server ordering ? Server must accept
client's
     desired ordering ?  Everyone must use BE regardless of server/client format ?
I'm
     inclined to say client must match server, since it distributes the byte-swapping
     overhead to all clients and lets the common case of x86->x86 be a no-op. 
  Hum, on the other hand if you do the conversion as suggested by IETF rules
it's easier to find the places where the conversion is missing, unless you
forgot to ntoh and hton on both client and server code. Honnestly I would
not take the performance hit in consideration at that level and not now,
the RPC is gonna totally dominate it by order of magnitudes in my opinion.

...
   - Add a protocol version message as first option to let use
protocol at will later
     while maintaining compat with older libvirt client libraries. 
  Yeah, this also ensure you get a functionning server on that port !

...
   - Improve support for describing the various QEMU network
configurations

   - Finish boot options - boot device order & explicit kernel

   - Open & use connection to QEMU monitor which will let us implement pause/resume,
     suspend/restore drivers, and device hotplug / media changes.

   - Return sensible data for virNodeInfo - will need to have operating system dependant
     code here - parsing /proc for Linux to determine available RAM & CPU speed. Who
     knows what for Solaris /  BSD ?!? Anyone know of remotely standard ways for doing
     this. Accurate host memory reporting is the only really critical data item we need.

  The GNOME guys tried that, maybe dig up the gst (gnome system tools) code
base :-)

...
   - There is a fair bit of duplicate in various helper functions
between the daemon,
     and various libvirt driver backends. We should probably pull this stuff out into
     a separate lib/ directoy, build it into a static library and then link that into
     both libvirt, virsh & the qemud daemon as needed. 
  Yes definitely !

This all sounds excellent, thanks a lot !!!

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard(a)redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Libvir] PATCH 0/2: Support QEMU (+KVM) in libvirt