New subject: [libvirt] sVirt support for LXC?

23 Aug 2011

      I was at the KVM Forum / LinuxCon last week and there were many
interesting things discussed which are relevant to ongoing libvirt
development. Here was the list that caught my attention. If I have
missed any, fill in the gaps....

 - Sandbox/container KVM.  The Solaris port of KVM puts QEMU inside
   a zone so that an exploit of QEMU can't escape into the full OS.
   Containers are Linux's parallel of Zones, and while not nearly as
   secure yet, it would still be worth using more containers support
   to confine QEMU.

 - Events for object changes. We already have async events for virDomainPtr.
   We need the same for virInterfacePtr, virStoragePoolPtr, virStorageVolPtr
   and virNodeDevPtr, so that at the very least applications can be notified
   when objects are created or removed. For virNodeDevPtr we also want to
   be notified when properties change (ie CDROM media change).

 - CGroups passthrough. There is alot of experimentation with cgroups. We
   don't want to expose cgroups as a direct concept in the libvirt API,
   but we should consider putting a generic cgroups get/set in the
   libvirt-qemu.so library, or create a libvirt-linux.so library.
   Also likely add a <linux:cgroups> XML element to store arbitrary
   tunables in the XML. Same (low) level of support as with qemu:XXX
   of course

 - CPUSet for changing CPU + Memory NUMA pinning. The CPUset cgroups
   controller is able to actually move a guest's memory between NUMA
   nodes. We can already change VCPU pinning, but we need a new API
   to do node pinning of the whole VM, so we can ensure the I/O threads
   are also moved. We also need an API to move the memory pinning to
   new nodes.

 - Guest NUMA topology. If we have guests with RAM size > node size,
   we need to expose a NUMA topology into the guest. The CPU/memory
   pinning APIs will also need to be able to pin individual guest
   NUMA nodes to individual host NUMA nodes.

 - AHCI controller. IDE is going the way of the dodo. We need to add
   support for QEMU's new AHCI controller. This is quite simple, we
   already have a 'sata' disk type we can wire up to QEMU

 - VFIO PCI passthru. The current PCI assignment code may well be
   changed to use something called 'VFIO'. This will need some
   work in libvirt to support new CLI arg syntax, and probably
   some SELinux work

 - QCow3. There will soon be a QCow3 format. We need to add code to
   detect it and extract backing stores, etc. Trivial since the primary
   header format will still be the same as QCow2.

 - QMP completion. Given anthony's plan for a complete replacement of
   the current CLI + monitor syntax in QEMU 2.0 (long way out), he has
   dropped objections to adding new commands to QMP in the near future.
   So all existing HMP commands will immediately be made available in
   QMP with no attempt to re-design them now. So the need for the HMP
   passthrough command will soon go away.

 - Migration + VEPA/VNLink failures. As raised previously on this list,
   Cisco really wants libvirt to have the ability to do migration, and
   optionally *not* fail, even if the VEPA/VNLink setup fails. This will
   require an event notification to the app if a failure of a device
   backend occurs, and an API to let the admin app fix the device backend
   (virDomainUpdateDevice) and some way to tell migration what bits are
   allowed to fail.

 - Virtio SCSI. We need to support this new stuff in QEMU when it is
   eventually implemented. It will mean we avoid the PCI slot usage
   problems inherant in virtio-blk, and get other things like multipath
   and decent SCSI passthrough support.

 - USB 2.0. We need to support this in libvirt asap. It is very important
   for desktop experiance and to support better integration with SPICE
   This also gets us proper USB port addressing. Fun footnote, QEMU USB
   has *never* supported migration. The USB tablet only works by sheer
   luck, as OS' see the device disappear on migration & come back with
   different device ID/port addr and so does a re-initialize !

 - Native KVM tool. The problem statement was that the QEMU code is too
   big/complex & and command line args are too complex, so lets rewrite
   from scratch to make the code small & CLI simple. They achieve this,
   but of course primarily because they lack so many features compared
   to QEMU. They had libvirt support as a bullet point on their preso,
   but I'm not expecting it to replace the current QEMU KVM support in
   the forseeable future, given its current level of features and the
   size of its dev team compared to QEMU/KVM. They did have some fun
   demos of booting using the host OS filesystem though. We can
   actually do the same with regular KVM/libvirt but there's no nice
   demo tool to show it off. I'm hoping to create one....

 - Shared memory devices. Some people doing high performance work are
   using the QEMU shared memory device. We don't support this (ivhshm
   device) in libvirt yet. Fairly niche use cases but might be nice to
   have this.

 - SDK / Docs. Request for a more SDK like approach to KVM development
   tools and documentation. Also want to simplify libvirt operations.
   The exposure of the virt-install internal API as official GObjects
   would have significantly helped the project Ricardo (from IBM)
   described in his presentation. Of course no one can argue that we
   need more documentation in every area.

 - USB managed mode. As we do with PCI passthrough, we should be able
   to detach USB device from host OS, and perform a reset before
   attaching to the guest, and most importantly track which USB devices
   have been given to which guest, so we don't duplicate assign. We have
   all neccessary APIs, just need to wire them up.

 - PCI passthrough. We need to support setting of MAC addr, VLAN and
   VEPA/VNLink properties against VFs from SRIOV NICs that are assigned
   to a guest.

For those who were not at the KVM Forum, the presentations are already
available online at:

  http://www.linux-kvm.org/page/KVM_Forum_2011

All the session were also video recorded, so sometime in the next week
or two, there should be OGG videos of the talks being uploaded to the
same site.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

[libvirt] Notes from the KVM Forum relevant to libvirt

Daniel P. Berrange

Stefan Hajnoczi

Daniel P. Berrange

Stefan Hajnoczi

Daniel P. Berrange

Stefan Hajnoczi

Daniel P. Berrange

Stefan Hajnoczi

Serge E. Hallyn

Daniel P. Berrange

Serge Hallyn

Dong-In David Kang

Eric Blake

Dong-In David Kang

tags

participants (6)