[Libvir] Proposal for dealing with host devices

The following document illustrates an API for host device enumeration, creation and deletion. This has a number of use cases: - PCI passthrough. Need to enumerate PCI devices, get their domain, bus, slot, function IDs to be able to pass-though to a guest. Need to disable the host kernel driver. Need to get metadata about the device to present to the user, eg model / vendor. - USB passthrough. Need to enumerate USB devices, get their bus, device IDs to be able to pass-though to a guest. Need to disable the host kernel driver. Need to get metadata about the device to present to the user, eg model / vendor. - Fibre Channel. Need to enumerate SCSI HBAs with FC capability and get their WWNN and WWPN (World Wide Node/Port Name) to enable the administrator to associate with the SAN. This relates to the SCSI storage pool impl I posted last week - NPIV. Need to create/delete NPIV virtual Fibre Channel adapters. This relates to the SCSI storage pool impl I posted last week - Networking. Need to enumerate NICs, bridges, VLANs, bonding. This all sounds like alot of data / stuff to manage, but the good news is that there is already an application which does most of this for us. ie HAL. So at the basic level all we need to do is map the HAL properties into a libvirt XML format for describing devices. I have chosen to define a very direct mapping. - At the top level, "<device>" element - A short 'name' and longer 'key' both unique to host The key is the HAL 'udi' value - A 'parent' key to show nesting of devices. - Optional 'bus' information inside a <bus> element Bus names will map straight into HAL bus names. The content is bus specific - Optional 'capability' information inside one or more <capability> elements. NB a single device can provide several capabilites. Capability names will map straight onto HAL capability names. The content is capability specific. - Capabilities can be nested for specialization. eg a 'net' capability can have a '80211' sub-capability if it is a wifi device. Now some example XML descriptions.... An arbitrary PCI device <device> <name>pci_8086_27c5</name> <key>/org/freedesktop/Hal/devices/pci_8086_27c5</key> <parent>/org/freedesktop/Hal/devices/computer</parent> <bus type="pci"> <vendor id="32902">Intel Corporation</vendor> <product id="10202">82801G (ICH7 Family) SMBus Controller</product> <address domain="0000" bus="00" slot="1f" function="3"/> </bus> </device> An arbitrary USB device <device> <name>usb_device_483_2016_noserial</name> <key>/org/freedesktop/Hal/devices/usb_device_483_2016_noserial</key> <parent>/org/freedesktop/Hal/devices/usb_device_0_0_0000_00_1d_3</parent> <bus type="usb"> <vendor id="1155">SGS Thomson Microelectronics</vendor> <product id="8214">Fingerprint Reader</product> <address bus="003" dev="005"/> </bus> </device> A SCSI HBA <device> <name>pci_8086_27df_scsi_host</name> <key>/org/freedesktop/Hal/devices/pci_8086_27df_scsi_host</key> <parent>/org/freedesktop/Hal/devices/pci_8086_27df</parent> <capability type="scsihost"> <capability type="fc"> <address wwnn="023432532532632" wwpn="32453253252352"/> <vports max="4"/> </capability> </capability> </device> As an example, consider a wireless NIC <device> <name>net_00_13_02_b9_f9_d3_0</name> <key>/org/freedesktop/Hal/devices/net_00_13_02_b9_f9_d3_0</key> <parent>/org/freedesktop/Hal/devices/pci_8086_4227</parent> <capability type="net"> <hwaddr>00:13:02:b9:f9:d3</hwaddr> <name>eth0</name> <capability type="80211"/> </capability> </device> Notice how the specific functional devices like NICs, HBAs, are children of the physical USB or PCI device. This is where the hierarchy comes in. There are a few other types of devices we want explicit representations for, block devices, sound devices, storage devices, input devices. I want describe them here, but they follow same pattern of mapping the XML onto the HAL properties in their <capability> tags. There are some devices HAL does not represent so we'll have to augment the HAL information. Specifically devices which don't correspond to a physical device, eg - Bonding NICs - Bridges - VLANs There are also cases where HAL does not have enough properties so we again need to augment the data - Fibre Channel / NPIV: Add WWPN, WWNN The API to deal with this is really very simple. APIs to query all devices, or query devices based on a capability, or bus type: int virNodeNumOfDevices(virConnectPtr conn) int virNodeListDevices(virConnectPtr conn, char **const names, int maxnames) int virNodeNumOfDevicesByCap(virConnectPtr conn, const char *cap) int virNodeListDevicesByCap(virConnectPtr conn, const char *cap, char **const names, int maxnames) int virNodeNumOfDevicesByBus(virConnectPtr conn, const char *bus) int virNodeListDevicesByBus(virConnectPtr conn, const char *bus, char **const names, int maxnames) Then APIs to obtain a virNodeDevicePtr object corresponding to a device name / key : virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *name) virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *key) int virNodeDeviceFree(virNodeDevicePtr dev) An API to get the XML description: char * virNodeDeviceDumpXML(virConnectPtr conn, unsigned int flags) Finally an API to create / delete devices - this is only for devices with certain capabilities virNodeDevicePtr virNodeDeviceCreate(virConnectPtr conn, const char *xml) int virNodeDeviceDestroy(virNodeDevicePtr dev) BTW if you want to see all the HAL metadata on your machine run lshal I don't propose to expose all the data - only specific properties we have immediate need for. The HAL spec describes the meaning of various props http://people.freedesktop.org/~david/hal-spec/hal-spec.html Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Apr 02, 2008 at 02:05:58AM +0100, Daniel P. Berrange wrote:
The following document illustrates an API for host device enumeration, creation and deletion. This has a number of use cases:
[...]
This all sounds like alot of data / stuff to manage, but the good news is that there is already an application which does most of this for us. ie HAL. So at the basic level all we need to do is map the HAL properties into a libvirt XML format for describing devices.
Having gone though the description a couple of time, this makes sense to me I just have a couple of remarks: - basically we are now extending libvirt to be a generic accessor for host physical data it's fine because we need it but ... - if hald daemons (I have 5 hal related daemons running mon my F8 desktops) had exported things in a secure way most of this would not be needed, right ? But having a remote secure way to access hardware data is needed, if libvirt is the first one to provide it, why not ! It will certainly make things easier for libvirt users.
I have chosen to define a very direct mapping.
Agreed, no need to remap, basically the only thing really hal specific in the exposed XML data are the hal names, which are basically internal unique names for addressing the devices, and public names usable for user comminication are also available so that's fine. [...]
Now some example XML descriptions.... [..]
Notice how the specific functional devices like NICs, HBAs, are children of the physical USB or PCI device. This is where the hierarchy comes in.
Well except the hierarchy is not reflected at the XML structure level. But I understand we need to be able to isolate devices descriptions and this could get too complex to be represented by a tree, so that's fine.
There are a few other types of devices we want explicit representations for, block devices, sound devices, storage devices, input devices. I want describe them here, but they follow same pattern of mapping the XML onto the HAL properties in their <capability> tags.
There are some devices HAL does not represent so we'll have to augment the HAL information. Specifically devices which don't correspond to a physical device, eg
- Bonding NICs - Bridges - VLANs
how much of that could be separated and considered a local network topology and extracted with the network APIs instead ?
The API to deal with this is really very simple. APIs to query all devices, or query devices based on a capability, or bus type:
int virNodeNumOfDevices(virConnectPtr conn)
int virNodeListDevices(virConnectPtr conn, char **const names, int maxnames)
I would add a flags for future extensibility of those 2 entry points. For example to be able to query 'active' devices.
int virNodeNumOfDevicesByCap(virConnectPtr conn, const char *cap)
int virNodeListDevicesByCap(virConnectPtr conn, const char *cap, char **const names, int maxnames)
How do you know the proper values for cap (or bus below) ?
int virNodeNumOfDevicesByBus(virConnectPtr conn, const char *bus) int virNodeListDevicesByBus(virConnectPtr conn, const char *bus, char **const names, int maxnames)
Okay that's server side filtering why not make it a bit more generic ? int virNodeNumOfDevicesSubset(virConnectPtr conn, const char *selector); int virNodeListDevicesSubset( virConnectPtr conn, const char *selector, char **const names, int maxnames) where the selector could be something like: bus='usb' cap='net' bus='pci' and cap='net' To me the problem may quickly become to filter the available informations while still allowing the API to be 1/ extensive 2/ fast (limited round trip) when you know what you want. Ultimately, you may want to move a domain from one machine with a very specific kind of device to another one in the pool with the same hardware (and not used yet by a running domain), and querying for this may need a relatively specific selector, that's why I think just selecting on bus or on capability may not be sufficient. Maybe HAL has better API for this. Maybe that's too complex but even with the existing API I think you need to be able to enumerate the bus and capability values to avoid hardcoding in the application HAL specific knowledge.
Then APIs to obtain a virNodeDevicePtr object corresponding to a device name / key :
virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *name)
virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *key)
you mean virNodeDeviceLookupByKey there
int virNodeDeviceFree(virNodeDevicePtr dev)
An API to get the XML description:
char * virNodeDeviceDumpXML(virConnectPtr conn, unsigned int flags)
Finally an API to create / delete devices - this is only for devices with certain capabilities
virNodeDevicePtr virNodeDeviceCreate(virConnectPtr conn, const char *xml)
int virNodeDeviceDestroy(virNodeDevicePtr dev)
Except for the querying capability this sounds fine to me. Of course at some point people may need change informations lookup (I'm not sure we want to provide a callback based API, something querying for changes since last check might be more useful state could be preserved in the libvirtd).
I don't propose to expose all the data - only specific properties we have immediate need for. The HAL spec describes the meaning of various props
The only thing I would like to avoid is that the immediate need viewpoint to lead to an API which we would have to modify and deprecate once used for a couple of years :-) But as-is except the mechanism for server side list filtering this looks just right to me, thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Wed, Apr 02, 2008 at 08:48:17AM -0400, Daniel Veillard wrote:
On Wed, Apr 02, 2008 at 02:05:58AM +0100, Daniel P. Berrange wrote:
The following document illustrates an API for host device enumeration, creation and deletion. This has a number of use cases:
[...]
This all sounds like alot of data / stuff to manage, but the good news is that there is already an application which does most of this for us. ie HAL. So at the basic level all we need to do is map the HAL properties into a libvirt XML format for describing devices.
Having gone though the description a couple of time, this makes sense to me I just have a couple of remarks: - basically we are now extending libvirt to be a generic accessor for host physical data it's fine because we need it but ...
Yes, that is basically correct. We have some limited host physical data like NUMA topology, CPU capabilities. This is extending that idea to cover devices too.
- if hald daemons (I have 5 hal related daemons running mon my F8 desktops) had exported things in a secure way most of this would not be needed, right ?
Yes & no. While this API is mostly virtualization agnostic, we'd still need to provide a different impl for VMWare or any other hypervisor where you don't have a Dom0 + HAl host OS available. Fortunately all our drivers do have that currently, so its easy in the short term & by minimizing the amount of data in the XML its practical to provide an impl for VMWare in the future if desired.
But having a remote secure way to access hardware data is needed, if libvirt is the first one to provide it, why not ! It will certainly make things easier for libvirt users.
Adding remote support to HAL directly would involve adding Kerberos, SSL, x509 support to DBus, and then defining an new security mechanism for DBus to replace the local SELinux controls, or getting them to work cross network. Not to mention the extra administrative setup step. So I think it is simpler all round to proxy it in libvirt.
[...]
Now some example XML descriptions.... [..]
Notice how the specific functional devices like NICs, HBAs, are children of the physical USB or PCI device. This is where the hierarchy comes in.
Well except the hierarchy is not reflected at the XML structure level. But I understand we need to be able to isolate devices descriptions and this could get too complex to be represented by a tree, so that's fine.
We could define an official way to nest the <device> XML fragments, but I can't think of any application use case (yet) where I'd want them all nested. So I figure its best to keep it simple now.
There are some devices HAL does not represent so we'll have to augment the HAL information. Specifically devices which don't correspond to a physical device, eg
- Bonding NICs - Bridges - VLANs
how much of that could be separated and considered a local network topology and extracted with the network APIs instead ?
I'm not entirely sure to be honest - that's certainly an option I've considered. Even with this host device API I think we'll need some extra APIs to deal with network device configs because they get very complex & there's fair amount of stateful configuration / lifecycle transitions to track which isn't something that can be expressed in this generic device enumeration API. So perhaps we should leave out the bonding/bridge/vlan stuff from this API for now...
int virNodeNumOfDevices(virConnectPtr conn)
int virNodeListDevices(virConnectPtr conn, char **const names, int maxnames)
I would add a flags for future extensibility of those 2 entry points. For example to be able to query 'active' devices.
There isn't really a concept of 'active' in these APIs, since they're really expressing hardware devices & functional capabilities exposed by the hardware. Each different type of device has its own 'lifecycle' which may or may not involve a concept of 'active', as well as number of other states. So I think its best to keep lifecycle tracking out of this API & concept just on device enumeration and metadata, which is basically what HAL does.
int virNodeNumOfDevicesByCap(virConnectPtr conn, const char *cap)
int virNodeListDevicesByCap(virConnectPtr conn, const char *cap, char **const names, int maxnames)
How do you know the proper values for cap (or bus below) ?
They're mostly defined in the HAL spec, but there'll be some extra ones that we add - eg for FibreChannel / NPIV vports. Off top of my head there is Capabilities net net.80203 net.80211 sound storage storage.cdrom storage.raid input input.keyboard input.mouse input.tablet input.joystick scsi_host scsi_host.vport Bus pci usb
int virNodeNumOfDevicesByBus(virConnectPtr conn, const char *bus) int virNodeListDevicesByBus(virConnectPtr conn, const char *bus, char **const names, int maxnames)
Okay that's server side filtering why not make it a bit more generic ?
int virNodeNumOfDevicesSubset(virConnectPtr conn, const char *selector); int virNodeListDevicesSubset( virConnectPtr conn, const char *selector, char **const names, int maxnames)
where the selector could be something like: bus='usb' cap='net' bus='pci' and cap='net'
To me the problem may quickly become to filter the available informations while still allowing the API to be 1/ extensive 2/ fast (limited round trip) when you know what you want.
Yep, that's an option - on a typical machine I expect you'd have 100 - 200 core devices + 1 or more devices for each block device. So if you have a SAN exported hundreds of LUNs each with several partitions you could get many many 100's of devices listed via this API. So filtering is definitely needed.
Ultimately, you may want to move a domain from one machine with a very specific kind of device to another one in the pool with the same hardware (and not used yet by a running domain), and querying for this may need a relatively specific selector, that's why I think just selecting on bus or on capability may not be sufficient.
Maybe HAL has better API for this.
The HAL API is basically matching the ByBus/ByCapability stuff I showed above, but we don't have to map 1-to-1 here, because we don't have to query HAL in real time. I expect we'll query HAL & cache the data in a mnore suitable intermediate format in libvirt
virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *name)
virNodeDevicePtr virNodeDeviceLookupByName(virConnectPtr conn, const char *key)
you mean virNodeDeviceLookupByKey there
Yes. Cut & paste mistake
virNodeDevicePtr virNodeDeviceCreate(virConnectPtr conn, const char *xml)
int virNodeDeviceDestroy(virNodeDevicePtr dev)
Except for the querying capability this sounds fine to me. Of course at some point people may need change informations lookup (I'm not sure we want to provide a callback based API, something querying for changes since last check might be more useful state could be preserved in the libvirtd).
HAL lets you get notifications on property changes, so we can definitely provide some form of callback to be notified when a device changes, or when a device is created / deleted.
I don't propose to expose all the data - only specific properties we have immediate need for. The HAL spec describes the meaning of various props
The only thing I would like to avoid is that the immediate need viewpoint to lead to an API which we would have to modify and deprecate once used for a couple of years :-)
Most of the stuff we'd have to add over time is likely to be just new elements/attributes in the XML. The HAL api itself hasn't really changed in any significant way in the last few years - they aim to provide a stable API themselves which is good for us. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

The API looks "sane" (modulo my usual concerns about passing random XML around, dynamic typing, etc.). My only question is: how dynamic are these devices? Could they be listed in the ordinary capabilities XML? USB devices are of course dynamic, but you also talk about caching the output of hald. Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

On Wed, Apr 02, 2008 at 05:30:25PM +0100, Richard W.M. Jones wrote:
The API looks "sane" (modulo my usual concerns about passing random XML around, dynamic typing, etc.).
My only question is: how dynamic are these devices? Could they be listed in the ordinary capabilities XML? USB devices are of course dynamic, but you also talk about caching the output of hald.
My concern with putting it in the capabilities XML, is that there can be an enourmous number of devices which will result in an XML doc that can be MBs in size. Aside from being inefficient for apps, I think this would be unmanagable for admins looking 'virsh capabilities' The devices are dynamic, in the sense that they can appear / disappear at any time - eg USB devices, and their properties can also change at runtime. This shouldn't be a problem, even if we do cache in libvirt, because HAL can provide notifications for all these events enablign us to refresh the cache as needed Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Richard W.M. Jones