[libvirt] [PATCH 0/3] start of patches to support VFIO device assignment

This work isn't finished, but since there's a freeze coming up in a few days, I thought I should send these patches that *are* finished to get them out of the way before the last minute rush. Patch 1 is trivial. Patch 2 is longer, but completely mechanical (NOTE: there may be some uses of that struct living in code that I lack the environment to compile for, so test builds by people with odd platforms would be appreciated!) Patch 3 is standard parser/formatter/RNG/docs for the new XML. I haven't added any test cases, because lacking backend functionality I could only test xml-to-xml, which would require adding yet another test xml file, and I'd rather wait until the backend commandline-building code is in place, then simply add the new element to a few existing test XML files. In email awhile ago, danpb had suggested <driver name='vfio|qemu'/> I've used <driver name='kvm'/> because, unlike vhost-net where <driver name='qemu'/> indicates that the driver used is in the usermode qemu process (and *not* in the kernel), in this case <driver name='kvm'/> indicates that the driver used is in the kernel (and I've seen it referenced in several places as being "done by KVM", but never as "done by QEMU" (makes sense, since qemu is all in userland) Laine Stump (3): qemu: detect vfio-pci device assignment support conf: put hostdev pci address in a struct conf: formatter/parser/RNG/docs for hostdev <driver name='kvm|vfio'/> docs/formatdomain.html.in | 42 ++++++++++++++++++++- docs/formatnetwork.html.in | 15 ++++++++ docs/schemas/domaincommon.rng | 79 ++++++++++++++++++++++++++-------------- docs/schemas/network.rng | 11 ++++++ src/conf/domain_audit.c | 8 ++-- src/conf/domain_conf.c | 48 +++++++++++++++++++++--- src/conf/domain_conf.h | 15 +++++++- src/network/bridge_driver.c | 24 ++++++------ src/qemu/qemu_capabilities.c | 3 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 34 ++++++++--------- src/qemu/qemu_hostdev.c | 42 ++++++++++----------- src/qemu/qemu_hotplug.c | 14 +++---- src/security/security_apparmor.c | 8 ++-- src/security/security_dac.c | 16 ++++---- src/security/security_selinux.c | 16 ++++---- src/security/virt-aa-helper.c | 10 ++--- src/xen/xend_internal.c | 10 ++--- src/xenxs/xen_sxpr.c | 16 ++++---- src/xenxs/xen_xm.c | 16 ++++---- 20 files changed, 286 insertions(+), 142 deletions(-) -- 1.7.11.7

--- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index ef291c0..0a3fb06 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -220,6 +220,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "machine-usb-opt", "tpm-passthrough", "tpm-tis", + + "vfio-pci", /* 140 */ ); struct _virQEMUCaps { @@ -1347,6 +1349,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "virtio-rng-ccw", QEMU_CAPS_DEVICE_VIRTIO_RNG }, { "rng-random", QEMU_CAPS_OBJECT_RNG_RANDOM }, { "rng-egd", QEMU_CAPS_OBJECT_RNG_EGD }, + { "vfio-pci", QEMU_CAPS_DEVICE_VFIO_PCI }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBlk[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 4e76799..2acc1c8 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -178,6 +178,7 @@ enum virQEMUCapsFlags { QEMU_CAPS_MACHINE_USB_OPT = 137, /* -machine xxx,usb=on/off */ QEMU_CAPS_DEVICE_TPM_PASSTHROUGH = 138, /* -tpmdev passthrough */ QEMU_CAPS_DEVICE_TPM_TIS = 139, /* -device tpm_tis */ + QEMU_CAPS_DEVICE_VFIO_PCI = 140, /* -device vfio-pci */ QEMU_CAPS_LAST, /* this must always be the last item */ }; -- 1.7.11.7

On 04/23/2013 10:45 AM, Laine Stump wrote:
--- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+)
ACK -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

There will soon be other items related to pci hostdevs that need to be in the same part of the hostdevsubsys union as the pci address (which is currently a single member called "pci". This patch replaces the single member named pci with a struct named pci that contains a single member named "addr". --- src/conf/domain_audit.c | 8 ++++---- src/conf/domain_conf.c | 12 ++++++------ src/conf/domain_conf.h | 4 +++- src/network/bridge_driver.c | 24 +++++++++++------------ src/qemu/qemu_command.c | 34 ++++++++++++++++---------------- src/qemu/qemu_hostdev.c | 42 ++++++++++++++++++++-------------------- src/qemu/qemu_hotplug.c | 14 +++++++------- src/security/security_apparmor.c | 8 ++++---- src/security/security_dac.c | 16 +++++++-------- src/security/security_selinux.c | 16 +++++++-------- src/security/virt-aa-helper.c | 10 +++++----- src/xen/xend_internal.c | 10 +++++----- src/xenxs/xen_sxpr.c | 16 +++++++-------- src/xenxs/xen_xm.c | 16 +++++++-------- 14 files changed, 116 insertions(+), 114 deletions(-) diff --git a/src/conf/domain_audit.c b/src/conf/domain_audit.c index 6d0ae48..d81744c 100644 --- a/src/conf/domain_audit.c +++ b/src/conf/domain_audit.c @@ -382,10 +382,10 @@ virDomainAuditHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev, switch (hostdev->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: if (virAsprintf(&address, "%.4x:%.2x:%.2x.%.1x", - hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function) < 0) { + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function) < 0) { VIR_WARN("OOM while encoding audit message"); goto cleanup; } diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index dc0ecaa..b7fea36 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -3552,7 +3552,7 @@ virDomainHostdevSubsysPciDefParseXML(const xmlNodePtr node, if (cur->type == XML_ELEMENT_NODE) { if (xmlStrEqual(cur->name, BAD_CAST "address")) { virDevicePCIAddressPtr addr = - &def->source.subsys.u.pci; + &def->source.subsys.u.pci.addr; if (virDevicePCIAddressParseXML(cur, addr) < 0) goto out; @@ -8990,10 +8990,10 @@ static int virDomainHostdevMatchSubsysPCI(virDomainHostdevDefPtr a, virDomainHostdevDefPtr b) { - if (a->source.subsys.u.pci.domain == b->source.subsys.u.pci.domain && - a->source.subsys.u.pci.bus == b->source.subsys.u.pci.bus && - a->source.subsys.u.pci.slot == b->source.subsys.u.pci.slot && - a->source.subsys.u.pci.function == b->source.subsys.u.pci.function) + if (a->source.subsys.u.pci.addr.domain == b->source.subsys.u.pci.addr.domain && + a->source.subsys.u.pci.addr.bus == b->source.subsys.u.pci.addr.bus && + a->source.subsys.u.pci.addr.slot == b->source.subsys.u.pci.addr.slot && + a->source.subsys.u.pci.addr.function == b->source.subsys.u.pci.addr.function) return 1; return 0; } @@ -13701,7 +13701,7 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: if (virDevicePCIAddressFormat(buf, - def->source.subsys.u.pci, + def->source.subsys.u.pci.addr, includeTypeInAddr) != 0) virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("PCI address Formatting failed")); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index f1f01fa..a9f86c0 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -399,7 +399,9 @@ struct _virDomainHostdevSubsys { unsigned vendor; unsigned product; } usb; - virDevicePCIAddress pci; /* host address */ + struct { + virDevicePCIAddress addr; /* host address */ + } pci; } u; }; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 27dd230..510b4f3 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -3891,7 +3891,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) iface->data.network.actual->data.hostdev.def.mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; iface->data.network.actual->data.hostdev.def.managed = netdef->forward.managed ? 1 : 0; iface->data.network.actual->data.hostdev.def.source.subsys.type = dev->type; - iface->data.network.actual->data.hostdev.def.source.subsys.u.pci = dev->device.pci; + iface->data.network.actual->data.hostdev.def.source.subsys.u.pci.addr = dev->device.pci; /* merge virtualports from interface, network, and portgroup to * arrive at actual virtualport to use @@ -4222,7 +4222,7 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) for (ii = 0; ii < netdef->forward.nifs; ii++) { if (netdef->forward.ifs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI && - virDevicePCIAddressEqual(&hostdev->source.subsys.u.pci, + virDevicePCIAddressEqual(&hostdev->source.subsys.u.pci.addr, &netdef->forward.ifs[ii].device.pci)) { dev = &netdef->forward.ifs[ii]; break; @@ -4234,10 +4234,10 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) _("network '%s' doesn't have " "PCI device %04x:%02x:%02x.%x in use by domain"), netdef->name, - hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); goto error; } @@ -4380,8 +4380,8 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) for (ii = 0; ii < netdef->forward.nifs; ii++) { if (netdef->forward.ifs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI && - virDevicePCIAddressEqual(&hostdev->source.subsys.u.pci, - &netdef->forward.ifs[ii].device.pci)) { + virDevicePCIAddressEqual(&hostdev->source.subsys.u.pci.addr, + &netdef->forward.ifs[ii].device.pci)) { dev = &netdef->forward.ifs[ii]; break; } @@ -4392,10 +4392,10 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) _("network '%s' doesn't have " "PCI device %04x:%02x:%02x.%x in use by domain"), netdef->name, - hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); goto error; } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 05c12b2..9e7c1d5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4109,10 +4109,10 @@ qemuOpenPCIConfig(virDomainHostdevDefPtr dev) int configfd = -1; if (virAsprintf(&path, "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/config", - dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function) < 0) { + dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function) < 0) { virReportOOMError(); return -1; } @@ -4135,9 +4135,9 @@ qemuBuildPCIHostdevDevStr(virDomainHostdevDefPtr dev, const char *configfd, virBufferAddLit(&buf, "pci-assign"); virBufferAsprintf(&buf, ",host=%.2x:%.2x.%.1x", - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); virBufferAsprintf(&buf, ",id=%s", dev->info->alias); if (configfd && *configfd) virBufferAsprintf(&buf, ",configfd=%s", configfd); @@ -4167,9 +4167,9 @@ qemuBuildPCIHostdevPCIDevStr(virDomainHostdevDefPtr dev) char *ret = NULL; if (virAsprintf(&ret, "host=%.2x:%.2x.%.1x", - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function) < 0) + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function) < 0) virReportOOMError(); return ret; @@ -6766,10 +6766,10 @@ qemuBuildCommandLine(virConnectPtr conn, _("PCI device %04x:%02x:%02x.%x " "allocated from network %s is already " "in use by domain %s"), - hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function, + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function, net->data.network.name, def->name); goto error; @@ -8629,9 +8629,9 @@ qemuParseCommandLinePCI(const char *val) def->mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; def->managed = true; def->source.subsys.type = VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI; - def->source.subsys.u.pci.bus = bus; - def->source.subsys.u.pci.slot = slot; - def->source.subsys.u.pci.function = func; + def->source.subsys.u.pci.addr.bus = bus; + def->source.subsys.u.pci.addr.slot = slot; + def->source.subsys.u.pci.addr.function = func; return def; error: diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 0db9ddd..308fdcd 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -1,7 +1,7 @@ /* * qemu_hostdev.c: QEMU hostdev management * - * Copyright (C) 2006-2007, 2009-2012 Red Hat, Inc. + * Copyright (C) 2006-2007, 2009-2013 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -51,10 +51,10 @@ qemuGetPciHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); if (!dev) { virObjectUnref(list); return NULL; @@ -96,10 +96,10 @@ qemuGetActivePciHostDeviceList(virQEMUDriverPtr driver, if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); if (!dev) { virObjectUnref(list); return NULL; @@ -141,10 +141,10 @@ int qemuUpdateActivePciHostdevs(virQEMUDriverPtr driver, if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); if (!dev) goto cleanup; @@ -219,10 +219,10 @@ qemuDomainHostdevPciSysfsPath(virDomainHostdevDefPtr hostdev, char **sysfs_path) { virPCIDeviceAddress config_address; - config_address.domain = hostdev->source.subsys.u.pci.domain; - config_address.bus = hostdev->source.subsys.u.pci.bus; - config_address.slot = hostdev->source.subsys.u.pci.slot; - config_address.function = hostdev->source.subsys.u.pci.function; + config_address.domain = hostdev->source.subsys.u.pci.addr.domain; + config_address.bus = hostdev->source.subsys.u.pci.addr.bus; + config_address.slot = hostdev->source.subsys.u.pci.addr.slot; + config_address.function = hostdev->source.subsys.u.pci.addr.function; return virPCIDeviceAddressGetSysfsFile(&config_address, sysfs_path); } @@ -544,10 +544,10 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.domain, - hostdev->source.subsys.u.pci.bus, - hostdev->source.subsys.u.pci.slot, - hostdev->source.subsys.u.pci.function); + dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); /* original states "unbind_from_stub", "remove_slot", * "reprobe" were already set by pciDettachDevice in diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 238d0d7..f6b2fc8 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1035,7 +1035,7 @@ int qemuDomainAttachHostPciDevice(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorAddPCIHostDevice(priv->mon, - &hostdev->source.subsys.u.pci, + &hostdev->source.subsys.u.pci.addr, &guestAddr); qemuDomainObjExitMonitor(driver, vm); @@ -2343,8 +2343,8 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (qemuIsMultiFunctionDevice(vm->def, detach->info)) { virReportError(VIR_ERR_OPERATION_FAILED, _("cannot hot unplug multifunction PCI device: %.4x:%.2x:%.2x.%.1x"), - subsys->u.pci.domain, subsys->u.pci.bus, - subsys->u.pci.slot, subsys->u.pci.function); + subsys->u.pci.addr.domain, subsys->u.pci.addr.bus, + subsys->u.pci.addr.slot, subsys->u.pci.addr.function); goto cleanup; } @@ -2375,8 +2375,8 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, virObjectLock(driver->activePciHostdevs); virObjectLock(driver->inactivePciHostdevs); - pci = virPCIDeviceNew(subsys->u.pci.domain, subsys->u.pci.bus, - subsys->u.pci.slot, subsys->u.pci.function); + pci = virPCIDeviceNew(subsys->u.pci.addr.domain, subsys->u.pci.addr.bus, + subsys->u.pci.addr.slot, subsys->u.pci.addr.function); if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && @@ -2518,8 +2518,8 @@ int qemuDomainDetachHostDevice(virQEMUDriverPtr driver, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: virReportError(VIR_ERR_OPERATION_FAILED, _("host pci device %.4x:%.2x:%.2x.%.1x not found"), - subsys->u.pci.domain, subsys->u.pci.bus, - subsys->u.pci.slot, subsys->u.pci.function); + subsys->u.pci.addr.domain, subsys->u.pci.addr.bus, + subsys->u.pci.addr.slot, subsys->u.pci.addr.function); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: if (subsys->u.usb.bus && subsys->u.usb.device) { diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 9dd8d74..122edd4 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -823,10 +823,10 @@ AppArmorSetSecurityHostdevLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = - virPCIDeviceNew(dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + virPCIDeviceNew(dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (!pci) goto done; diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 38f7ba0..8576081 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -508,10 +508,10 @@ virSecurityDACSetSecurityHostdevLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = - virPCIDeviceNew(dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + virPCIDeviceNew(dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (!pci) goto done; @@ -588,10 +588,10 @@ virSecurityDACRestoreSecurityHostdevLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = - virPCIDeviceNew(dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + virPCIDeviceNew(dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (!pci) goto done; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index c620a2e..98e9183 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1340,10 +1340,10 @@ virSecuritySELinuxSetSecurityHostdevSubsysLabel(virDomainDefPtr def, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = - virPCIDeviceNew(dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + virPCIDeviceNew(dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (!pci) goto done; @@ -1502,10 +1502,10 @@ virSecuritySELinuxRestoreSecurityHostdevSubsysLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = - virPCIDeviceNew(dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + virPCIDeviceNew(dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (!pci) goto done; diff --git a/src/security/virt-aa-helper.c b/src/security/virt-aa-helper.c index 3d3b772..3c9515f 100644 --- a/src/security/virt-aa-helper.c +++ b/src/security/virt-aa-helper.c @@ -2,7 +2,7 @@ /* * virt-aa-helper: wrapper program used by AppArmor security driver. * - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * Copyright (C) 2009-2011 Canonical Ltd. * * This library is free software; you can redistribute it and/or @@ -1029,10 +1029,10 @@ get_files(vahControl * ctl) case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { virPCIDevicePtr pci = virPCIDeviceNew( - dev->source.subsys.u.pci.domain, - dev->source.subsys.u.pci.bus, - dev->source.subsys.u.pci.slot, - dev->source.subsys.u.pci.function); + dev->source.subsys.u.pci.addr.domain, + dev->source.subsys.u.pci.addr.bus, + dev->source.subsys.u.pci.addr.slot, + dev->source.subsys.u.pci.addr.function); if (pci == NULL) continue; diff --git a/src/xen/xend_internal.c b/src/xen/xend_internal.c index 83d6d39..860bf11 100644 --- a/src/xen/xend_internal.c +++ b/src/xen/xend_internal.c @@ -2565,7 +2565,7 @@ xenDaemonAttachDeviceFlags(virDomainPtr domain, virDevicePCIAddress PCIAddr; - PCIAddr = dev->data.hostdev->source.subsys.u.pci; + PCIAddr = dev->data.hostdev->source.subsys.u.pci.addr; if (virAsprintf(&target, "PCI device: %.4x:%.2x:%.2x", PCIAddr.domain, PCIAddr.bus, PCIAddr.slot) < 0) { virReportOOMError(); @@ -3753,10 +3753,10 @@ virDomainXMLDevID(virDomainPtr domain, virDomainHostdevDefPtr def = dev->data.hostdev; if (virAsprintf(&bdf, "%04x:%02x:%02x.%0x", - def->source.subsys.u.pci.domain, - def->source.subsys.u.pci.bus, - def->source.subsys.u.pci.slot, - def->source.subsys.u.pci.function) < 0) { + def->source.subsys.u.pci.addr.domain, + def->source.subsys.u.pci.addr.bus, + def->source.subsys.u.pci.addr.slot, + def->source.subsys.u.pci.addr.function) < 0) { virReportOOMError(); return -1; } diff --git a/src/xenxs/xen_sxpr.c b/src/xenxs/xen_sxpr.c index f2c345b..76e57ee 100644 --- a/src/xenxs/xen_sxpr.c +++ b/src/xenxs/xen_sxpr.c @@ -1089,10 +1089,10 @@ xenParseSxprPCI(virDomainDefPtr def, dev->mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; dev->managed = false; dev->source.subsys.type = VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI; - dev->source.subsys.u.pci.domain = domainID; - dev->source.subsys.u.pci.bus = busID; - dev->source.subsys.u.pci.slot = slotID; - dev->source.subsys.u.pci.function = funcID; + dev->source.subsys.u.pci.addr.domain = domainID; + dev->source.subsys.u.pci.addr.bus = busID; + dev->source.subsys.u.pci.addr.slot = slotID; + dev->source.subsys.u.pci.addr.function = funcID; if (VIR_REALLOC_N(def->hostdevs, def->nhostdevs+1) < 0) { virDomainHostdevDefFree(dev); @@ -2044,10 +2044,10 @@ xenFormatSxprPCI(virDomainHostdevDefPtr def, virBufferPtr buf) { virBufferAsprintf(buf, "(dev (domain 0x%04x)(bus 0x%02x)(slot 0x%02x)(func 0x%x))", - def->source.subsys.u.pci.domain, - def->source.subsys.u.pci.bus, - def->source.subsys.u.pci.slot, - def->source.subsys.u.pci.function); + def->source.subsys.u.pci.addr.domain, + def->source.subsys.u.pci.addr.bus, + def->source.subsys.u.pci.addr.slot, + def->source.subsys.u.pci.addr.function); } diff --git a/src/xenxs/xen_xm.c b/src/xenxs/xen_xm.c index 0e9f8d2..275e6ed 100644 --- a/src/xenxs/xen_xm.c +++ b/src/xenxs/xen_xm.c @@ -877,10 +877,10 @@ xenParseXM(virConfPtr conf, int xendConfigVersion, hostdev->managed = false; hostdev->source.subsys.type = VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI; - hostdev->source.subsys.u.pci.domain = domainID; - hostdev->source.subsys.u.pci.bus = busID; - hostdev->source.subsys.u.pci.slot = slotID; - hostdev->source.subsys.u.pci.function = funcID; + hostdev->source.subsys.u.pci.addr.domain = domainID; + hostdev->source.subsys.u.pci.addr.bus = busID; + hostdev->source.subsys.u.pci.addr.slot = slotID; + hostdev->source.subsys.u.pci.addr.function = funcID; if (VIR_REALLOC_N(def->hostdevs, def->nhostdevs+1) < 0) { virDomainHostdevDefFree(hostdev); @@ -1470,10 +1470,10 @@ xenFormatXMPCI(virConfPtr conf, char *buf; if (virAsprintf(&buf, "%04x:%02x:%02x.%x", - def->hostdevs[i]->source.subsys.u.pci.domain, - def->hostdevs[i]->source.subsys.u.pci.bus, - def->hostdevs[i]->source.subsys.u.pci.slot, - def->hostdevs[i]->source.subsys.u.pci.function) < 0) { + def->hostdevs[i]->source.subsys.u.pci.addr.domain, + def->hostdevs[i]->source.subsys.u.pci.addr.bus, + def->hostdevs[i]->source.subsys.u.pci.addr.slot, + def->hostdevs[i]->source.subsys.u.pci.addr.function) < 0) { virReportOOMError(); goto error; } -- 1.7.11.7

On 04/23/2013 10:45 AM, Laine Stump wrote:
There will soon be other items related to pci hostdevs that need to be in the same part of the hostdevsubsys union as the pci address (which is currently a single member called "pci". This patch replaces the single member named pci with a struct named pci that contains a single member named "addr". --- src/conf/domain_audit.c | 8 ++++---- src/conf/domain_conf.c | 12 ++++++------ src/conf/domain_conf.h | 4 +++- src/network/bridge_driver.c | 24 +++++++++++------------ src/qemu/qemu_command.c | 34 ++++++++++++++++---------------- src/qemu/qemu_hostdev.c | 42 ++++++++++++++++++++-------------------- src/qemu/qemu_hotplug.c | 14 +++++++------- src/security/security_apparmor.c | 8 ++++---- src/security/security_dac.c | 16 +++++++-------- src/security/security_selinux.c | 16 +++++++-------- src/security/virt-aa-helper.c | 10 +++++----- src/xen/xend_internal.c | 10 +++++----- src/xenxs/xen_sxpr.c | 16 +++++++-------- src/xenxs/xen_xm.c | 16 +++++++-------- 14 files changed, 116 insertions(+), 114 deletions(-)
Mostly mechanical; the meat of the patch is:
+++ b/src/conf/domain_conf.h @@ -399,7 +399,9 @@ struct _virDomainHostdevSubsys { unsigned vendor; unsigned product; } usb; - virDevicePCIAddress pci; /* host address */ + struct { + virDevicePCIAddress addr; /* host address */ + } pci; } u;
here. ACK.
@@ -2343,8 +2343,8 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (qemuIsMultiFunctionDevice(vm->def, detach->info)) { virReportError(VIR_ERR_OPERATION_FAILED, _("cannot hot unplug multifunction PCI device: %.4x:%.2x:%.2x.%.1x"), - subsys->u.pci.domain, subsys->u.pci.bus, - subsys->u.pci.slot, subsys->u.pci.function); + subsys->u.pci.addr.domain, subsys->u.pci.addr.bus, + subsys->u.pci.addr.slot, subsys->u.pci.addr.function);
Spacing was unusual here, but that's pre-existing.
goto cleanup; }
@@ -2375,8 +2375,8 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver,
virObjectLock(driver->activePciHostdevs); virObjectLock(driver->inactivePciHostdevs); - pci = virPCIDeviceNew(subsys->u.pci.domain, subsys->u.pci.bus, - subsys->u.pci.slot, subsys->u.pci.function); + pci = virPCIDeviceNew(subsys->u.pci.addr.domain, subsys->u.pci.addr.bus, + subsys->u.pci.addr.slot, subsys->u.pci.addr.function);
And again. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

A domain's <interface> or <hostdev>, as well as a <network>'s <forward>, can now have an optional <driver name='kvm|vfio'/> element. As of this patch, there is no functionality behind this new knob - this patch adds support to the domain and network formatter/parser, and to the RNG and documentation. When then backend is added, legacy KVM PCI device assignment will continue to be used when no driver name is specified (or if <driver name='kvm'/> is specified), but if driver name is 'vfio', the new UEFI Secure Boot compatible VFIO device assignment will be used. Note that the parser doesn't automatically insert the current default value of this setting. This is done on purpose because the two possibilities are functionally equivalent from the guest's point of view, and we want to be able to automatically start using vfio as the default (even for existing domains) at some time in the future. This is similar to what was done with the "vhost" driver option in <interface>. --- docs/formatdomain.html.in | 42 ++++++++++++++++++++++- docs/formatnetwork.html.in | 15 ++++++++ docs/schemas/domaincommon.rng | 79 ++++++++++++++++++++++++++++--------------- docs/schemas/network.rng | 11 ++++++ src/conf/domain_conf.c | 36 ++++++++++++++++++++ src/conf/domain_conf.h | 11 ++++++ 6 files changed, 166 insertions(+), 28 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 888c005..0e5b00c 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2340,7 +2340,22 @@ the device as can be found with the <code>lspci</code> or with <code>virsh nodedev-list</code>. <a href="#elementsAddress">See above</a> for - more details on the address element. + more details on the address element.</dd> + <dt><code>driver</code></dt> + <dd> + PCI devices can have an optional <code>driver</code> + subelement that specifies which backend driver to use for PCI + device assignment. Use the <code>name</code> attribute to + select either "vfio" (for the new VFIO device assignment + backend, which is compatible with UEFI SecureBoot) or "kvm" + (for the legacy device assignment handled directly by the KVM + kernel module)<span class="since">Since 1.0.5 (QEMU and KVM + only, requires kernel 3.6 or newer)</span>. Currently, "kvm" + is the default used by libvirt when not explicitly provided, + but since the two are functionally equivalent, this default + could be changed in the future with no impact to domains that + don't specify anything. + </dd> </dl> @@ -2947,6 +2962,18 @@ </p> <p> + To use VFIO device assignment rather than traditional/legacy KVM + device assignment (VFIO is a new method of device assignment + that is compatible with UEFI Secure Boot), a type='hostdev' + interface can have an optional <code>driver</code> sub-element + with a <code>name</code> attribute set to "vfio". To use legacy + KVM device assignment you can set <code>name</code> to "kvm" (or + simply omit the <code><driver></code> element, since "kvm" + is currently the default). + <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span> + </p> + + <p> Note that this "intelligent passthrough" of network devices is very similar to the functionality of a standard <hostdev> device, the difference being that this method allows specifying @@ -2964,6 +2991,7 @@ ... <devices> <interface type='hostdev'> + <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </source> @@ -3095,6 +3123,18 @@ qemu-kvm -net nic,model=? /dev/null to 'qemu' without error. <span class="since">Since 0.8.8 (QEMU and KVM only)</span> </dd> + <dd> + For interfaces of type='hostdev' (PCI passthrough devices) + the <code>name</code> attribute can optionally be set to + "vfio" or "kvm". "vfio" tells libvirt to use VFIO device + assignment rather than traditional KVM device assignment (VFIO + is a new method of device assignment that is compatible with + UEFI Secure Boot), and "kvm" tells libvirt to use the legacy + device assignment performed directly by the kvm kernel module + (the default is currently "kvm", but is subject to change). + <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span> + </dd> + <dt><code>txmode</code></dt> <dd> The <code>txmode</code> attribute specifies how to handle diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index 4dd0415..b188896 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -279,6 +279,20 @@ use the traditional <code>< hostdev></code> device definition. <span class="since"> Since 0.10.0</span> + <p> + To use VFIO device assignment rather than + traditional/legacy KVM device assignment (VFIO is a new + method of device assignment that is compatible with UEFI + Secure Boot), a <forward type='hostdev'> interface + can have an optional <code>driver</code> sub-element + with a <code>name</code> attribute set to "vfio". To use + legacy KVM device assignment you can + set <code>name</code> to "kvm" (or simply omit the + <driver> element, since "kvm" is currently the + default). + <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span> + </p> + <p>Note that this "intelligent passthrough" of network devices is very similar to the functionality of a standard <code>< hostdev></code> device, the @@ -360,6 +374,7 @@ <pre> ... <forward mode='hostdev' managed='yes'> + <driver name='vfio'/> <address type='pci' domain='0' bus='4' slot='0' function='1'/> <address type='pci' domain='0' bus='4' slot='0' function='2'/> <address type='pci' domain='0' bus='4' slot='0' function='3'/> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 3976b82..0bab37b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1923,28 +1923,40 @@ </optional> <optional> <element name="driver"> - <optional> - <attribute name="name"> - <choice> - <value>qemu</value> - <value>vhost</value> - </choice> - </attribute> - </optional> - <optional> - <attribute name="txmode"> - <choice> - <value>iothread</value> - <value>timer</value> - </choice> - </attribute> - </optional> - <optional> - <ref name="ioeventfd"/> - </optional> - <optional> - <ref name="event_idx"/> - </optional> + <choice> + <group> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + </group> + <group> + <optional> + <attribute name="name"> + <choice> + <value>qemu</value> + <value>vhost</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="txmode"> + <choice> + <value>iothread</value> + <value>timer</value> + </choice> + </attribute> + </optional> + <optional> + <ref name="ioeventfd"/> + </optional> + <optional> + <ref name="event_idx"/> + </optional> + </group> + </choice> <empty/> </element> </optional> @@ -3084,14 +3096,27 @@ <attribute name="type"> <value>pci</value> </attribute> - <element name="source"> + <interleave> <optional> - <ref name="startupPolicy"/> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/> + </element> </optional> - <element name="address"> - <ref name="pciaddress"/> + <element name="source"> + <optional> + <ref name="startupPolicy"/> + </optional> + <element name="address"> + <ref name="pciaddress"/> + </element> </element> - </element> + </interleave> </define> <define name="hostdevsubsysusb"> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6c3fae2..493edae 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -149,6 +149,17 @@ </attribute> </element> </optional> + <optional> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/> + </element> + </optional> </interleave> </element> </optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b7fea36..ae1f6e2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -581,6 +581,12 @@ VIR_ENUM_IMPL(virDomainHostdevSubsys, VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST, "usb", "pci") +VIR_ENUM_IMPL(virDomainHostdevSubsysPciBackend, + VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST, + "default", + "kvm", + "vfio") + VIR_ENUM_IMPL(virDomainHostdevCaps, VIR_DOMAIN_HOSTDEV_CAPS_TYPE_LAST, "storage", "misc", @@ -3599,6 +3605,8 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, { xmlNodePtr sourcenode; char *managed = NULL; + char *backendStr = NULL; + int backend; int ret = -1; /* @managed can be read from the xml document - it is always an @@ -3651,7 +3659,20 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: if (virDomainHostdevSubsysPciDefParseXML(sourcenode, def, flags) < 0) goto error; + + backend = VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_DEFAULT; + if ((backendStr = virXPathString("string(./driver/@name)", ctxt)) && + (((backend = virDomainHostdevSubsysPciBackendTypeFromString(backendStr)) < 0) || + backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_DEFAULT)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown PCI device <driver name='%s'/> " + "has been specified"), backendStr); + goto error; + } + def->source.subsys.u.pci.backend = backend; + break; + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: if (virDomainHostdevSubsysUsbDefParseXML(sourcenode, def) < 0) goto error; @@ -3662,9 +3683,11 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, virDomainHostdevSubsysTypeToString(def->source.subsys.type)); goto error; } + ret = 0; error: VIR_FREE(managed); + VIR_FREE(backendStr); return ret; } @@ -13665,6 +13688,19 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, unsigned int flags, bool includeTypeInAddr) { + if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + def->source.subsys.u.pci.backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_DEFAULT) { + const char *backend = virDomainHostdevSubsysPciBackendTypeToString(def->source.subsys.u.pci.backend); + + if (!backend) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unexpected pci hostdev driver name type %d"), + def->source.subsys.u.pci.backend); + return -1; + } + virBufferAsprintf(buf, "<driver name='%s'/>\n", backend); + } + virBufferAddLit(buf, "<source"); if (def->startupPolicy) { const char *policy; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index a9f86c0..bb75da8 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -384,6 +384,16 @@ enum virDomainHostdevSubsysType { VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST }; +/* the backend driver used for PCI hostdev devices */ +typedef enum { + VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_DEFAULT, /* currently kvm, could change */ + VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_KVM, /* force legacy kvm style */ + VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_VFIO, /* force vfio */ + + VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST +} virDomainHostdevSubsysPciBackendType; + +VIR_ENUM_DECL(virDomainHostdevSubsysPciBackend) typedef struct _virDomainHostdevSubsys virDomainHostdevSubsys; typedef virDomainHostdevSubsys *virDomainHostdevSubsysPtr; @@ -401,6 +411,7 @@ struct _virDomainHostdevSubsys { } usb; struct { virDevicePCIAddress addr; /* host address */ + int backend; /* enum virDomainHostdevSubsysPciBackendType */ } pci; } u; }; -- 1.7.11.7

On 04/23/2013 10:45 AM, Laine Stump wrote:
A domain's <interface> or <hostdev>, as well as a <network>'s <forward>, can now have an optional <driver name='kvm|vfio'/> element. As of this patch, there is no functionality behind this new knob - this patch adds support to the domain and network formatter/parser, and to the RNG and documentation.
When then backend is added, legacy KVM PCI device assignment will
s/then/the/
continue to be used when no driver name is specified (or if <driver name='kvm'/> is specified), but if driver name is 'vfio', the new UEFI Secure Boot compatible VFIO device assignment will be used.
Note that the parser doesn't automatically insert the current default value of this setting. This is done on purpose because the two possibilities are functionally equivalent from the guest's point of view, and we want to be able to automatically start using vfio as the default (even for existing domains) at some time in the future. This is similar to what was done with the "vhost" driver option in <interface>.
Yes, leaving an unspecified element as hypervisor default has worked for us in the past, especially when the choice of host driver should not be guest-visible.
--- docs/formatdomain.html.in | 42 ++++++++++++++++++++++- docs/formatnetwork.html.in | 15 ++++++++ docs/schemas/domaincommon.rng | 79 ++++++++++++++++++++++++++++--------------- docs/schemas/network.rng | 11 ++++++ src/conf/domain_conf.c | 36 ++++++++++++++++++++ src/conf/domain_conf.h | 11 ++++++ 6 files changed, 166 insertions(+), 28 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 888c005..0e5b00c 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2340,7 +2340,22 @@ the device as can be found with the <code>lspci</code> or with <code>virsh nodedev-list</code>. <a href="#elementsAddress">See above</a> for - more details on the address element. + more details on the address element.</dd> + <dt><code>driver</code></dt> + <dd> + PCI devices can have an optional <code>driver</code> + subelement that specifies which backend driver to use for PCI + device assignment. Use the <code>name</code> attribute to + select either "vfio" (for the new VFIO device assignment + backend, which is compatible with UEFI SecureBoot) or "kvm" + (for the legacy device assignment handled directly by the KVM + kernel module)<span class="since">Since 1.0.5 (QEMU and KVM + only, requires kernel 3.6 or newer)</span>. Currently, "kvm" + is the default used by libvirt when not explicitly provided, + but since the two are functionally equivalent, this default + could be changed in the future with no impact to domains that + don't specify anything.
IIRC, we have talked about managing host devices across all hypervisors, since it is a shared host resource. If that's true, then what happens if LXC wants to start using a driver? Or should LXC only use vfio and never use kvm? Also, didn't you say that xen used yet another driver, in which case we should consider listing that in the RNG.
+ device assignment performed directly by the kvm kernel module + (the default is currently "kvm", but is subject to change). + <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span>
Long line (several times).
+++ b/docs/schemas/domaincommon.rng @@ -1923,28 +1923,40 @@ </optional> <optional> <element name="driver"> - <optional> - <attribute name="name"> - <choice> - <value>qemu</value> - <value>vhost</value> - </choice> - </attribute> - </optional> - <optional> - <attribute name="txmode"> - <choice> - <value>iothread</value> - <value>timer</value> - </choice> - </attribute> - </optional> - <optional> - <ref name="ioeventfd"/> - </optional> - <optional> - <ref name="event_idx"/> - </optional> + <choice> + <group> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + </group> + <group> + <optional> + <attribute name="name"> + <choice> + <value>qemu</value> + <value>vhost</value>
So for define='interface-options', we are extending the exiting <driver> so that there are now four values for <driver name=.../>, but only the two older options allow other attributes. Can the two ever overlap, or are we safe in that kvm/vfio are for hostdev interface passthrough, while qemu/vhost is for emulation, so the two types of driver choices never make sense at once?
+ </choice> + </attribute> + </optional> + <optional> + <attribute name="txmode"> + <choice> + <value>iothread</value> + <value>timer</value> + </choice> + </attribute> + </optional> + <optional> + <ref name="ioeventfd"/> + </optional> + <optional> + <ref name="event_idx"/> + </optional> + </group> + </choice> <empty/> </element> </optional> @@ -3084,14 +3096,27 @@ <attribute name="type"> <value>pci</value> </attribute> - <element name="source"> + <interleave> <optional> - <ref name="startupPolicy"/> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/>
Whereas for define='hostdevsubsyspci', we are completely adding a new subelement, with only two choices for the new attribute.
+ </element> </optional> - <element name="address"> - <ref name="pciaddress"/> + <element name="source"> + <optional> + <ref name="startupPolicy"/> + </optional> + <element name="address"> + <ref name="pciaddress"/> + </element> </element> - </element> + </interleave> </define>
<define name="hostdevsubsysusb"> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6c3fae2..493edae 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -149,6 +149,17 @@ </attribute> </element> </optional> + <optional> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/> + </element> + </optional> </interleave>
Looks okay to me. I'm not sure if we are ready to push this without more patches in the series, but at least I think the XML is on the right track. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/23/2013 07:34 PM, Eric Blake wrote:
On 04/23/2013 10:45 AM, Laine Stump wrote:
A domain's <interface> or <hostdev>, as well as a <network>'s <forward>, can now have an optional <driver name='kvm|vfio'/> element. As of this patch, there is no functionality behind this new knob - this patch adds support to the domain and network formatter/parser, and to the RNG and documentation.
When then backend is added, legacy KVM PCI device assignment will s/then/the/
continue to be used when no driver name is specified (or if <driver name='kvm'/> is specified), but if driver name is 'vfio', the new UEFI Secure Boot compatible VFIO device assignment will be used.
Note that the parser doesn't automatically insert the current default value of this setting. This is done on purpose because the two possibilities are functionally equivalent from the guest's point of view, and we want to be able to automatically start using vfio as the default (even for existing domains) at some time in the future. This is similar to what was done with the "vhost" driver option in <interface>. Yes, leaving an unspecified element as hypervisor default has worked for us in the past, especially when the choice of host driver should not be guest-visible.
Doing that has turned out badly, though, in the case where the choice of driver *does* make a visible difference in the guest (I'm thinking of <interface> model), which is why I felt the need to explain my decision.
--- docs/formatdomain.html.in | 42 ++++++++++++++++++++++- docs/formatnetwork.html.in | 15 ++++++++ docs/schemas/domaincommon.rng | 79 ++++++++++++++++++++++++++++--------------- docs/schemas/network.rng | 11 ++++++ src/conf/domain_conf.c | 36 ++++++++++++++++++++ src/conf/domain_conf.h | 11 ++++++ 6 files changed, 166 insertions(+), 28 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 888c005..0e5b00c 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2340,7 +2340,22 @@ the device as can be found with the <code>lspci</code> or with <code>virsh nodedev-list</code>. <a href="#elementsAddress">See above</a> for - more details on the address element. + more details on the address element.</dd> + <dt><code>driver</code></dt> + <dd> + PCI devices can have an optional <code>driver</code> + subelement that specifies which backend driver to use for PCI + device assignment. Use the <code>name</code> attribute to + select either "vfio" (for the new VFIO device assignment + backend, which is compatible with UEFI SecureBoot) or "kvm" + (for the legacy device assignment handled directly by the KVM + kernel module)<span class="since">Since 1.0.5 (QEMU and KVM + only, requires kernel 3.6 or newer)</span>. Currently, "kvm" + is the default used by libvirt when not explicitly provided, + but since the two are functionally equivalent, this default + could be changed in the future with no impact to domains that + don't specify anything. IIRC, we have talked about managing host devices across all hypervisors, since it is a shared host resource. If that's true, then what happens if LXC wants to start using a driver? Or should LXC only use vfio and never use kvm?
Well, nobody but the qemu hypervisor driver can/should ever use "kvm". In general, the contents of a <driver> subelement don't need to be identical for all hypervisors, so it's okay if other hypervisor drivers ignore it, or have their own set of attributes/values.
Also, didn't you say that xen used yet another driver, in which case we should consider listing that in the RNG.
Just as kvm has its own internal device assignment implementation, I guess so does xen. kvm likes having devices bound to the "pci-stub" driver before it starts messing with them, and xen likes having them bound to the "pciback" driver. Since I don't know what would be the proper name for xen's internal implementation, and they don't support anything else anyway, I don't think we need to add anything for it yet. In the future, xen may decide to support VFIO device assignment too, in which case they could also recognize the "vfio" value.
+ device assignment performed directly by the kvm kernel module + (the default is currently "kvm", but is subject to change). + <span class="since">Since 1.0.5 (QEMU and KVM only, requires kernel 3.6 or newer)</span> Long line (several times).
+++ b/docs/schemas/domaincommon.rng @@ -1923,28 +1923,40 @@ </optional> <optional> <element name="driver"> - <optional> - <attribute name="name"> - <choice> - <value>qemu</value> - <value>vhost</value> - </choice> - </attribute> - </optional> - <optional> - <attribute name="txmode"> - <choice> - <value>iothread</value> - <value>timer</value> - </choice> - </attribute> - </optional> - <optional> - <ref name="ioeventfd"/> - </optional> - <optional> - <ref name="event_idx"/> - </optional> + <choice> + <group> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + </group> + <group> + <optional> + <attribute name="name"> + <choice> + <value>qemu</value> + <value>vhost</value> So for define='interface-options', we are extending the exiting <driver>
s/exiting/existing/ (I couldn't pass up a chance to do a little "review of the review" :-)
so that there are now four values for <driver name=.../>, but only the two older options allow other attributes.
Can the two ever overlap, or are we safe in that kvm/vfio are for hostdev interface passthrough, while qemu/vhost is for emulation, so the two types of driver choices never make sense at once?
No, they can never overlap. qemu and vhost are only used if the interface model is virtio, and kvm/vfio are only used if the type of the interface is hostdev. The one problematic situation would be if someone had this: <interface type='network'> <source network='mynet'/> <model name='virtio'/> <driver name='qemu'/> ... </interface> (so the <interface> is expecting mynet to just be some type of bridge or macvtap network) but the "mynet" network was actually a pool of hostdevs. In this case the <driver name='qemu'/> from the <interface> would be ignored (just as the <model name='virtio'/> is ignored), but <driver name> could be specified in the network, and would be retrieved from there. Anyway, I don't see that ever cropping up in real life - nobody is going to point their <interface> at a network where they don't know in advance whether or not it will be a <forward mode='hostdev'> network - if they did that, they would have no idea what kind of network device the guest would see.
+ </choice> + </attribute> + </optional> + <optional> + <attribute name="txmode"> + <choice> + <value>iothread</value> + <value>timer</value> + </choice> + </attribute> + </optional> + <optional> + <ref name="ioeventfd"/> + </optional> + <optional> + <ref name="event_idx"/> + </optional> + </group> + </choice> <empty/> </element> </optional> @@ -3084,14 +3096,27 @@ <attribute name="type"> <value>pci</value> </attribute> - <element name="source"> + <interleave> <optional> - <ref name="startupPolicy"/> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/> Whereas for define='hostdevsubsyspci', we are completely adding a new subelement, with only two choices for the new attribute.
Right, because there wasn't previously any <driver> element for <hostdev>
+ </element> </optional> - <element name="address"> - <ref name="pciaddress"/> + <element name="source"> + <optional> + <ref name="startupPolicy"/> + </optional> + <element name="address"> + <ref name="pciaddress"/> + </element> </element> - </element> + </interleave> </define>
<define name="hostdevsubsysusb"> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6c3fae2..493edae 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -149,6 +149,17 @@ </attribute> </element> </optional> + <optional> + <element name="driver"> + <attribute name="name"> + <choice> + <value>kvm</value> + <value>vfio</value> + </choice> + </attribute> + <empty/> + </element> + </optional> </interleave> Looks okay to me.
I'm not sure if we are ready to push this without more patches in the series, but at least I think the XML is on the right track.
Yeah, this patch can't go in without some functionality behind it. Thanks for looking through this though. You did manage to pick up on one of the things that caused the most thought by me (the qemu/vhost vs. kvm/vfio thing for <interface>).

On 04/23/2013 12:45 PM, Laine Stump wrote:
This work isn't finished, but since there's a freeze coming up in a few days, I thought I should send these patches that *are* finished to get them out of the way before the last minute rush.
Patch 1 is trivial.
Patch 2 is longer, but completely mechanical (NOTE: there may be some uses of that struct living in code that I lack the environment to compile for, so test builds by people with odd platforms would be appreciated!)
Patch 3 is standard parser/formatter/RNG/docs for the new XML. I haven't added any test cases, because lacking backend functionality I could only test xml-to-xml, which would require adding yet another test xml file, and I'd rather wait until the backend commandline-building code is in place, then simply add the new element to a few existing test XML files.
In email awhile ago, danpb had suggested
<driver name='vfio|qemu'/>
I've used
<driver name='kvm'/>
because, unlike vhost-net where <driver name='qemu'/> indicates that the driver used is in the usermode qemu process (and *not* in the kernel), in this case <driver name='kvm'/> indicates that the driver used is in the kernel (and I've seen it referenced in several places as being "done by KVM", but never as "done by QEMU" (makes sense, since qemu is all in userland)
Laine Stump (3): qemu: detect vfio-pci device assignment support conf: put hostdev pci address in a struct conf: formatter/parser/RNG/docs for hostdev <driver name='kvm|vfio'/>
I pushed these three patches after making the small changes Eric requested. Thanks for the reviews.
participants (2)
-
Eric Blake
-
Laine Stump