[libvirt] [PATCH 0/6] Use more PCIe, less legacy PCI slots

These patches use three methods to get more of the PCI devices onto PCIe slots on Q35 and aarch64/virt machinetypes: 1) When virtio devices can present themselves as PCIe if they're plugged into a PCIe controller, do that. (This capability is detected by looking for presence of the "disable-modern" option on the virtio-net device. If you have a better idea for how to detect it, please let me know.) 2) Any devices that aren't hotpluggable anyway will no longer request a hotplug-capable slot. This, along with a change to auto-assign legacy PCI devices to pcie-root (as long as they don't require hotplug) means the devices will now be assigned to pcie-root. 3) Also using the fact that devices that won't be hotplugged can be assigned to pcie-root, the devices that *do* support hotplug have a new optional subelement "<hotplug require='no'/>" (it defaults to "yes" for historical reasons). If hotplug require is set to 'no', even a PCI device can be auto-assigned to pcie-root. I haven't yet removed the dmi-to-pci-bridge that is added by default, and we still will only auto-add pci-bridge (so if you're adding a virtio device without <hostplug require='no'/> then you will also need to add a <controller type='pci' model='pcie-root-port'/> to plug it into). These 6 patches together resolve: https://bugzilla.redhat.com/show_bug.cgi?id=1330024 They need to be applied on top of the series I sent on Friday: https://www.redhat.com/archives/libvir-list/2016-August/msg00380.html Laine Stump (6): conf: restrict what type of buses will accept a pci-bridge conf: permit auto-assignment of *non-hotpluggable* PCI devices to pcie-root qemu: add capabilities bit for virtio-net "disable-modern" option conf: new <hotplug require='yes|no'/> element for hotpluggable devices qemu: don't require a hotpluggable slot for devices that can't be hotplugged qemu: auto-assign virtio devices to PCIe slots when appropriate docs/formatdomain.html.in | 37 +++++++ docs/schemas/domaincommon.rng | 17 +++ src/conf/device_conf.h | 15 +++ src/conf/domain_addr.c | 44 ++++++-- src/conf/domain_addr.h | 4 +- src/conf/domain_conf.c | 28 +++++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain_address.c | 94 ++++++++++++++-- tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml | 1 + .../caps_2.6.0-gicv2.aarch64.xml | 1 + .../caps_2.6.0-gicv3.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + tests/qemuxml2argvdata/qemuxml2argv-autoindex.args | 22 ++-- .../qemuxml2argv-bios-nvram-secure.args | 2 +- .../qemuxml2argv-machine-smm-opt.args | 2 +- .../qemuxml2argv-q35-hotpluggable.args | 50 +++++++++ .../qemuxml2argv-q35-hotpluggable.xml | 62 +++++++++++ .../qemuxml2argv-q35-pm-disable-fallback.args | 3 +- .../qemuxml2argv-q35-pm-disable.args | 3 +- .../qemuxml2argv-q35-usb2-multi.args | 10 +- .../qemuxml2argv-q35-usb2-reorder.args | 10 +- .../qemuxml2argv-q35-virtio-pcie.args | 54 +++++++++ .../qemuxml2argv-q35-virtio-pcie.xml | 69 ++++++++++++ tests/qemuxml2argvtest.c | 18 +++ .../qemuxml2xmlout-autoindex.xml | 18 +-- .../qemuxml2xmlout-q35-hotpluggable.xml | 103 +++++++++++++++++ .../qemuxml2xmlout-q35-usb2-multi.xml | 8 +- .../qemuxml2xmlout-q35-usb2-reorder.xml | 8 +- .../qemuxml2xmlout-q35-virtio-pcie.xml | 123 +++++++++++++++++++++ tests/qemuxml2xmltest.c | 15 +++ 34 files changed, 765 insertions(+), 64 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml -- 2.7.4

A pci-bridge has *almost* the same rules as a legacy PCI endpoint device for where it can be automatically connected, and until now both had been considered identical. There is one pairing that is okay when specifically requested by the user (i.e. manual assignment), but we want to avoid it when auto-assigning addresses - plugging a pci-bridge directly into pcie-root (it is cleaner to plug in a dmi-to-pci-bridge, then plug the pci-bridge into that). In order to allow that difference, this patch makes a separate CONNECT_TYPE for pci-bridge, and uses it to restrict auto-assigned addresses for pci-bridges to be only on pci-root, pci-expander-bus, dmi-to-pci-bridge, or on another pci-bridge. This makes no difference for now, but an upcoming patch is going to allow non-hotpluggable PCI endpoint devices to be plugged directly into pcie-root, and without this differentiating patch, pci-bridges would also be auto-assigned to pcie-root. --- src/conf/domain_addr.c | 29 ++++++++++++++++++++++------- src/conf/domain_addr.h | 4 +++- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 92f9573..8956295 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -51,11 +51,14 @@ virDomainPCIControllerModelToConnectType(virDomainControllerModelPCI model) return 0; case VIR_DOMAIN_CONTROLLER_MODEL_PCI_BRIDGE: - /* pci-bridge is treated like a standard - * PCI endpoint device, because it can plug into any - * standard PCI slot (it just can't be hotplugged). + /* pci-bridge is treated like a standard PCI endpoint device + * when its address is manually assigned, but needs special + * treatment when auto-assigning addresses (in that case we + * avoid plugging into anything except pci-root, + * dmi-to-pci-bridge, pci-expander-bus, or another + * pci-bridge). */ - return VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + return VIR_PCI_CONNECT_TYPE_PCI_BRIDGE; case VIR_DOMAIN_CONTROLLER_MODEL_PCI_EXPANDER_BUS: /* pci-expander-bus can only be plugged into pci-root @@ -118,6 +121,12 @@ virDomainPCIAddressFlagsCompatible(virPCIDeviceAddressPtr addr, */ if (devFlags & VIR_PCI_CONNECT_HOTPLUGGABLE) busFlags |= VIR_PCI_CONNECT_HOTPLUGGABLE; + /* if the device is a pci-bridge, allow manually + * assigning to any bus that would also accept a + * standard PCI device. + */ + if (devFlags & VIR_PCI_CONNECT_TYPE_PCI_BRIDGE) + devFlags |= VIR_PCI_CONNECT_TYPE_PCI_DEVICE; } /* If this bus doesn't allow the type of connection (PCI @@ -144,6 +153,8 @@ virDomainPCIAddressFlagsCompatible(virPCIDeviceAddressPtr addr, connectStr = "pci-expander-bus"; } else if (devFlags & VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS) { connectStr = "pcie-expander-bus"; + } else if (devFlags & VIR_PCI_CONNECT_TYPE_PCI_BRIDGE) { + connectStr = "pci-bridge"; } else { /* this should never happen. If it does, there is a * bug in the code that sets the flag bits for devices. @@ -253,19 +264,22 @@ virDomainPCIAddressBusSetModel(virDomainPCIAddressBusPtr bus, case VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT: bus->flags = (VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE | + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE | VIR_PCI_CONNECT_TYPE_PCI_EXPANDER_BUS); bus->minSlot = 1; bus->maxSlot = VIR_PCI_ADDRESS_SLOT_LAST; break; case VIR_DOMAIN_CONTROLLER_MODEL_PCI_BRIDGE: bus->flags = (VIR_PCI_CONNECT_HOTPLUGGABLE | - VIR_PCI_CONNECT_TYPE_PCI_DEVICE); + VIR_PCI_CONNECT_TYPE_PCI_DEVICE | + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE); bus->minSlot = 1; bus->maxSlot = VIR_PCI_ADDRESS_SLOT_LAST; break; case VIR_DOMAIN_CONTROLLER_MODEL_PCI_EXPANDER_BUS: bus->flags = (VIR_PCI_CONNECT_HOTPLUGGABLE | - VIR_PCI_CONNECT_TYPE_PCI_DEVICE); + VIR_PCI_CONNECT_TYPE_PCI_DEVICE | + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE); bus->minSlot = 0; bus->maxSlot = VIR_PCI_ADDRESS_SLOT_LAST; break; @@ -286,7 +300,8 @@ virDomainPCIAddressBusSetModel(virDomainPCIAddressBusPtr bus, case VIR_DOMAIN_CONTROLLER_MODEL_DMI_TO_PCI_BRIDGE: /* slots 0 - 31, standard PCI slots, * but *not* hot-pluggable */ - bus->flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + bus->flags = (VIR_PCI_CONNECT_TYPE_PCI_DEVICE | + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE); bus->minSlot = 0; bus->maxSlot = VIR_PCI_ADDRESS_SLOT_LAST; break; diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index 596cd4c..0072a08 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -43,6 +43,7 @@ typedef enum { VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE = 1 << 6, VIR_PCI_CONNECT_TYPE_PCI_EXPANDER_BUS = 1 << 7, VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS = 1 << 8, + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE = 1 << 9, } virDomainPCIConnectFlags; /* a combination of all bits that describe the type of connections @@ -55,7 +56,8 @@ typedef enum { VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT | \ VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE | \ VIR_PCI_CONNECT_TYPE_PCI_EXPANDER_BUS | \ - VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS) + VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS | \ + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE) /* combination of all bits that could be used to connect a normal * endpoint device (i.e. excluding the connection possible between an -- 2.7.4

Although the slots in pcie-root (aka pcie.0 aka "the PCIe Root Complex") are PCIe slots, and the PCIe spec usually doesn't like putting legacy PCI devices in PCIe slots, these are an exception - "according to my sources" (i.e. I haven't read the spec myself, but know people who have), the PCIe spec expressly permits legacy PCI devices plugged directly into the root complex. This patch changes the connection-type flags for pcie-root to allow legacy PCI endpoint devices. For now this has no effect in practice, since qemuDomainAssignDevicePCISlots() sets the HOTPLUGGABLE flag for all devices except PCI controllers, and pcie-root doesn't allow devices that have HOTPLUGGABLE set - that will be remedied in later patches. --- src/conf/domain_addr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 8956295..5533b11 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -285,12 +285,12 @@ virDomainPCIAddressBusSetModel(virDomainPCIAddressBusPtr bus, break; case VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT: - /* slots 1 - 31, no hotplug, PCIe endpoint device or - * pcie-root-port only, unless the address was specified in - * user config *and* the particular device being attached also - * allows it. + /* slots 1 - 31, *no hotplug* + * both PCIe and legacy PCI devices are allowed, as well as + * pcie-root-port, dmi-to-pci-bridge, and pcie-expander-bus. */ bus->flags = (VIR_PCI_CONNECT_TYPE_PCIE_DEVICE | + VIR_PCI_CONNECT_TYPE_PCI_DEVICE | VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT | VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE | VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS); -- 2.7.4

The presence of this option on virtio-net also indicates that it is available for other virtio devices, and that those devices appear as PCIe devices when plugged into a PCIe controller. --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0-gicv2.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0-gicv3.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + 9 files changed, 10 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 43e3ea7..54f6955 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -340,6 +340,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "display", /* 230 */ "intel-iommu", "smm", + "virtio-net-disable-modern", ); @@ -1579,6 +1580,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBlk[] = { static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioNet[] = { { "tx", QEMU_CAPS_VIRTIO_TX_ALG }, { "event_idx", QEMU_CAPS_VIRTIO_NET_EVENT_IDX }, + { "disable-modern", QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioSCSI[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index d249e2e..dd6b4ca 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -373,6 +373,7 @@ typedef enum { QEMU_CAPS_DISPLAY, /* -display */ QEMU_CAPS_DEVICE_INTEL_IOMMU, /* -device intel-iommu */ QEMU_CAPS_MACHINE_SMM_OPT, /* -machine xxx,smm=on/off/auto */ + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN, /* -device virtio-net,disable-modern */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml index 339ee1f..4dbcea5 100644 --- a/tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml @@ -184,6 +184,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2004000</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml index c1a68d0..3e2038f 100644 --- a/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml @@ -189,6 +189,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2005000</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0-gicv2.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.6.0-gicv2.aarch64.xml index 85d7d3f..2acb6a2 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0-gicv2.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0-gicv2.aarch64.xml @@ -158,6 +158,7 @@ <flag name='tls-creds-x509'/> <flag name='display'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2005094</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0-gicv3.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.6.0-gicv3.aarch64.xml index deb1257..396cee4 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0-gicv3.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0-gicv3.aarch64.xml @@ -158,6 +158,7 @@ <flag name='tls-creds-x509'/> <flag name='display'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2005094</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml b/tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml index 2b7ea0e..3d65867 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml @@ -152,6 +152,7 @@ <flag name='tls-creds-x509'/> <flag name='display'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2005094</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml index 495c114..0a65fb2 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml @@ -195,6 +195,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml index fafffa6..f52a4c2 100644 --- a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml @@ -192,6 +192,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='virtio-net-disable-modern'/> <version>2006091</version> <kvmVersion>0</kvmVersion> <package> (v2.7.0-rc1-52-g42e0d60)</package> -- 2.7.4

When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug). In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug: <hotplug require='no'/> For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge. NB: According to the code in qemuDomainAssignDevicePCISlots() and qemuDomainAttachDeviceLive(), the following device types are the only ones that use a PCI address *and* support hotplug: <interface> <hostdev> <rng> <serial> <disk> (only virtio) <controller> (hotplug only for SCSI controllers) These devices use a PCI address, but don't support hotplug, so the RNG doesn't allow them to have a <hotplug> element: <filesystem> <sound> <balloon> <watchdog> <video> <shmem> <input> --- docs/formatdomain.html.in | 37 ++++++++ docs/schemas/domaincommon.rng | 17 ++++ src/conf/device_conf.h | 15 +++ src/conf/domain_addr.c | 7 ++ src/conf/domain_conf.c | 28 ++++++ .../qemuxml2argv-q35-hotpluggable.args | 50 ++++++++++ .../qemuxml2argv-q35-hotpluggable.xml | 62 +++++++++++++ tests/qemuxml2argvtest.c | 8 ++ .../qemuxml2xmlout-q35-hotpluggable.xml | 103 +++++++++++++++++++++ tests/qemuxml2xmltest.c | 5 + 10 files changed, 332 insertions(+) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index eddb8dd..d5292f1 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3038,6 +3038,43 @@ assign a PCI address for the device rather than some other type of address that may also be appropriate for that same device (e.g. virtio-mmio). + <p> + If a device that needs a PCI address doesn't have one + explicitly provided in the XML configuration, libvirt will + automatically assign an appropriate address before saving + the configuration. During this automatic assignment, if a + device supports being hotplugged/unplugged (attaching and + detaching the device while the virtual machine is running) + libirt's default behavior is to assign that device to a slot + on a PCI controller that also supports having its devices + hotplugged/unplugged. <i>(In the qemu driver as of libvirt + 2.2.0, these devices can be hotplugged: <interface>, + <hostdev>, <rng>, <disk> (virtio-only), + <controller> (SCSI controllers only), and + <serial> (PCI serial only). Likewise, the following + PCI controller models support having their connected devices + hotplugged: pci-root, pci-bridge, pci-expander-bus, + pcie-root-bus, and pcie-switch-downstream-port.)</i> To allow + auto-assigning hotpluggable devices to PCI controllers that + don't support hotplug (for example, to assign the device to + pcie-root), those devices that support hotplug can have an + optional + <code><hotplug></code> element with a single + attribute <code>require</code> that can be used to inform + libvirt that this particular device <b>will not</b> require + hotplugging; libvirt will then look for open slots in all + appropriate controllers regardless of whether or not they + support hotplug. + </p> + <pre> + ... + <interface type='network'> + <source network='default'/> + <mac address='52:54:00:5d:c7:9e'/> + <hotplug require='no'/> + </interface> + ...</pre> + </dd> <dt><code>drive</code></dt> <dd>Drive addresses have the following additional diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 052f28c..e45bec1 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1171,6 +1171,7 @@ <optional> <ref name="alias"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> @@ -1745,6 +1746,7 @@ <optional> <ref name="alias"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> @@ -2619,6 +2621,7 @@ <optional> <ref name="alias"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> @@ -3274,6 +3277,7 @@ <optional> <ref name="alias"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> @@ -3877,6 +3881,7 @@ <optional> <ref name="rom"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> @@ -4717,6 +4722,17 @@ </element> </define> + <define name="hotplug"> + <optional> + <element name="hotplug"> + <attribute name="require"> + <ref name="virYesNo"/> + </attribute> + <empty/> + </element> + </optional> + </define> + <define name="memorydev"> <element name="memory"> <attribute name="model"> @@ -4786,6 +4802,7 @@ <optional> <ref name="alias"/> </optional> + <ref name="hotplug"/> <optional> <ref name="address"/> </optional> diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index 8443de6..236bc96 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -142,6 +142,21 @@ typedef struct _virDomainDeviceInfo { /* bootIndex is only used for disk, network interface, hostdev * and redirdev devices */ unsigned int bootIndex; + /* hotpluggable values: + + * "no" - there is definitely no need for this device to be in a + * slot that supports hotplug. + * + * "yes" - assume hotplugging is required and refuse to put in any + * slot that doesn't support hotplug. (if the device + * itself doesn't support hotplug, that is an error) + * + * unspecified - the device can *manually* be placed in a + * non-hotpluggable slot, but that won't be automatically + * done (unless the device itself doesn't support + * hotplug). + */ + int hotpluggable; /* virTristateBool */ } virDomainDeviceInfo, *virDomainDeviceInfoPtr; diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 5533b11..8f2a78f 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -709,6 +709,13 @@ virDomainPCIAddressReserveNextSlot(virDomainPCIAddressSetPtr addrs, virDomainPCIConnectFlags flags) { virPCIDeviceAddress addr; + + /* if config says to no require hotplug, then don't check for + * it when looking for an open address. + */ + if (dev->hotpluggable == VIR_TRISTATE_BOOL_NO) + flags &= ~VIR_PCI_CONNECT_HOTPLUGGABLE; + if (virDomainPCIAddressGetNextSlot(addrs, &addr, flags) < 0) return -1; diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 82876f3..8ebbca9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4765,6 +4765,18 @@ virDomainDeviceInfoFormat(virBufferPtr buf, virBufferAddLit(buf, "/>\n"); } + if (info->hotpluggable) { + const char *require = virTristateBoolTypeToString(info->hotpluggable); + + if (!require) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unexpected hotplug require value %d"), + info->hotpluggable); + return -1; + } + virBufferAsprintf(buf, "<hotplug require='%s'/>\n", require); + } + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE || info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_S390) return 0; @@ -5277,6 +5289,7 @@ virDomainDeviceInfoParseXML(xmlNodePtr node, xmlNodePtr alias = NULL; xmlNodePtr boot = NULL; xmlNodePtr rom = NULL; + xmlNodePtr hotplug = NULL; char *type = NULL; int ret = -1; @@ -5303,6 +5316,9 @@ virDomainDeviceInfoParseXML(xmlNodePtr node, (flags & VIR_DOMAIN_DEF_PARSE_ALLOW_ROM) && xmlStrEqual(cur->name, BAD_CAST "rom")) { rom = cur; + } else if (hotplug == NULL && + xmlStrEqual(cur->name, BAD_CAST "hotplug")) { + hotplug = cur; } } cur = cur->next; @@ -5335,6 +5351,18 @@ virDomainDeviceInfoParseXML(xmlNodePtr node, info->romfile = virXMLPropString(rom, "file"); } + if (hotplug) { + char *require = virXMLPropString(hotplug, "require"); + if (require && + ((info->hotpluggable = virTristateBoolTypeFromString(require)) <= 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown hotplug require value '%s'"), require); + VIR_FREE(require); + goto cleanup; + } + VIR_FREE(require); + } + if (!address) return 0; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args new file mode 100644 index 0000000..0020bfe --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args @@ -0,0 +1,50 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/libexec/qemu-kvm \ +-name q35-test \ +-S \ +-M q35 \ +-m 2048 \ +-smp 2,sockets=2,cores=1,threads=1 \ +-uuid 11dbdcdd-4c3b-482b-8903-9bdb8c0a2774 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-q35-test/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ +-device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 \ +-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ +-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,\ +addr=0x1d \ +-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ +-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ +-device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk1 \ +-device virtio-blk-pci,bus=pci.2,addr=0x3,drive=drive-virtio-disk1,\ +id=virtio-disk1 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk2 \ +-device virtio-blk-pci,bus=pcie.0,addr=0x4,drive=drive-virtio-disk2,\ +id=virtio-disk2 \ +-device rtl8139,vlan=0,id=net0,mac=00:11:22:33:44:55,bus=pci.2,addr=0x1 \ +-net user,vlan=0,name=hostnet0 \ +-device rtl8139,vlan=1,id=net1,mac=00:11:22:33:44:55,bus=pcie.0,addr=0x2 \ +-net user,vlan=1,name=hostnet1 \ +-device virtio-net-pci,vlan=2,id=net2,mac=00:11:22:33:44:55,bus=pci.2,addr=0x2 \ +-net user,vlan=2,name=hostnet2 \ +-device virtio-net-pci,vlan=3,id=net3,mac=00:11:22:33:44:55,bus=pcie.0,\ +addr=0x3 \ +-net user,vlan=3,name=hostnet3 \ +-vga qxl \ +-global qxl-vga.ram_size=67108864 \ +-global qxl-vga.vram_size=33554432 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x4 \ +-object rng-random,id=objrng0,filename=/dev/random \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x5 \ +-object rng-random,id=objrng1,filename=/dev/random \ +-device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.2,addr=0x5 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.xml b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.xml new file mode 100644 index 0000000..5c826ba --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.xml @@ -0,0 +1,62 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + </disk> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1'/> + </video> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <hotplug require='no'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <hotplug require='no'/> + </interface> + <rng model='virtio'> + <hotplug require='no'/> + <backend model='random'>/dev/random</backend> + </rng> + <rng model='virtio'> + <hotplug require='yes'/> + <backend model='random'>/dev/random</backend> + </rng> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 2de4f0a..2be634c 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1677,6 +1677,14 @@ mymain(void) QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL); + DO_TEST("q35-hotpluggable", + QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM); DO_TEST("pcie-root-port", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml new file mode 100644 index 0000000..e771125 --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml @@ -0,0 +1,103 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <controller type='usb' index='0' model='ich9-ehci1'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci1'> + <master startport='0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci2'> + <master startport='2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci3'> + <master startport='4'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='dmi-to-pci-bridge'> + <model name='i82801b11-bridge'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> + </controller> + <controller type='pci' index='2' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1' primary='yes'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> + </memballoon> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </rng> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + <hotplug require='yes'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/> + </rng> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 7601a5f..2d8145b 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -691,6 +691,11 @@ mymain(void) QEMU_CAPS_ICH9_AHCI, QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL); + DO_TEST("q35-hotpluggable", + QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_ICH9_AHCI, QEMU_CAPS_PCI_MULTIFUNCTION, + QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL); DO_TEST("pcie-root", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, -- 2.7.4

On 08/08/2016 04:56 AM, Laine Stump wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug).
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge.
I've been playing around with this and, by itself, it works very well. With this solved, combined with taking advantage of PCIe for virtio when available, it's very easy to create q35 domains that have no legacy-PCI without needing to resort to manually assigning addresses. However, there is still another item that we need to be able to configure - stating a preference of legacy PCI vs. PCIe when both are available for a device (again, the aim is to do this *without* needing to manually assign an address). The following devices have this choice: 1) vfio assigned devices 2) virtio devices 3) the nec-xhci USB controller You might think that it would always be preferable to use PCIe if it's available, but especially in these "early days" of using PCIe in guests it would be useful to have to ability to *easily* force use of a legacy PCI slot in case some PCIe-related bug is encountered (in particular, people have pointed out in discussions about vfio device assignment that it could be possible for a guest OS to misbehave when presented with a device's PCIe configuration block (which hasn't been visible in the past because the device was attached to a legacy PCI slot)). In order maintain functionality while any such bugs are figured out and fixed, we need to be able to force the device onto a PCI slot. There are two ways of doing this: 1) manually specify the full PCI address of a legacy PCI slot in the config 2) provide an option in the config that simply says "use any PCI slot" or "use any PCIe slot". Assuming that (1) is too cumbersome, we need to come up with a reasonable name/location for a config option (providing the backend for it will be trivial). Some possible places: 2a) add a new attribute to the <address> element I don't like this option because that makes it impossible to easily force re-addressing of the devices in a domain by simply removing all the <address> lines. (Yes, I know that's a non-issue in production, especially when there is some other management system (OpenStack, oVirt) sitting on top of libvirt. But it is a *big* help for developers who are messing around with it). 2b) Add a new attribute to an existing subelement, e.g. <target preferredBus='pci'/> This makes parsing and formatting cumbersome, because every device type has its own code to parse/format its <target> subelement. Also, in the case of <interface>, the <target> subelement is being mis-used to hold the name *on the host* of the tap device, and it would be confusing to see something like this: <target dev='vnet1' preferredBus='pci'/> 2c) add a new subelement just for this, e.g. <bus prefer='pci'/> or ??? I don't like this because it adds to the toplevel clutter in the devices. XML's hierarchichal structure is useful to organize attributes so they are easier to comprehend, and we should take advantage of that as much as possible. 2d) Try to find a common subelement that can be used for *all* address assignment preferences/restrictions, including hotplug. This is the option that has prompted my writing this message in response to my own patch mail. What it, instead of: <hotplug require='no'/> <bus prefer='pci'/> (or whatever) we had something like this? <addressPreferences hotplug='no' bus='pci'/> (*PLEASE* think of a better name!) In my mind, the choice is between 1 and 2d - if everyone thinks this is something only needed during a short transitional stage, maybe (1) is an adequate solution. If not, then we should decide now on the name for this option, and potentially rename the hotplug option accordingly. What are your opinions? (BTW, just to throw another wrench into the works - I think it would also be useful to be able to specify a numa node for devices, so that a device could be placed on a particular numa node in the guest (i.e. on a particular pci[e]-expander-bus or one of its subordinate buses) without needing to know the full PCI address. That could be done by specifying it the same way it's done in the pci[e]-expander-bus itself: <hostdev ....> <target> <node>2</node> </target> ... </hostdev> or it could be made a part of this new proposed element: <addressPreferences hotplug='no' bus='pci' numa='2'/> This is something that we will want in the long term (not just a temporary method of working around potential bugs), so if we're going to want it in a separate element rather than in <target>, we'll need to consider it *now* in order to avoid giving the wrong name to the new hotplug option defined in the parent of this message.)

On Mon, Aug 08, 2016 at 12:41:48PM -0400, Laine Stump wrote:
On 08/08/2016 04:56 AM, Laine Stump wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug).
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge.
I've been playing around with this and, by itself, it works very well. With this solved, combined with taking advantage of PCIe for virtio when available, it's very easy to create q35 domains that have no legacy-PCI without needing to resort to manually assigning addresses.
However, there is still another item that we need to be able to configure - stating a preference of legacy PCI vs. PCIe when both are available for a device (again, the aim is to do this *without* needing to manually assign an address). The following devices have this choice:
1) vfio assigned devices 2) virtio devices 3) the nec-xhci USB controller
You might think that it would always be preferable to use PCIe if it's available, but especially in these "early days" of using PCIe in guests it would be useful to have to ability to *easily* force use of a legacy PCI slot in case some PCIe-related bug is encountered (in particular, people have pointed out in discussions about vfio device assignment that it could be possible for a guest OS to misbehave when presented with a device's PCIe configuration block (which hasn't been visible in the past because the device was attached to a legacy PCI slot)).
In order maintain functionality while any such bugs are figured out and fixed, we need to be able to force the device onto a PCI slot. There are two ways of doing this:
1) manually specify the full PCI address of a legacy PCI slot in the config 2) provide an option in the config that simply says "use any PCI slot" or "use any PCIe slot".
Assuming that (1) is too cumbersome, we need to come up with a reasonable name/location for a config option (providing the backend for it will be trivial). Some possible places:
I prefer a variant on (1), which is to specifcy an address with only the controller index filled out. eg given a q35 bridge topology <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='dmi-to-pci-bridge'> <model name='i82801b11-bridge'/> </controller> <controller type='pci' index='2' model='pci-bridge'> <model name='pci-bridge'/> <target chassisNr='56'/> </controller> A device would use <address type="pci" controller="2"> (for pci-bridge placement) <address type="pci" controller="0"> (for pcie-root placement) This trivially expaneds to cover the NUMA use case too, without having to invent further elements duplicating NUMA node info again. <controller type='pci' index='0' model='pci-root'/> <controller type='pci' index='1' model='pci-expander-bus'> <model name='pxb'/> <target busNr='254'> <node>0</node> </target> </controller> <controller type='pci' index='2' model='pci-expander-bus'> <model name='pxb'/> <target busNr='255'> <node>1</node> </target> </controller> eg device uses <address type="pci" controller="1"> (for pxb on NUMA node 0) <address type="pci" controller="2"> (for pxb on NUMA node 1) Yes, when you first boot a guest, this means the mgmt app has to know what controllers exist by default, and/or specify controllers, but I think that's ultimately preferrable than inventing an indefinitely growing list of extra XML elements to provide tweaks to teh PCI address assginment logic. We can simplify life of mgmt apps in a different way, but using the domain XML capabilities to provide full data on the default controllers used for each machine type. So apps would not need to have any machine type specific logic in them - they can write code that's entirely metadata driven based on the domain capabilities. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/09/2016 04:05 AM, Daniel P. Berrange wrote:
On Mon, Aug 08, 2016 at 12:41:48PM -0400, Laine Stump wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug).
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge. I've been playing around with this and, by itself, it works very well. With
On 08/08/2016 04:56 AM, Laine Stump wrote: this solved, combined with taking advantage of PCIe for virtio when available, it's very easy to create q35 domains that have no legacy-PCI without needing to resort to manually assigning addresses.
However, there is still another item that we need to be able to configure - stating a preference of legacy PCI vs. PCIe when both are available for a device (again, the aim is to do this *without* needing to manually assign an address). The following devices have this choice:
1) vfio assigned devices 2) virtio devices 3) the nec-xhci USB controller
You might think that it would always be preferable to use PCIe if it's available, but especially in these "early days" of using PCIe in guests it would be useful to have to ability to *easily* force use of a legacy PCI slot in case some PCIe-related bug is encountered (in particular, people have pointed out in discussions about vfio device assignment that it could be possible for a guest OS to misbehave when presented with a device's PCIe configuration block (which hasn't been visible in the past because the device was attached to a legacy PCI slot)).
In order maintain functionality while any such bugs are figured out and fixed, we need to be able to force the device onto a PCI slot. There are two ways of doing this:
1) manually specify the full PCI address of a legacy PCI slot in the config 2) provide an option in the config that simply says "use any PCI slot" or "use any PCIe slot". Assuming that (1) is too cumbersome, we need to come up with a reasonable name/location for a config option (providing the backend for it will be trivial). Some possible places: I prefer a variant on (1), which is to specifcy an address with only the controller index filled out. eg given a q35 bridge topology
<controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='dmi-to-pci-bridge'> <model name='i82801b11-bridge'/> </controller> <controller type='pci' index='2' model='pci-bridge'> <model name='pci-bridge'/> <target chassisNr='56'/> </controller>
A device would use
<address type="pci" controller="2"> (for pci-bridge placement) <address type="pci" controller="0"> (for pcie-root placement)
This trivially expaneds to cover the NUMA use case too, without having to invent further elements duplicating NUMA node info again.
<controller type='pci' index='0' model='pci-root'/> <controller type='pci' index='1' model='pci-expander-bus'> <model name='pxb'/> <target busNr='254'> <node>0</node> </target> </controller> <controller type='pci' index='2' model='pci-expander-bus'> <model name='pxb'/> <target busNr='255'> <node>1</node> </target> </controller>
eg device uses
<address type="pci" controller="1"> (for pxb on NUMA node 0) <address type="pci" controller="2"> (for pxb on NUMA node 1)
Yes, when you first boot a guest, this means the mgmt app has to know what controllers exist by default, and/or specify controllers,
It also has to know the rules about which controllers support what type of devices, and which controllers can plug into which other controllers - the full topology of PCI controllers will need to be spelled out specifically in advance, so that the management app can give the index number of the correct controller. As it stands now, some amount of this is already necessary, just because libvirt currently will only automatically add pci-bridge devices when there aren't enough PCI slots available (the code to automatically add pcie controllers as necessary is yet to be written). Still, this is all it takes to get 8 properly connected downstream-ports: <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-switch-upstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> (note that you don't need to say where they are connected, nor do you need to give an index to each one. As long as there is an available slot of the correct type for each controller as it's encountered, libvirt will hook them up and index them correctly) If the controller number was needed to specify in the device address, then not only would each controller need to be manually given an index (if you were specifying the entire domain in a single go), but each device would need to be given a unique setting - instead of just saying "I want one of these" (in the future, if there isn't "one of those" available, everything necessary to provide it will be automatically created), you have to say "I want *this* one, implying that you know everything about what "this one" provides (and you will need to manually setup "this one" and everything else required to provide it). Also, note that since each pci-root-port and pci-switch-downstream-port has only a single slot, in those cases this: <address type="pci" controller="1"/> is really just: <address type="pci" bus="1"/> and for the other cases (pci[e]-root, pci-bridge, dmi-to-pci-bridge) you will still have to keep track of how many devices you've assigned to a particular controller, to make sure that your request doesn't overflow the 31 (or 32) device limit (which brings up another problem of doing it this way - different controllers have different numbers of useful slots, so the management app will have to know about that too).
but I think that's ultimately preferrable than inventing an indefinitely growing list of extra XML elements to provide tweaks to teh PCI address assginment logic.
We can simplify life of mgmt apps in a different way, but using the domain XML capabilities to provide full data on the default controllers used for each machine type.
And pcie/pci info on every endpoint device (I agree with your assertion that we should treat every device as potentially capable of hotplug, so we don't need that info for endpoints).
So apps would not need to have any machine type specific logic in them - they can write code that's entirely metadata driven based on the domain capabilities.
The metadata would need to include: 1) upstream connection type (each type of PCI controller needs a different connection type. I've learned through all this that each controller has different rules about what it can be connected to. So far there are 9 different types, including pci-endpoint and pcie-endpoint) 2) accepted downstream connection types 3) min and max slot (currently known possibilities: 0-0, 0-31, or 0-32) 4) hotplug supported or not (yes, the management app really would need to know about this, even if they intended for all devices to be hotpluggable - they need to be able to avoid pcie-root and dmi-to-pci-bridge) Then the management app would need to keep track of which addresses are in use for each controller, so that it would be able to avoid putting too many devices on any controller. Also, the management app will now need to know for *every* device whether that device in this particular version of qemu is a PCIe device, a legacy PCI device, or one of the silly "dual mode" devices that changes according to the type of bus it's plugged into. Otherwise it won't know which controller it should plug the device into. Finally, the management app would need to add logic to grow the bus topology correctly as needed. Sometimes this is simple and sometimes not. For example, if there is an open slot available on pcie-root, you can get another hotpluggable pcie slot by simply adding a pcie-root-port. If not, then you need to find an open pcie-switch-downstream-port or pcie-root-port, then connect a pcie-switch-upstream-port to that, and connect one or more pcie-switch-downstream-ports to that. (note this implies that you should always maintain at least one unoccupied root-port, downstream-port, or an open slot on pcie-root). The whole point of my exercise has been to 1) get rid of as much legacy PCI baggage as possible, and 2) do this in a way that prevents/eliminates the need for multiple management apps to re-implement and maintain essentially the same code (especially since my trip through PCIe-land has shown me that it is fraught with lack of documentation (about the controller implementations, not the PCIe spec), misinformation (changing over time), misconceptions, and lots of annoying little details). I started out kind of liking your alternate idea, since it seemed to accomplish that in a simpler way that didn't require adding yet another knob later. But after thinking it through, I think that it provides very little extra functionality, and would require a lot of extra work for the management app, even in a simple case where you didn't care about PCI vs PCIe. In the end, my opinion is that the job of creating a proper PCIe topology and assigning devices to the correct address in that topology is a non-trivial job that should be implemented in one place. I think libvirt is in the proper place to do that, it just needs to be provided with a small amount of information from the user/management app so that libvirt won't be making any policy decisions, but just assuring that those policies are properly applied.

On Tue, Aug 09, 2016 at 02:14:15PM -0400, Laine Stump wrote:
On 08/09/2016 04:05 AM, Daniel P. Berrange wrote:
On Mon, Aug 08, 2016 at 12:41:48PM -0400, Laine Stump wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug).
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge. I've been playing around with this and, by itself, it works very well. With
On 08/08/2016 04:56 AM, Laine Stump wrote: this solved, combined with taking advantage of PCIe for virtio when available, it's very easy to create q35 domains that have no legacy-PCI without needing to resort to manually assigning addresses.
However, there is still another item that we need to be able to configure - stating a preference of legacy PCI vs. PCIe when both are available for a device (again, the aim is to do this *without* needing to manually assign an address). The following devices have this choice:
1) vfio assigned devices 2) virtio devices 3) the nec-xhci USB controller
You might think that it would always be preferable to use PCIe if it's available, but especially in these "early days" of using PCIe in guests it would be useful to have to ability to *easily* force use of a legacy PCI slot in case some PCIe-related bug is encountered (in particular, people have pointed out in discussions about vfio device assignment that it could be possible for a guest OS to misbehave when presented with a device's PCIe configuration block (which hasn't been visible in the past because the device was attached to a legacy PCI slot)).
In order maintain functionality while any such bugs are figured out and fixed, we need to be able to force the device onto a PCI slot. There are two ways of doing this:
1) manually specify the full PCI address of a legacy PCI slot in the config 2) provide an option in the config that simply says "use any PCI slot" or "use any PCIe slot". Assuming that (1) is too cumbersome, we need to come up with a reasonable name/location for a config option (providing the backend for it will be trivial). Some possible places: I prefer a variant on (1), which is to specifcy an address with only the controller index filled out. eg given a q35 bridge topology
<controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='dmi-to-pci-bridge'> <model name='i82801b11-bridge'/> </controller> <controller type='pci' index='2' model='pci-bridge'> <model name='pci-bridge'/> <target chassisNr='56'/> </controller>
A device would use
<address type="pci" controller="2"> (for pci-bridge placement) <address type="pci" controller="0"> (for pcie-root placement)
This trivially expaneds to cover the NUMA use case too, without having to invent further elements duplicating NUMA node info again.
<controller type='pci' index='0' model='pci-root'/> <controller type='pci' index='1' model='pci-expander-bus'> <model name='pxb'/> <target busNr='254'> <node>0</node> </target> </controller> <controller type='pci' index='2' model='pci-expander-bus'> <model name='pxb'/> <target busNr='255'> <node>1</node> </target> </controller>
eg device uses
<address type="pci" controller="1"> (for pxb on NUMA node 0) <address type="pci" controller="2"> (for pxb on NUMA node 1)
Yes, when you first boot a guest, this means the mgmt app has to know what controllers exist by default, and/or specify controllers,
It also has to know the rules about which controllers support what type of devices, and which controllers can plug into which other controllers - the full topology of PCI controllers will need to be spelled out specifically in advance, so that the management app can give the index number of the correct controller.
As it stands now, some amount of this is already necessary, just because libvirt currently will only automatically add pci-bridge devices when there aren't enough PCI slots available (the code to automatically add pcie controllers as necessary is yet to be written). Still, this is all it takes to get 8 properly connected downstream-ports:
<controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-switch-upstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/> <controller type='pci' model='pcie-switch-downstream-port'/>
(note that you don't need to say where they are connected, nor do you need to give an index to each one. As long as there is an available slot of the correct type for each controller as it's encountered, libvirt will hook them up and index them correctly)
If the controller number was needed to specify in the device address, then not only would each controller need to be manually given an index (if you were specifying the entire domain in a single go), but each device would need to be given a unique setting - instead of just saying "I want one of these" (in the future, if there isn't "one of those" available, everything necessary to provide it will be automatically created), you have to say "I want *this* one, implying that you know everything about what "this one" provides (and you will need to manually setup "this one" and everything else required to provide it).
Also, note that since each pci-root-port and pci-switch-downstream-port has only a single slot, in those cases this:
<address type="pci" controller="1"/>
is really just:
<address type="pci" bus="1"/>
and for the other cases (pci[e]-root, pci-bridge, dmi-to-pci-bridge) you will still have to keep track of how many devices you've assigned to a particular controller, to make sure that your request doesn't overflow the 31 (or 32) device limit (which brings up another problem of doing it this way - different controllers have different numbers of useful slots, so the management app will have to know about that too).
but I think that's ultimately preferrable than inventing an indefinitely growing list of extra XML elements to provide tweaks to teh PCI address assginment logic.
We can simplify life of mgmt apps in a different way, but using the domain XML capabilities to provide full data on the default controllers used for each machine type.
And pcie/pci info on every endpoint device (I agree with your assertion that we should treat every device as potentially capable of hotplug, so we don't need that info for endpoints).
So apps would not need to have any machine type specific logic in them - they can write code that's entirely metadata driven based on the domain capabilities.
The metadata would need to include:
1) upstream connection type (each type of PCI controller needs a different connection type. I've learned through all this that each controller has different rules about what it can be connected to. So far there are 9 different types, including pci-endpoint and pcie-endpoint) 2) accepted downstream connection types 3) min and max slot (currently known possibilities: 0-0, 0-31, or 0-32) 4) hotplug supported or not (yes, the management app really would need to know about this, even if they intended for all devices to be hotpluggable - they need to be able to avoid pcie-root and dmi-to-pci-bridge)
Then the management app would need to keep track of which addresses are in use for each controller, so that it would be able to avoid putting too many devices on any controller.
Also, the management app will now need to know for *every* device whether that device in this particular version of qemu is a PCIe device, a legacy PCI device, or one of the silly "dual mode" devices that changes according to the type of bus it's plugged into. Otherwise it won't know which controller it should plug the device into.
Finally, the management app would need to add logic to grow the bus topology correctly as needed. Sometimes this is simple and sometimes not. For example, if there is an open slot available on pcie-root, you can get another hotpluggable pcie slot by simply adding a pcie-root-port. If not, then you need to find an open pcie-switch-downstream-port or pcie-root-port, then connect a pcie-switch-upstream-port to that, and connect one or more pcie-switch-downstream-ports to that. (note this implies that you should always maintain at least one unoccupied root-port, downstream-port, or an open slot on pcie-root).
The whole point of my exercise has been to 1) get rid of as much legacy PCI baggage as possible, and 2) do this in a way that prevents/eliminates the need for multiple management apps to re-implement and maintain essentially the same code (especially since my trip through PCIe-land has shown me that it is fraught with lack of documentation (about the controller implementations, not the PCIe spec), misinformation (changing over time), misconceptions, and lots of annoying little details). I started out kind of liking your alternate idea, since it seemed to accomplish that in a simpler way that didn't require adding yet another knob later. But after thinking it through, I think that it provides very little extra functionality, and would require a lot of extra work for the management app, even in a simple case where you didn't care about PCI vs PCIe.
In the end, my opinion is that the job of creating a proper PCIe topology and assigning devices to the correct address in that topology is a non-trivial job that should be implemented in one place. I think libvirt is in the proper place to do that, it just needs to be provided with a small amount of information from the user/management app so that libvirt won't be making any policy decisions, but just assuring that those policies are properly applied.
Ultimately what you are suggesting here is to stuff a massive amount of VM configuration policy logic inside libvirt have a bunch of extra XML elements which control how that policy works. When some application finds some aspect of the policy that doesn't work for them, they'll have to solve all of the hard problems themselves anyway, or live with a policy that doesn't work right for them. Ultimately we'll end up adding countless new XML knobs to libvirt whose sole purpose is to be inputs to ever more complex policy inside libvirt. We've had a long standing rule that libvirt avoids implementing policy rules whereever possible, because it is inevitable that you'll never satisfy all downstream consumers of libvirt. IMHO this rule has been one of our best decisions and so I'm not in favour of abandoning that now. While we do have some existing PCI addressing logic, this was unavoidable when we first added it, due to the need to move address assignment out of QEMU and into libvirt, while maintaining back-compat for existing apps using libvirt. This is now talking about adding logic for cases where we have no corresponding logic in QEMU and no need for backcompat, so there is not a compelling reason why this has to live in libvirt. So I'm strong -1 on the entire approach in this patch series. I think it needs to be re-worked so that we can expose the neccessary information in domain capabilities to allow applications using libvirt to implement their own policies. This would also allow us to create a shared library outside of libvirt which can implement some "standard" policies which can be shared if apps so wish to have them eg this is the roll libvirt-designer was intended to fill - provide an opinionated policy driver API. The attractive thing about using libvirt-designer or an equivalent API layer outside libvirt, so that all the policy control knobs will exist as actual APIs inside of extra XML elements.. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Aug 08, 2016 at 04:56:55AM -0400, Laine Stump wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug).
I'm not really seeing why lack of use of pcie-root is a real problem. It makes total sense to always use the bridge instead of root if the latter does not support hotplug. At the time you first create a guest, you may not even realize that you'll need to be able to hot-unplug a particular device later. So putting any device on pcie-root is tieing your hands for the entire lifetime of the guest.
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge.
I still really hate the idea of adding elements to the XML config that have nothing todo with the actual guest ABI config, but instead are setting logic policy in libvirt. But more generally I don't see why anyone would ever want to set require=no for this - it just ties your hands for later.
NB: According to the code in qemuDomainAssignDevicePCISlots() and qemuDomainAttachDeviceLive(), the following device types are the only ones that use a PCI address *and* support hotplug:
<interface> <hostdev> <rng> <serial> <disk> (only virtio) <controller> (hotplug only for SCSI controllers)
These devices use a PCI address, but don't support hotplug, so the RNG doesn't allow them to have a <hotplug> element:
<filesystem> <sound> <balloon> <watchdog> <video> <shmem> <input>
IMHO it is really bad practice to make any assumption about what devices support hotplug & what don't. In fact if any device type does not support hotplug I think it is fair to argue that is simply a bug / missing feature in libvirt right now. I see no reason why any of those devices should not be able to support hotplug in general. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/09/2016 03:54 AM, Daniel P. Berrange wrote:
When faced with a guest device that requires a PCI address but doesn't have one manually assigned in the config, libvirt has always insisted (well... *tried* to insist) on auto-assigning an address that is on a PCI controller that supports hotplug. One big problem with this is that it prevents automatic use of a Q35 (or aarch64/virt) machine's pcie-root (since the PCIe root complex doesn't support hotplug). I'm not really seeing why lack of use of pcie-root is a real problem. It makes total sense to always use the bridge instead of root if the latter does not support hotplug. At the time you first create a guest, you may not even realize that you'll need to be able to hot-unplug a particular device later. So putting any device on pcie-root is tieing your hands for
On Mon, Aug 08, 2016 at 04:56:55AM -0400, Laine Stump wrote: the entire lifetime of the guest.
1) simpler bus topology, less devices - the less there is, the less resources are used, the less can go wrong (among other advantages) 2) In particular, there is a lot of call for making Q35 and aarch64/virt machines "legacy-PCI-free". But some of the emulated devices are PCI-only, and until PCIe alternatives are available, this provides a method of getting the use of these devices (yes, I realize that in itself screws over the "legacy-PCI-free" demand) at least without requiring the addition of a pci-bridge or the dreaded dmi-to-pci-bridge. 3) Even for PCIe devices, there currently is no way to have a hotpluggable device without using a vendor-specific PCI controller that doesn't necessarily exist on the real hardware that is being emulated (pcie-root-port emulates an Intel IOH3420, and pcie-switch-downstream-port is a TI X3130; I have serious doubts that either of those are used (or will ever be used) in any real aarch64 hardware). Requiring those controllers to be added into the virtual machine may lead to incompatibilities with OSes that have been designed only for real hardware and may be lacking the proper drivers for these controllers. 4) libvirt should not be making policy decisions about things like whether or not a device should be hotpluggable. On the 440fx, *all* devices are in hotplug-capable slots, so this is a non-issue. On Q35 and other machinetypes with a pcie-root, there is a choice, and that choice should be made by the user/management app, not forced on them by libvirt. As far as not realizing that you'll need to hot-unplug some device: that's true and that's why the default is to assume hotplug will be needed. But certainly for most devices you can say with a very high degree of certainty that you'll never hot-unplug them. Why force extra infrastructure for something that will definitely never be used? Although it shouldn't be the default, we really do need a way to allow avoiding the extra trappings of hotplug without requiring the user to specify the full address.
In order to promote simpler domain configs (more devices on pcie-root rather than on a pci-bridge), this patch adds a new sub-element to all guest devices that have a PCI address and support hotplug:
<hotplug require='no'/>
For devices that have hotplug require='no', we turn off the VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an available PCI address. Since pcie-root now allows standard PCI devices, this results in those devices being placed on pcie-root rather than pci-bridge. I still really hate the idea of adding elements to the XML config that have nothing todo with the actual guest ABI config, but instead are setting logic policy in libvirt. But more generally I don't see why anyone would ever want to set require=no for this - it just ties your hands for later.
So that you can explicitly acknowledge that you don't want the extra trappings that hotplug requires.
NB: According to the code in qemuDomainAssignDevicePCISlots() and qemuDomainAttachDeviceLive(), the following device types are the only ones that use a PCI address *and* support hotplug:
<interface> <hostdev> <rng> <serial> <disk> (only virtio) <controller> (hotplug only for SCSI controllers)
These devices use a PCI address, but don't support hotplug, so the RNG doesn't allow them to have a <hotplug> element:
<filesystem> <sound> <balloon> <watchdog> <video> <shmem> <input> IMHO it is really bad practice to make any assumption about what devices support hotplug & what don't. In fact if any device type does not support hotplug I think it is fair to argue that is simply a bug / missing feature in libvirt right now. I see no reason why any of those devices should not be able to support hotplug in general.
Yes, good point; I suppose that if a future version of libvirt adds the ability to hotplug those devices, someone may be upset that the virtual machine they created in the past can't hot-unplug those devices. But does that mean we shouldn't be placing the default video device directly at 00:1.0, or the default USB controller at 00:1D.x?

qemuDomainAssignDevicePCISlots() blindly sets the HOTPLUGGABLE flag for all devices, even though only a subset of devices actually support hotplugging. In particular, these devices do support hotplug: <interface> <hostdev> <rng> <serial> <disk> (only virtio) <controller> (SCSI controllers only) and these devices don't support hotplug: <filesystem> <sound> <balloon> <watchdog> <video> <shmem> <input> By turning off the HOTPLUGGABLE flag for the devices that couldn't be hotplugged anyway, we can convince libvirt to auto-assign them to pcie-root (which doesn't support hotplug, but *does* allow both PCIe and legacy PCI devices). --- src/qemu/qemu_domain_address.c | 67 +++++++++++++++++++--- tests/qemuxml2argvdata/qemuxml2argv-autoindex.args | 22 +++---- .../qemuxml2argv-bios-nvram-secure.args | 2 +- .../qemuxml2argv-machine-smm-opt.args | 2 +- .../qemuxml2argv-q35-hotpluggable.args | 6 +- .../qemuxml2argv-q35-pm-disable-fallback.args | 3 +- .../qemuxml2argv-q35-pm-disable.args | 3 +- .../qemuxml2argv-q35-usb2-multi.args | 10 ++-- .../qemuxml2argv-q35-usb2-reorder.args | 10 ++-- .../qemuxml2xmlout-autoindex.xml | 18 +++--- .../qemuxml2xmlout-q35-hotpluggable.xml | 6 +- .../qemuxml2xmlout-q35-usb2-multi.xml | 8 +-- .../qemuxml2xmlout-q35-usb2-reorder.xml | 8 +-- 13 files changed, 108 insertions(+), 57 deletions(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 3d52d72..87a1268 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1021,19 +1021,16 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, } } - /* all other devices that plug into a PCI slot are treated as a - * PCI endpoint devices that require a hotplug-capable slot - * (except for some special cases which have specific handling - * below) - */ - flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; - for (i = 0; i < def->nfss; i++) { if (!virDeviceInfoPCIAddressWanted(&def->fss[i]->info)) continue; /* Only support VirtIO-9p-pci so far. If that changes, * we might need to skip devices here */ + + /* <filesystem> devices are not hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->fss[i]->info, flags) < 0) goto error; @@ -1049,6 +1046,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&def->nets[i]->info)) { continue; } + /* most <interface> devices are hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->nets[i]->info, flags) < 0) goto error; @@ -1064,6 +1064,8 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, def->sounds[i]->model == VIR_DOMAIN_SOUND_MODEL_USB) continue; + /* <sound> devices are not hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; if (virDomainPCIAddressReserveNextSlot(addrs, &def->sounds[i]->info, flags) < 0) goto error; @@ -1129,6 +1131,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, break; } + /* USB2 controllers not hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (!foundAddr) { /* This is the first part of the controller, so need * to find a free slot & then reserve a function */ @@ -1142,6 +1147,7 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, addrs->lastaddr.function = 0; addrs->lastaddr.multi = VIR_TRISTATE_SWITCH_ABSENT; } + /* Finally we can reserve the slot+function */ if (virDomainPCIAddressReserveAddr(addrs, &addr, flags, false, foundAddr) < 0) @@ -1150,6 +1156,19 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, def->controllers[i]->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; def->controllers[i]->info.addr.pci = addr; } else { + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + /* USB3 (nec-xhci) controllers can be either PCIe or + * legacy PCI, depending on the type of controller they + * are plugged into. + */ + if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_USB && + def->controllers[i]->model == VIR_DOMAIN_CONTROLLER_MODEL_USB_NEC_XHCI) + flags |= VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + + /* SCSI controllers can be hot plugged. All others can't */ + if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_SCSI) + flags |= VIR_PCI_CONNECT_HOTPLUGGABLE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->controllers[i]->info, flags) < 0) @@ -1184,6 +1203,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, goto error; } + /* <disk> devices are hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->disks[i]->info, flags) < 0) goto error; @@ -1197,6 +1219,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, def->hostdevs[i]->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; + /* <hostdev> devices are hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, def->hostdevs[i]->info, flags) < 0) @@ -1207,6 +1232,10 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, if (def->memballoon && def->memballoon->model == VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO && virDeviceInfoPCIAddressWanted(&def->memballoon->info)) { + + /* <balloon> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->memballoon->info, flags) < 0) @@ -1219,6 +1248,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&def->rngs[i]->info)) continue; + /* <rng> devices are hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->rngs[i]->info, flags) < 0) goto error; @@ -1228,6 +1260,10 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, if (def->watchdog && def->watchdog->model == VIR_DOMAIN_WATCHDOG_MODEL_I6300ESB && virDeviceInfoPCIAddressWanted(&def->watchdog->info)) { + + /* <hostdev> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->watchdog->info, flags) < 0) goto error; @@ -1237,6 +1273,10 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, * assigned address. */ if (def->nvideos > 0 && virDeviceInfoPCIAddressWanted(&def->videos[0]->info)) { + + /* <video> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->videos[0]->info, flags) < 0) goto error; @@ -1251,6 +1291,10 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, } if (!virDeviceInfoPCIAddressWanted(&def->videos[i]->info)) continue; + + /* <video> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->videos[i]->info, flags) < 0) goto error; @@ -1261,6 +1305,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, if (!virDeviceInfoPCIAddressWanted(&def->shmems[i]->info)) continue; + /* <shmem> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->shmems[i]->info, flags) < 0) goto error; @@ -1270,6 +1317,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&def->inputs[i]->info)) continue; + /* <input> devices are NOT hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &def->inputs[i]->info, flags) < 0) goto error; @@ -1284,6 +1334,9 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&chr->info)) continue; + /* <serial> devices are hotpluggable, legacy PCI */ + flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + if (virDomainPCIAddressReserveNextSlot(addrs, &chr->info, flags) < 0) goto error; } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-autoindex.args b/tests/qemuxml2argvdata/qemuxml2argv-autoindex.args index 43b9661..cddf7a0 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-autoindex.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-autoindex.args @@ -38,16 +38,16 @@ addr=0x1d \ -device ich9-usb-ehci1,id=usb1,bus=pcie.0,addr=0x1a.0x7 \ -device ich9-usb-uhci1,masterbus=usb1.0,firstport=0,bus=pcie.0,\ multifunction=on,addr=0x1a \ --device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pci.2,multifunction=on,\ -addr=0x1 \ --device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pci.2,addr=0x1.0x1 \ --device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pci.2,addr=0x1.0x2 \ --device ich9-usb-ehci1,id=usb2,bus=pci.2,addr=0x1.0x7 \ --device nec-usb-xhci,id=usb3,bus=pci.2,addr=0x2 \ --device ich9-usb-uhci1,masterbus=usb4.0,firstport=0,bus=pci.2,multifunction=on,\ -addr=0x3 \ --device ich9-usb-uhci2,masterbus=usb4.0,firstport=2,bus=pci.2,addr=0x3.0x1 \ --device ich9-usb-uhci3,masterbus=usb4.0,firstport=4,bus=pci.2,addr=0x3.0x2 \ --device ich9-usb-ehci1,id=usb4,bus=pci.2,addr=0x3.0x7 \ +-device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pcie.0,\ +multifunction=on,addr=0x4 \ +-device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pcie.0,addr=0x4.0x1 \ +-device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pcie.0,addr=0x4.0x2 \ +-device ich9-usb-ehci1,id=usb2,bus=pcie.0,addr=0x4.0x7 \ +-device nec-usb-xhci,id=usb3,bus=pcie.0,addr=0x3 \ +-device ich9-usb-uhci1,masterbus=usb4.0,firstport=0,bus=pcie.0,\ +multifunction=on,addr=0x6 \ +-device ich9-usb-uhci2,masterbus=usb4.0,firstport=2,bus=pcie.0,addr=0x6.0x1 \ +-device ich9-usb-uhci3,masterbus=usb4.0,firstport=4,bus=pcie.0,addr=0x6.0x2 \ +-device ich9-usb-ehci1,id=usb4,bus=pcie.0,addr=0x6.0x7 \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ -device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-bios-nvram-secure.args b/tests/qemuxml2argvdata/qemuxml2argv-bios-nvram-secure.args index c014254..93d62a7 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-bios-nvram-secure.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-bios-nvram-secure.args @@ -26,4 +26,4 @@ readonly=on \ -device scsi-disk,bus=scsi0.0,channel=0,scsi-id=0,lun=0,\ drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ -serial pty \ --device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x2 +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x2 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-machine-smm-opt.args b/tests/qemuxml2argvdata/qemuxml2argv-machine-smm-opt.args index e49d7e9..05738c1 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-machine-smm-opt.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-machine-smm-opt.args @@ -22,4 +22,4 @@ QEMU_AUDIO_DRV=none \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-scsi0-0-0-0 \ -device scsi-disk,bus=scsi0.0,channel=0,scsi-id=0,lun=0,\ drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ --device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x2 +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x2 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args index 0020bfe..c3368ae 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-hotpluggable.args @@ -43,8 +43,8 @@ addr=0x3 \ -vga qxl \ -global qxl-vga.ram_size=67108864 \ -global qxl-vga.vram_size=33554432 \ --device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x5 \ -object rng-random,id=objrng0,filename=/dev/random \ --device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x5 \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x6 \ -object rng-random,id=objrng1,filename=/dev/random \ --device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.2,addr=0x5 +-device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.2,addr=0x4 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable-fallback.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable-fallback.args index deae687..31a1d71 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable-fallback.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable-fallback.args @@ -19,5 +19,4 @@ QEMU_AUDIO_DRV=none \ -global PIIX4_PM.disable_s4=1 \ -boot n \ -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ --device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 \ --device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x2 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable.args index 871340f..f0dda09 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-pm-disable.args @@ -19,5 +19,4 @@ QEMU_AUDIO_DRV=none \ -global ICH9-LPC.disable_s4=1 \ -boot n \ -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ --device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 \ --device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x2 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-multi.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-multi.args index d465c69..9d2875d 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-multi.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-multi.args @@ -28,11 +28,11 @@ addr=0x1d \ multifunction=on,addr=0x1a \ -device ich9-usb-uhci2,masterbus=usb1.0,firstport=2,bus=pcie.0,addr=0x1a.0x1 \ -device ich9-usb-uhci3,masterbus=usb1.0,firstport=4,bus=pcie.0,addr=0x1a.0x2 \ --device ich9-usb-ehci1,id=usb2,bus=pci.2,addr=0x1.0x7 \ --device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pci.2,multifunction=on,\ -addr=0x1 \ --device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pci.2,addr=0x1.0x1 \ --device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pci.2,addr=0x1.0x2 \ +-device ich9-usb-ehci1,id=usb2,bus=pcie.0,addr=0x3.0x7 \ +-device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pcie.0,\ +multifunction=on,addr=0x3 \ +-device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pcie.0,addr=0x3.0x1 \ +-device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pcie.0,addr=0x3.0x2 \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ -device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ -vga qxl \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-reorder.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-reorder.args index 87d2ce7..a915b76 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-reorder.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-usb2-reorder.args @@ -28,11 +28,11 @@ addr=0x1d \ multifunction=on,addr=0x1a \ -device ich9-usb-uhci3,masterbus=usb1.0,firstport=4,bus=pcie.0,addr=0x1a.0x2 \ -device ich9-usb-uhci2,masterbus=usb1.0,firstport=2,bus=pcie.0,addr=0x1a.0x1 \ --device ich9-usb-ehci1,id=usb2,bus=pci.2,addr=0x1.0x7 \ --device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pci.2,addr=0x1.0x2 \ --device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pci.2,addr=0x1.0x1 \ --device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pci.2,multifunction=on,\ -addr=0x1 \ +-device ich9-usb-ehci1,id=usb2,bus=pcie.0,addr=0x3.0x7 \ +-device ich9-usb-uhci3,masterbus=usb2.0,firstport=4,bus=pcie.0,addr=0x3.0x2 \ +-device ich9-usb-uhci2,masterbus=usb2.0,firstport=2,bus=pcie.0,addr=0x3.0x1 \ +-device ich9-usb-uhci1,masterbus=usb2.0,firstport=0,bus=pcie.0,\ +multifunction=on,addr=0x3 \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ -device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ -vga qxl \ diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-autoindex.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-autoindex.xml index 086a1cf..0502b36 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-autoindex.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-autoindex.xml @@ -112,36 +112,36 @@ </controller> <controller type='usb' index='2' model='ich9-uhci1'> <master startport='0'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0' multifunction='on'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='2' model='ich9-uhci2'> <master startport='2'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/> </controller> <controller type='usb' index='2' model='ich9-uhci3'> <master startport='4'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/> </controller> <controller type='usb' index='2' model='ich9-ehci1'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x7'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/> </controller> <controller type='usb' index='3' model='nec-xhci'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='usb' index='4' model='ich9-uhci1'> <master startport='0'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0' multifunction='on'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='4' model='ich9-uhci2'> <master startport='2'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/> </controller> <controller type='usb' index='4' model='ich9-uhci3'> <master startport='4'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/> </controller> <controller type='usb' index='4' model='ich9-ehci1'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x7'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x7'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml index e771125..18ea795 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-hotpluggable.xml @@ -87,17 +87,17 @@ <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video> <memballoon model='virtio'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/random</backend> <hotplug require='no'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </rng> <rng model='virtio'> <backend model='random'>/dev/random</backend> <hotplug require='yes'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> </rng> </devices> </domain> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-multi.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-multi.xml index 06c0699..74b345d 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-multi.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-multi.xml @@ -60,19 +60,19 @@ <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' function='0x2'/> </controller> <controller type='usb' index='2' model='ich9-ehci1'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x7'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/> </controller> <controller type='usb' index='2' model='ich9-uhci1'> <master startport='0'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0' multifunction='on'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='2' model='ich9-uhci2'> <master startport='2'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/> </controller> <controller type='usb' index='2' model='ich9-uhci3'> <master startport='4'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-reorder.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-reorder.xml index 1007095..6ac275f 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-reorder.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-usb2-reorder.xml @@ -60,19 +60,19 @@ <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' function='0x1'/> </controller> <controller type='usb' index='2' model='ich9-ehci1'> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x7'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/> </controller> <controller type='usb' index='2' model='ich9-uhci3'> <master startport='4'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/> </controller> <controller type='usb' index='2' model='ich9-uhci2'> <master startport='2'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/> </controller> <controller type='usb' index='2' model='ich9-uhci1'> <master startport='0'/> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0' multifunction='on'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> -- 2.7.4

Until now libvirt has always assumed that virtio devices were legacy PCI, so they would never be assigned to a PCIe slot. This patch looks for QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN as an indicator that all the virtio devices in a qemu binary are capable of operating as PCIe, and if so it changes the connect flags to place them on a PCIe slot rather than legacy PCI. One problem that this could lead to with current libvirt - there isn't yet any provision for auto-adding any other kind of PCI controller except pci-bridge, and there are no PCIe controllers that support hotplug in the default configuration of machinetypes that have a pcie-root (Q35 and aarch64/virt). This means that if you add a virtio device to a configuration with no extra pci controllers, libvirt will fail to find an address for it (unless you add "<hotplug require='no'/> to the device - then it will be assigned to pcie-root). I'm hoping that the controller-auto-add will soon be refactored to auto-add pcie-*-port controllers, which will solve this problem. In the meantime, it's simple enough to just add a few lines like this (one for each hotpluggable virtio device) to the config: <controller type='pci' model='pcie-root-port'/> --- src/qemu/qemu_domain_address.c | 43 +++++-- .../qemuxml2argv-q35-virtio-pcie.args | 54 +++++++++ .../qemuxml2argv-q35-virtio-pcie.xml | 69 ++++++++++++ tests/qemuxml2argvtest.c | 10 ++ .../qemuxml2xmlout-q35-virtio-pcie.xml | 123 +++++++++++++++++++++ tests/qemuxml2xmltest.c | 10 ++ 6 files changed, 300 insertions(+), 9 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 87a1268..4dfd6be 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -998,14 +998,23 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, size_t i, j; virDomainPCIConnectFlags flags = 0; /* initialize to quiet gcc warning */ virPCIDeviceAddress tmp_addr; + bool virtioPCIe = false; /* PCI controllers */ for (i = 0; i < def->ncontrollers; i++) { if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI) { virDomainControllerModelPCI model = def->controllers[i]->model; + if (model == VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT) { + /* if qemu has the disable-modern option for virtio-net, + * then its virtio devices will present themselves as PCIe devices + * when plugged into a PCIe slot + */ + virtioPCIe = virQEMUCapsGet(qemuCaps, + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); + continue; + } if (model == VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT || - model == VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT || !virDeviceInfoPCIAddressWanted(&def->controllers[i]->info)) continue; @@ -1046,8 +1055,13 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&def->nets[i]->info)) { continue; } - /* most <interface> devices are hotpluggable, legacy PCI */ - flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + + /* use PCIe for virtio-net if the machinetype supports it */ + if (virtioPCIe && STREQ(def->nets[i]->model, "virtio")) + flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + else + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + flags |= VIR_PCI_CONNECT_HOTPLUGGABLE; if (virDomainPCIAddressReserveNextSlot(addrs, &def->nets[i]->info, flags) < 0) @@ -1203,8 +1217,12 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, goto error; } - /* <disk> devices are hotpluggable, legacy PCI */ - flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + /* use PCIe if the machinetype supports it */ + if (virtioPCIe) + flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + else + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + flags |= VIR_PCI_CONNECT_HOTPLUGGABLE; if (virDomainPCIAddressReserveNextSlot(addrs, &def->disks[i]->info, flags) < 0) @@ -1233,8 +1251,11 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, def->memballoon->model == VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO && virDeviceInfoPCIAddressWanted(&def->memballoon->info)) { - /* <balloon> devices are NOT hotpluggable, legacy PCI */ - flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + /* <balloon> devices are NOT hotpluggable, suport PCIe in newer qemus */ + if (virtioPCIe) + flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + else + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; if (virDomainPCIAddressReserveNextSlot(addrs, &def->memballoon->info, @@ -1248,8 +1269,12 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&def->rngs[i]->info)) continue; - /* <rng> devices are hotpluggable, legacy PCI */ - flags = VIR_PCI_CONNECT_HOTPLUGGABLE | VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + /* <rng> devices are hotpluggable, might be legacy PCI or PCIe */ + if (virtioPCIe) + flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + else + flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; + flags |= VIR_PCI_CONNECT_HOTPLUGGABLE; if (virDomainPCIAddressReserveNextSlot(addrs, &def->rngs[i]->info, flags) < 0) diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args new file mode 100644 index 0000000..9cb150c --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args @@ -0,0 +1,54 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/libexec/qemu-kvm \ +-name q35-test \ +-S \ +-M q35 \ +-m 2048 \ +-smp 2,sockets=2,cores=1,threads=1 \ +-uuid 11dbdcdd-4c3b-482b-8903-9bdb8c0a2774 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-q35-test/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ +-device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 \ +-device ioh3420,port=0x10,chassis=3,id=pci.3,bus=pcie.0,addr=0x2 \ +-device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ +-device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ +-device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ +-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ +-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,\ +addr=0x1d \ +-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ +-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ +-device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk1 \ +-device virtio-blk-pci,bus=pci.4,addr=0x0,drive=drive-virtio-disk1,\ +id=virtio-disk1 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk2 \ +-device virtio-blk-pci,bus=pcie.0,addr=0x8,drive=drive-virtio-disk2,\ +id=virtio-disk2 \ +-device rtl8139,vlan=0,id=net0,mac=00:11:22:33:44:55,bus=pci.2,addr=0x1 \ +-net user,vlan=0,name=hostnet0 \ +-device rtl8139,vlan=1,id=net1,mac=00:11:22:33:44:55,bus=pcie.0,addr=0x6 \ +-net user,vlan=1,name=hostnet1 \ +-device virtio-net-pci,vlan=2,id=net2,mac=00:11:22:33:44:55,bus=pci.3,addr=0x0 \ +-net user,vlan=2,name=hostnet2 \ +-device virtio-net-pci,vlan=3,id=net3,mac=00:11:22:33:44:55,bus=pcie.0,\ +addr=0x7 \ +-net user,vlan=3,name=hostnet3 \ +-vga qxl \ +-global qxl-vga.ram_size=67108864 \ +-global qxl-vga.vram_size=33554432 \ +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x9 \ +-object rng-random,id=objrng0,filename=/dev/random \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0xa \ +-object rng-random,id=objrng1,filename=/dev/random \ +-device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.5,addr=0x0 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml new file mode 100644 index 0000000..d3c7c05 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml @@ -0,0 +1,69 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <controller type='pci' model='pcie-root'/> + <controller type='pci' model='dmi-to-pci-bridge'/> + <controller type='pci' model='pci-bridge'/> + <controller type='pci' model='pcie-root-port'/> + <controller type='pci' model='pcie-root-port'/> + <controller type='pci' model='pcie-root-port'/> + <controller type='pci' model='pcie-root-port'/> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + </disk> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1'/> + </video> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <hotplug require='no'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <hotplug require='no'/> + </interface> + <rng model='virtio'> + <hotplug require='no'/> + <backend model='random'>/dev/random</backend> + </rng> + <rng model='virtio'> + <hotplug require='yes'/> + <backend model='random'>/dev/random</backend> + </rng> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 2be634c..ce0158d 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1685,6 +1685,16 @@ mymain(void) QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM); + DO_TEST("q35-virtio-pcie", + QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_DEVICE_IOH3420, + QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); DO_TEST("pcie-root-port", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml new file mode 100644 index 0000000..70d0cd2 --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml @@ -0,0 +1,123 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> + </disk> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='dmi-to-pci-bridge'> + <model name='i82801b11-bridge'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> + </controller> + <controller type='pci' index='2' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='pci' index='3' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='3' port='0x10'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </controller> + <controller type='pci' index='4' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='4' port='0x18'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='5' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='5' port='0x20'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </controller> + <controller type='pci' index='6' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='6' port='0x28'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </controller> + <controller type='usb' index='0' model='ich9-ehci1'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci1'> + <master startport='0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci2'> + <master startport='2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci3'> + <master startport='4'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='rtl8139'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1' primary='yes'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> + </memballoon> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> + </rng> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + <hotplug require='yes'/> + <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> + </rng> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 2d8145b..40c284d 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -696,6 +696,16 @@ mymain(void) QEMU_CAPS_ICH9_AHCI, QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL); + DO_TEST("q35-virtio-pcie", + QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_DEVICE_IOH3420, + QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); DO_TEST("pcie-root", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, -- 2.7.4

I had forgotten to take care of assigning virtio-scsi-pci to a PCIe address when appropriate. I'm squashing this small patch into PATCH 6/6. --- src/qemu/qemu_domain_address.c | 12 ++++++++++-- tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args | 8 ++++++-- tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml | 5 +++++ tests/qemuxml2argvtest.c | 1 + tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml | 12 ++++++++++-- tests/qemuxml2xmltest.c | 1 + 6 files changed, 33 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 4dfd6be..d218ac8 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1179,9 +1179,17 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, def->controllers[i]->model == VIR_DOMAIN_CONTROLLER_MODEL_USB_NEC_XHCI) flags |= VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; - /* SCSI controllers can be hot plugged. All others can't */ - if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_SCSI) + if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_SCSI) { + /* if it is a virtio-scsi controller and qemu supports PCIe for + * virtio devices, then auto-assign a PCIe address + */ + if (def->controllers[i]->model + == VIR_DOMAIN_CONTROLLER_MODEL_SCSI_VIRTIO_SCSI && virtioPCIe) + flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; + + /* All SCSI controllers can be hot plugged. All other controllers can't */ flags |= VIR_PCI_CONNECT_HOTPLUGGABLE; + } if (virDomainPCIAddressReserveNextSlot(addrs, &def->controllers[i]->info, diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args index 9cb150c..8b9d74a 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.args @@ -27,10 +27,14 @@ QEMU_AUDIO_DRV=none \ addr=0x1d \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ +-device virtio-scsi-pci,id=scsi0,bus=pci.4,addr=0x0 \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ -device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ +-drive file=/aarch64.raw,format=raw,if=none,id=drive-scsi0-0-0-1 \ +-device scsi-disk,bus=scsi0.0,channel=0,scsi-id=0,lun=1,\ +drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk1 \ --device virtio-blk-pci,bus=pci.4,addr=0x0,drive=drive-virtio-disk1,\ +-device virtio-blk-pci,bus=pci.5,addr=0x0,drive=drive-virtio-disk1,\ id=virtio-disk1 \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk2 \ -device virtio-blk-pci,bus=pcie.0,addr=0x8,drive=drive-virtio-disk2,\ @@ -51,4 +55,4 @@ addr=0x7 \ -object rng-random,id=objrng0,filename=/dev/random \ -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0xa \ -object rng-random,id=objrng1,filename=/dev/random \ --device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.5,addr=0x0 +-device virtio-rng-pci,rng=objrng1,id=rng1,bus=pci.6,addr=0x0 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml index d3c7c05..37816fa 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-virtio-pcie.xml @@ -21,11 +21,16 @@ <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> + <controller type='scsi' model='virtio-scsi'/> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='sda' bus='sata'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> + <disk type='file' device='disk'> + <source file='/aarch64.raw'/> + <target dev='sdb' bus='scsi'/> + </disk> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest2'/> <target dev='vdb' bus='virtio'/> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index ce0158d..9685101 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1690,6 +1690,7 @@ mymain(void) QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_DEVICE_IOH3420, QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml index 70d0cd2..304e031 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-virtio-pcie.xml @@ -19,10 +19,15 @@ <target dev='sda' bus='sata'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> + <disk type='file' device='disk'> + <source file='/aarch64.raw'/> + <target dev='sdb' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='1'/> + </disk> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest2'/> <target dev='vdb' bus='virtio'/> - <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </disk> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest2'/> @@ -60,6 +65,9 @@ <target chassis='6' port='0x28'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </controller> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/> </controller> @@ -117,7 +125,7 @@ <rng model='virtio'> <backend model='random'>/dev/random</backend> <hotplug require='yes'/> - <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </rng> </devices> </domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 40c284d..3dcc8f1 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -701,6 +701,7 @@ mymain(void) QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_DEVICE_IOH3420, QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, QEMU_CAPS_DEVICE_VIDEO_PRIMARY, QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, -- 2.7.4

The e1000e is an emulated network device based on the Intel 82574, present in qemu 2.7.0 and later. Among other differences from the e1000, it presents itself as a PCIe device rather than legacy PCI. In order to get it assigned to a PCIe controller, this patch updates the flags setting for network devices when the model name is "e1000e". (Note that for some reason libvirt has never validated the network device model names other than to check that there are no dangerous characters in them. That should probably change, but is the subject of another patch.) --- src/qemu/qemu_domain_address.c | 5 +- .../qemuxml2argvdata/qemuxml2argv-q35-e1000e.args | 43 ++++++++++ tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.xml | 50 ++++++++++++ tests/qemuxml2argvtest.c | 9 +++ .../qemuxml2xmlout-q35-e1000e.xml | 92 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 9 +++ 6 files changed, 207 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-e1000e.xml diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 4dfd6be..1ba999d 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -999,6 +999,7 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, virDomainPCIConnectFlags flags = 0; /* initialize to quiet gcc warning */ virPCIDeviceAddress tmp_addr; bool virtioPCIe = false; + bool havePCIeRoot = false; /* PCI controllers */ for (i = 0; i < def->ncontrollers; i++) { @@ -1012,6 +1013,7 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, */ virtioPCIe = virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); + havePCIeRoot = true; continue; } if (model == VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT || @@ -1057,7 +1059,8 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, } /* use PCIe for virtio-net if the machinetype supports it */ - if (virtioPCIe && STREQ(def->nets[i]->model, "virtio")) + if ((virtioPCIe && STREQ(def->nets[i]->model, "virtio")) || + (havePCIeRoot && STREQ(def->nets[i]->model, "e1000e"))) flags = VIR_PCI_CONNECT_TYPE_PCIE_DEVICE; else flags = VIR_PCI_CONNECT_TYPE_PCI_DEVICE; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.args b/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.args new file mode 100644 index 0000000..a1167c8 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.args @@ -0,0 +1,43 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/libexec/qemu-kvm \ +-name q35-test \ +-S \ +-M q35 \ +-m 2048 \ +-smp 2,sockets=2,cores=1,threads=1 \ +-uuid 11dbdcdd-4c3b-482b-8903-9bdb8c0a2774 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-q35-test/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ +-device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 \ +-device ioh3420,port=0x10,chassis=3,id=pci.3,bus=pcie.0,addr=0x2 \ +-device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ +-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ +-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,\ +addr=0x1d \ +-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ +-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-sata0-0-0 \ +-device ide-drive,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk1 \ +-device virtio-blk-pci,bus=pci.4,addr=0x0,drive=drive-virtio-disk1,\ +id=virtio-disk1 \ +-drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-virtio-disk2 \ +-device virtio-blk-pci,bus=pcie.0,addr=0x5,drive=drive-virtio-disk2,\ +id=virtio-disk2 \ +-device e1000e,vlan=0,id=net0,mac=00:11:22:33:44:55,bus=pci.3,addr=0x0 \ +-net user,vlan=0,name=hostnet0 \ +-device e1000e,vlan=1,id=net1,mac=00:11:22:33:44:55,bus=pcie.0,addr=0x4 \ +-net user,vlan=1,name=hostnet1 \ +-vga qxl \ +-global qxl-vga.ram_size=67108864 \ +-global qxl-vga.vram_size=33554432 \ +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x6 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.xml b/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.xml new file mode 100644 index 0000000..97d6248 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-q35-e1000e.xml @@ -0,0 +1,50 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <controller type='pci' model='pcie-root'/> + <controller type='pci' model='dmi-to-pci-bridge'/> + <controller type='pci' model='pci-bridge'/> + <controller type='pci' model='pcie-root-port'/> + <controller type='pci' model='pcie-root-port'/> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + </disk> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1'/> + </video> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='e1000e'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='e1000e'/> + <hotplug require='no'/> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index ce0158d..efe4985 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1695,6 +1695,15 @@ mymain(void) QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); + DO_TEST("q35-e1000e", + QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_DEVICE_IOH3420, + QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); DO_TEST("pcie-root-port", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-e1000e.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-e1000e.xml new file mode 100644 index 0000000..a70e849 --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-q35-e1000e.xml @@ -0,0 +1,92 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/libexec/qemu-kvm</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='vdc' bus='virtio'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </disk> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='dmi-to-pci-bridge'> + <model name='i82801b11-bridge'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> + </controller> + <controller type='pci' index='2' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='pci' index='3' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='3' port='0x10'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </controller> + <controller type='pci' index='4' model='pcie-root-port'> + <model name='ioh3420'/> + <target chassis='4' port='0x18'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='usb' index='0' model='ich9-ehci1'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci1'> + <master startport='0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci2'> + <master startport='2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci3'> + <master startport='4'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='e1000e'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> + </interface> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='e1000e'/> + <hotplug require='no'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1' primary='yes'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 40c284d..ea984e7 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -706,6 +706,15 @@ mymain(void) QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); + DO_TEST("q35-e1000e", + QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, + QEMU_CAPS_DEVICE_IOH3420, + QEMU_CAPS_ICH9_AHCI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_ICH9_USB_EHCI1, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_VGA_QXL, QEMU_CAPS_DEVICE_QXL, + QEMU_CAPS_VIRTIO_NET_DISABLE_MODERN); DO_TEST("pcie-root", QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, -- 2.7.4

On Mon, 2016-08-08 at 04:56 -0400, Laine Stump wrote:
These patches use three methods to get more of the PCI devices onto PCIe slots on Q35 and aarch64/virt machinetypes: 1) When virtio devices can present themselves as PCIe if they're plugged into a PCIe controller, do that. (This capability is detected by looking for presence of the "disable-modern" option on the virtio-net device. If you have a better idea for how to detect it, please let me know.)
How does this play along with Ján's series[1] implementing a way to control the protocol revision for virtio devices? IIUC, you wouldn't set disable-modern=off or disable-legacy=on explicitly, but rely instead on the fact that a mixed-mode virtio device will present itself as PCIe if plugged into a PCIe slot. Is that right? I think we should alter the connect flags for virtio devices based on the protocol revision, as decided by the user with either Ján's or a comparable approach, rather than using PCIe automatically if available. At the very least, the current approaches don't mix - if both your series and Ján's were to be merged now, a device that has been configured to use virtio 0.9 would end up assigned to a PCIe slot.
2) Any devices that aren't hotpluggable anyway will no longer request a hotplug-capable slot. This, along with a change to auto-assign legacy PCI devices to pcie-root (as long as they don't require hotplug) means the devices will now be assigned to pcie-root. 3) Also using the fact that devices that won't be hotplugged can be assigned to pcie-root, the devices that *do* support hotplug have a new optional subelement "<hotplug require='no'/>" (it defaults to "yes" for historical reasons). If hotplug require is set to 'no', even a PCI device can be auto-assigned to pcie-root. I haven't yet removed the dmi-to-pci-bridge that is added by default, and we still will only auto-add pci-bridge (so if you're adding a virtio device without <hostplug require='no'/> then you will also need to add a <controller type='pci' model='pcie-root-port'/> to plug it into).
I don't get any dmi-to-pci-bridge on aarch64 virt guests, which is good because we don't want it at the moment :) But it doesn't get auto-added either, not even when it would be required because I'm trying to add a legacy PCI device. So yeah, more work needed there I guess. [1] https://www.redhat.com/archives/libvir-list/2016-August/msg00412.html -- Andrea Bolognani / Red Hat / Virtualization

On 08/09/2016 09:11 AM, Andrea Bolognani wrote:
On Mon, 2016-08-08 at 04:56 -0400, Laine Stump wrote:
These patches use three methods to get more of the PCI devices onto PCIe slots on Q35 and aarch64/virt machinetypes:
1) When virtio devices can present themselves as PCIe if they're plugged into a PCIe controller, do that. (This capability is detected by looking for presence of the "disable-modern" option on the virtio-net device. If you have a better idea for how to detect it, please let me know.) How does this play along with Ján's series[1] implementing a way to control the protocol revision for virtio devices?
It doesn't, because those patches haven't been pushed yet. But it doesn't conflict either (except that Jan and I are both checking for essentially the same epoch difference in virtio by looking at different options - he looks for "disable-legacy" in any virtio device, while I look for "disable-modern" in virtio-net). Note that setting disable-modern/disable-legacy does *not* necessarily do what you think, and we don't want to always have to set virtio revision='1.0' in order to get a PCIe device, nor do we want all PCIe devices to be revision 1.0-only. There is a case for having disable-modern=off,disable-legacy=off even for a device installed on a PCIe slot (it's compatible with old guest drivers, but new guest drivers can take advantage of virtio 1.0, and it is legacy-PCI-free. On the other hand, it requires the PCIe controller to reserve IO port space, which we try to avoid.) For example, I just tried every combination of: disable-modern=off,disable-legacy=off (i.e. "all versions") disable-modern=on,disable-legacy=off ("0.9") disable-modern=off,disable-legacy=on ("1.0") with these three types of slots: pcie-root (PCIe root complex "integrated" device) pcie-root-port (pcie) pci-bridge (legacy PCI) I found that setting "1.0" doesn't force the device to be PCIe - if it's on either pci-bridge or pcie-root, it will still be properly advertised as a PCI device (the "Express Device" capability has been removed from the PCI capabilities data). The difference is that it will no longer request/require IO port space, and the device ID changes from 1af4:1000 to 1af4:1041. So there is no problem with putting a device with revision='1.0' into a legacy PCI slot. Setting 0.9 *does* force the device to be a PCI device even if it's plugged into a PCIe slot - the "Express is removed from the PCI capabilities. So definitely if 0.9 is set in order to disable virtio-1.0, we should not put it in a PCIe slot. This leads to the conclusion that any guest OS on Q35 that doesn't have virtio-1.0 drivers will require one of the following: 1) require that we can force both disables to OFF (that's the only way to get a true PCIe device that supports virtio-0.9) (NB: starting with qemu-2.7.0, the default for virtio devices plugged into a pcie-*-port slot will be "disable-legacy=on" (in order to conserve IO port space), and the only way for us to set disable-legacy=off will be to force virtio-0.9, i.e. there will be *no method* of getting a PCIe device that supports old virtio.) 2) live with having a dmi-to-pci-bridge and pci-bridge so that we'll have a proper place to plug in an 0.9-only device. 3) provide a simple way to force the 0.9-only virtio device onto pcie-root (pcie.0) (but of course that doesn't support hotplug...) Anyway, back to the topic - we *don't* want to be using the virtio revision setting to force PCI vs. PCIe (or think that we have to set it in order to get one or the other); the only information we can take from that setting to aid us in the PCI/PCIe decision is that when 0.9 is set, we definitely can't put the device on a PCIe slot; otherwise, it can be put on either type of slot and it will work correctly.
IIUC, you wouldn't set disable-modern=off or disable-legacy=on explicitly, but rely instead on the fact that a mixed-mode virtio device will present itself as PCIe if plugged into a PCIe slot. Is that right?
I don't set anything. I pay attention to the default behavior of a virtio device with no disable-* options set, because I'm working with code that has no way of setting those options.
I think we should alter the connect flags for virtio devices based on the protocol revision, as decided by the user with either Ján's or a comparable approach, rather than using PCIe automatically if available.
See above. The only thing we can change in the connect flags based on virtio version is to turn off PCI_ENDPOINT if it's set to 0.9. And we certainly don't want to force anyone to set 1.0-only just to get their device on a PCIe slot (nor do we want to require PCIe for someone who does set 1.0-only).
At the very least, the current approaches don't mix - if both your series and Ján's were to be merged now, a device that has been configured to use virtio 0.9 would end up assigned to a PCIe slot.
The only reason they don't appear to mix is because Jan's patches don't account for mine, and my patches don't account for his, because none of them are pushed yet. There's no intrinsic reason why they couldn't though. And as a matter of fact I applied my patches on top of his, and their logic doesn't conflict in any way (there were trivial merge problems due to both of us defining new qemu capabilities). As I've mentioned elsewhere (and you also suggest), the CONNECT setting should be modified to turn off PCIE_ENDPOINT and turn on PCI_ENDPOINT when 0.9-only is selected.
2) Any devices that aren't hotpluggable anyway will no longer request a hotplug-capable slot. This, along with a change to auto-assign legacy PCI devices to pcie-root (as long as they don't require hotplug) means the devices will now be assigned to pcie-root.
3) Also using the fact that devices that won't be hotplugged can be assigned to pcie-root, the devices that *do* support hotplug have a new optional subelement "<hotplug require='no'/>" (it defaults to "yes" for historical reasons). If hotplug require is set to 'no', even a PCI device can be auto-assigned to pcie-root.
I haven't yet removed the dmi-to-pci-bridge that is added by default, and we still will only auto-add pci-bridge (so if you're adding a virtio device without <hostplug require='no'/> then you will also need to add a <controller type='pci' model='pcie-root-port'/> to plug it into). I don't get any dmi-to-pci-bridge on aarch64 virt guests, which is good because we don't want it at the moment :)
Yes, because you've already removed it, in the way that I want to remove it from Q35. (but because Q35 guests are more likely to need a PCI-only device than aarch64 guests, I can't really do that until I've updated the "auto-create more slots" code to create a dmi-to-pci-bridge when necessary. (Hmm, but here's a problem - if there aren't currently any legacy PCI devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and someone decides to hotplug a legacy-PCI device, *then* what do you do? I don't want to have these legacy controllers in *every damn config in the world* just in case some moro... er "well meaning user" thinks they want to hotplug an rtl_8139 emulated ethernet...)
But it doesn't get auto-added either, not even when it would be required because I'm trying to add a legacy PCI device. So yeah, more work needed there I guess.
Yep, the only kind of controller we know how to auto-add is pci-bridge. Okay, I'm done. Finished. Kaput. Fini. Bittim, tükendim. Time for bed.

On Wed, 2016-08-10 at 00:32 -0400, Laine Stump wrote:
On Mon, 2016-08-08 at 04:56 -0400, Laine Stump wrote:
These patches use three methods to get more of the PCI devices onto PCIe slots on Q35 and aarch64/virt machinetypes: 1) When virtio devices can present themselves as PCIe if they're plugged into a PCIe controller, do that. (This capability is detected by looking for presence of the "disable-modern" option on the virtio-net device. If you have a better idea for how to detect it, please let me know.) How does this play along with Ján's series[1] implementing a way to control the protocol revision for virtio devices? It doesn't, because those patches haven't been pushed yet. But it doesn't conflict either (except that Jan and I are both checking for essentially the same epoch difference in virtio by looking at different
On 08/09/2016 09:11 AM, Andrea Bolognani wrote: options - he looks for "disable-legacy" in any virtio device, while I look for "disable-modern" in virtio-net). Note that setting disable-modern/disable-legacy does *not* necessarily do what you think, and we don't want to always have to set virtio revision='1.0' in order to get a PCIe device, nor do we want all PCIe devices to be revision 1.0-only. There is a case for having disable-modern=off,disable-legacy=off even for a device installed on a PCIe slot (it's compatible with old guest drivers, but new guest drivers can take advantage of virtio 1.0, and it is legacy-PCI-free. On the other hand, it requires the PCIe controller to reserve IO port space, which we try to avoid.)
Sheesh, you're right, I got it backwards - it's not that you need 1.0-only for PCIe, you *can't* have PCIe unless you also enable 1.0. Thanks for pointing out the mistake.
For example, I just tried every combination of: disable-modern=off,disable-legacy=off (i.e. "all versions") disable-modern=on,disable-legacy=off ("0.9") disable-modern=off,disable-legacy=on ("1.0") with these three types of slots: pcie-root (PCIe root complex "integrated" device) pcie-root-port (pcie) pci-bridge (legacy PCI) I found that setting "1.0" doesn't force the device to be PCIe - if it's on either pci-bridge or pcie-root, it will still be properly advertised as a PCI device (the "Express Device" capability has been removed from the PCI capabilities data). The difference is that it will no longer request/require IO port space, and the device ID changes from 1af4:1000 to 1af4:1041. So there is no problem with putting a device with revision='1.0' into a legacy PCI slot.
The pcie-root bit is weird - I know legacy PCI devices can be assigned directly to it, but I'd expect a device that can do both PCI and PCIe to present itself as PCIe when plugged into a PCIe-capable slot. Does this mean we have to be careful *not* to assign virtio devices to pcie-root unless they're 0.9-only, in order to chase our goal of having PCI-free guests?
Setting 0.9 *does* force the device to be a PCI device even if it's plugged into a PCIe slot - the "Express is removed from the PCI capabilities. So definitely if 0.9 is set in order to disable virtio-1.0, we should not put it in a PCIe slot. This leads to the conclusion that any guest OS on Q35 that doesn't have virtio-1.0 drivers will require one of the following: 1) require that we can force both disables to OFF (that's the only way to get a true PCIe device that supports virtio-0.9)
That sounds like a good idea, and Ján's series already covers that use case IIRC: you'd just have to add <virtio revision='0.9'/> <virtio revision='1.0'/> to the device. That gives you full control.
(NB: starting with qemu-2.7.0, the default for virtio devices plugged into a pcie-*-port slot will be "disable-legacy=on" (in order to conserve IO port space), and the only way for us to set disable-legacy=off will be to force virtio-0.9, i.e. there will be *no method* of getting a PCIe device that supports old virtio.)
Even though disable-legacy will default to on, I assume we would still be able to set it to off explicitly? Or would that be prevented altogether?
2) live with having a dmi-to-pci-bridge and pci-bridge so that we'll have a proper place to plug in an 0.9-only device. 3) provide a simple way to force the 0.9-only virtio device onto pcie-root (pcie.0) (but of course that doesn't support hotplug...) Anyway, back to the topic - we *don't* want to be using the virtio revision setting to force PCI vs. PCIe (or think that we have to set it in order to get one or the other); the only information we can take from that setting to aid us in the PCI/PCIe decision is that when 0.9 is set, we definitely can't put the device on a PCIe slot; otherwise, it can be put on either type of slot and it will work correctly.
Indeed.
At the very least, the current approaches don't mix - if both your series and Ján's were to be merged now, a device that has been configured to use virtio 0.9 would end up assigned to a PCIe slot. The only reason they don't appear to mix is because Jan's patches don't account for mine, and my patches don't account for his, because none of them are pushed yet. There's no intrinsic reason why they couldn't though.
Just to be clear, I didn't mean to suggest that you were doing anything wrong: I just noticed both you and Ján were working on partially overlapping features that don't necessarily work well together (see example above). Discussing how to make them work well together was the whole point of me bringing that up :)
2) Any devices that aren't hotpluggable anyway will no longer request a hotplug-capable slot. This, along with a change to auto-assign legacy PCI devices to pcie-root (as long as they don't require hotplug) means the devices will now be assigned to pcie-root. 3) Also using the fact that devices that won't be hotplugged can be assigned to pcie-root, the devices that *do* support hotplug have a new optional subelement "<hotplug require='no'/>" (it defaults to "yes" for historical reasons). If hotplug require is set to 'no', even a PCI device can be auto-assigned to pcie-root. I haven't yet removed the dmi-to-pci-bridge that is added by default, and we still will only auto-add pci-bridge (so if you're adding a virtio device without <hostplug require='no'/> then you will also need to add a <controller type='pci' model='pcie-root-port'/> to plug it into). I don't get any dmi-to-pci-bridge on aarch64 virt guests, which is good because we don't want it at the moment :) Yes, because you've already removed it, in the way that I want to remove it from Q35. (but because Q35 guests are more likely to need a PCI-only device than aarch64 guests, I can't really do that until I've updated the "auto-create more slots" code to create a dmi-to-pci-bridge when necessary. (Hmm, but here's a problem - if there aren't currently any legacy PCI devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and someone decides to hotplug a legacy-PCI device, *then* what do you do? I don't want to have these legacy controllers in *every damn config in the world* just in case some moro... er "well meaning user" thinks they want to hotplug an rtl_8139 emulated ethernet...)
I don't really have an answer for that, I'm afraid. What I can say is: there are many areas in libvirt where we choose the setup that ensures the widest compatibility by default, so that users who don't want to care too much about the details can just stick to that and have very high chances of never running into trouble. I think that's a sensible course of action, as long as you also allow users who know what they're doing and want to build a very precise setup to do so. With that in mind, auto-adding a dmi-to-pci-bridge/pci-bridge combo by default is probably not too bad *assuming* you have a way to opt out of it. -- Andrea Bolognani / Red Hat / Virtualization

On 08/10/2016 07:13 AM, Andrea Bolognani wrote:
On Wed, 2016-08-10 at 00:32 -0400, Laine Stump wrote:
On 08/09/2016 09:11 AM, Andrea Bolognani wrote:
On Mon, 2016-08-08 at 04:56 -0400, Laine Stump wrote:
These patches use three methods to get more of the PCI devices onto PCIe slots on Q35 and aarch64/virt machinetypes:
1) When virtio devices can present themselves as PCIe if they're plugged into a PCIe controller, do that. (This capability is detected by looking for presence of the "disable-modern" option on the virtio-net device. If you have a better idea for how to detect it, please let me know.)
How does this play along with Ján's series[1] implementing a way to control the protocol revision for virtio devices?
It doesn't, because those patches haven't been pushed yet. But it doesn't conflict either (except that Jan and I are both checking for essentially the same epoch difference in virtio by looking at different options - he looks for "disable-legacy" in any virtio device, while I look for "disable-modern" in virtio-net).
Note that setting disable-modern/disable-legacy does *not* necessarily do what you think, and we don't want to always have to set virtio revision='1.0' in order to get a PCIe device, nor do we want all PCIe devices to be revision 1.0-only. There is a case for having disable-modern=off,disable-legacy=off even for a device installed on a PCIe slot (it's compatible with old guest drivers, but new guest drivers can take advantage of virtio 1.0, and it is legacy-PCI-free. On the other hand, it requires the PCIe controller to reserve IO port space, which we try to avoid.) Sheesh, you're right, I got it backwards - it's not that you need 1.0-only for PCIe, you *can't* have PCIe unless you also enable 1.0. Thanks for pointing out the mistake.
No, I think all the double negatives are confusing you :-) The only way to *not* get PCIe on a pci-*-port connected device is if you specify 0.9-only. with disable-modern=off,disable-legacy=off, you'll get: pcie-root: pci/0.9+1.0 pcie-*-port: pcie/0.9+1.0 pci-bridge: pci/0.9+1.0 with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>): pcie-root: pci/0.9 pcie-*-port: pci/0.9 <-- we want to avoid this one pci-bridge: pci/0.9 with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>): pcie-root: pci/1.0 pcie-*-port: pcie/1.0 pci-bridge: pci/1.0 These are the implications that I can see of setting 0.9-only or 1.0-only: 0.9-only - PCI ID 1af4:1000, the device is always PCI, never PCIe, uses ioport 1.0-only - PCI ID 1af4:1041, the device is PCIe when on a pcie-*-port, otherwise PCI, doesn't use ioport 0.9+1.0 - PCI ID 1af4:1000, same rules for PCIe vs. PCI as 1.0-only, uses ioport (Note that in qemu 2.6.0, the default was 0.9+1.0 everywhere. in qemu 2.7.0, the default changes depending on what the device is plugged into - in particular, when plugged into a pcie-*-port, the default is 1.0-only.)
For example, I just tried every combination of:
disable-modern=off,disable-legacy=off (i.e. "all versions") disable-modern=on,disable-legacy=off ("0.9") disable-modern=off,disable-legacy=on ("1.0")
with these three types of slots:
pcie-root (PCIe root complex "integrated" device) pcie-root-port (pcie) pci-bridge (legacy PCI)
I found that setting "1.0" doesn't force the device to be PCIe - if it's on either pci-bridge or pcie-root, it will still be properly advertised as a PCI device (the "Express Device" capability has been removed from the PCI capabilities data). The difference is that it will no longer request/require IO port space, and the device ID changes from 1af4:1000 to 1af4:1041. So there is no problem with putting a device with revision='1.0' into a legacy PCI slot. The pcie-root bit is weird - I know legacy PCI devices can be assigned directly to it, but I'd expect a device that can do both PCI and PCIe to present itself as PCIe when plugged into a PCIe-capable slot.
pcie-root *is* a bit weird. Alex pointed out to me yesterday that on our Lenovo laptops, we have an ethernet device at 00:19.0 that shows itself as a PCI device, but is using the e1000e driver (e1000e is a PCIe device). On the other hand, the same system has an audio device at 00:1b.0 that shows up as a PCIe device (you can see this by running "lspci -v" as root and looking for a "Capabilities" line that says "Express ....".)
Does this mean we have to be careful *not* to assign virtio devices to pcie-root unless they're 0.9-only, in order to chase our goal of having PCI-free guests?
No. That's just the way pcie-root works. I think the important thing is to avoid plugging legacy-only devices into pcie-*-port, and to avoid dmi-to-pci-bridge and pci-bridge if at all possible. (And actually I don't see any upside to making virtio devices 0.9-only for any reason except to protect against a bug in virtio-1.0 - virtio devices will always properly "downgrade" themselves to legacy PCI when their connection warrants it).
Setting 0.9 *does* force the device to be a PCI device even if it's plugged into a PCIe slot - the "Express is removed from the PCI capabilities. So definitely if 0.9 is set in order to disable virtio-1.0, we should not put it in a PCIe slot. This leads to the conclusion that any guest OS on Q35 that doesn't have virtio-1.0 drivers will require one of the following:
1) require that we can force both disables to OFF (that's the only way to get a true PCIe device that supports virtio-0.9) That sounds like a good idea, and Ján's series already covers that use case IIRC: you'd just have to add
<virtio revision='0.9'/> <virtio revision='1.0'/>
to the device. That gives you full control.
Right. I misunderstood that when I went through his patches before. I had assumed it could be specified only once (and was thus confused about why he stored it as a bitmap rather than a single enum value).
(NB: starting with qemu-2.7.0, the default for virtio devices plugged into a pcie-*-port slot will be "disable-legacy=on" (in order to conserve IO port space), and the only way for us to set disable-legacy=off will be to force virtio-0.9, i.e. there will be *no method* of getting a PCIe device that supports old virtio.) Even though disable-legacy will default to on, I assume we would still be able to set it to off explicitly? Or would that be prevented altogether?
Yes, qemu allows it to be explicitly set to off. It's just the default that is changed.
2) live with having a dmi-to-pci-bridge and pci-bridge so that we'll have a proper place to plug in an 0.9-only device.
3) provide a simple way to force the 0.9-only virtio device onto pcie-root (pcie.0) (but of course that doesn't support hotplug...)
Anyway, back to the topic - we *don't* want to be using the virtio revision setting to force PCI vs. PCIe (or think that we have to set it in order to get one or the other); the only information we can take from that setting to aid us in the PCI/PCIe decision is that when 0.9 is set, we definitely can't put the device on a PCIe slot; otherwise, it can be put on either type of slot and it will work correctly. Indeed.
At the very least, the current approaches don't mix - if both your series and Ján's were to be merged now, a device that has been configured to use virtio 0.9 would end up assigned to a PCIe slot.
The only reason they don't appear to mix is because Jan's patches don't account for mine, and my patches don't account for his, because none of them are pushed yet. There's no intrinsic reason why they couldn't though. Just to be clear, I didn't mean to suggest that you were doing anything wrong: I just noticed both you and Ján were working on partially overlapping features that don't necessarily work well together (see example above). Discussing how to make them work well together was the whole point of me bringing that up :)
2) Any devices that aren't hotpluggable anyway will no longer request a hotplug-capable slot. This, along with a change to auto-assign legacy PCI devices to pcie-root (as long as they don't require hotplug) means the devices will now be assigned to pcie-root.
3) Also using the fact that devices that won't be hotplugged can be assigned to pcie-root, the devices that *do* support hotplug have a new optional subelement "<hotplug require='no'/>" (it defaults to "yes" for historical reasons). If hotplug require is set to 'no', even a PCI device can be auto-assigned to pcie-root.
I haven't yet removed the dmi-to-pci-bridge that is added by default, and we still will only auto-add pci-bridge (so if you're adding a virtio device without <hostplug require='no'/> then you will also need to add a <controller type='pci' model='pcie-root-port'/> to plug it into).
I don't get any dmi-to-pci-bridge on aarch64 virt guests, which is good because we don't want it at the moment :)
Yes, because you've already removed it, in the way that I want to remove it from Q35. (but because Q35 guests are more likely to need a PCI-only device than aarch64 guests, I can't really do that until I've updated the "auto-create more slots" code to create a dmi-to-pci-bridge when necessary.
(Hmm, but here's a problem - if there aren't currently any legacy PCI devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and someone decides to hotplug a legacy-PCI device, *then* what do you do? I don't want to have these legacy controllers in *every damn config in the world* just in case some moro... er "well meaning user" thinks they want to hotplug an rtl_8139 emulated ethernet...) I don't really have an answer for that, I'm afraid.
What I can say is: there are many areas in libvirt where we choose the setup that ensures the widest compatibility by default, so that users who don't want to care too much about the details can just stick to that and have very high chances of never running into trouble.
I think that's a sensible course of action, as long as you also allow users who know what they're doing and want to build a very precise setup to do so.
With that in mind, auto-adding a dmi-to-pci-bridge/pci-bridge combo by default is probably not too bad *assuming* you have a way to opt out of it.
Right now the only way to opt out is to put some other PCI controller at index 1 (e.g. a pci-root-port). For some other default devices (e.g. usb controllers and memory balloon) we take care of this with "model='none'", e.g.: <controller type='usb' model='none'/> I've actually never really liked that, and it doesn't directly translate to this use case anyway - it's not that we don't want *any* PCI controllers, it's that we don't want a controller of a particular model. Maybe we just need to think about the whole "reserve a spot for a potential future hotplug device" in a different way for PCIe - on systems with pci-root, we never really made a conscious decision that we had to have open slots available for hotplug - it just automatically happened that way because for legacy PCI, *all* slots (including on the root bus) support hotplug, and each PCI bus starts out with 31 open slots, so you're almost 100% assured that any given config would have an empty slot available for potential hotplug. PCIe doesn't work that way though - a basic machine has exactly 0 slots that support hotplug and (except for adding legacy-PCI bridges) any PCI controller that you add has only a single slot, so the chance of just coincidentally having a slot open for hotplug is exactly 0%. So are we just taking something that happened by chance with pci-based machines and turning it into a requirement for pcie? Instead, maybe we could just tell people if they might want hotplug, to add a few empty pci-*-port devices. (after all, if we were going to add them automatically, how many should we add? 1? 32? No matter what you pick, it's too many for some, and not enough for others.

On Wed, 2016-08-10 at 12:10 -0400, Laine Stump wrote:
Note that setting disable-modern/disable-legacy does *not* necessarily do what you think, and we don't want to always have to set virtio revision='1.0' in order to get a PCIe device, nor do we want all PCIe devices to be revision 1.0-only. There is a case for having disable-modern=off,disable-legacy=off even for a device installed on a PCIe slot (it's compatible with old guest drivers, but new guest drivers can take advantage of virtio 1.0, and it is legacy-PCI-free. On the other hand, it requires the PCIe controller to reserve IO port space, which we try to avoid.) Sheesh, you're right, I got it backwards - it's not that you need 1.0-only for PCIe, you *can't* have PCIe unless you also enable 1.0. Thanks for pointing out the mistake. No, I think all the double negatives are confusing you :-) The only way to *not* get PCIe on a pci-*-port connected device is if you specify 0.9-only.
For once, I was not confused! ;) I wrote
you *can't* have PCIe unless you also enable 1.0
meaning you won't get PCIe unless you enable 1.0 by using either <virtio revision='1.0'/> on its own or both <virtio revision='0.9'/> *and* <virtio revision='1.0'/>. That, in turn, implies that using <virtio revision='0.9'/> on its own will result in a legacy PCI device. All of this only applies to endpoint devices connected to a PCIe controller, of course. If you're connecting to a legacy PCI controller you're going to get a legacy PCI device no matter what virtio versions you enable[1].
with disable-modern=off,disable-legacy=off, you'll get: pcie-root: pci/0.9+1.0 pcie-*-port: pcie/0.9+1.0 pci-bridge: pci/0.9+1.0 with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>): pcie-root: pci/0.9 pcie-*-port: pci/0.9 <-- we want to avoid this one pci-bridge: pci/0.9 with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>):
Should have been "disable-modern=off,disable-legacy=on (i.e. <virtio revision='1.0'/>)", I guess :)
pcie-root: pci/1.0 pcie-*-port: pcie/1.0 pci-bridge: pci/1.0 These are the implications that I can see of setting 0.9-only or 1.0-only: 0.9-only - PCI ID 1af4:1000, the device is always PCI, never PCIe, uses ioport 1.0-only - PCI ID 1af4:1041, the device is PCIe when on a pcie-*-port, otherwise PCI, doesn't use ioport 0.9+1.0 - PCI ID 1af4:1000, same rules for PCIe vs. PCI as 1.0-only, uses ioport
I wonder if it would have been better to assign different PCI IDs to 0.9, 0.9+1.0 and 1.0 devices. They do behave differently enough that it might have been warranted. On the other hand, I can see why 0.9+1.0 would have the same PCI ID as 0.9 - so that existing drivers would recognize the device. Just typing out loud, really :)
The pcie-root bit is weird - I know legacy PCI devices can be assigned directly to it, but I'd expect a device that can do both PCI and PCIe to present itself as PCIe when plugged into a PCIe-capable slot. pcie-root *is* a bit weird. Alex pointed out to me yesterday that on our Lenovo laptops, we have an ethernet device at 00:19.0 that shows itself as a PCI device, but is using the e1000e driver (e1000e is a PCIe device). On the other hand, the same system has an audio device at 00:1b.0 that shows up as a PCIe device (you can see this by running "lspci -v" as root and looking for a "Capabilities" line that says "Express ....".)
At first I thought that maybe the e1000e driver was able to manage both PCI and PCIe devices, or that the Ethernet device was actually PCI rather than PCIe, but it looks like neither is the case, and it just shows up as PCI instead of PCIe because of... Reasons? /me sighs
Does this mean we have to be careful *not* to assign virtio devices to pcie-root unless they're 0.9-only, in order to chase our goal of having PCI-free guests? No. That's just the way pcie-root works. I think the important thing is to avoid plugging legacy-only devices into pcie-*-port, and to avoid dmi-to-pci-bridge and pci-bridge if at all possible. (And actually I don't see any upside to making virtio devices 0.9-only for any reason except to protect against a bug in virtio-1.0 - virtio devices will always properly "downgrade" themselves to legacy PCI when their connection warrants it).
Okay, going the other way around: should we try and always add a pcie-root-port between pcie-root and a virtio device that's either 0.9+1.0 or 1.0, to try and force it to expose the Express capabilities? Would that buy us anything? Does a 1.0-only virtio device plugged directly into pcie-root still use no IO ports even though it shows up as legacy PCI?
(Hmm, but here's a problem - if there aren't currently any legacy PCI devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and someone decides to hotplug a legacy-PCI device, *then* what do you do? I don't want to have these legacy controllers in *every damn config in the world* just in case some moro... er "well meaning user" thinks they want to hotplug an rtl_8139 emulated ethernet...) I don't really have an answer for that, I'm afraid. What I can say is: there are many areas in libvirt where we choose the setup that ensures the widest compatibility by default, so that users who don't want to care too much about the details can just stick to that and have very high chances of never running into trouble. I think that's a sensible course of action, as long as you also allow users who know what they're doing and want to build a very precise setup to do so. With that in mind, auto-adding a dmi-to-pci-bridge/pci-bridge combo by default is probably not too bad *assuming* you have a way to opt out of it. Right now the only way to opt out is to put some other PCI controller at index 1 (e.g. a pci-root-port). For some other default devices (e.g. usb controllers and memory balloon) we take care of this with "model='none'", e.g.: <controller type='usb' model='none'/> I've actually never really liked that, and it doesn't directly translate to this use case anyway - it's not that we don't want *any* PCI controllers, it's that we don't want a controller of a particular model. Maybe we just need to think about the whole "reserve a spot for a potential future hotplug device" in a different way for PCIe - on systems with pci-root, we never really made a conscious decision that we had to have open slots available for hotplug - it just automatically happened that way because for legacy PCI, *all* slots (including on the root bus) support hotplug, and each PCI bus starts out with 31 open slots, so you're almost 100% assured that any given config would have an empty slot available for potential hotplug. PCIe doesn't work that way though - a basic machine has exactly 0 slots that support hotplug and (except for adding legacy-PCI bridges) any PCI controller that you add has only a single slot, so the chance of just coincidentally having a slot open for hotplug is exactly 0%. So are we just taking something that happened by chance with pci-based machines and turning it into a requirement for pcie? Instead, maybe we could just tell people if they might want hotplug, to add a few empty pci-*-port devices. (after all, if we were going to add them automatically, how many should we add? 1? 32? No matter what you pick, it's too many for some, and not enough for others.
Mh, if we plugged a dmi-to-pci-bridge and pci-bridge, plus 30 or so pcie-root-ports, the non-expert user would never have to care about not having free hotpluggable PCI *or* PCIe slots. We could *not* do that if the exising topology is in any way different from the one we would build. But this approach would be extremely fragile, I'm afraid, so let's not go there. We also said times and times again that we don't want libvirt to carry any policy, and all of this stuff stinks like policy to me. Maybe we should go with an extremely basic default topology (eg. pcie-root only) and try to plug devices in any way that would work, without trying to satisfy other constraints like "leave a few hotpluggable PCIe (and PCI?) slots ready for additional devices". I agree that we need to have a way to make life easier for higher-level tools, and that we should centralize this complex stuff to avoid having the same issues over and over again. But maybe that should be a separate API rather than the core address assignment logic? Or a tool built on top of libvirt? Dan mentioned in another sub-thread that libvirt-designer was supposed to be that tool. I haven't looked into it yet, to be honest, but I'll definitely do that Really Soon Now™. [1] Or rather not disable. -- Andrea Bolognani / Red Hat / Virtualization
participants (3)
-
Andrea Bolognani
-
Daniel P. Berrange
-
Laine Stump