[libvirt] [PATCH 0/2] RFC: qemu: aarch64 virtio-pci work

These patches apply on top of the series I just posted: [PATCH 0/9] tests: qemu: unconditionally enable QEMU_CAPS_DEVICE https://www.redhat.com/archives/libvir-list/2016-January/msg01233.html These specific changers were discussed here: https://www.redhat.com/archives/libvir-list/2015-December/msg00217.html We need a way for apps to tell libvirt 'use virtio-pci not virtio-mmio' for aarch64 -M virt. We can't really make use this by default yet since most OS don't actually support it. So we decided the best course of interaction is to allow users to pass a manual pci controller, and libvirt will interpret that to mean it's okay to use virtio-pci. Patch #1 undoes the current behavior of unconditionally adding a q35 style PCIe controller setup for mach-virt that supports it. In the future we will probably re-enable something like this, but maybe not for a year or so until the bits proliferate to distros. Patch #2 tries to implement the manual pci controller bit. Unfortunately it's not that easy get the ideal XML to work, so this isn't ready for committing. See the commit message for more details Cole Robinson (2): qemu: aarch64: Don't add PCIe controller by default HACK: qemu: aarch64: Use virtio-pci if user specifies PCI controller src/qemu/qemu_command.c | 34 ++++++++---- src/qemu/qemu_domain.c | 4 -- ...l2argv-aarch64-pci-manual-nocontroller-fail.xml | 43 ++++++++++++++++ .../qemuxml2argv-aarch64-virtio-pci-default.args | 2 - ...l2argv-aarch64-virtio-pci-manual-addresses.args | 2 - ...ml2argv-aarch64-virtio-pci-manual-addresses.xml | 1 + ...2argv-aarch64-virtio-pci-manual-controller.args | 36 +++++++++++++ ...l2argv-aarch64-virtio-pci-manual-controller.xml | 50 ++++++++++++++++++ tests/qemuxml2argvtest.c | 22 ++++++-- .../qemuxml2xmlout-aarch64-virtio-pci-default.xml | 10 ---- ...2xmlout-aarch64-virtio-pci-manual-addresses.xml | 11 +--- ...xmlout-aarch64-virtio-pci-manual-controller.xml | 60 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 6 +++ 13 files changed, 239 insertions(+), 42 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-pci-manual-nocontroller-fail.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-controller.xml -- 2.5.0

This was discussed here: https://www.redhat.com/archives/libvir-list/2015-December/msg00217.html The summary is that apps/users are in a tough spot WRT aarch64 -M virt and PCIe support: qemu has all the plumbing, but most distros don't support it yet. Presently libvirt adds a PCIe controller to the XML for new enough qemu, but this patch drops that behavior. Upcoming patches will instead require users to manually specify a pcie controller in the XML as a way of telling libvirt 'the OS supports PCI', at which point libvirt can use it as a target for virtio-pci. --- src/qemu/qemu_domain.c | 4 ---- .../qemuxml2argv-aarch64-virtio-pci-default.args | 2 -- tests/qemuxml2argvtest.c | 6 ++---- .../qemuxml2xmlout-aarch64-virtio-pci-default.xml | 10 ---------- tests/qemuxml2xmltest.c | 2 ++ 5 files changed, 4 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 1df1b74..aea1ea5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1109,10 +1109,6 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, case VIR_ARCH_AARCH64: addDefaultUSB = false; addDefaultMemballoon = false; - if (STREQ(def->os.machine, "virt") || - STRPREFIX(def->os.machine, "virt-")) { - addPCIeRoot = virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_GPEX); - } break; case VIR_ARCH_PPC64: diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args index a49bc82..f879560 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args @@ -21,8 +21,6 @@ QEMU_AUDIO_DRV=none \ -initrd /aarch64.initrd \ -append 'earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait' \ -dtb /aarch64.dtb \ --device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 \ --device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 \ -device virtio-serial-device,id=virtio-serial0 \ -usb \ -drive file=/aarch64.raw,format=raw,if=none,id=drive-virtio-disk0 \ diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index cfb46ef..b2f636e 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1642,9 +1642,7 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM); /* Demonstrates the virtio-pci default... namely that there isn't any! - q35 style PCI controllers will be added if the binary supports it, - but virtio-mmio is always used unless PCI addresses are manually - specified. */ + However this might change in the future... */ DO_TEST("aarch64-virtio-pci-default", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, @@ -1653,7 +1651,7 @@ mymain(void) QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE); /* Example of using virtio-pci with no explicit PCI controller but with manual PCI addresses */ - DO_TEST("aarch64-virtio-pci-manual-addresses", + DO_TEST_PARSE_ERROR("aarch64-virtio-pci-manual-addresses", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml index 6f1c53b..db2119c 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml @@ -31,16 +31,6 @@ <target dev='vda' bus='virtio'/> <address type='virtio-mmio'/> </disk> - <controller type='pci' index='0' model='pcie-root'/> - <controller type='pci' index='1' model='dmi-to-pci-bridge'> - <model name='i82801b11-bridge'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> - </controller> - <controller type='pci' index='2' model='pci-bridge'> - <model name='pci-bridge'/> - <target chassisNr='2'/> - <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/> - </controller> <controller type='virtio-serial' index='0'> <address type='virtio-mmio'/> </controller> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 581129c..cac401c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -731,12 +731,14 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + /* DO_TEST_FULL("aarch64-virtio-pci-manual-addresses", WHEN_ACTIVE, QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + */ DO_TEST("aarch64-gic"); DO_TEST("aarch64-gicv3"); -- 2.5.0

On 01/28/2016 04:14 PM, Cole Robinson wrote:
This was discussed here:
https://www.redhat.com/archives/libvir-list/2015-December/msg00217.html
The summary is that apps/users are in a tough spot WRT aarch64 -M virt and PCIe support: qemu has all the plumbing, but most distros don't support it yet. Presently libvirt adds a PCIe controller to the XML for new enough qemu, but this patch drops that behavior.
I read back through all of these messages and am more thoroughly undecided than ever about the proper thing to do. I guess the end of that thread was that the best thing to do was (as you're doing in this patch) to temporarily (?) not add a pcie-root controller to a domain's XML for aarch64 -M virt, and if there is no PCI controller, then use virtio-mmio for the address of all devices; but if a pcie-root is manually added, to instead use pci addresses. My main issue with that is that the XML is then not reflecting the hardware exactly in cases where the binary is creating a pcie-root automatically (but we're not using it). The alternative would be to force specifying the address type of each device though, and since that is impractical for management apps to deal with, I guess keying off the presence/absence of a manually-created pcie-root is the less evil of the options :-P. Assuming that we're all in agreement with this, ACK to this patch, but only in combination with whatever followup is necessary to support doing the auto-addressing based on presence of pcie-root. (I'm actually wondering if, once we have that 2nd part figured out, it may be necessary to combine the two patches into one in order to not break any future bisecting to try and find bugs related to aarch64 that is using devices with PCI addresses, but we'll get to that later...)
Upcoming patches will instead require users to manually specify a pcie controller in the XML as a way of telling libvirt 'the OS supports PCI', at which point libvirt can use it as a target for virtio-pci. --- src/qemu/qemu_domain.c | 4 ---- .../qemuxml2argv-aarch64-virtio-pci-default.args | 2 -- tests/qemuxml2argvtest.c | 6 ++---- .../qemuxml2xmlout-aarch64-virtio-pci-default.xml | 10 ---------- tests/qemuxml2xmltest.c | 2 ++ 5 files changed, 4 insertions(+), 20 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 1df1b74..aea1ea5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1109,10 +1109,6 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, case VIR_ARCH_AARCH64: addDefaultUSB = false; addDefaultMemballoon = false; - if (STREQ(def->os.machine, "virt") || - STRPREFIX(def->os.machine, "virt-")) { - addPCIeRoot = virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_GPEX); - } break;
case VIR_ARCH_PPC64: diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args index a49bc82..f879560 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-default.args @@ -21,8 +21,6 @@ QEMU_AUDIO_DRV=none \ -initrd /aarch64.initrd \ -append 'earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait' \ -dtb /aarch64.dtb \ --device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 \ --device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 \ -device virtio-serial-device,id=virtio-serial0 \ -usb \ -drive file=/aarch64.raw,format=raw,if=none,id=drive-virtio-disk0 \ diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index cfb46ef..b2f636e 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1642,9 +1642,7 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM);
/* Demonstrates the virtio-pci default... namely that there isn't any! - q35 style PCI controllers will be added if the binary supports it, - but virtio-mmio is always used unless PCI addresses are manually - specified. */ + However this might change in the future... */ DO_TEST("aarch64-virtio-pci-default", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, @@ -1653,7 +1651,7 @@ mymain(void) QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE); /* Example of using virtio-pci with no explicit PCI controller but with manual PCI addresses */ - DO_TEST("aarch64-virtio-pci-manual-addresses", + DO_TEST_PARSE_ERROR("aarch64-virtio-pci-manual-addresses", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml index 6f1c53b..db2119c 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-default.xml @@ -31,16 +31,6 @@ <target dev='vda' bus='virtio'/> <address type='virtio-mmio'/> </disk> - <controller type='pci' index='0' model='pcie-root'/> - <controller type='pci' index='1' model='dmi-to-pci-bridge'> - <model name='i82801b11-bridge'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> - </controller> - <controller type='pci' index='2' model='pci-bridge'> - <model name='pci-bridge'/> - <target chassisNr='2'/> - <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/> - </controller> <controller type='virtio-serial' index='0'> <address type='virtio-mmio'/> </controller> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 581129c..cac401c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -731,12 +731,14 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + /* DO_TEST_FULL("aarch64-virtio-pci-manual-addresses", WHEN_ACTIVE, QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + */ DO_TEST("aarch64-gic"); DO_TEST("aarch64-gicv3");

On Wed, 2016-02-17 at 12:40 -0500, Laine Stump wrote:
On 01/28/2016 04:14 PM, Cole Robinson wrote:
This was discussed here: https://www.redhat.com/archives/libvir-list/2015-December/msg00217.html The summary is that apps/users are in a tough spot WRT aarch64 -M virt and PCIe support: qemu has all the plumbing, but most distros don't support it yet. Presently libvirt adds a PCIe controller to the XML for new enough qemu, but this patch drops that behavior. I read back through all of these messages and am more thoroughly undecided than ever about the proper thing to do.
Same here :)
I guess the end of that thread was that the best thing to do was (as you're doing in this patch) to temporarily (?) not add a pcie-root controller to a domain's XML for aarch64 -M virt, and if there is no PCI controller, then use virtio-mmio for the address of all devices; but if a pcie-root is manually added, to instead use pci addresses. My main issue with that is that the XML is then not reflecting the hardware exactly in cases where the binary is creating a pcie-root automatically (but we're not using it). The alternative would be to force specifying the address type of each device though, and since that is impractical for management apps to deal with, I guess keying off the presence/absence of a manually-created pcie-root is the less evil of the options :-P.
This is my concern as well. Not adding automatically the dmi-to-pci-bridge and pci-bridge sounds like a step in the right direction as QEMU will happily work without them, but the PCIe root controller is always added (if my reading of the output of 'info qtree' and poking around the guest is correct) and the domain XML should reflect this fact. Deviating from the actual (virtual) hardware is something that's bound to come back and bite us at some point, I'm afraid, and we should avoid it. Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team

If a user manually specifies this XML snippet for aarch64 machvirt: <controller type='pci' index='0' model='pci-root'/> Libvirt will interpret this to mean that the OS supports virtio-pci, and will allocate PCI addresses (instead of virtio-mmio) for virtio devices. This is a giant hack. Trying to improve it led me into the maze of PCI address code and I gave up for now. Here are the issues: * I'd prefer that to be model='pcie-root' which matches what qemu-system-aarch64 -M virt actually provides by default... however libvirt isn't happy with a single pcie-root specified by the user, it will error with: error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses Instead this patch uses hacks to make pci-root use the pcie.0 bus for aarch64, since that code path already works. * It may even be nice to make specifying <controller type='pci'/> map to 'give me the recommended PCI setup for this VM'... then we could adjust what that default means in the future, maybe switching to something like what q35 uses. This would make apps lives easier. But presently that violates the XML schema, and libvirt can't handle lack of model= anyways: error: internal error: Invalid PCI controller model -1 --- src/qemu/qemu_command.c | 34 ++++++++---- ...l2argv-aarch64-pci-manual-nocontroller-fail.xml | 43 ++++++++++++++++ ...l2argv-aarch64-virtio-pci-manual-addresses.args | 2 - ...ml2argv-aarch64-virtio-pci-manual-addresses.xml | 1 + ...2argv-aarch64-virtio-pci-manual-controller.args | 36 +++++++++++++ ...l2argv-aarch64-virtio-pci-manual-controller.xml | 50 ++++++++++++++++++ tests/qemuxml2argvtest.c | 20 ++++++-- ...2xmlout-aarch64-virtio-pci-manual-addresses.xml | 11 +--- ...xmlout-aarch64-virtio-pci-manual-controller.xml | 60 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 8 ++- 10 files changed, 239 insertions(+), 26 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-pci-manual-nocontroller-fail.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-controller.xml diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 8943270..4b0f070 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1064,7 +1064,10 @@ qemuAssignDeviceControllerAlias(virDomainDefPtr domainDef, * (including the hardcoded pci-root controller on * multibus-capable qemus). */ - return virAsprintf(&controller->info.alias, "pci.%d", controller->idx); + if (domainDef->os.arch == VIR_ARCH_AARCH64) + return virAsprintf(&controller->info.alias, "pcie.%d", controller->idx); + else + return virAsprintf(&controller->info.alias, "pci.%d", controller->idx); } else if (controller->type == VIR_DOMAIN_CONTROLLER_TYPE_IDE) { /* for any machine based on e.g. I440FX or G3Beige, the * first (and currently only) IDE controller is an integrated @@ -1393,15 +1396,28 @@ static int qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def, virQEMUCapsPtr qemuCaps) { - if (((def->os.arch == VIR_ARCH_ARMV7L) || - (def->os.arch == VIR_ARCH_AARCH64)) && - (STRPREFIX(def->os.machine, "vexpress-") || - STREQ(def->os.machine, "virt") || - STRPREFIX(def->os.machine, "virt-")) && - virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MMIO)) { - qemuDomainPrimeVirtioDeviceAddresses( - def, VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_MMIO); + size_t i; + + if ((def->os.arch != VIR_ARCH_ARMV7L) && + (def->os.arch != VIR_ARCH_AARCH64)) + return 0; + + if (!(STRPREFIX(def->os.machine, "vexpress-") || + STREQ(def->os.machine, "virt") || + STRPREFIX(def->os.machine, "virt-"))) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MMIO)) + return 0; + + /* If there's a PCIe controller in the XML, we will use PCI virtio */ + for (i = 0; i < def->ncontrollers; i++) { + if (def->controllers[i]->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI) + return 0; } + + qemuDomainPrimeVirtioDeviceAddresses( + def, VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_MMIO); return 0; } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-pci-manual-nocontroller-fail.xml b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-pci-manual-nocontroller-fail.xml new file mode 100644 index 0000000..6a44f19 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-pci-manual-nocontroller-fail.xml @@ -0,0 +1,43 @@ +<domain type='qemu'> + <name>aarch64test</name> + <uuid>496d7ea8-9739-544b-4ebd-ef08be936e8b</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <kernel>/aarch64.kernel</kernel> + <initrd>/aarch64.initrd</initrd> + <cmdline>earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait</cmdline> + <dtb>/aarch64.dtb</dtb> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu mode='custom' match='exact'> + <model fallback='allow'>cortex-a53</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <disk type='file' device='disk'> + <source file='/aarch64.raw'/> + <target dev='sda' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='scsi' index='0' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <interface type='user'> + <mac address='52:54:00:09:a4:37'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </interface> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.args b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.args index 142fd5b..f091c89 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.args @@ -21,8 +21,6 @@ QEMU_AUDIO_DRV=none \ -initrd /aarch64.initrd \ -append 'earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait' \ -dtb /aarch64.dtb \ --device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 \ --device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0,addr=0x3 \ -usb \ -drive file=/aarch64.raw,format=raw,if=none,id=drive-scsi0-0-0-0 \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.xml b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.xml index 6a44f19..80a6592 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-addresses.xml @@ -31,6 +31,7 @@ <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> + <controller type='pci' index='0' model='pci-root'/> <controller type='scsi' index='0' model='virtio-scsi'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.args b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.args new file mode 100644 index 0000000..dfa07da --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.args @@ -0,0 +1,36 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-aarch64 \ +-name aarch64test \ +-S \ +-M virt \ +-cpu cortex-a53 \ +-m 1024 \ +-smp 1 \ +-uuid 496d7ea8-9739-544b-4ebd-ef08be936e8b \ +-nographic \ +-nodefconfig \ +-nodefaults \ +-monitor unix:/tmp/test-monitor,server,nowait \ +-boot c \ +-kernel /aarch64.kernel \ +-initrd /aarch64.initrd \ +-append 'earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait' \ +-dtb /aarch64.dtb \ +-device virtio-serial-pci,id=virtio-serial0,bus=pcie.0,addr=0x2 \ +-usb \ +-drive file=/aarch64.raw,format=raw,if=none,id=drive-virtio-disk0 \ +-device virtio-blk-pci,bus=pcie.0,addr=0x3,drive=drive-virtio-disk0,\ +id=virtio-disk0 \ +-device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:09:a4:37,bus=pcie.0,addr=0x1 \ +-net user,vlan=0,name=hostnet0 \ +-serial pty \ +-chardev pty,id=charconsole1 \ +-device virtconsole,chardev=charconsole1,id=console1 \ +-device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x4 \ +-object rng-random,id=objrng0,filename=/dev/random \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.xml b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.xml new file mode 100644 index 0000000..c5ea728 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-aarch64-virtio-pci-manual-controller.xml @@ -0,0 +1,50 @@ +<domain type='qemu'> + <name>aarch64test</name> + <uuid>496d7ea8-9739-544b-4ebd-ef08be936e8b</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <kernel>/aarch64.kernel</kernel> + <initrd>/aarch64.initrd</initrd> + <cmdline>earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait</cmdline> + <dtb>/aarch64.dtb</dtb> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu mode='custom' match='exact'> + <model fallback='allow'>cortex-a53</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <disk type='file' device='disk'> + <source file='/aarch64.raw'/> + <target dev='vda' bus='virtio'/> + </disk> + <controller type='pci' index='0' model='pci-root'/> + <interface type='user'> + <mac address='52:54:00:09:a4:37'/> + <model type='virtio'/> + </interface> + <console type='pty'/> + <console type='pty'> + <target type='virtio' port='0'/> + </console> + <memballoon model='virtio'/> + <!-- + This actually doesn't work in practice because vexpress only has + 4 virtio slots available, rng makes 5 --> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + </rng> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index b2f636e..6c5a8ab 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1649,9 +1649,23 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE); - /* Example of using virtio-pci with no explicit PCI controller - but with manual PCI addresses */ - DO_TEST_PARSE_ERROR("aarch64-virtio-pci-manual-addresses", + /* Example of using virtio-pci with manual PCIe controller and auto + allocated addresses */ + DO_TEST("aarch64-virtio-pci-manual-controller", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, + QEMU_CAPS_DEVICE_VIRTIO_MMIO, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, + QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + /* Example of using virtio-pci with manual PCIe controller and + manual PCI addresses */ + DO_TEST("aarch64-virtio-pci-manual-addresses", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, + QEMU_CAPS_DEVICE_VIRTIO_MMIO, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, + QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); + DO_TEST_PARSE_ERROR("aarch64-pci-manual-nocontroller-fail", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-addresses.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-addresses.xml index 4add271..80a6592 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-addresses.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-addresses.xml @@ -31,19 +31,10 @@ <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> + <controller type='pci' index='0' model='pci-root'/> <controller type='scsi' index='0' model='virtio-scsi'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> - <controller type='pci' index='0' model='pcie-root'/> - <controller type='pci' index='1' model='dmi-to-pci-bridge'> - <model name='i82801b11-bridge'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> - </controller> - <controller type='pci' index='2' model='pci-bridge'> - <model name='pci-bridge'/> - <target chassisNr='2'/> - <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/> - </controller> <interface type='user'> <mac address='52:54:00:09:a4:37'/> <model type='virtio'/> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-controller.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-controller.xml new file mode 100644 index 0000000..7be2cbd --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-aarch64-virtio-pci-manual-controller.xml @@ -0,0 +1,60 @@ +<domain type='qemu'> + <name>aarch64test</name> + <uuid>496d7ea8-9739-544b-4ebd-ef08be936e8b</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <kernel>/aarch64.kernel</kernel> + <initrd>/aarch64.initrd</initrd> + <cmdline>earlyprintk console=ttyAMA0,115200n8 rw root=/dev/vda rootwait</cmdline> + <dtb>/aarch64.dtb</dtb> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu mode='custom' match='exact'> + <model fallback='allow'>cortex-a53</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <disk type='file' device='disk'> + <source file='/aarch64.raw'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </disk> + <controller type='pci' index='0' model='pci-root'/> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </controller> + <interface type='user'> + <mac address='52:54:00:09:a4:37'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </interface> + <serial type='pty'> + <target port='0'/> + </serial> + <console type='pty'> + <target type='serial' port='0'/> + </console> + <console type='pty'> + <target type='virtio' port='1'/> + </console> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + <rng model='virtio'> + <backend model='random'>/dev/random</backend> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </rng> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index cac401c..e7f8845 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -731,14 +731,18 @@ mymain(void) QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); - /* + DO_TEST_FULL("aarch64-virtio-pci-manual-controller", WHEN_ACTIVE, + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, + QEMU_CAPS_DEVICE_VIRTIO_MMIO, + QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, + QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, + QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); DO_TEST_FULL("aarch64-virtio-pci-manual-addresses", WHEN_ACTIVE, QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DTB, QEMU_CAPS_DEVICE_VIRTIO_MMIO, QEMU_CAPS_DEVICE_VIRTIO_RNG, QEMU_CAPS_OBJECT_RNG_RANDOM, QEMU_CAPS_OBJECT_GPEX, QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_DMI_TO_PCI_BRIDGE, QEMU_CAPS_VIRTIO_SCSI); - */ DO_TEST("aarch64-gic"); DO_TEST("aarch64-gicv3"); -- 2.5.0

On 01/28/2016 04:14 PM, Cole Robinson wrote:
If a user manually specifies this XML snippet for aarch64 machvirt:
<controller type='pci' index='0' model='pci-root'/>
As you've noted below, this isn't correct. aarch64 machvirt has no implicit pci-root controller (aka "pci.0"). It instead has a pcie-root controller ("pcie.0"). Since a pci[e]-root controller cannot be explicitly added, by definition this couldn't work.
Libvirt will interpret this to mean that the OS supports virtio-pci, and will allocate PCI addresses (instead of virtio-mmio) for virtio devices.
This is a giant hack. Trying to improve it led me into the maze of PCI address code and I gave up for now. Here are the issues:
* I'd prefer that to be model='pcie-root' which matches what qemu-system-aarch64 -M virt actually provides by default... however libvirt isn't happy with a single pcie-root specified by the user, it will error with:
error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses
That's not the right error, but it's caused by the fact that libvirt wants the pci-bridge device to be plugged into a standard PCI slot, but all the slots of pcie-root are PCIe slots. Since we now know that qemu doesn't mind if any standard PCI device is plugged into a PCIe slot, the decision of how we want to solve this problem depends on whether or not we want the devices in question to be hot-pluggable - the ports of pcie-root do not support hot-plugging devices (at least on Q35), while the ports on pci-bridge do. So if we require that all devices be hot-pluggable, then we have a few choices: 1) create the same PCI controller Frankenstein we currently have for Q35 - a dmi-to-pci-bridge plugged into pcie-root, and a pci-bridge plugged into dmi-to-pci-bridge. This is easiest because it already works, but it does create an extra unnecessary controller. 2) auto-add a pci-bridge in cases when there is a pcie-root but not standard PCI slots. This would take only a slight amount more work. 3) auto-add a pcie-root-port to each port of the pcie-root controller. This would still leave us with PCIe ports, so we would need to teach libvirt that it's okay to plug PCI devices into PCIe ports. If we don't require hot-pluggability, then we can just teach the address-assignment code that PCI devices can plug into non-hotpluggable PCIe ports and we're done. Or we can do a hybrid that's kind of a continuation of the "use PCI if it's available, otherwise mmio" - we could do this: A) If there are any standard PCI slots, then auto-assign to PCI slots (creating new pci-bridge controllers s necessary) B) else if there are any PCIe slots, then auto-assign to hot-pluggable PCIe if available, or straight PCIe if not. C) else use virtio-mmio. ------------------------------------------- Mixed in with all of this discussion is my thinking that we should have some way to specify, in XML, constraints for the address of each device *without specifying the address itself*. Things we need to be able to specify: 1) Is a PCI-only vs. PCIe-only vs. either one (maybe this could be used in the future to constrain to virtio-mmio as well)? 2) Must the device be hot-pluggable? (default would be yes) 3) guest-side NUMA node? (I'm not sure if this needs to be user specifiable - in the case of a vfio-assigned device, I think all we need to to inform the guest which NUMA node the device is on in the host (via putting it on a PXB controller that is configured with that same NUMA node number). For emulated devices - is there any use to putting an *emulated* device on the same controller as a particular vfio-assigned device that is on a specific node? If not, then maybe it will never matter). It would be better if these "address constraints" were in a different part of the XML than the <address> element itself - this would maintain the simplicity of being able to just remove all <address> elements in order to force libvirt to re-assign all device addresses. This isn't something that needs doing immediately, but worth keeping in mind while putting together something that works for aarch64.
Instead this patch uses hacks to make pci-root use the pcie.0 bus for aarch64, since that code path already works.
I think that's a dead-end that we would have to back-track on, so probably not a good solution even temporarily. Here's an attempt at a plan: 1) change the PCI address assignment code so that for aarch64/virt it prefers PCIe addresses, but still requires hot-pluggable (currently it almost always prefers PCI, and requires hot-pluggable). (alternate - if aarch64 doesn't support pcie-root-port or pcie-switch-*-port, then don't require hot-pluggable either). 2) put something on the front of that that checks for existence of pcie-root, and if it's not found, uses virtio-mmio instead (is there something already that auto-adds the virtio-mmio address? I haven't looked and am too lazy to do so now). At this point, as long as you manually add a bunch of pcie-root-port controllers along with the manual pcie-root, everything should just work. Then we would go to step 3: 3) enhance the auto-assign code so that, in addition to auto-adding a pci-bridge when needed, it would auto-add either a single pcie-root-port or a pcie-switch-upstream-port and 32 pcie-switch-downstream-ports anytime a hotpluggable PCIe port was needed and couldn't be found. (the latter assumes that aarch64 supports those controllers). Does that make any sense? I could try to code some of this up if you could test it (or help me get setup to test it myself).

On Wed, 2016-02-17 at 15:03 -0500, Laine Stump wrote:
On 01/28/2016 04:14 PM, Cole Robinson wrote:
If a user manually specifies this XML snippet for aarch64 machvirt: <controller type='pci' index='0' model='pci-root'/> As you've noted below, this isn't correct. aarch64 machvirt has no implicit pci-root controller (aka "pci.0"). It instead has a pcie-root controller ("pcie.0"). Since a pci[e]-root controller cannot be explicitly added, by definition this couldn't work. Libvirt will interpret this to mean that the OS supports virtio-pci, and will allocate PCI addresses (instead of virtio-mmio) for virtio devices. This is a giant hack. Trying to improve it led me into the maze of PCI address code and I gave up for now. Here are the issues: * I'd prefer that to be model='pcie-root' which matches what qemu-system-aarch64 -M virt actually provides by default... however libvirt isn't happy with a single pcie-root specified by the user, it will error with: error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses That's not the right error, but it's caused by the fact that libvirt wants the pci-bridge device to be plugged into a standard PCI slot, but all the slots of pcie-root are PCIe slots. Since we now know that qemu doesn't mind if any standard PCI device is plugged into a PCIe slot,
Should we rely on this behavior? Isn't this something that might change in the future? Or at least be quite puzzling for users? Just thinking out loud :)
the decision of how we want to solve this problem depends on whether or not we want the devices in question to be hot-pluggable - the ports of pcie-root do not support hot-plugging devices (at least on Q35), while the ports on pci-bridge do. So if we require that all devices be hot-pluggable, then we have a few choices: 1) create the same PCI controller Frankenstein we currently have for Q35 - a dmi-to-pci-bridge plugged into pcie-root, and a pci-bridge plugged into dmi-to-pci-bridge. This is easiest because it already works, but it does create an extra unnecessary controller.
This is the current situation, right? qemu-kvm in current aarch64 RHEL doesn't have the i82801b11-bridge device compiled in, by the way. However, since qemu-system-aarch64 in Fedora 23 *does* have it, I assume enabling it would simply be a matter of flipping a build configuration bit.
2) auto-add a pci-bridge in cases when there is a pcie-root but not standard PCI slots. This would take only a slight amount more work. 3) auto-add a pcie-root-port to each port of the pcie-root controller. This would still leave us with PCIe ports, so we would need to teach libvirt that it's okay to plug PCI devices into PCIe ports.
As mentioned above, I'm not sure this is a good idea. Maybe I'm just afraid of my own shadow though :)
Instead this patch uses hacks to make pci-root use the pcie.0 bus for aarch64, since that code path already works. I think that's a dead-end that we would have to back-track on, so
If we don't require hot-pluggability, then we can just teach the address-assignment code that PCI devices can plug into non-hotpluggable PCIe ports and we're done. Or we can do a hybrid that's kind of a continuation of the "use PCI if it's available, otherwise mmio" - we could do this: A) If there are any standard PCI slots, then auto-assign to PCI slots (creating new pci-bridge controllers s necessary) B) else if there are any PCIe slots, then auto-assign to hot-pluggable PCIe if available, or straight PCIe if not. C) else use virtio-mmio. ------------------------------------------- Mixed in with all of this discussion is my thinking that we should have some way to specify, in XML, constraints for the address of each device *without specifying the address itself*. Things we need to be able to specify: 1) Is a PCI-only vs. PCIe-only vs. either one (maybe this could be used in the future to constrain to virtio-mmio as well)? 2) Must the device be hot-pluggable? (default would be yes) 3) guest-side NUMA node? (I'm not sure if this needs to be user specifiable - in the case of a vfio-assigned device, I think all we need to to inform the guest which NUMA node the device is on in the host (via putting it on a PXB controller that is configured with that same NUMA node number). For emulated devices - is there any use to putting an *emulated* device on the same controller as a particular vfio-assigned device that is on a specific node? If not, then maybe it will never matter). It would be better if these "address constraints" were in a different part of the XML than the <address> element itself - this would maintain the simplicity of being able to just remove all <address> elements in order to force libvirt to re-assign all device addresses. This isn't something that needs doing immediately, but worth keeping in mind while putting together something that works for aarch64. probably not a good solution even temporarily. Here's an attempt at a plan: 1) change the PCI address assignment code so that for aarch64/virt it prefers PCIe addresses, but still requires hot-pluggable (currently it almost always prefers PCI, and requires hot-pluggable). (alternate - if aarch64 doesn't support pcie-root-port or pcie-switch-*-port, then don't require hot-pluggable either). 2) put something on the front of that that checks for existence of pcie-root, and if it's not found, uses virtio-mmio instead (is there something already that auto-adds the virtio-mmio address? I haven't looked and am too lazy to do so now). At this point, as long as you manually add a bunch of pcie-root-port controllers along with the manual pcie-root, everything should just work. Then we would go to step 3: 3) enhance the auto-assign code so that, in addition to auto-adding a pci-bridge when needed, it would auto-add either a single pcie-root-port or a pcie-switch-upstream-port and 32 pcie-switch-downstream-ports anytime a hotpluggable PCIe port was needed and couldn't be found. (the latter assumes that aarch64 supports those controllers). Does that make any sense? I could try to code some of this up if you could test it (or help me get setup to test it myself).
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :) First, we make sure that <controller type='pci' index='0' model='pcie-root'/> is always added automatically to the domain XML when using the mach-virt machine type. Then, if <controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/> is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes. Does this sound like a good idea? Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team

On 02/26/2016 10:13 AM, Andrea Bolognani wrote:
On Wed, 2016-02-17 at 15:03 -0500, Laine Stump wrote:
On 01/28/2016 04:14 PM, Cole Robinson wrote:
If a user manually specifies this XML snippet for aarch64 machvirt:
<controller type='pci' index='0' model='pci-root'/>
As you've noted below, this isn't correct. aarch64 machvirt has no implicit pci-root controller (aka "pci.0"). It instead has a pcie-root controller ("pcie.0"). Since a pci[e]-root controller cannot be explicitly added, by definition this couldn't work.
Libvirt will interpret this to mean that the OS supports virtio-pci, and will allocate PCI addresses (instead of virtio-mmio) for virtio devices.
This is a giant hack. Trying to improve it led me into the maze of PCI address code and I gave up for now. Here are the issues:
* I'd prefer that to be model='pcie-root' which matches what qemu-system-aarch64 -M virt actually provides by default... however libvirt isn't happy with a single pcie-root specified by the user, it will error with:
error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses
That's not the right error, but it's caused by the fact that libvirt wants the pci-bridge device to be plugged into a standard PCI slot, but all the slots of pcie-root are PCIe slots. Since we now know that qemu doesn't mind if any standard PCI device is plugged into a PCIe slot, Should we rely on this behavior? Isn't this something that might change in the future? Or at least be quite puzzling for users?
Just thinking out loud :)
It was my identical thinking that led to libvirt being initially very strict about plugging PCI into PCI and PCIe into PCIe. I've since received reasonable assurances that qemu will continue to be permissive about plugging PCI things into PCIe, so I allow it, but still default to "purity".
the decision of how we want to solve this problem depends on whether or not we want the devices in question to be hot-pluggable - the ports of pcie-root do not support hot-plugging devices (at least on Q35), while the ports on pci-bridge do. So if we require that all devices be hot-pluggable, then we have a few choices:
1) create the same PCI controller Frankenstein we currently have for Q35 - a dmi-to-pci-bridge plugged into pcie-root, and a pci-bridge plugged into dmi-to-pci-bridge. This is easiest because it already works, but it does create an extra unnecessary controller. This is the current situation, right?
qemu-kvm in current aarch64 RHEL doesn't have the i82801b11-bridge device compiled in, by the way. However, since qemu-system-aarch64 in Fedora 23 *does* have it, I assume enabling it would simply be a matter of flipping a build configuration bit.
2) auto-add a pci-bridge in cases when there is a pcie-root but not standard PCI slots. This would take only a slight amount more work.
3) auto-add a pcie-root-port to each port of the pcie-root controller. This would still leave us with PCIe ports, so we would need to teach libvirt that it's okay to plug PCI devices into PCIe ports. As mentioned above, I'm not sure this is a good idea. Maybe I'm just afraid of my own shadow though :)
If we don't require hot-pluggability, then we can just teach the address-assignment code that PCI devices can plug into non-hotpluggable PCIe ports and we're done.
Or we can do a hybrid that's kind of a continuation of the "use PCI if it's available, otherwise mmio" - we could do this:
A) If there are any standard PCI slots, then auto-assign to PCI slots (creating new pci-bridge controllers s necessary)
B) else if there are any PCIe slots, then auto-assign to hot-pluggable PCIe if available, or straight PCIe if not.
C) else use virtio-mmio.
-------------------------------------------
Mixed in with all of this discussion is my thinking that we should have some way to specify, in XML, constraints for the address of each device *without specifying the address itself*. Things we need to be able to specify:
1) Is a PCI-only vs. PCIe-only vs. either one (maybe this could be used in the future to constrain to virtio-mmio as well)?
2) Must the device be hot-pluggable? (default would be yes)
3) guest-side NUMA node? (I'm not sure if this needs to be user specifiable - in the case of a vfio-assigned device, I think all we need to to inform the guest which NUMA node the device is on in the host (via putting it on a PXB controller that is configured with that same NUMA node number). For emulated devices - is there any use to putting an *emulated* device on the same controller as a particular vfio-assigned device that is on a specific node? If not, then maybe it will never matter).
It would be better if these "address constraints" were in a different part of the XML than the <address> element itself - this would maintain the simplicity of being able to just remove all <address> elements in order to force libvirt to re-assign all device addresses.
This isn't something that needs doing immediately, but worth keeping in mind while putting together something that works for aarch64.
Instead this patch uses hacks to make pci-root use the pcie.0 bus for aarch64, since that code path already works.
I think that's a dead-end that we would have to back-track on, so probably not a good solution even temporarily.
Here's an attempt at a plan:
1) change the PCI address assignment code so that for aarch64/virt it prefers PCIe addresses, but still requires hot-pluggable (currently it almost always prefers PCI, and requires hot-pluggable). (alternate - if aarch64 doesn't support pcie-root-port or pcie-switch-*-port, then don't require hot-pluggable either).
2) put something on the front of that that checks for existence of pcie-root, and if it's not found, uses virtio-mmio instead (is there something already that auto-adds the virtio-mmio address? I haven't looked and am too lazy to do so now).
At this point, as long as you manually add a bunch of pcie-root-port controllers along with the manual pcie-root, everything should just work. Then we would go to step 3:
3) enhance the auto-assign code so that, in addition to auto-adding a pci-bridge when needed, it would auto-add either a single pcie-root-port or a pcie-switch-upstream-port and 32 pcie-switch-downstream-ports anytime a hotpluggable PCIe port was needed and couldn't be found. (the latter assumes that aarch64 supports those controllers).
Does that make any sense? I could try to code some of this up if you could test it (or help me get setup to test it myself). I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :)
First, we make sure that
<controller type='pci' index='0' model='pcie-root'/>
is always added automatically to the domain XML when using the mach-virt machine type. Then, if
<controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/>
is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes.
Does this sound like a good idea?
... or a variation of that, anyway :-) What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio. To make this more useful, we will want some enhancements: 1) add a "hotpluggable" attribute to <target> for every device (this will be a bit ugly, because there isn't a unified parser/formatter for the <target> element :-( ) 2) add a "busHint" ("busType", "preferredBus", ??) attribute too, to allow specifying pci vs pcie (is it worth adding "mmio" here? Seems it will be deprecated before long anyway...). It will default to pcie rather than pci on platforms that support it. 3) change the auto-assign to pay attention to hotpluggable and preferredBus (damn! I need to think of a better name!) This way someone can define a new aarch64 domain and just toss a bunch of pcie-root-ports into it: <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> ... then add a bunch of devices. The pcie-root-ports will be auto-assigned to ports on pcie.0, and the devices will be auto-assigned to root-ports. (alternately, they could toss in a bunch of pcie-downstream-switch-ports. These would trigger an auto-add of a pcie-upstream-switch-port, which would trigger an auto-add of a pcie-root-port, and those would all be connected). (Ooh! Or instead of hotpluggable and preferredBus in the devices' <target>, just "preferredConnection" which would be exactly the model of pci controller preferred. So you'd do something like this: <interface type='network'> <target preferredConnection='pcie-switch-downstream-port'/> ... </interface> This would look for an existing pcie-switch-downstream-port. If it couldn't find one available, it would add one (and all the underlying controllers necessary). I'm just rambling now, so I'll stop. But I think a good first step is the simple thing in the "What I think" paragraph.

On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :) First, we make sure that <controller type='pci' index='0' model='pcie-root'/> is always added automatically to the domain XML when using the mach-virt machine type. Then, if <controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/> is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes. Does this sound like a good idea? ... or a variation of that, anyway :-) What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio.
This sounds reasonable. However, I'm wondering if we shouldn't be even more explicit about this... From libvirt's point of view we just need to agree on some sort of "trigger" that causes it to allocate PCI addresses instead of MMIO addresses, and for that purpose "any PCI controller or any device with a PCI address" is perfectly fine; looking at that from the point of view of an user, though? Not sure about it. What about adding something like <preferences> <preference name="defaultAddressType" value="pci"/> </preferences> to the domain XML? AFAICT libvirt doesn't have any element that could be used for this purpose at the moment, but maybe having a generic way to set domain-wide preferences could be useful in other situations as well?
To make this more useful, we will want some enhancements: 1) add a "hotpluggable" attribute to <target> for every device (this will be a bit ugly, because there isn't a unified parser/formatter for the <target> element :-( ) 2) add a "busHint" ("busType", "preferredBus", ??) attribute too, to allow specifying pci vs pcie (is it worth adding "mmio" here? Seems it will be deprecated before long anyway...). It will default to pcie rather than pci on platforms that support it. 3) change the auto-assign to pay attention to hotpluggable and preferredBus (damn! I need to think of a better name!) This way someone can define a new aarch64 domain and just toss a bunch of pcie-root-ports into it: <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> <controller type='pci' model='pcie-root-port'/> ... then add a bunch of devices. The pcie-root-ports will be auto-assigned to ports on pcie.0, and the devices will be auto-assigned to root-ports. (alternately, they could toss in a bunch of pcie-downstream-switch-ports. These would trigger an auto-add of a pcie-upstream-switch-port, which would trigger an auto-add of a pcie-root-port, and those would all be connected). (Ooh! Or instead of hotpluggable and preferredBus in the devices' <target>, just "preferredConnection" which would be exactly the model of pci controller preferred. So you'd do something like this: <interface type='network'> <target preferredConnection='pcie-switch-downstream-port'/> ... </interface> This would look for an existing pcie-switch-downstream-port. If it couldn't find one available, it would add one (and all the underlying controllers necessary).
I'm assuming these examples refer to the case of a guest starting up and not device hotplug, because neither pcie-root-port nor pcie-switch-{up,down}stream-port can be added at runtime. Looks to me like having both 'hotpluggable' and 'preferredBus' would be more user-friendly, ie. looking at the XML tells you whether you'll be able to hot-unplug a device without having to remember what a pcie-switch-downstream-port is. This still doesn't help when it comes to providing PCI or PCIe slots suitable for hotplugging: if you want to have three PCIe devices at startup and still be able to hotplug up to two PCIe devices later on, my understanding is that you would need to have something like <!-- pretend this is repeated five times --> <controller type='pci' model='pcie-root-port'/> or <controller type='pci' model='pcie-switch-upstream-port'/> <!-- pretend this is repeated five times --> <controller type='pci' model='pcie-switch-downstream-port'/> Could this make use of a preferences mechanism such as the one hinted at above? For example <preferences> <preference name='pcieHotplugSlots' value='5'/> <preference name='pciHotplugSlots' value='0'/> </preferences> would generate either of the above PCIe topology, while <preferences> <preference name='pcieHotplugSlots' value='0'/> <preference name='pciHotplugSlots' value='5'/> </preferences> would add a dmi-to-pci-bridge and a pci-bridge in order to present the hotplug-capable PCI slots to the guest. Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team

On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :) First, we make sure that <controller type='pci' index='0' model='pcie-root'/> is always added automatically to the domain XML when using the mach-virt machine type. Then, if <controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/> is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes. Does this sound like a good idea? ... or a variation of that, anyway :-) What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio.
This sounds reasonable.
However, I'm wondering if we shouldn't be even more explicit about this... From libvirt's point of view we just need to agree on some sort of "trigger" that causes it to allocate PCI addresses instead of MMIO addresses, and for that purpose "any PCI controller or any device with a PCI address" is perfectly fine; looking at that from the point of view of an user, though? Not sure about it.
What about adding something like
<preferences> <preference name="defaultAddressType" value="pci"/> </preferences>
to the domain XML? AFAICT libvirt doesn't have any element that could be used for this purpose at the moment, but maybe having a generic way to set domain-wide preferences could be useful in other situations as well?
From the POV of being able to tell libvirt to assign PCI instead of MMIO addresses for the virt machine, I think it suffices to have
[snip] Looking at this mail and laine's before I really get the impression we are over-thinking this all. The automatic address assignment by libvirt was written such that it matched what QEMU would have done, so that we could introduce the concept of device addressing without breaking existing apps which didn't supply addresses. The automatic addressing logic certainly wasn't ever intended to be able to implement arbitrary policies. As a general rule libvirt has always aimed to stay out of the policy business, and focus on providing the mechanism only. So when we start talking about adding <preferences> to the XML this is a sure sign that we've gone too far into trying to implement policy in libvirt. libvirt accept a simple <address type="pci"/> element (ie with no bus/slot/func filled in). If applictions care about assigning devices to specify PCI buses, ie to distinguish between PCI & PCIe, or to pick a different bus when there is one bus per NUMA node, then really it is time for the application to start specifying the full <address> element with all details, and not try to invent ever more complex policies inside libvirt which will never deal with all application use cases. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :)
First, we make sure that
<controller type='pci' index='0' model='pcie-root'/>
is always added automatically to the domain XML when using the mach-virt machine type. Then, if
<controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/>
is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes.
Does this sound like a good idea? ... or a variation of that, anyway :-)
What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio. This sounds reasonable.
However, I'm wondering if we shouldn't be even more explicit about this... From libvirt's point of view we just need to agree on some sort of "trigger" that causes it to allocate PCI addresses instead of MMIO addresses, and for that purpose "any PCI controller or any device with a PCI address" is perfectly fine; looking at that from the point of view of an user, though? Not sure about it.
What about adding something like
<preferences> <preference name="defaultAddressType" value="pci"/> </preferences>
to the domain XML? AFAICT libvirt doesn't have any element that could be used for this purpose at the moment, but maybe having a generic way to set domain-wide preferences could be useful in other situations as well? [snip]
Looking at this mail and laine's before I really get the impression we are over-thinking this all. The automatic address assignment by libvirt was written such that it matched what QEMU would have done, so that we could introduce the concept of device addressing without breaking existing apps which didn't supply addresses. The automatic addressing logic certainly wasn't ever intended to be able to implement arbitrary policies.
As a general rule libvirt has always aimed to stay out of the policy business, and focus on providing the mechanism only. So when we start talking about adding <preferences> to the XML this is a sure sign that we've gone too far into trying to implement policy in libvirt.
From the POV of being able to tell libvirt to assign PCI instead of MMIO addresses for the virt machine, I think it suffices to have libvirt accept a simple <address type="pci"/> element (ie with no bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though - you're specifying a preference for what type of address you want, just in a different place. (The odd thing is that we're adding it only to remedy what is apparently a temporary problem, and even then it's a problem that wouldn't exist if we just said simply this: "If virtio-mmio is specified, then use virtio-mmio; If no address is specified, use pci". This puts the burden on people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address type to add a little something to the XML (and a very simple little something at that!). For a short time, this could cause problems for people who create new domains using qemu binaries that don't have a pci bus on aarch64/virt yet, but that will be easily solvable (by adding "<address type='virtio-mmio'/>", which is nearly as simple as typing "<address type='pci'/>"). A downside of setting your preference with <address type="pci"/> vs. having the preference in a separate element is that you can no longer cause a "re-addressing" of all the devices by simply removing all the <address> elements (and it's really that that got me thinking about adding hints/preferences in the <target> element). I don't know how important that is; I suppose when the producer of the XML is some other software it's a non-issue, when the producer is a human using virsh edit it would make life much simpler once in awhile (and the suggestion in the previous paragraph would eliminate that problem anyway). So is there a specific reason why we need to keep virtio-mmio as the default, and require explicitly saying "address type='pci'" in order to get a PCI address?
If applictions care about assigning devices to specify PCI buses, ie to distinguish between PCI & PCIe, or to pick a different bus when there is one bus per NUMA node, then really it is time for the application to start specifying the full <address> element with all details, and not try to invent ever more complex policies inside libvirt which will never deal with all application use cases.
First, I agree with your vigilance against unnecessary feature creep. Both because keeping things simple makes it easier to maintain and use, and also because it's very difficult (or even impossible) to change something once it's gone in, in the case that you have second thoughts. But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had already done in this thread, and which is what got us into this "feature creep" discussion in the first place :-)... I remember when we were first adding support for Q35 that we said it should be as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you should be able to do it without manually specifying any PCI addresses (that's why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is very simplistic - all slots are standard PCI and all are hotpluggable. On Q35 there could be room for choices though, in particular PCI vs PCIe and hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree that one is "getting out there"). So since there are choices, libvirt has by definition begun implementing policy when it auto-assigns *any* PCI address (current policy is "require all auto-assigned device addresses to be hotpluggable standard PCI). And if libvirt doesn't provide an easy way to choose, it will have implemented a defacto mandatory policy (unless you force full specification of PCI addresses, which goes against the "make it easy" goal). And the current policy is looking (for Q35 and aarch64/virt at least) less and less like the correct one - in the years since it was put in, 1) we've learned that qemu doesn't care, and will not ever care, that a PCI device has been plugged into a PCIe slot, 2) we now have support for several PCIe controllers we didn't previously have, 3) people have started complaining that their devices are in PCI slots rather than PCIe, 4) even though it was stated at the time I wrote the code to auto-add a pci-bridge to every Q35 domain that pci-bridge's failure to support hotplug was a temporary bug, I just learned yesterday that it apparently still doesn't work, and 5) some platforms (e.g. our favorite aarch64/virt) are emulating a hardware platform that has *never* had standard PCI slots, only PCIe. Beyond that, there is no place that provides a simple encoding of which type of controller provides which type of slot, and what is allowed to plug into what. If you require the management application/human to manually specify all the PCI addresses as soon as they have a need for one of these basic characteristics, then not only has it become cumbersome to define a domain (because the management app has to maintain a data structure to keep track of which PCI addresses are in use and which aren't), but it means that the management application also needs to know all sorts of rules about which PCI controllers are actually pcie vs. pci, and which accept hotplug devices vs. which don't, as well as things like the min/max slot number for each controller, and which ones can plug into where, e.g. a pcie-root-port can only plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port can only plug into a pcie-switch-upstream-port, etc. Requiring a management app to get all of that right just so that they can pick between a hotpluggable and non-hotpluggable slot seems like an overly large burden (and prone to error). In the end, if libvirt makes it simple for the management app to specify what kind of slot it wants, rather than requiring it to specify the exact slot, then libvirt isn't making any policy decisions, it's just making it easier for the management app to implement its own policy decisions, without requiring the management app to know all the details about which controller implements what kind of connection etc, and that does seem to fit within libvirt's purpose.

On Wed, Mar 09, 2016 at 02:09:22PM -0500, Laine Stump wrote:
On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
From the POV of being able to tell libvirt to assign PCI instead of MMIO addresses for the virt machine, I think it suffices to have libvirt accept a simple <address type="pci"/> element (ie with no bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though - you're specifying a preference for what type of address you want, just in a different place. (The odd thing is that we're adding it only to remedy what is apparently a temporary problem, and even then it's a problem that wouldn't exist if we just said simply this: "If virtio-mmio is specified, then use virtio-mmio; If no address is specified, use pci". This puts the burden on people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address type to add a little something to the XML (and a very simple little something at that!). For a short time, this could cause problems for people who create new domains using qemu binaries that don't have a pci bus on aarch64/virt yet, but that will be easily solvable (by adding "<address type='virtio-mmio'/>", which is nearly as simple as typing "<address type='pci'/>").
So, I've just seen that QEMU has decided that as of QEMU 2.6, the virt machine type will start to be versioned. This is quite convenient I think as it gives us a nice thing to hook on. ie we see a non-versioned machine type of 'virt' then we use virtio-mmio addressing, however, if we see a versioned virt-X.Y.Z machine type, then we can assume pci by default. Since the long term plan for AArch64 is to use PCI for everything, this gives us nice default behaviour from this point onwards, while not breaking compatibility for existing early adopters. Of course people with "legacy" mmio-only guests will stll have a little pain to run then on new QEMU, but honestly I think that's worth it since it will avoid us long term pain in the world where aarch64 uses pci for everything Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, 2016-03-10 at 09:56 +0000, Daniel P. Berrange wrote:
So, I've just seen that QEMU has decided that as of QEMU 2.6, the virt machine type will start to be versioned. This is quite convenient I think as it gives us a nice thing to hook on. ie we see a non-versioned machine type of 'virt' then we use virtio-mmio addressing, however, if we see a versioned virt-X.Y.Z machine type, then we can assume pci by default.
Since the long term plan for AArch64 is to use PCI for everything, this gives us nice default behaviour from this point onwards, while not breaking compatibility for existing early adopters.
Of course people with "legacy" mmio-only guests will stll have a little pain to run then on new QEMU, but honestly I think that's worth it since it will avoid us long term pain in the world where aarch64 uses pci for everything
I think it's way too early to flip the switch and default to PCI addresses: my understanding is that guest OS support is expected to be spotty at best for at least a couple more years. Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team

On Thu, Mar 10, 2016 at 02:34:02PM +0100, Andrea Bolognani wrote:
On Thu, 2016-03-10 at 09:56 +0000, Daniel P. Berrange wrote:
So, I've just seen that QEMU has decided that as of QEMU 2.6, the virt machine type will start to be versioned. This is quite convenient I think as it gives us a nice thing to hook on. ie we see a non-versioned machine type of 'virt' then we use virtio-mmio addressing, however, if we see a versioned virt-X.Y.Z machine type, then we can assume pci by default.
Since the long term plan for AArch64 is to use PCI for everything, this gives us nice default behaviour from this point onwards, while not breaking compatibility for existing early adopters.
Of course people with "legacy" mmio-only guests will stll have a little pain to run then on new QEMU, but honestly I think that's worth it since it will avoid us long term pain in the world where aarch64 uses pci for everything
I think it's way too early to flip the switch and default to PCI addresses: my understanding is that guest OS support is expected to be spotty at best for at least a couple more years.
I don't buy that. For a start the only virtio-mmio and/or pci devices we are giving to guests are virtio devices (blk, net, balloon, etc). The only current guests supporting these devices are Linux, and Linux has support for PCI on aarch64. So any Linux distro that ships today is going to support PCI. Other guests (*BSD) which may choose to support aarch64 in future would likely go straight to PCI and not bother with virtio-mmio. Likewise if we did ever get a Windows guest on aarch64 with win-virtio drivers I would expec them to be PCI based. So AFAICT, the only distros which are going to be using virtio-mmio are the Linux aarch64 early-access/tech-preview releases. I don't really see them as an important target. Even if we switch to PCI by default, I'd still expect virt-manager to be capable of being told to override the default and assign mmio addresses instad. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2016 08:40 AM, Daniel P. Berrange wrote:
On Thu, Mar 10, 2016 at 02:34:02PM +0100, Andrea Bolognani wrote:
On Thu, 2016-03-10 at 09:56 +0000, Daniel P. Berrange wrote:
So, I've just seen that QEMU has decided that as of QEMU 2.6, the virt machine type will start to be versioned. This is quite convenient I think as it gives us a nice thing to hook on. ie we see a non-versioned machine type of 'virt' then we use virtio-mmio addressing, however, if we see a versioned virt-X.Y.Z machine type, then we can assume pci by default.
Since the long term plan for AArch64 is to use PCI for everything, this gives us nice default behaviour from this point onwards, while not breaking compatibility for existing early adopters.
Of course people with "legacy" mmio-only guests will stll have a little pain to run then on new QEMU, but honestly I think that's worth it since it will avoid us long term pain in the world where aarch64 uses pci for everything
I think it's way too early to flip the switch and default to PCI addresses: my understanding is that guest OS support is expected to be spotty at best for at least a couple more years.
I don't buy that. For a start the only virtio-mmio and/or pci devices we are giving to guests are virtio devices (blk, net, balloon, etc). The only current guests supporting these devices are Linux, and Linux has support for PCI on aarch64. So any Linux distro that ships today is going to support PCI. Other guests (*BSD) which may choose to support aarch64 in future would likely go straight to PCI and not bother with virtio-mmio. Likewise if we did ever get a Windows guest on aarch64 with win-virtio drivers I would expec them to be PCI based.
So AFAICT, the only distros which are going to be using virtio-mmio are the Linux aarch64 early-access/tech-preview releases. I don't really see them as an important target. Even if we switch to PCI by default, I'd still expect virt-manager to be capable of being told to override the default and assign mmio addresses instad.
It's not that simple. The kernel supports the devices fine, but there's other bits in the chain. For example Fedora 23 + AAVMF does not work with virtio-pci... I don't understand the specifics but it's likely to do with as-yet-not-fully-upstreamed aarch64 ACPI support that we have in RHELSA. I don't think any released distro works out of the box with AAVMF + virtio-pci. CCing Drew who maybe can shed more light on specifics That said, maybe the simpler thing is to do what you suggest: default to virtio-pci for versioned virt-*, then teach tools to force <address type='virtio-mmio'/> via the XML until we know distros can support it. Means some difficult transition time for the tools in the short to medium term, since for example an older virt-manager (which doesn't explicitly specify virtio-mmio) will generate an incorrect XML config for Fedora 23 with a new libvirt (which defaults to virtio-pci). But maybe it's just best to get it out of the way now... - Cole

On 03/09/2016 02:09 PM, Laine Stump wrote:
On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :) First, we make sure that <controller type='pci' index='0' model='pcie-root'/> is always added automatically to the domain XML when using the mach-virt machine type. Then, if <controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/> is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes. Does this sound like a good idea? ... or a variation of that, anyway :-) What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio. This sounds reasonable.
However, I'm wondering if we shouldn't be even more explicit about this... From libvirt's point of view we just need to agree on some sort of "trigger" that causes it to allocate PCI addresses instead of MMIO addresses, and for that purpose "any PCI controller or any device with a PCI address" is perfectly fine; looking at that from the point of view of an user, though? Not sure about it.
What about adding something like
<preferences> <preference name="defaultAddressType" value="pci"/> </preferences>
to the domain XML? AFAICT libvirt doesn't have any element that could be used for this purpose at the moment, but maybe having a generic way to set domain-wide preferences could be useful in other situations as well? [snip]
Looking at this mail and laine's before I really get the impression we are over-thinking this all. The automatic address assignment by libvirt was written such that it matched what QEMU would have done, so that we could introduce the concept of device addressing without breaking existing apps which didn't supply addresses. The automatic addressing logic certainly wasn't ever intended to be able to implement arbitrary policies.
As a general rule libvirt has always aimed to stay out of the policy business, and focus on providing the mechanism only. So when we start talking about adding <preferences> to the XML this is a sure sign that we've gone too far into trying to implement policy in libvirt.
From the POV of being able to tell libvirt to assign PCI instead of MMIO addresses for the virt machine, I think it suffices to have libvirt accept a simple <address type="pci"/> element (ie with no bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though - you're specifying a preference for what type of address you want, just in a different place. (The odd thing is that we're adding it only to remedy what is apparently a temporary problem, and even then it's a problem that wouldn't exist if we just said simply this: "If virtio-mmio is specified, then use virtio-mmio; If no address is specified, use pci". This puts the burden on people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address type to add a little something to the XML (and a very simple little something at that!). For a short time, this could cause problems for people who create new domains using qemu binaries that don't have a pci bus on aarch64/virt yet, but that will be easily solvable (by adding "<address type='virtio-mmio'/>", which is nearly as simple as typing "<address type='pci'/>").
A downside of setting your preference with <address type="pci"/> vs. having the preference in a separate element is that you can no longer cause a "re-addressing" of all the devices by simply removing all the <address> elements (and it's really that that got me thinking about adding hints/preferences in the <target> element). I don't know how important that is; I suppose when the producer of the XML is some other software it's a non-issue, when the producer is a human using virsh edit it would make life much simpler once in awhile (and the suggestion in the previous paragraph would eliminate that problem anyway).
So is there a specific reason why we need to keep virtio-mmio as the default, and require explicitly saying "address type='pci'" in order to get a PCI address?
I explained this in my reply to Dan just now, but basically all currently released distros don't work with virtio-pci. However long term I've suggested from the start that we switch to virtio-pci by default, keying off a new enough -M virt version, as an arbitrary marker in time when hopefully most common distros work with virtio-pci
If applictions care about assigning devices to specify PCI buses, ie to distinguish between PCI & PCIe, or to pick a different bus when there is one bus per NUMA node, then really it is time for the application to start specifying the full <address> element with all details, and not try to invent ever more complex policies inside libvirt which will never deal with all application use cases.
First, I agree with your vigilance against unnecessary feature creep. Both because keeping things simple makes it easier to maintain and use, and also because it's very difficult (or even impossible) to change something once it's gone in, in the case that you have second thoughts.
But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had already done in this thread, and which is what got us into this "feature creep" discussion in the first place :-)...
I remember when we were first adding support for Q35 that we said it should be as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you should be able to do it without manually specifying any PCI addresses (that's why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is very simplistic - all slots are standard PCI and all are hotpluggable. On Q35 there could be room for choices though, in particular PCI vs PCIe and hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree that one is "getting out there"). So since there are choices, libvirt has by definition begun implementing policy when it auto-assigns *any* PCI address (current policy is "require all auto-assigned device addresses to be hotpluggable standard PCI). And if libvirt doesn't provide an easy way to choose, it will have implemented a defacto mandatory policy (unless you force full specification of PCI addresses, which goes against the "make it easy" goal).
And the current policy is looking (for Q35 and aarch64/virt at least) less and less like the correct one - in the years since it was put in, 1) we've learned that qemu doesn't care, and will not ever care, that a PCI device has been plugged into a PCIe slot, 2) we now have support for several PCIe controllers we didn't previously have, 3) people have started complaining that their devices are in PCI slots rather than PCIe, 4) even though it was stated at the time I wrote the code to auto-add a pci-bridge to every Q35 domain that pci-bridge's failure to support hotplug was a temporary bug, I just learned yesterday that it apparently still doesn't work, and 5) some platforms (e.g. our favorite aarch64/virt) are emulating a hardware platform that has *never* had standard PCI slots, only PCIe.
Beyond that, there is no place that provides a simple encoding of which type of controller provides which type of slot, and what is allowed to plug into what. If you require the management application/human to manually specify all the PCI addresses as soon as they have a need for one of these basic characteristics, then not only has it become cumbersome to define a domain (because the management app has to maintain a data structure to keep track of which PCI addresses are in use and which aren't), but it means that the management application also needs to know all sorts of rules about which PCI controllers are actually pcie vs. pci, and which accept hotplug devices vs. which don't, as well as things like the min/max slot number for each controller, and which ones can plug into where, e.g. a pcie-root-port can only plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port can only plug into a pcie-switch-upstream-port, etc. Requiring a management app to get all of that right just so that they can pick between a hotpluggable and non-hotpluggable slot seems like an overly large burden (and prone to error).
In the end, if libvirt makes it simple for the management app to specify what kind of slot it wants, rather than requiring it to specify the exact slot, then libvirt isn't making any policy decisions, it's just making it easier for the management app to implement its own policy decisions, without requiring the management app to know all the details about which controller implements what kind of connection etc, and that does seem to fit within libvirt's purpose.
I guess the main thing would be to understand usecases... outside of aarch64 is there a real reason for q35 that apps might need one address policy over another? If it's something super obscure, like only for testing or dev, then I think asking people to do it manually is fine. At least for the arm case I think we can side step the question by adding the <address type='pci'/> allocation request, and flipping the default from virtio-mmio to virtio-pci at some future point - Cole
participants (4)
-
Andrea Bolognani
-
Cole Robinson
-
Daniel P. Berrange
-
Laine Stump