Re: [libvirt] [PATCH] qemu: Permit PCI-free aarch64 mach-virt guests

17 Jun 2016

On Fri, 2016-06-17 at 11:36 -0400, Laine Stump wrote:
...
On 06/17/2016 08:43 AM, Andrea Bolognani wrote:
...
There has been some progress lately in enabling virtio-pci on
aarch64 guests; however, guest OS support is still spotty at best,
so most guests are going to be using virtio-mmio instead.
 
Currently, mach-virt guests are closely modeled after q35 guests,
and that includes always adding a dmi-to-pci-bridge that's just
impossible to get rid of.
 
Yeah. Sorry my simple patches from yesterday didn't get you as far as 
you needed to go.
That's quite all right, you did a bunch of work and I merely
walked the last mile :)
...
...
            * other than the pcie-root. This is so that there will be hot-pluggable
-         * PCI slots available
+         * PCI slots available.
+         *
+         * We skip this step for aarch64 mach-virt guests, where we want to
+         * be able to have a pure virtio-mmio topology
            */
           if (virDomainControllerFind(def, VIR_DOMAIN_CONTROLLER_TYPE_PCI, 1) < 0 &&
+            !qemuDomainMachineIsVirt(def) &&
 
You're assuming that the only virt* machinetypes will be aarch64, which 
may be reasonable now, but not in the future (periodically someone from 
qemu will mention the idea of a "virt" machinetype for x86, which is 
legacy-free and accepts only virtio devices). Wouldn't a more specific 
comparison be better here (and in the other places in this patch)?
I'll reply to this one in the sub-thread Martin started.
...
...
               !virDomainDefAddController(def, VIR_DOMAIN_CONTROLLER_TYPE_PCI, 1,
                                          VIR_DOMAIN_CONTROLLER_MODEL_DMI_TO_PCI_BRIDGE))
               goto cleanup;
 
Okay, so pcie-root is still always "there", which isn't a problem since 
the presence of pcie-root in the config doesn't actually change anything 
in the qemu commandline or resulting virtual machine (it's a default 
device in qemu that can't be removed).
Exactly, that's why it *has* to be in the domain XML. Since
it can't be removed or disabled, it should always be displayed
to the user.
...
...

diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
index ca3adc0..883264a 100644
--- a/src/qemu/qemu_domain_address.c
+++ b/src/qemu/qemu_domain_address.c
@@ -1483,9 +1483,15 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def,
           }
   
           /* Reserve 1 extra slot for a (potential) bridge only if buses
-         * are not fully reserved yet
+         * are not fully reserved yet.
+         *
+         * We don't reserve the extra slot for aarch64 mach-virt guests
+         * either because we want to be able to have pure virtio-mmio
+         * guests, and reserving this slot would force us to add at least
+         * a dmi-to-pci-bridge to an otherwise PCI-free topology
            */
           if (!buses_reserved &&
+            !qemuDomainMachineIsVirt(def) &&
               virDomainPCIAddressReserveNextSlot(addrs, &info, flags) < 0)
               goto cleanup;
 
You could save some time by just putting the whole thing from "for (i = 
0; i < addrs->nbuses; i++)" down to that "goto cleanup" inside an "if 
(!qemuDomainMachineIsVirt(def)) { ... }". (maybe replacing 
qemuDomainMachineIsVirt() with something more specific, as noted above).
I didn't change that because it would have added one level
of nesting to the code, and qemuDomainPCIBusFullyReserved()
should be fairly cheap anyway. We can always revisit it
later.
...
As we've discussed in IRC, eventually there should be a way to disable 
this for *every* platform, not just aarch64/virt. Possibly an 
"openPCISlots" attribute somewhere in the domain that tells how many 
slots should be left open for hotplug (with default being 1 for 
x86/440fx, and up for discussion on other platforms).
Agreed. We're not quite there yet, unfortunately, so for now
this mach-virt specific fix will have to do :)
...
...
         <target dev='sda' bus='scsi'/>
         <address type='drive' controller='0' bus='0' target='0' unit='0'/>
       </disk>
+    <controller type='pci' index='0' model='pcie-root'/>
 
This surprised me for a minute, since you're now explicitly adding 
pcie-root even though it's still implicitly added for you (and, for 
example, qemuxml2argvtest completes successfully without it). But I 
guess it doesn't hurt anything to have it in the file, as it makes it 
obvious that the dmi-to-pci-bridge isn't just being added on top of nothing.
Yeah, that was the pretty much my reasoning :)
...
(Hopefully all of this will soon be a thing of the past - for *all* 
arches/machinetypes if you put in a device with <address type='pci'/> 
(or a device that can only be connected via PCI), then all the necessary 
pci controllers should just magically appear, otherwise not).
I think we're all looking forward to that brighter day! :)
...
ACK.
Thanks, I've pushed it.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team