On 12/07/2015 10:37 AM, Cole Robinson wrote:
On 12/07/2015 07:27 AM, Daniel P. Berrange wrote:
> On Sun, Dec 06, 2015 at 09:46:56PM -0500, Cole Robinson wrote:
>> Hi all,
>>
>> I'm trying to figure out how apps should request virtio-pci for libvirt +
qemu
>> + arm/aarch64. Let me provide some background.
>>
>> qemu's arm/aarch64 original virtio support is via virtio-mmio, libvirt XML
>> <address type='virtio-mmio'/>. Currently this is what libvirt sets
as the
>> address default for all arm/aarch64 virtio devices in the XML. Long term
>> though all arm virt will likely be using virtio-pci: it's faster, enables
>> hotplug, is more x86 like, etc.
>>
>> Support for virtio-pci is newer and not as widespread. qemu has had the
>> necessary support since 2.4 at least, but the guest side isn't well
>> distributed yet. For example, Fedora 23 and earlier don't work out of the
box
>> with virtio-pci. Internal RHELSA (RHEL Server for Aarch64) builds have it
>> recently working AFAIK.
>>
>> Libvirt has some support for enabling virtio-pci with aarch64, commits added
>> by Pavel Fedin in v1.2.19. (See e8d55172544c1fafe31a9e09346bdebca4f0d6f9). The
>> patches add a PCIe controller automatically to the XML (and qemu commandline)
>> if qemu-system-aarch64 supports it. However virtio-mmio is still used as the
>> default virtio address, given the current lack of OS support.
>>
>> So we are at the point where libvirt apps want to enable this, but presently
>> there isn't a good solution; the only option is to fully allocate
<address
>> type='pci' ...> for each virtio device in the XML. This is suboptimal
for 2
>> reasons:
>>
>> #1) apps need to duplicate libvirt's non-trivial address type=pci allocation
logic
>>
>> #2) apps have to add an <address> block for every virtio device, which is
less
>> friendly than the x86 case where this is rarely required. Any XML device
>> snippets that work for x86 likely won't give the desired result for aarch64,
>> since they will default to virtio-mmio. Think virsh attach-device/attach-disk
>> commands
> Yeah this is very undesirable for a default out of the box config - we should
> always strive to "do the best thing" when no address is given.
>
>> Here are some possible solutions:
>>
>> * Drop the current behavior of adding a PCIe controller unconditionally, and
>> instead require apps to specify it in the XML. Then, if libvirt sees a PCIe
>> controller in the XML, default the virtio address type to pci. Apps will know
>> if the OS they are installing supports virtio-pci (eventually via libosinfo),
>> so this is the way we can implicitly ask libvirt 'allocate us pci
addresses'
> Yes, clearly we need to record in libosinfo whether an OS can do PCI vs
> MMIO.
>
>> Upsides:
>> - Solves both the stated problems.
>> - Simplest addition for applications IMO
>>
>> Downsides:
>> - Requires a libvirt behavior change, no longer adding the PCIe controller by
>> default. But in practice I don't think it will really affect anyone, since
>> there isn't really any OS support for virtio-pci yet, and no apps support it
>> either AFAIK.
>> - The PCIe controller is not strictly about virtio-pci, it's for enabling
>> plain emulated PCI devices as well. So there is a use case for using the PCIe
>> controller for a graphics card even while your OS doesn't yet support
>> virtio-pci. In the big picture though this is a small time window with current
>> OS, and users can work around it by manually requesting <address
>> type='virtio-mmio'/>, so medium/long term this isn't a big deal
IMO
>> - The PCIe controller XML is:
>> <controller type='pci' index='0'
model='pcie-root'/>
>> <controller type='pci' index='1'
model='dmi-to-pci-bridge'/>
>> <controller type='pci' index='2'
model='pci-bridge'/>
>> I have no idea if that's always going to be the expected XML, maybe it's
not
>> wise to hardcode that in apps. Laine?
That was only intended for the Q35 machinetype (but somehow all of them
got turned on by a patch to add pcie-root to aarch64 virt machinetypes).
pcie-root is included in the hardware by qemu on Q35, and can't be
removed. dmi-to-pci-bridge translates from the PCIe ports of pcie-root
to standard pci ports (but non-hotpluggable), and pci-bridge converts
from non-hotpluggable PCI to hotpluggable PCI, which is the kind of
slots that management applications expect to be available.
Other machinetypes don't need to do this same thing (for that matter, in
the future this may not be the most desirable way to go for Q35 - in the
2 (or is it 3) years since Q35 support was added, I've learned that
pretty much every emulated PCI device in qemu can be plugged into a PCIe
port (on the Q35 machinetype at least) with no complaints from qemu, and
we now have pcie-root-port and pcie-switch-downstream-port (with
matching pcie-switch-upstream-port) that can accept hotplugged devices,
so a Q35 machine could now be constructed as:
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1'
model='pcie-root-port'/>
<controller type='pci' index='2'
model='pcie-switch-upstream-port'/>
<controller type='pci' index='3'
model='pcie-switch-downstream-port'/>
<controller type='pci' index='4'
model='pcie-switch-downstream-port'/>
...
and the address assignment could be modified to allow auto-selection of
PCIe ports for PCI devices (the downstream ports support hotplugging
devices, but can't be hotplugged themselves, and they can only be
plugged into an upstream port, which can only be plugged into a
root-port (or a downstream-port)).
At any rate, I don't think the current PCIe bus structure should be
hardcoded anywhere. We can change what libvirt does by default any time;
existing configs will continue with what we setup in the past (and thus
won't suffer "your hardware has changed! Reactivate!!" problems), but
newly created ones will use whatever new model we come up with.
>>
>>
>> * Next idea: Users specify something like like <address
type='pci'/> and
>> libvirt fills in the address for us.
>>
>> Upsides:
>> - We can stick with the current PCIe controller default and avoid some of the
>> problems mentioned above.
>> - An auto address feature may be useful in other contexts as well.
>>
>> Downsides:
>> - Seems potentially tricky to implement in libvirt code. There's many places
>> that check type=pci and key off that, seems like it would be easy to miss
>> updating a check and cause regressions. Maybe we could add a new type like
>> auto-pci to make it explicit. There's probably some implementation trick to
>> make this safe, but at first glance it looked a little dicey.
> I'm not sure it is actually all that hairy - it might be as simple as
> updating only qemuAssignDevicePCISlots so that instread of:
>
> if (def->controllers[i]->info.type !=
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE)
> continue;
>
> It handles type=pci with no values set too
>
I think my fear was that there are other places in domain_conf that check for
ADDRESS_TYPE_PCI before we even get to assigning PCI slots. But i'll poke at it.
This is very possible. ADDRESS_TYPE_PCI means that the PCI address is
"valid". 0000:00:00.0 is a valid PCI address (although it happens to be
reserved on any x86 architecture. For the bit early on after the parse
we may need to have a separate "valid address" flag beyond the type to
prevent confusion. I do like the idea of being able to say "<address
type='pci'/> to select the bus without specifying an address though.
>> - Doesn't really solve problem #2 mentioned up above... maybe we could
change
>> the address allocation logic to default to virtio-pci if there's already a
>> virtio-pci device in the XML. But it's more work.
>> - More work for apps, but nothing horrible.
>> * Change the default address type from virtio-mmio to pci, if qemu supports
>> it. I'm listing this for completeness. In the short term this doesn't
make
>> sense as there isn't any OS releases that will work with this default.
However
>> it might be worth considering for the future, maybe keying off a particular
>> qemu version or machine type. I suspect 2 years from now no one is going to be
>> using virtio-mmio so long term it's not an ideal default.
> Yeah, when QEMU starts doing versioned machine types for AArch64 we could
> do this, but then this just kind of flips the problem around - apps now
> need to manually add <address type="mmio"> for every device if
deploying
> an OS that can't do PCI. Admittedly this is slightly easier, since address
> rules for mmio are simpler than address rules for PCI.
>
>> I think the first option is best (keying off the PCIe controller specified by
>> the user), with a longer term plan to change the default from mmio to pci. But
>> I'm not really sold on anything either way. So I'm interested if anyone
else
>> has ideas.
> I guess I'd tend towards option 1 too - only adding PCI controller if we
> actually want to use PCI with the guest.
I *kind of* agree with this, but not completely.
I think that if we know for sure based on introspection of the virtual
machine (or based on verified knowledge that we hardcode into libvirt)
that there is a PCI controller of some type that is implemented in the
machine and no way to remove it, we should put that information in the
XML. Any controllers that are optional, and don't exist in the virtual
machine if they're not added to the commandline, can be auto-added only
if needed.
So for example, if a Q35 domain was created that had just 2 emulated PCI
devices, we could auto-add just enough ports to accommodate that. (For
that matter, adding an emulated PCI device would cause the auto-add of a
pcie-switch-downstream-port, which might cause the auto-add of a
pcie-switch-upstream-port, which might cause the auto-add of a
pcie-root-port; the 2nd emulated PCI device would cause an auto-add of
another pcie-switch-downstream-port which would find a spot in the
existing pcie-switch-upstream-port) The one problem with doing this
would be that there would be no free ports for hotplugging; I'm not sure
what is the best way to deal with that; I suppose any attempt at
hotplugging a new device would lead to an error, after which you would
shut down the virtual machine, manually add in a port, then start it up
again; or maybe when there are *any* PCI devices, we could always make
sure there were 'several' extra ports for hotplugging? Neither sounds
ideal...).