On 12/07/2015 10:33 AM, Cole Robinson wrote:
On 12/07/2015 03:19 AM, Pavel Fedin wrote:
> Hello!
>
>> - The PCIe controller XML is:
>> <controller type='pci' index='0'
model='pcie-root'/>
>> <controller type='pci' index='1'
model='dmi-to-pci-bridge'/>
>> <controller type='pci' index='2'
model='pci-bridge'/>
>> I have no idea if that's always going to be the expected XML, maybe it's
not
>> wise to hardcode that in apps.
> Since we are discussing this, i have a question.
> Why do we construct exactly this thing? What is "dmi-to-pci-bridge" and
why do we need it?
A dmi-to-pci-bridge plugs into a PCIe port (on real hardware at least,
it isn't allowed in a normal PCI slot) and provides standard PCI slots
downstream. So it is a way to convert from PCIe to PCI. However, you
can't hot-plug devices into these standard PCI slots, which is why a
pci-bridge is plugged into one of the slots of the dmi-to-pci-bridge -
to provide standard PCI slots that can accept hot-plugged devices (which
is what most management apps expect to be available).
There was considerable discussion about this when support for the Q35
chipset was added, and this bus structure was directly taken from the
advice I was given by qemu/pci people. (At the time I was concerned
about whether or not it should be allowed to plug standard PCI devices
into PCIe slots and vice versa (since that is physically not possible on
real hardware); we've since learned that qemu doesn't have much problem
with this in most cases, and I've loosened up restrictions in libvirt
(auto-assign will match types, but you can force most endpoint devices
into any PCI or PCIe slot you like.)
For AARCH64, though... well, if you want to know why it's added for that
machinetype, I guess you'd need to talk to the person who turned on
addPCIeRoot for AARCH64 :-). I actually wondered about that recently
when I was tinkering with auto-adding USB2 controllers when machinetype
is Q35 (i.e. "why are we adding an Intel/x86-specific controller to
AARCH64 machines?")
(BTW, the name "dmi-to-pci-bridge" was chosen specifically to *not* be
platform-specific. It happens that currently the only example of this
type of controller is i82801b11-bridge (as can be seen in the xml here:
<model name='i82801b11-bridge'/>
but if some other pci controller in the future behaves in the same way,
it could also be classified as a dmi-to-pci-bridge (with corresponding
different <model name...).)
> I guess this is something PC-specific,
> may be this layout has been copied from some real PC model, but i don't see any
practical sense in it.
This is likely just a side effect of the libvirt code requesting PCIe for
aarch64, but the original PCIe support was added for the x86 q35 layout.
Correct. the "addPCIeRoot" bool, created only with thought to the Q35
bus structure was overloaded to also create the other controllers, and
that detail was missed when support for pcie-root on aarch64 virt
machinetypes was added.
> Also, there are two problems with "pci-bridge":
> 1. Live migration of this thing in qemu is broken. After migration the bridge screws
up, lspci says "invalid header", and i don't
> know whether it actually works because i never attach anything behind it, because of
(2).
I didn't even know aarch64 migration was working...
Is this only a problem on aarch64, or is there a migration problem with
pci-bridge on x86 as well? (It's possible there is but it hasn't been
noticed, because pci-bridge likely isn't used much outside of q35
machines, and q35 was prohibited from migrating until qemu 2.4 due to
the embedded sata driver which didn't support migration.)
> 2. After pcie-root we have PCI-X, which supports MSI-X. And after pci-bridge we seem
to have a plain PCI, which supports only plain
> MSI. The problem here is that virtio seems to work only with MSI-X in any advanced
mode (multiqueue, vhost, etc). If i place it
> behind the bridge (and default libvirt's logic is to place the device there),
MSI-X will not work.
libvirt's PCI address allocation logic can certainly be changed, as long
as it's done in a way that won't break auto-allocation for any older
machinetypes. For example, we could make virtio devices prefer to be
plugged into a PCIe port (probably a pcie-downstream-switch-port or
pcie-root-port).
BTW, does the aarch64 virt machinetype support any controller aside from
the embedded pcie-root? Normally the ports on pcie-root don't support
hotplug directly, but need a pcie-root-port (or a set of switch ports)
plugged into them at boot time. The only examples of these types of
controllers that libvirt knows about are based on Intel chips (ioh3420,
x3130-whatever).
> The same applies to passthrough
> VFIO devices. This is especially painful because on real-life ARM64 platforms builtin
hardware seems to mandate MSI-X. For example
> on ThunderX NIC driver simply does not support anything except MSI-X.
>
Maybe this is just a factor of libvirt specifying the wrong bits on the
aarch64 command line. Do you have a working qemu commandline outside of libvirt?
Similar question - is this problem present on x86 as well?
>> * Next idea: Users specify something like like <address
type='pci'/> and
>> libvirt fills in the address for us.
> I like this one, and IMHO this would be nice to have regardless of the default.
Manual assignment of PCI layout is a tedious
> process, which is not always necessary. I think it is quite logical to allow the user
just to say: "I want this device on the PCI
> bus", and do the rest for him.
Agreed, I'll look into it in addition to the user PCI controller bits.
Right now we will auto-add only a pci-bridge if no available slot is
found for a pci device, but we will (should anyway) auto-assign a slot
on an *existing* PCIe controller if the device has PCIE as the preferred
slot type. It would be really cool if the "current machinetype" had a
"preferred controller type" for pci devices with no manually assigned
address, and would auto-add the appropriate controller according to that
(with parents auto-added as necessary).