On Wed, 2016-08-10 at 12:10 -0400, Laine Stump wrote:
> > Note that setting disable-modern/disable-legacy does *not*
necessarily
> > do what you think, and we don't want to always have to set virtio
> > revision='1.0' in order to get a PCIe device, nor do we want all PCIe
> > devices to be revision 1.0-only. There is a case for having
> > disable-modern=off,disable-legacy=off even for a device installed on a
> > PCIe slot (it's compatible with old guest drivers, but new guest drivers
> > can take advantage of virtio 1.0, and it is legacy-PCI-free. On the
> > other hand, it requires the PCIe controller to reserve IO port space,
> > which we try to avoid.)
>
> Sheesh, you're right, I got it backwards - it's not that you
> need 1.0-only for PCIe, you *can't* have PCIe unless you also
> enable 1.0. Thanks for pointing out the mistake.
No, I think all the double negatives are confusing you :-) The only way
to *not* get PCIe on a pci-*-port connected device is if you specify
0.9-only.
For once, I was not confused! ;) I wrote
you *can't* have PCIe unless you also enable 1.0
meaning you won't get PCIe unless you enable 1.0 by using
either <virtio revision='1.0'/> on its own or both
<virtio revision='0.9'/> *and* <virtio revision='1.0'/>.
That, in turn, implies that using <virtio revision='0.9'/>
on its own will result in a legacy PCI device.
All of this only applies to endpoint devices connected to
a PCIe controller, of course. If you're connecting to a
legacy PCI controller you're going to get a legacy PCI device
no matter what virtio versions you enable[1].
with disable-modern=off,disable-legacy=off, you'll get:
pcie-root: pci/0.9+1.0
pcie-*-port: pcie/0.9+1.0
pci-bridge: pci/0.9+1.0
with disable-modern=on,disable-legacy=off (i.e. <version
revision='0.9'/>):
pcie-root: pci/0.9
pcie-*-port: pci/0.9 <-- we want to avoid this one
pci-bridge: pci/0.9
with disable-modern=on,disable-legacy=off (i.e. <version revision='0.9'/>):
Should have been "disable-modern=off,disable-legacy=on (i.e.
<virtio revision='1.0'/>)", I guess :)
pcie-root: pci/1.0
pcie-*-port: pcie/1.0
pci-bridge: pci/1.0
These are the implications that I can see of setting 0.9-only or 1.0-only:
0.9-only - PCI ID 1af4:1000, the device is always PCI, never PCIe, uses
ioport
1.0-only - PCI ID 1af4:1041, the device is PCIe when on a pcie-*-port,
otherwise PCI, doesn't use ioport
0.9+1.0 - PCI ID 1af4:1000, same rules for PCIe vs. PCI as 1.0-only,
uses ioport
I wonder if it would have been better to assign different PCI
IDs to 0.9, 0.9+1.0 and 1.0 devices. They do behave differently
enough that it might have been warranted.
On the other hand, I can see why 0.9+1.0 would have the same
PCI ID as 0.9 - so that existing drivers would recognize the
device.
Just typing out loud, really :)
> The pcie-root bit is weird - I know legacy PCI devices can be
> assigned directly to it, but I'd expect a device that can do
> both PCI and PCIe to present itself as PCIe when plugged into
> a PCIe-capable slot.
pcie-root *is* a bit weird. Alex pointed out to me yesterday that on our
Lenovo laptops, we have an ethernet device at 00:19.0 that shows itself
as a PCI device, but is using the e1000e driver (e1000e is a PCIe
device). On the other hand, the same system has an audio device at
00:1b.0 that shows up as a PCIe device (you can see this by running
"lspci -v" as root and looking for a "Capabilities" line that says
"Express ....".)
At first I thought that maybe the e1000e driver was able to
manage both PCI and PCIe devices, or that the Ethernet device
was actually PCI rather than PCIe, but it looks like neither
is the case, and it just shows up as PCI instead of PCIe
because of... Reasons?
/me sighs
> Does this mean we have to be careful *not* to assign virtio
> devices to pcie-root unless they're 0.9-only, in order to
> chase our goal of having PCI-free guests?
No. That's just the way pcie-root works. I think the important thing is
to avoid plugging legacy-only devices into pcie-*-port, and to avoid
dmi-to-pci-bridge and pci-bridge if at all possible. (And actually I
don't see any upside to making virtio devices 0.9-only for any reason
except to protect against a bug in virtio-1.0 - virtio devices will
always properly "downgrade" themselves to legacy PCI when their
connection warrants it).
Okay, going the other way around: should we try and always
add a pcie-root-port between pcie-root and a virtio device
that's either 0.9+1.0 or 1.0, to try and force it to expose
the Express capabilities? Would that buy us anything?
Does a 1.0-only virtio device plugged directly into pcie-root
still use no IO ports even though it shows up as legacy PCI?
> > (Hmm, but here's a problem - if there aren't
currently any legacy PCI
> > devices in a config (and thus no dmi-to-pci-bridge or pci-bridge), and
> > someone decides to hotplug a legacy-PCI device, *then* what do you do? I
> > don't want to have these legacy controllers in *every damn config in the
> > world* just in case some moro... er "well meaning user" thinks they
want
> > to hotplug an rtl_8139 emulated ethernet...)
>
> I don't really have an answer for that, I'm afraid.
>
> What I can say is: there are many areas in libvirt where we
> choose the setup that ensures the widest compatibility by
> default, so that users who don't want to care too much about
> the details can just stick to that and have very high chances
> of never running into trouble.
>
> I think that's a sensible course of action, as long as you
> also allow users who know what they're doing and want to
> build a very precise setup to do so.
>
> With that in mind, auto-adding a dmi-to-pci-bridge/pci-bridge
> combo by default is probably not too bad *assuming* you have
> a way to opt out of it.
Right now the only way to opt out is to put some other PCI controller at
index 1 (e.g. a pci-root-port). For some other default devices (e.g. usb
controllers and memory balloon) we take care of this with
"model='none'", e.g.:
<controller type='usb' model='none'/>
I've actually never really liked that, and it doesn't directly translate
to this use case anyway - it's not that we don't want *any* PCI
controllers, it's that we don't want a controller of a particular model.
Maybe we just need to think about the whole "reserve a spot for a
potential future hotplug device" in a different way for PCIe - on
systems with pci-root, we never really made a conscious decision that we
had to have open slots available for hotplug - it just automatically
happened that way because for legacy PCI, *all* slots (including on the
root bus) support hotplug, and each PCI bus starts out with 31 open
slots, so you're almost 100% assured that any given config would have an
empty slot available for potential hotplug. PCIe doesn't work that way
though - a basic machine has exactly 0 slots that support hotplug and
(except for adding legacy-PCI bridges) any PCI controller that you add
has only a single slot, so the chance of just coincidentally having a
slot open for hotplug is exactly 0%. So are we just taking something
that happened by chance with pci-based machines and turning it into a
requirement for pcie? Instead, maybe we could just tell people if they
might want hotplug, to add a few empty pci-*-port devices. (after all,
if we were going to add them automatically, how many should we add? 1?
32? No matter what you pick, it's too many for some, and not enough for
others.
Mh, if we plugged a dmi-to-pci-bridge and pci-bridge, plus 30
or so pcie-root-ports, the non-expert user would never have to
care about not having free hotpluggable PCI *or* PCIe slots.
We could *not* do that if the exising topology is in any way
different from the one we would build. But this approach would
be extremely fragile, I'm afraid, so let's not go there.
We also said times and times again that we don't want libvirt
to carry any policy, and all of this stuff stinks like policy
to me. Maybe we should go with an extremely basic default
topology (eg. pcie-root only) and try to plug devices in any
way that would work, without trying to satisfy other
constraints like "leave a few hotpluggable PCIe (and PCI?)
slots ready for additional devices".
I agree that we need to have a way to make life easier for
higher-level tools, and that we should centralize this complex
stuff to avoid having the same issues over and over again. But
maybe that should be a separate API rather than the core
address assignment logic? Or a tool built on top of libvirt?
Dan mentioned in another sub-thread that libvirt-designer was
supposed to be that tool. I haven't looked into it yet, to be
honest, but I'll definitely do that Really Soon Now™.
[1] Or rather not disable.
--
Andrea Bolognani / Red Hat / Virtualization