On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy
wrote:
>
> On 31/10/16 13:53, David Gibson wrote:
> >
> > On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
> > >
> > > On Fri, 28 Oct 2016 18:56:40 +1100
> > > Alexey Kardashevskiy <aik(a)ozlabs.ru> wrote:
> > >
> > > >
> > > > At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
> > > > This means that vfio-pci devices attached to it (and this is
> > > > a default behaviour) hide PCIe extended capabilities as
> > > > the bus does not pass a pci_bus_is_express(pdev->bus) check.
> > > >
> > > > This changes adds a default PCI bus type property to sPAPR PHB
> > > > and uses TYPE_PCIE_BUS if none passed; older machines get
TYPE_PCI_BUS
> > > > for backward compatibility as a bus type is used in the bus name
> > > > so the root bus name becomes "pcie.0" instead of
"pci.0".
> > > >
> > > > Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
> > > > ---
> > > >
> > > > What can possibly go wrong with such change of a name?
> > > > From devices prospective, I cannot see any.
> > > >
> > > > libvirt might get upset as "pci.0" will not be available,
> > > > will it make sense to create pcie.0 as a root bus and always
> > > > add a PCIe->PCI bridge and name its bus "pci.0"?
> > > >
> > > > Or create root bus from TYPE_PCIE_BUS and force name to
"pci.0"?
> > > > pci_register_bus() can do this.
> > > >
> > > >
> > > > ---
> > > > hw/ppc/spapr.c | 5 +++++
> > > > hw/ppc/spapr_pci.c | 5 ++++-
> > > > include/hw/pci-host/spapr.h | 1 +
> > > > 3 files changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index 0b3820b..a268511 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8",
true);
> > > > .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > > > .property = "mem64_win_size", \
> > > > .value = "0", \
> > > > + }, \
> > > > + { \
> > > > + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \
> > > > + .property = "root_bus_type", \
> > > > + .value = TYPE_PCI_BUS, \
> > > > },
> > > >
> > > > static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t
index,
> > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > > index 7cde30e..2fa1f22 100644
> > > > --- a/hw/ppc/spapr_pci.c
> > > > +++ b/hw/ppc/spapr_pci.c
> > > > @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev,
Error **errp)
> > > > bus = pci_register_bus(dev, NULL,
> > > > pci_spapr_set_irq, pci_spapr_map_irq,
sphb,
> > > > &sphb->memspace,
&sphb->iospace,
> > > > - PCI_DEVFN(0, 0), PCI_NUM_PINS,
TYPE_PCI_BUS);
> > > > + PCI_DEVFN(0, 0), PCI_NUM_PINS,
> > > > + sphb->root_bus_type ?
sphb->root_bus_type :
> > > > + TYPE_PCIE_BUS);
> > >
> > > Shouldn't we ensure that sphb->root_bus_type is either
TYPE_PCIE_BUS or
> > > TYPE_PCI_BUS ?
> >
> > Yes, I think so. In fact, I think it would be better to make the
> > property a boolean that just selects PCI-E, rather than this which
> > exposes qemu (semi-)internal type names on the comamnd line.
>
> Sure, a "pcie-root" boolean property should do.
>
> However this is not my main concern, I rather wonder if we have to have
> pci.0 when we pick PCIe for the root.
Right.
I've added Andrea Bologna to the CC list to get a libvirt perspective.
Thanks for doing so: changes such as this one can have quite
an impact on the upper layers of the stack, so the earliest
libvirt is involved in the discussion the better.
I'm going to go a step further and cross-post to libvir-list
in order to give other libvirt contributors a chance to chime
in too.
Andrea,
To summarise the issue here:
* As I've said before the PAPR spec kinda-sorta abstracts the
difference between vanilla PCI and PCI-E
* However, because within qemu we're declaring the bus as PCI that
means some PCI-E devices aren't working right
* In particular it means that PCI-E extended config space isn't
available
The proposal is to change (on newer machine types) the spapr PHB code
to declare a PCI-E bus instead. AIUI this still won't make the root
complex guest visible (which it's not supposed to be under PAPR), and
the guest shouldn't see a difference in most cases - it will still see
the PAPR abstracted PCIish bus, but will now be able to get extended
config space.
The possible problem from a libvirt perspective is that doing this in
the simplest way in qemu would change the name of the default bus from
pci.0 to pcie.0. We have two suggested ways to mitigate this:
1) Automatically create a PCI-E to PCI bridge, so that new machine
types will have both a pcie.0 and pci.0 bus
2) Force the name of the bus to be pci.0, even though it's treated
as PCI-E in other ways.
We're trying to work out exactly what will and won't cause trouble for
libvirt.
Option 2) is definitely a no-no, as we don't want to be piling
up even more hacks and architecture-specific code: the PCI
Express Root Bus should be called pcie.0, just as it is on q35
and mach-virt machine types.
Option 1) doesn't look too bad, but devices that are added
automatically by QEMU are an issue since we need to hardcode
knowledge of them into libvirt if we want the rest of the PCI
address allocation logic to handle them correctly.
Moreover libvirt now has the ability of building a legacy PCI
topology without user intervention, if needed to plug in
legacy devices, on machines that have a PCI Express Root Bus,
which makes the additional bridge fully redundant...
... or at least it would, if we actually had a proper
PCIe-to-PCI bridge; AFAIK, though, the closest we have is the
i82801b11-bridge that is Intel-specific despite having so far
been abused as a generic PCIe-to-PCI bridge. I'm not even
sure whether it would work at all on ppc64.
Moving from legacy PCI to PCI Express would definitely be an
improvement, in my opinion. As mentioned, that's already the
case for at least two other architectures, so the more we can
standardize on that, the better.
That said, considering that a big part of the PCI address
allocation logic is based off whether the specific machine
type exposes a legay PCI Root Bus or a PCI Express Root Bus,
libvirt will need a way to be able to tell which one is which.
Version checks are pretty much out of the question, as they
fail as soon as downstream releases enter the picture. A
few ways we could deal with the situation:
1) switch to PCI Express on newer machine types, and
expose some sort of capability through QMP so that
libvirt can know about the switch
2) switch between legacy PCI and PCI Express based on a
machine type option. libvirt would be able to find out
whether the option is available or not, and default to
either
<controller type='pci' model='pci-root'/>
or
<controller type='pci' model='pcie-root'/>
based on that. In order to support multiple PHBs
properly, those would have to be switchable with an
option as well
3) create an entirely new machine type, eg. pseries-pcie
or whatever someone with the ability to come up with
decent names can suggest :) That would make ppc64
similar to x86, where i440fx and q35 have different
root buses. libvirt would learn about the new machine
type, know that it has a PCI Express Root Bus, and
behave accordingly
Option 1) would break horribly with existing libvirt
versions, and so would Option 2) if we default to using
PCI Express. Option 2) with default to legacy PCI and
option 3) would work just fine with existing libvirt
versions AFAICT, but wouldn't of course expose the new
capabilities.
Option 3) is probably the one that will be less confusing
to users; we might even decide to take the chance and fix
other small annoyances with the current pseries machine
type, if there's any. On the other hand, it might very well
be considered to be too big a hammer for such a small nail.
--
Andrea Bolognani / Red Hat / Virtualization