On Mon, 2013-04-08 at 12:37 -0400, Laine Stump wrote:
On 04/05/2013 03:26 PM, Alex Williamson wrote:
> On Fri, 2013-04-05 at 14:42 -0400, Laine Stump wrote:
>> On 04/05/2013 01:38 PM, Daniel P. Berrange wrote:
>>> On Fri, Apr 05, 2013 at 12:32:04PM -0400, Laine Stump wrote:
>>>> On 04/03/2013 11:50 AM, Ján Tomko wrote:
>>>>> From: liguang <lig.fnst(a)cn.fujitsu.com>
>>>>>
>>>>> add a new controller type, then one can
>>>>> define a pci-bridge controller like this:
>>>>> <controller type='pci-bridge' index='0'/>
>>>> In the next patch we're prohibiting exactly this config
(index='0')
>>>> because the pre-existing pci bus on the "pc-*" machinetypes is
already
>>>> named pci.0. If we don't allow it, we shouldn't include it as an
example
>>>> in the commit log :-)
>>> NB, it isn't always named 'pci.0' - on many arches it is merely
'pci'.
>> Yeah, I'm just using that as a convenient shorthand. The final decision
>> on whether to use pci.0 or pci happens down in the qemuBuildCommandline().
>>
>>>> More on this - one of the things this points out is that there is no
>>>> representation in the config of the pci.0 bus, it's just assumed to
>>>> always be there. That is the case for pc-* machinetypes (and probably
>>>> several others with PCI buses), but for q35, there is no pci.0 bus in
>>>> the basic machine, only a pcie.0; if you want a pci.0 on q35 (which
>>>> *will* be necessary in order to attach any pci devices, so I imagine we
>>>> will always want one), you have to attach a pcie->pci bridge, which
is
>>>> the device "i82801b11-bridge", to pcie.0.
>>>> The reason I bring this up here, is I'm wondering:
>>>>
>>>> 1) should we have some representation of the default pci.0 bus in the
>>>> config, even though it is just "always there" for the pc
machinetypes
>>>> and there is no way to disable it, and nothing on the commandline that
>>>> specifies its existence?
>>> Yep, we should be aiming for the XML to fully describe the machine
>>> hardware. So since we're adding the concept of PCI controllers/bridges
>>> etc to the XML, we should be auto-adding the default bus to the XML.
>>>
>>>> 2) For the q35 machinetype, should we just always add an
>>>> i82801b11-bridge device and name it pci.0? Or should that need to be
>>>> present in the xml?
>>> We've been burnt before auto-adding stuff that ought to have
>>> been optional. So I'd tend towards only having the minimal
>>> config that is required. If the users want this, let them
>>> explicitly ask for the bridge
Okay. This makes for a larger burden on the
user/virt-manager/boxes/libvirt-designer, but does prevent us from
setting up an undesirable default that we can't rescue ourselves from :-)
>>>
>>> Also from the apps POV the QEMU device name is irrelevant. The
>>> XML config works off the PCI addresses. So there's no need
>>> to force/specialcase a i82801b11-bridge to use the name
>>> 'pci.0'.
>>
>> Sure. I just mean "pci bus 0" (hmm, but actually this does point out
a
>> problem with my logic - the same namespace (well, "numbering space")
is
>> used for both pcie and pci buses, so on a q35 system, bus=0 is already
>> taken by pcie.0; that means that the first pci bus would need to use a
>> different bus number anyway, so it wouldn't be so easy to switch an
>> existing domain from pc to q35 - every PCI device would need to have its
>> bus number modified. I suppose that's reasonable to expect, though.
> I would think you'd want to differentiate PCI from PCIe anyway. PCI is
> a bus and you have 32 slots per bus to fill. PCIe is a point-to-point
> link and you really only have slot 0 available. Perhaps that puts them
> in different number spaces already.
Are you saying that it's okay to have a bus=0 for pci and a different
bus=0 for pcie?
In bus=<identifier> the identifier needs to be unique, but it's not a
bus #, it's just an identifier.
I was hoping that what is used in libvirt's config could mirror
as
closely as possible the numbering that you see in the output of lspci on
the guest, but it sounds like that numbering is something done at the
whim of the guest, with no basis in (standard) reality, is that right?
Correct, the the BIOS determines the initial bus numbers and it can do
it however it wants. Most guests won't renumber buses, but they can if
they want. It's a lost cause to expect any correlation between the
libvirt bus identifier and the actual bus number.
>>>> 3) Most important - depending on the answers to (1)
and (2), should we
>>>> maybe name this device "pci", and use a different backend
depending on
>>>> index and machinetype? (or alternately explicitly specifiable with a
>>>> <driver> subelement). To be specific, we would have:
>>>>
>>>> <controller type='pci' index='0'/>
>>>>
>>>> which on pc machinetypes would just be a placeholder in the config (and
>>>> always inserted if it wasn't there, for machinetypes that have a
pci
>>>> bus). On the q35 machinetype, that same line would equate to adding an
>>>> i82801b11-bridge device (with source defaulting to
>>>> bus=pcie.0,addr=1e.0). This would serve several purposes:
>>>>
>>>> a) on pc machinetypes, it would be a visual aid indicating that pci.0
>>>> exists, and that index='0' isn't available for a new pci
controller.
>>>>
>>>> b) it would make switching a domain config from pc to q35 simpler,
since
>>>> pci.0 would always already be in place for attaching pci devices
>>>> (including pci.1, pci.2, etc)
>>>>
>>>> c) it would make the config a true complete description of the machine
>>>> being created.
>>>>
>>>> (I've suggested naming the controller "pci" rather than
"pci-bridge"
>>>> because in the case of a "root" bus like pci.0 it seems to not
be a
>>>> "bridge", but maybe the name "pci-bridge" is always
appropriate, even
>>>> when it's a root bus. Maybe someone with better pci/pcie knowledge
can
>>>> provide an opinion on this)
>>> I think "pci" is a little too generic - how about we call it
'pci-root'
>> Okay, so a separate "pci-root" device along with
"pci-bridge"? What I
>> was really hoping was to have all PCI buses represented in a common way
>> in the config. How about a controller called "pci" with different
types,
>> "root" and "bridge"? And since they use the same numbering
space as pcie
>> buses, maybe the pcie controllers (including the root and the hubs and
>> ???) would be different types of PCI controllers. That would make it
>> easier (i.e. *possible*) to avoid collisions in use of bus numbers.
>>
>> Alex or mst, any advice/opinions on how to represent all the different
>> q35 devices that consume bus numbers in a succinct fashion?
> Note that none of these are really bus numbers, they're just bus
> identifiers. The BIOS and the guest running define the bus numbers.
> "root" also has special meaning in PCI, so for instance I wouldn't
name
> a bus behind the i82801b11-bridge "pci-root". Somehow we also need to
> deal with what can be attached where. For instance a pci-bridge is a
> PCI device and can only go on a PCI bus. The equivalent structure on
> PCIe is an upstream switch port with some number of downstream switch
> ports. Each of those are specific to the bus type.
I think we're starting to get closer to the concrete problem that's
bothering me. As I understand it (and again - "what I understand" has
repeatedly been shown to be incorrect in this thread :-):
* Ihere are multiple different types of devices that provide a bus with
1 or more "slots" that PCI devices (e.g., the virtio-net-pci device, the
e1000 network device, etc) can be plugged into.
* In the config for those devices, there is a required (auto-generated
if not explicitly provided) <address> element that indicates what
controller that device is plugged into e.g.:
<interface type='direct'>
...
<address type='pci' domain='0' bus='0' slot='3'
function='0'/>
...
</interface>
The above is not really how QEMU works. QEMU PCI devices take an addr=
parameter that specifies the "slot.function". The bus= option is not
numeric. That's the identifier value. So if you create a bus with:
-device i82801b11-bridge,id=dmi-to-pci-bridge,addr=1e.0
Then to put a device on that bus you'd do:
-device e1000,id=e1000-net-0,bus=dmi-to-pci-bridge,addr=0.0
We don't have a way to generate new domains yet, but I imagine it would
require a PCI host bridge device and be a parameter to that. For
instance:
- device pci-host-bridge,domain=1,id=domain1-pcie.0
Which would create a new "pci-root". You would then specify a device in
the domain using the same bus= notation. An e1000 on bus=pcie.0 is in
domain0, but a nic on bus=domain1-pcie.0 is in domain1. This is all
just speculation though.
* domain is always hardcoded to 0, and in the past bus was also
always
hardcoded to 0 because until now there has only been a single place
where PCI devices could be connected - the builtin pci.0 bus, which is a
part of the basic "pc" (and some others) virtual machine and provides 32
slots.
Again, bus= is an identifier and I would guess that it will implicitly
specify the domain when we get that far. Libvirt specifying a numerical
domain:bus:slot.fn and expecting the device to appear there to the guest
is a flawed concept.
* Now we are adding the ability to define new PCI buses, for now just
a
single kind - a pci-bridge controller, which itself must connect to an
existing PCI slot, and provides 32 new PCI slots. But in the future
there will be more different types of controllers that provide one or
more PCI slots where PCI devices/controllers can be plugged in.
* In these patches adding support for pci-bridge, we are making the
assumption that there is a 1:1 correspondence between the "index='n'"
attribute of the pci-bridge controller and the "bus='n'" attribute of
the <address> element in devices that will be plugged into that
controller. So for example if we have:
<controller type='pci-bridge' index='1'>
<address type='pci' domain='0' bus='0' slot='10'
function='0'/>
</controller>
and then change the <interface> definition above to say
"bus='1'", that
interface device will plug into this new bus at slot 3.
Yes, you can do this, but there's no guarantee that the guest won't see
that as bus number 7. '1' is just the name of the bus libvirt is using.
It could also be named 'foo'.
* So let's assume that we add a new controller called
"dmi-to-pci-bridge:
<controller type='dmi-to-pci-bridge' index='0'/>
Ignoring for now the question of what address we give in the definition
of *this* device (which is itself problematic - do we need a new "pcie"
address type?), if some device is then defined with
<address type='pci bus='0' .../>
How do we differentiate between that meaning "the pci-ptp controller
that is index='0'" and "the pci-bridge controller that is
index='0'"? Do
we need to expand our <address> element further? If, as I think you
suggest, we have multiple different kinds of controllers that provide
PCI slots, each with its own namespace, the current pci address element
is inadequate to unambiguously describe where a pci device should be
plugged in.
Perhaps we should be referencing the "<alias name='nnn'/>"
element of
each controller in the pci address of the target device, e.g.:
<controller type='pci-bridge' index='0'>
<alias name='pci.0'/> <!-- obviously on a machine with no builtin
pci.0! -->
</controller/>
<controller type='dmi-to-pci-bridge' index='0'>
<alias name='dmi-to-pci-bridge.0'/>
</controller>
<interface type='direct'>
...
<address type='pci' controller='dmi-to-pci-bridge.0'
slot='3'
function='0'/>
</interface>
(or, since this "controller" attribute really obsolates the numeric
"bus" attribute, maybe it could be
"bus='dmi-to-pci-bridge.0'", and we
could continue to support "bus='0'" for legacy configs).
Yes, exactly. The id= of the controller becomes the bus= identifier for
interfaces on that bus.
I believe right now the alias name is always auto-generated; we
would
need to make that so that when explicitly provided it would be
guaranteed to never change (and if that's not possible to do in a
backward compatible manner, then we need to come up with some new
attribute to use in this manner)
Alternately, we could add new types to address, one for each new type of
controller, then define the devices like this:
<interface type='direct'>
<address type='pci-bridge' bus='0' slot='3'
function='0'/>
<interface
<interface type='direct'>
<address type='dmi-to-pci-bridge' bus='0' slot='3'
function='0'/>
</interface>
(yes, I know you wouldn't want to plug a network device into the
dmi-to-pci-bridge directly, this is just for the sake of example)
You'll notice that this makes the bus attribute obsolete.
I'm not sure how you get multiple devices on the same bus using this
model.
(side note: I know that this discussion has gone far beyond just
talking
about adding a single new type of controller (pci-bridge), but how we do
this device will have implications far beyond, so we need to figure it
out now.)
> For PCIe, we create new buses for root ports (ioh3420), upstream switch
> ports (xio3130-upstream), downstream switch ports (xio3130-downstream),
> and the dmi-to-pci bridge (i82801b11-bridge). For PCI, PCI-to-PCI
> bridges create new buses (pci-bridge and dec-21154).
>
> One of my goals is to move us away from emulation of specific chips and
> create more devices like pci-bridge that adhere to the standard, but
> don't try to emulate a specific device. Then we might have
"root-port",
> "pcie-upstream-switch-port", "pcie-downstream-switch-port", and
> "dmi-to-pci-bridge" (none of these names have been discussed).
That makes sense to me at the level of libvirt, but in qemu don't you
need to "emulate specific devices" anyway, in order for the guest OS to
operate properly? If that's the case and there are different chips that
implement the same functionality in a different manner, how would you
decide which of those should be chosen as "the *only" dmi-to-pci-bridge"?
The "pci-bridge" is an example of a generic device. We've created our
our virtual hardware that adheres to the necessary specifications but
doesn't emulate a specific piece of physical hardware. Root bridges,
switch ports, PCIe-to-PCI bridges, in fact even chipsets can be done the
same way. These interconnect components typically use generic class
drivers in the guest. If we go the next step and emulate specific
devices then we also need to implement all the hardware bugs and
limitations for that device as well as all the value add extensions for
the device that may or may not add value on a virtual platform. For
instance, why should our root ports be limited in width or speed to the
same as the physical hardware? The dmi-to-pci bridge might just end up
being a pcie-to-pci bridge, the dmi bus isn't really visible to the
guest anyway (but on q35 we need to install it at a specific location
because we're emulating specific hardware). Thanks,
Alex