On Tue, Apr 16, 2013 at 12:35:29PM -0400, Laine Stump wrote:
On 04/15/2013 06:14 PM, Don Dutile wrote:
> On 04/15/2013 04:09 PM, Laine Stump wrote:
>> On 04/15/2013 06:29 AM, Daniel P. Berrange wrote:
>>> On Fri, Apr 12, 2013 at 11:46:15AM -0400, Laine Stump wrote:
>>>> On 04/11/2013 07:23 AM, Michael S. Tsirkin wrote:
>>>>> On Thu, Apr 11, 2013 at 07:03:56AM -0400, Laine Stump wrote:
>>>>>> On 04/10/2013 05:26 AM, Daniel P. Berrange wrote:
>>>>>>>>> So if we later allowed for mutiple PCI roots, then
we'd have
>>>>>>>>> something
>>>>>>>>> like
>>>>>>>>>
>>>>>>>>> <controller type="pci-root"
index="0">
>>>>>>>>> <model name="i440FX"/>
>>>>>>>>> </controller>
>>>>>>>>> <controller type="pci-root"
index="1">
>>>>>>>>> <model name="i440FX"/>
>>>>>>>>> </controller>
>>>>>>>>> <controller type="pci"
index="0"> <!-- Host bridge 1 -->
>>>>>>>>> <address type='pci'
domain='0' bus='0' slot='0''/>
>>>>>>>>> </controller>
>>>>>>>>> <controller type="pci"
index="0"> <!-- Host bridge 2 -->
>>>>>>>>> <address type='pci'
domain='1' bus='0' slot='0''/>
>>>>>>>>> </controller>
>>>>
>>>> There is a problem here - within a given controller type, we will now
>>>> have the possibility of multiple controllers with the same index - the
>>>> differentiating attribute will be in the<address> subelement,
which
>>>> could create some awkwardness. Maybe instead this should be handled
>>>> with
>>>> a different model of pci controller, and we can add a
"domain"
>>>> attribute
>>>> at the toplevel rather than specifying an<address>?
>>> IIUC there is a limit on the number of PCI buses you can create per
>>> domain, due to fixed size of PCI addresses. Google suggests to me
>>> the limit is 256. So for domain 1, we could just start index at
>>> 256, and domain 2 at 512, etc
>>
>> Okay. Whether we choose that method, or a separate domain attribute, I'm
>> satisfied that we'll be able to find a way to solve it when the time
>> comes (and it hasn't yet), so we can ignore that problem for now.
>>
>>
> *PLEASE* don't create a new/competing naming/numbering scheme for
> differentiating
> PCI domains.... as much as I dislike the overuse of the term 'domain',
> it's what
> is used. No sane person is going to look to assign PCI bus numbers >
> 256 in order
> to get new/different domains.
> The name sucks, but that's what it's called in the code, and what
> customers are use to.
I infer from this that you're okay with:
<controller type='pci' domain='n' index='n'>
when defining a new controller (using "index" instead of "bus" is a
bit
bothersome, but it is following current convention; should we reconsider
and just call it "bus" this time?), and:
<address domain='n bus='n' slot='n' function='n'/>
For specifying where to connect a PCI/PCIe device (including PCI[e]
controllers).
I think I prefer index to bus, since bus would make people
think it's the bus number.
I'm still confused as to what does domain mean exactly.
If you identify where device is connected using the
index value, why do you need a separate domain?
>>>
>>>
>>>> Comment: I'm not quite convinced that we really need the separate
>>>> "pci-root" device. Since 1) every pci-root will *always* have
either a
>>>> pcie-root-bus or a pci-bridge connected to it, 2) the pci-root-bus
>>>> will
>>>> only ever be connected to the pci-root, and 3) the pci-bridge that
>>>> connects to it will need special handling within the pci-bridge case
>>>> anyway, why not:
>>>>
>>>> 1) eliminate the separate pci-root controller type
>>> Ok, lets leave it out - we can always add it later if desired.
>>
>> Okay.
>>
>
> Not so fast.... something that represents the PCI Root Complex might be
> handy -- error handling and embedded devices (like IOMMUs,
> intr-remapping table)
> come to mind... ACPI tables (if they get duped from real systems) may
> need unconventional naming schemes for qemu if an RC isn't modelled.
If I understand it correctly, that's what I intend <model
type='pcie-root'/> to be - it is what shows up as "bus 0" in the
output
of lspci.
Everything, including embedded/integrated devices, is shown as
connected to domain 0 bus 0, isn't it?
There does not have to be a bus 0 in the output of lspci,
even though current guests enumerate bus numbers from 0
so there currently usually is one, this might not hold
in the future.
I am not sure whether libvirt uses any magic values
itself but I think it's preferable not to have magic numbers.
In fact, we only use numbers for the index field for compatibility.
What Dan had originally suggested was a separate "root" that only
specified "i440FX" or "q35", and that the devices I'm naming
"pci-root"
and "pcie-root" in this latest draft would just magically connect to
that "root" by giving their own connection address as
"slot='0'" of
their own bus. That seemed redundant to I suggested removing it and just
having a special kind of PCI controller that had no connection address.
Maybe for the root it makes sense to optionally specify
domain instead of index in the address.
>
>>>
>>>> 2) within<controller type='pci'>, a new<model
type='pci-root-bus'/>
>>>> will be added.
>>>>
>>>> 3) a pcie-root-bus will automatically be added for q35
>>>> machinetypes, and
>>>> pci-root-bus for any machinetype that supports a PCI bus (e.g.
"pc-*")
>>>>
>>>> 4) model type='pci-root-bus' will behave like pci-bridge,
except
>>>> that it
>>>> will be an implicit device (nothing on qemu commandline) and it
won't
>>>> need an<address> element (neither will pcie-root-bus).
>>>>
>>>> 5) to support multiple domains, we can simply add a "domain"
attribute
>>>> to the toplevel of controller.
>>> Or use index numbers modulo 256 to identify domain numbers.
>>
>>
>> Right. One or the other. But we can defer that discussion.
>>
> Just say 'domain' .... again! ;-)
Okay, I'm on board with you (and we have quite awhile to convince anyone
who isn't :-)
Looks sane. So there shouldn't be any need for the domain
attribute anywhere as long as you only have one root, right?
>
>>
>>
>>> One note on q35 - we need to make sure whatever we do in terms of
>>> creating
>>> default<controller>s in the XML 'just works' for applications.
eg if
>>> they
>>> define a guest using<type machine="q35">hvm</type>,
and then add a
>>> <interface>, it should do the right thing wrt PCI
>>> addressing/connectivity.
>>> We must not require applications to manually add<controller>
elements
>>> for q35 for things to work. Adding<controller>s must purely be an
>>> opt-in
>>> for apps which have the detailed knowledge rquired& need full control
>>> over bus layout.
>>
>> Yep. What I see happening is that the place where we currently add
>> default controllers will, in the future, automatically add this for
>> machinetype pc* and rhel-*:
>>
>> <controller type='pci'> <!-- implied index='0'
-->
>> <model type='pci-root'/>
>> </controller>
>>
>> and for machinetype q35* it will add (something like):
>>
>> <controller type='pci'> <!-- index='0' -->
>> <model type='pcie-root'/>
>> </controller>
>> <controller type='pci'> <!-- index='1' -->
>> <model type='dmi-to-pci-bridge'/>
>> <address type='pci' bus='0' slot='0x1e'/>
>> </controller>
>> <controller type='pci'> <!-- index='2' -->
>> <model type='pci-bridge'>
>> <address type='pci' bus='1' slot='1'/>
>> </controller>
>>
>> The slot-auto-reserve code will look through all pci controllers and
>> only auto-reserve slots on controllers appropriate for the given device
>> - controller 0 is already inappropriate for PCI devices, and we can mark
>> the dmi-to-pci-bridge type as being inappropriate for auto-reserve
>> (since, if I recall correctly, I was told that you can't hotplug devices
>> on that bus). So, all new PCI devices in the config will get addresses
>> with bus='2'.
>>
>> Of course this means that it will not be possible to switch an existing
>> domain config from pc to q35 simply by changing the machinetype - the
>> bus number in the address of all devices will need to be changed from 0
>> to 2. But this is another case of "opt in", and already requires
editing
>> the domain config anyway. If someone creates a brand new q35 machine
>> though, all PCI devices will get added with bus='whatever is the bus
>> number of the first pci-root or pci-bridge controller' (in this case,
>> '2').
>>
>> So, here are the proposed pci controller types cleaned up an
>> re-summarized, followed by an example.
>>
>> <controller type='pci'>
>> =======================
>>
>> This will be used for *all* of the following PCI controller devices
>> supported by qemu:
>>
>>
>> <model type='pci-root'/> (implicit/integrated)
>> ------------------------
>>
>> Upstream: implicit connection to the host
>> Downstream: 32 slots (slot 0 reserved), PCI devices only
>> qemu commandline: nothing (implicit in the pc-* etc. machinetypes)
>>
>> This controller represents a pc* (or rhel-*) machine's integrated PCI
>> bus (pci.0) and provides places for PCI devices to connect (including
>> the "pci-bridge" type of PCI controller).
>>
>> There is only one of these controllers, and it will *always* be
>> index='0', and will have no<address> element.
>>
> ok.
>
>>
>> <model type='pcie-root'/> (implicit/integrated)
>> -------------------------
>>
>> Upstream: implicit connection to the host
>> Downstream: 32 slots (slot 0 reserved), PCIe devices only, no
>> hotplug.
>> qemu commandline: nothing (implicit in the q35-* machinetype)
>>
>> This controller represents a q35's PCI "root complex", and
provides
>> places for PCIe devices to connect. Examples are root-ports,
>> dmi-to-pci-bridges sata controllers, integrated sound/usb/ethernet
>> devices (do any of those integrated devices that can be connected to
>> the pcie-root-bus exist yet?).
>>
>> There is only one of these controllers, and it will *always* be
>> index='0', and will have no<address> element.
>>
> ok.
>
>>
>> <model type='root-port'/> (ioh3420)
>> -------------------------
>>
>> Upstream: PCIe, connect to pcie-root-bus *only* (?)
>> Downstream: 1 slot, PCIe devices only (?)
>> qemu commandline: -device ioh3420,...
>>
>> These can only connect to the "pcie-root" of a q35. Any PCIe
>> devices can connect to it, including an upstream-switch-port.
>>
> ioh on q35; ich9/10/xx for other intel chipsets
>
>>
>> <model type='upstream-switch-port'/> (x3130-upstream)
>> ------------------------------------
>>
>> Upstream: PCIe, connect to pcie-root-bus, root-port, or
>> downstream-switch-port (?)
>> Downstream: 32 slots, connect *only* to downstream-switch-port
>> qemu-commandline: -device x3130-upstream
>>
>>
>> This is the upper side of a switch that can multiplex multiple devices
>> onto a single port. It's only useful when one or more downstream switch
>> ports are connected to it.
>>
>>
>> <model type='downstream-switch-port'/> (xio3130-downstream)
>> --------------------------------------
>>
>> Upstream: connect *only* to upstream-switch-port
>> Downstream: 1 slot, any PCIe device
>> qemu commandline: -device xio3130-downstream
>>
>> You can connect one or more of these to an upstream-switch-port in order
>> to effectively plug multiple devices into a single PCIe port.
>>
> ugh! one cannot have 3130-downstream w/o 3130upstream;
> simplify: PCIe-PPB-up; PCIe-PPB-down -- then it can be anything (not
> TI, not IDT, not Intel, etc.).
Are you just saying that the upstream and downstream ports must be a
matching set? That was the intent with naming them
(up|down)stream-switch-port - since I've been led to believe that all
the different chipsets end up providing the same externally-visible
functionality (as long as all the building blocks are used from the same
set), I've just used functional names rather than specific
chipset/device names, so that the appropriate choice can be made at
domain start time, based on what's available in the qemu being run (or
the machine type that is chosen).
(the "qemu commandline" line merely tells what would be used to
implement this controller *with existing qemu 1.4 devices*)
(Hmmm - do you think that changing something like the type of upstream
and downstream switch ports would lead to Windows requiring a license
re-activation? If so, we may need to rethink this and hardcode the
specific device that's used :-/)
>
>>
>> <model type='dmi-to-pci-bridge'/> (i82801b11-bridge)
>> ---------------------------------
Based on Alex's feedback, do we maybe want to name this device
"pcie-to-pci-bridge" instead? (or maybe just "pcie-to-pci"?)
>>
>> (btw, what does "dmi" mean?)
>>
>> Upstream: pcie-root *only*
>> Downstream: 32 slots, any PCI device (including "pci-bridge"),
>> no hotplug (?)
>> qemu commandline: -device i82801b11-bridge,...
>>
>> This is the gateway to the world of standard old PCI.
>>
> why needed?
My understanding is that this is the only type of bridge that can be
directly connected to pcie-root (the "root complex") and provide plain
old PCI slots.
>
>>
>> <model type='pci-bridge'/> (pci-bridge)
>> --------------------------
For consistency with the above "pcie-to-pci-bridge" type, should we name
this "pci-to-pci-bridge" instead? (or maybe just "pci-to-pci"?)
>>
>> Upstream: PCI, connect to 1) pci-root, 2) dmi-to-pci-bridge
>> 3) another pci-bridge
>> Downstream: any PCI device, 32 slots
>> qemu commandline: -device pci-bridge,...
>>
>> This differs from dmi-to-pci-bridge in that its upstream connection is
>> PCI rather than PCIe (so it will work on an i440FX system, which has a
>> pci-root rather than pcie-root) and that hotplug is supported. In
>> general, if a guest will have any PCI devices, one of these
>> controllers should be added, and the PCI devices connected to it
>> rather than to the dmi-to-pci-bridge.
>>
>> ************************************
>> (For q35, we *may* decide to always auto-add a dmi-to-pci-bridge at
>> 00:1E.0, and a pci-bridge on slot 1 of the dmi-to-pci-bridge. This
>> will allow a continuation of the tradition of simply adding new
>> devices to the config without worrying about where they connect.)
>>
>>
>> ============================================================================
>>
>> Just to make sure this config model will work, here is the XML to
>> replicate the layout (only the ones involved in the PCI tree, along with
>> 3 ethernet devices as examples) of the X58 hardware I have sitting under
>> my desk (I've attached lspci and virsh nodedev-list --tree output from
>> that machine):
>>
>>
>> <controller type='pci' index='0'>
>> <model type='pcie-root'/>
>> </controller>
>>
>> <controller type='pci' index='1'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='1'/>
>> </controller>
>>
>> ( there is a scsi controller connected to bus='1')
>>
>>
>> <controller type='pci' index='2'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='3'/>
>> </controller>
>>
>> (the VGA controller is connected to bus='2')
>>
>> <controller type='pci' index='3'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='7'/>
>> </controller>
>>
>> (PCIe SRIOV network card (in external PCIe slot) connected to bus='3')
>>
>> <controller type='pci' index='4'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='0x1c'
function='0'/>
>> </controller>
>>
>> (unused PCIe slot available on bus='4')
>>
>> <!-- pcie-root (0:1c.4) -> root-port (5:0.0) -> onboard
>> ethernet ->
>> <controller type='pci' index='5'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='0x1c'
function='4'/>
>> </controller>
>> <interface type='blah'>
>> ...
>> <mac address='00:27:13:53:db:76'/>
>> <address type='pci' bus='5' slot='0'
function='0'/>
>> </interface>
>>
>> <!-- more complicated connection to 2nd systemboard ethernet -->
>> <!-- pcie-root ->(0:1c:5)root-port -> (6:0.0)upstream-switch-port
>> -> (7:3.0)downstream-switch-port -> (9:0.0)ethernet -->
>> <controller type='pci' index='6'>
>> <model type='root-port'/>
>> <address type='pci' bus='0' slot='0x1c'
function='5'/>
>> </controller>
>> <controller type='pci' index='7'>
>> <model type='upstream-switch-port'/>
>> <address type='pci' bus='6' slot='0'
function='0'/>
>> </controller>
>> <controller type='pci' index='8'>
>> <model type='downstream-switch-port'/>
>> <address type='pci' bus='7' slot='2'
function='0'/>
>> </controller>
>> <controller type='pci' index='9'>
>> <model type='downstream-switch-port'/>
>> <address type='pci' bus='7' slot='3'
function='0'/>
>> </controller>
>> <interface type='blah'>
>> ...
>> <mac address='00:27:13:53:db:77'/>
>> <address type='pci' bus='9' slot='0'
function='0'/>
>> </interface>
>>
>>
>> <!-- old-fashioned PCI ethernet in an external PCI slot -->
>> <controller type='pci' index='0x0a'>
>> <model type='dmi-to-pci-bridge'/>
>> <address type='pci' bus='0x1e' slot='0'
function='0'/>
>> </controller>
>> <interface type='blah'>
>> ...
>> <mac address='00:03:47:7b:63:e6'/>
>> <address type='pci' bus='0x0a' slot='0x0e'
function='0'/>
>> </interface>
>>
>> So I think this will all work. Does anyone see any problems?
>>
>> If not, then we can draw it all back to the *current* patchset - support
>> for multiple PCI buses using the pci-bridge device. For *that*, we only
>> need to implement the follow bits of the above:
>>
>> 1) There will be a new<controller type='pci'> device, with
a<model
>> type='xyz'/> subelement. Initially we will support types
"pci-root" and
>> "pci-bridge" (all the other types discussed above can be added
later).
>> pci-root will have *no<address>* element (and will generate nothing on
>> the qemu commandline, but will create a 32 slot "bus='0'" to
plug PCI
>> devices into). pci-bridge will have an<address> element, will generate
>> a -device option on the qemu commandline, and will also create a 32 slot
>> "bus='n'" to plug PCI devices into.
>>
>> 2) for machinetypes that have a PCI bus, the config should have this
>> controller auto-added:
>>
>> <controller type='pci'>
>> <model type='pci-root'/>
>> </controller>
>>
>> This will make bus='0' available (but add nothing to the qemu
>> commandline). Any attempt to add a PCI device when there is no bus
>> available should be an error.
>>
>> 3) The way to add more buses will be to add a controller like this:
>>
>> <controller type='pci'>
>> <model type='pci-bridge'/>
>> </controller>
>>
> for legacy PCI, yes;
Right. When I said "current patchset", that's what I was referring to. I
started this whole design discussion as a sub-thread of a patcheset that
just adds support for the "pci-bridge" device to the existing
pc-*/rhel-* machinetypes. I only got into the discussion of all the PCIe
controller types because I wanted to make sure that the new controller
type we added to support that would be logically expandable to support
all of the controllers needed for q35/PCIe support (which is in the
queue, but will happen later).
> but for PCIe, one needs PCIe-PPB-up & at least one PCI-PPB-down
> One _cannot_ have just a single pci-bridge except as driving bridge from
> a root-complex port.
I had thought that the possibilities for a bridge that provides a legacy
PCI bus were:
1) create an i82801b11-bridge device connected to the root complex (or
to any root-port or downstream-switch-port). This new bus will provide
32 legacy PCI slots (slot 0 reserved), but devices (currently) can't be
hot-plugged.
2) create a pci-bridge device connected to any existing legacy PCI slot
(e.g. one of the slots of an i82801b11-bridge). This new bus will
provide 32 legacy PCI slots (slot 0 reserved) and hot-plug *is* supported.
The (up|down)stream-switch-port based "buses" provide PCIe slots, not PCI.
>
>> 4) When<controller type='usb'> was added, resulting in
auto-generated
>> devices, that caused problems when migrating from a host with newer
>> libvirt to one with older libvirt. We need to make sure we don't suffer
>> the same problem this time. See the following two BZes for details
>> (unless you have a better memory than me! :-):
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=815503
>>
https://bugzilla.redhat.com/show_bug.cgi?id=856864
>>
>> (and note how danpb eerily prophesied the current pending situation :-)
>>
>>
>> I think everything else about Jan's/Liguang's pci-bridge patches can
>> remain.
>>
>
>