On 04/15/2013 04:09 PM, Laine Stump wrote:
On 04/15/2013 06:29 AM, Daniel P. Berrange wrote:
> On Fri, Apr 12, 2013 at 11:46:15AM -0400, Laine Stump wrote:
>> On 04/11/2013 07:23 AM, Michael S. Tsirkin wrote:
>>> On Thu, Apr 11, 2013 at 07:03:56AM -0400, Laine Stump wrote:
>>>> On 04/10/2013 05:26 AM, Daniel P. Berrange wrote:
>>>>>>> So if we later allowed for mutiple PCI roots, then we'd
have something
>>>>>>> like
>>>>>>>
>>>>>>> <controller type="pci-root"
index="0">
>>>>>>> <model name="i440FX"/>
>>>>>>> </controller>
>>>>>>> <controller type="pci-root"
index="1">
>>>>>>> <model name="i440FX"/>
>>>>>>> </controller>
>>>>>>> <controller type="pci"
index="0"> <!-- Host bridge 1 -->
>>>>>>> <address type='pci' domain='0'
bus='0' slot='0''/>
>>>>>>> </controller>
>>>>>>> <controller type="pci"
index="0"> <!-- Host bridge 2 -->
>>>>>>> <address type='pci' domain='1'
bus='0' slot='0''/>
>>>>>>> </controller>
>>
>> There is a problem here - within a given controller type, we will now
>> have the possibility of multiple controllers with the same index - the
>> differentiating attribute will be in the<address> subelement, which
>> could create some awkwardness. Maybe instead this should be handled with
>> a different model of pci controller, and we can add a "domain"
attribute
>> at the toplevel rather than specifying an<address>?
> IIUC there is a limit on the number of PCI buses you can create per
> domain, due to fixed size of PCI addresses. Google suggests to me
> the limit is 256. So for domain 1, we could just start index at
> 256, and domain 2 at 512, etc
Okay. Whether we choose that method, or a separate domain attribute, I'm
satisfied that we'll be able to find a way to solve it when the time
comes (and it hasn't yet), so we can ignore that problem for now.
*PLEASE* don't create a new/competing naming/numbering scheme for
differentiating
PCI domains.... as much as I dislike the overuse of the term 'domain', it's
what
is used. No sane person is going to look to assign PCI bus numbers > 256 in order
to get new/different domains.
The name sucks, but that's what it's called in the code, and what customers are
use to.
>
>
>> Comment: I'm not quite convinced that we really need the separate
>> "pci-root" device. Since 1) every pci-root will *always* have either a
>> pcie-root-bus or a pci-bridge connected to it, 2) the pci-root-bus will
>> only ever be connected to the pci-root, and 3) the pci-bridge that
>> connects to it will need special handling within the pci-bridge case
>> anyway, why not:
>>
>> 1) eliminate the separate pci-root controller type
> Ok, lets leave it out - we can always add it later if desired.
Okay.
Not so fast.... something that represents the PCI Root Complex might be
handy -- error handling and embedded devices (like IOMMUs, intr-remapping table)
come to mind... ACPI tables (if they get duped from real systems) may
need unconventional naming schemes for qemu if an RC isn't modelled.
>
>> 2) within<controller type='pci'>, a new<model
type='pci-root-bus'/>
>> will be added.
>>
>> 3) a pcie-root-bus will automatically be added for q35 machinetypes, and
>> pci-root-bus for any machinetype that supports a PCI bus (e.g. "pc-*")
>>
>> 4) model type='pci-root-bus' will behave like pci-bridge, except that it
>> will be an implicit device (nothing on qemu commandline) and it won't
>> need an<address> element (neither will pcie-root-bus).
>>
>> 5) to support multiple domains, we can simply add a "domain" attribute
>> to the toplevel of controller.
> Or use index numbers modulo 256 to identify domain numbers.
Right. One or the other. But we can defer that discussion.
Just say 'domain' .... again! ;-)
> One note on q35 - we need to make sure whatever we do in terms of creating
> default<controller>s in the XML 'just works' for applications. eg if
they
> define a guest using<type machine="q35">hvm</type>, and then
add a
> <interface>, it should do the right thing wrt PCI addressing/connectivity.
> We must not require applications to manually add<controller> elements
> for q35 for things to work. Adding<controller>s must purely be an opt-in
> for apps which have the detailed knowledge rquired& need full control
> over bus layout.
Yep. What I see happening is that the place where we currently add
default controllers will, in the future, automatically add this for
machinetype pc* and rhel-*:
<controller type='pci'> <!-- implied index='0' -->
<model type='pci-root'/>
</controller>
and for machinetype q35* it will add (something like):
<controller type='pci'> <!-- index='0' -->
<model type='pcie-root'/>
</controller>
<controller type='pci'> <!-- index='1' -->
<model type='dmi-to-pci-bridge'/>
<address type='pci' bus='0' slot='0x1e'/>
</controller>
<controller type='pci'> <!-- index='2' -->
<model type='pci-bridge'>
<address type='pci' bus='1' slot='1'/>
</controller>
The slot-auto-reserve code will look through all pci controllers and
only auto-reserve slots on controllers appropriate for the given device
- controller 0 is already inappropriate for PCI devices, and we can mark
the dmi-to-pci-bridge type as being inappropriate for auto-reserve
(since, if I recall correctly, I was told that you can't hotplug devices
on that bus). So, all new PCI devices in the config will get addresses
with bus='2'.
Of course this means that it will not be possible to switch an existing
domain config from pc to q35 simply by changing the machinetype - the
bus number in the address of all devices will need to be changed from 0
to 2. But this is another case of "opt in", and already requires editing
the domain config anyway. If someone creates a brand new q35 machine
though, all PCI devices will get added with bus='whatever is the bus
number of the first pci-root or pci-bridge controller' (in this case, '2').
So, here are the proposed pci controller types cleaned up an
re-summarized, followed by an example.
<controller type='pci'>
=======================
This will be used for *all* of the following PCI controller devices
supported by qemu:
<model type='pci-root'/> (implicit/integrated)
------------------------
Upstream: implicit connection to the host
Downstream: 32 slots (slot 0 reserved), PCI devices only
qemu commandline: nothing (implicit in the pc-* etc. machinetypes)
This controller represents a pc* (or rhel-*) machine's integrated PCI
bus (pci.0) and provides places for PCI devices to connect (including
the "pci-bridge" type of PCI controller).
There is only one of these controllers, and it will *always* be
index='0', and will have no<address> element.
ok.
<model type='pcie-root'/> (implicit/integrated)
-------------------------
Upstream: implicit connection to the host
Downstream: 32 slots (slot 0 reserved), PCIe devices only, no hotplug.
qemu commandline: nothing (implicit in the q35-* machinetype)
This controller represents a q35's PCI "root complex", and provides
places for PCIe devices to connect. Examples are root-ports,
dmi-to-pci-bridges sata controllers, integrated sound/usb/ethernet
devices (do any of those integrated devices that can be connected to
the pcie-root-bus exist yet?).
There is only one of these controllers, and it will *always* be
index='0', and will have no<address> element.
ok.
<model type='root-port'/> (ioh3420)
-------------------------
Upstream: PCIe, connect to pcie-root-bus *only* (?)
Downstream: 1 slot, PCIe devices only (?)
qemu commandline: -device ioh3420,...
These can only connect to the "pcie-root" of a q35. Any PCIe
devices can connect to it, including an upstream-switch-port.
ioh on q35; ich9/10/xx for other intel chipsets
<model type='upstream-switch-port'/> (x3130-upstream)
------------------------------------
Upstream: PCIe, connect to pcie-root-bus, root-port, or
downstream-switch-port (?)
Downstream: 32 slots, connect *only* to downstream-switch-port
qemu-commandline: -device x3130-upstream
This is the upper side of a switch that can multiplex multiple devices
onto a single port. It's only useful when one or more downstream switch
ports are connected to it.
<model type='downstream-switch-port'/> (xio3130-downstream)
--------------------------------------
Upstream: connect *only* to upstream-switch-port
Downstream: 1 slot, any PCIe device
qemu commandline: -device xio3130-downstream
You can connect one or more of these to an upstream-switch-port in order
to effectively plug multiple devices into a single PCIe port.
ugh! one cannot have 3130-downstream w/o 3130upstream;
simplify: PCIe-PPB-up; PCIe-PPB-down -- then it can be anything (not TI, not IDT, not
Intel, etc.).
<model type='dmi-to-pci-bridge'/> (i82801b11-bridge)
---------------------------------
(btw, what does "dmi" mean?)
Upstream: pcie-root *only*
Downstream: 32 slots, any PCI device (including "pci-bridge"),
no hotplug (?)
qemu commandline: -device i82801b11-bridge,...
This is the gateway to the world of standard old PCI.
why needed?
<model type='pci-bridge'/> (pci-bridge)
--------------------------
Upstream: PCI, connect to 1) pci-root, 2) dmi-to-pci-bridge
3) another pci-bridge
Downstream: any PCI device, 32 slots
qemu commandline: -device pci-bridge,...
This differs from dmi-to-pci-bridge in that its upstream connection is
PCI rather than PCIe (so it will work on an i440FX system, which has a
pci-root rather than pcie-root) and that hotplug is supported. In
general, if a guest will have any PCI devices, one of these
controllers should be added, and the PCI devices connected to it
rather than to the dmi-to-pci-bridge.
************************************
(For q35, we *may* decide to always auto-add a dmi-to-pci-bridge at
00:1E.0, and a pci-bridge on slot 1 of the dmi-to-pci-bridge. This
will allow a continuation of the tradition of simply adding new
devices to the config without worrying about where they connect.)
============================================================================
Just to make sure this config model will work, here is the XML to
replicate the layout (only the ones involved in the PCI tree, along with
3 ethernet devices as examples) of the X58 hardware I have sitting under
my desk (I've attached lspci and virsh nodedev-list --tree output from
that machine):
<controller type='pci' index='0'>
<model type='pcie-root'/>
</controller>
<controller type='pci' index='1'>
<model type='root-port'/>
<address type='pci' bus='0' slot='1'/>
</controller>
( there is a scsi controller connected to bus='1')
<controller type='pci' index='2'>
<model type='root-port'/>
<address type='pci' bus='0' slot='3'/>
</controller>
(the VGA controller is connected to bus='2')
<controller type='pci' index='3'>
<model type='root-port'/>
<address type='pci' bus='0' slot='7'/>
</controller>
(PCIe SRIOV network card (in external PCIe slot) connected to bus='3')
<controller type='pci' index='4'>
<model type='root-port'/>
<address type='pci' bus='0' slot='0x1c'
function='0'/>
</controller>
(unused PCIe slot available on bus='4')
<!-- pcie-root (0:1c.4) -> root-port (5:0.0) -> onboard ethernet ->
<controller type='pci' index='5'>
<model type='root-port'/>
<address type='pci' bus='0' slot='0x1c'
function='4'/>
</controller>
<interface type='blah'>
...
<mac address='00:27:13:53:db:76'/>
<address type='pci' bus='5' slot='0'
function='0'/>
</interface>
<!-- more complicated connection to 2nd systemboard ethernet -->
<!-- pcie-root ->(0:1c:5)root-port -> (6:0.0)upstream-switch-port
-> (7:3.0)downstream-switch-port -> (9:0.0)ethernet -->
<controller type='pci' index='6'>
<model type='root-port'/>
<address type='pci' bus='0' slot='0x1c'
function='5'/>
</controller>
<controller type='pci' index='7'>
<model type='upstream-switch-port'/>
<address type='pci' bus='6' slot='0'
function='0'/>
</controller>
<controller type='pci' index='8'>
<model type='downstream-switch-port'/>
<address type='pci' bus='7' slot='2'
function='0'/>
</controller>
<controller type='pci' index='9'>
<model type='downstream-switch-port'/>
<address type='pci' bus='7' slot='3'
function='0'/>
</controller>
<interface type='blah'>
...
<mac address='00:27:13:53:db:77'/>
<address type='pci' bus='9' slot='0'
function='0'/>
</interface>
<!-- old-fashioned PCI ethernet in an external PCI slot -->
<controller type='pci' index='0x0a'>
<model type='dmi-to-pci-bridge'/>
<address type='pci' bus='0x1e' slot='0'
function='0'/>
</controller>
<interface type='blah'>
...
<mac address='00:03:47:7b:63:e6'/>
<address type='pci' bus='0x0a' slot='0x0e'
function='0'/>
</interface>
So I think this will all work. Does anyone see any problems?
If not, then we can draw it all back to the *current* patchset - support
for multiple PCI buses using the pci-bridge device. For *that*, we only
need to implement the follow bits of the above:
1) There will be a new<controller type='pci'> device, with a<model
type='xyz'/> subelement. Initially we will support types "pci-root"
and
"pci-bridge" (all the other types discussed above can be added later).
pci-root will have *no<address>* element (and will generate nothing on
the qemu commandline, but will create a 32 slot "bus='0'" to plug PCI
devices into). pci-bridge will have an<address> element, will generate
a -device option on the qemu commandline, and will also create a 32 slot
"bus='n'" to plug PCI devices into.
2) for machinetypes that have a PCI bus, the config should have this
controller auto-added:
<controller type='pci'>
<model type='pci-root'/>
</controller>
This will make bus='0' available (but add nothing to the qemu
commandline). Any attempt to add a PCI device when there is no bus
available should be an error.
3) The way to add more buses will be to add a controller like this:
<controller type='pci'>
<model type='pci-bridge'/>
</controller>
for legacy PCI, yes; but for PCIe, one needs PCIe-PPB-up & at least one
PCI-PPB-down
One _cannot_ have just a single pci-bridge except as driving bridge from
a root-complex port.
4) When<controller type='usb'> was added, resulting in
auto-generated
devices, that caused problems when migrating from a host with newer
libvirt to one with older libvirt. We need to make sure we don't suffer
the same problem this time. See the following two BZes for details
(unless you have a better memory than me! :-):
https://bugzilla.redhat.com/show_bug.cgi?id=815503
https://bugzilla.redhat.com/show_bug.cgi?id=856864
(and note how danpb eerily prophesied the current pending situation :-)
I think everything else about Jan's/Liguang's pci-bridge patches can remain.