On Thu, 2013-02-07 at 17:31 +0000, Daniel P. Berrange wrote:
On Wed, Feb 06, 2013 at 01:15:05PM -0700, Alex Williamson wrote:
> On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote:
> > 2) Are there other issues aside from implicit controller devices I
> > need to consider for q35? For example, are there any devices that (as
> > I recall is the case for some devices on "pc") may or may not be
> > present, but if they are present they are always at a particular PCI
> > address (meaning that address must be reserved)? I've also just
> > learned that certain types of PCIe devices must be plugged into
> > certain locations on the guest bus? ("root complex" devices - is
there
> > a good source of background info to learn the meaning of terms like
> > that, and the rules of engagement? libvirt will need to know/follow
> > these rules.)
>
> The GMCH (Graphics & Memory Controller Hub) defines:
>
> 00.0 - Host bridge
> 01.0 - x16 root port for external graphics
> 02.0,1 - integrated graphics device (IGD)
> 03.0,1,2,3 - management engine subsystem
>
> And the ICH defines:
>
> 19.0 - Embedded ethernet (e1000e)
> 1a.* - UHCI/EHCI
> 1b.0 - HDA audio
> 1c.* - PCIe root ports
> 1d.* - UHCI/EHCI
> 1e.0 - PCI Bridge
> 1f.0 - ISA Bridge
> 1f.2,5 - SATA
> 1f.3 - SMBUS
>
> Personally, I think these slots should be reserved for only the spec
> defined devices, and I'm not all that keen on using the remaining slots
> for anything else. Users should of course be allowed to put anything
> anywhere, but libvirt auto-placement should follow some rules.
>
> All of the above sit on what we now call bus pcie.0. This is a root
> complex, which implies that all of endpoints are root complex integrated
> endpoints. Being an integrated endpoint restricts aspects of the
> device. I've already found out the hard way that Windows actually cares
> about this and will ignore PCI assigned devices of type "Endpoint" when
> attached to the root complex bus. (endpoint, root complex, etc is
> defined in the PCIe spec, the above slot use is defined in the
> respective chipset spec)
>
> What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a
> complete, virgin PCI bus. libvirt should use that as the default
> location for any PCI device that's not a chipset component. We might be
> able to get away with installing our e1000 at 19.0, but otherwise I'm
> thinking that the list only includes uhci/ehci, hda, ahci, and the
> chipset components themselves (smbus, isa, root ports, etc...). We
> don't have "IGD", so our graphics should go on the PCI bus and the
PCI
> bridge should include functioning VGA enable bits. Maybe QXL wants to
> make itself a PCIe device, in which case it should be attached behind a
> PCIe root port at slot 01.0. Secondary PCIe graphics attach to root
> ports behind 1c.*. This is the same framework within real hardware has
> to work.
>
> Assigned devices get interesting due to the PCIe type. We've never had
> any problems attaching PCIe devices to PCI buses on PIIX (but it may be
> holding back our ability to support graphics passthrough), so assigned
> devices can probably be attached to the PCI bus. More appropriate would
> be to attach "Endpoints" behind root ports and "Integrated
Endpoints" to
> the root complex. I've got some code that will mangle the PCIe type to
> it's location in the topology, but it needs more work. That should help
> make things more flexible.
So taking all this into account there are a couple of pieces of info
libvirt will need to know in order to assign device addresses / buses
sensibly
- What devices / buses are hardcoded to be always present (and their
addresses)
- What extra "integrated" devices are available for optional
enablement and their mandatory addresses (if any)
- What bus any other devices should be placed on by default.
With this in mind, libvirt's address assignment code is basically
already broken for anything which isn't a x86 system with PIIX
controller. eg address assignment for QEMU ARM / PPC / etc is
mostly fubar. Regardless of what Q35 involves, libvirt needs to
sort out the existing mess it has in this area. Sorting this
should then make support for Q35 more or less trivial, since all
the hardwork will have been done.
Since it doesn't sound like QEMU has a pratical means to supply
the data required, without actually running the machine, there
are only two options I see
- Libvirt maintains a set of data tables with all the info for
all QEMU machine types, in all system emulators. This will
need updating as QEMU gains new machine types.
- Libvirt tries to configure the VM without any addresses,
then launches QEMU and tries to introspect it to figure out
what QEMU assigned. Then shut it down, record the addresses
in XML for later usage at the real startup point.
Neither of these are particularly appealing to me, but I have a
preference for recording a data tables, since it would be much
simpler than trying to introspect things & more efficient too.
It doesn't seem like introspection really works unless we have a fully
populated base configuration for you to parse. Even then I'm not sure
how you'd figure out whether you should be placing devices to fill the
gaps on a given bus or not. -M q35 as we have it today is just a bare
minimum shell which doesn't tell much of anything on inspection. You
need the blueprint derived from the chipset spec of how to put all the
other components together, which seems more like the data tables you
describe. Thanks,
Alex