[libvirt] q35 machine type and libvirt.

Now that qemu is getting the q35 machine type, libvirt needs to support it. As far as I understand, from libvirt's point of view, q35 is just another x86_64 system, but with a different set of implicit devices, and possibly some extra rules limiting which devices can be plugged into which bus/slot on the guest. That means that in order to support it, we will need to recognize when a q35-based machine type is being created, and auto-add all the implicit devices to libvirt's config model for that domain, then pay attention to the extra rules when assigning addresses for all the user-added devices. We already add implicit controllers/devices for pc-based machine types; as a matter of fact, currently, libvirt improperly assumes (for the purposes of adding implicit devices) that *every* virtual machine is based on the "pc" machine type (or rather it just doesn't pay attention), so it always adds all the implicit devices for a pc machine type for every domain. This of course is already incorrect for many (probably all?) non-x86 machine types, even before we add q35 into the mix. To fix this, it might be reasonable (and arguably, it's necessary to fix the problem in a backward-compatible manner) to just setup a table of machinetype ==> implicit device lists, look up the machine type in this table, and add the devices needed for that machine type. This goes against libvirt's longstanding view of machinetype as being an opaque value that it merely passes through to qemu, but it's manageable for the existing machine types (even including q35), since it's a finite set. But it starts to be a pain to maintain when you think about future additions - yet another case where new functionality in qemu will require an update to libvirt before it can be fully used via libvirt. In the long term, it would be very useful / more easily maintainable to have a qemu status command available via QMP which would return the list of implicit devices (and their PCI addresses) for any requested machine type. It would be necessary that this command be callable multiple times within a single execution of qemu, giving it a different machinetype each time. This way libvirt could first query the list of available machinetypes in this particular qemu binary, then request the list of implicit devices for each machine type (libvirt runs each available qemu binary *once* the first time it's requested, and caches all such capabilities information so that it doesn't need to re-run qemu again and again). My limited understanding of qemu's code is that qemu itself doesn't have a table of this information as data, but instead has lines of code that are executed to create it, thus making it impractical to provide the list of devices for a machinetype without actually instantiating a machine of that type. What's the feasibility of adding such a capability (and in the process likely making the list of implicit devices in qemu itself table/data driven rather than constructed with lines of code). More questions: 1) It seems that the exact list of devices for the basic q35 machine type hasn't been settled on yet, is that correct? 2) Are there other issues aside from implicit controller devices I need to consider for q35? For example, are there any devices that (as I recall is the case for some devices on "pc") may or may not be present, but if they are present they are always at a particular PCI address (meaning that address must be reserved)? I've also just learned that certain types of PCIe devices must be plugged into certain locations on the guest bus? ("root complex" devices - is there a good source of background info to learn the meaning of terms like that, and the rules of engagement? libvirt will need to know/follow these rules.) 3) What new types of devices/controllers must be supported for a properly functioning q35 machine?

On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote:
Now that qemu is getting the q35 machine type, libvirt needs to support it.
As far as I understand, from libvirt's point of view, q35 is just another x86_64 system, but with a different set of implicit devices, and possibly some extra rules limiting which devices can be plugged into which bus/slot on the guest. That means that in order to support it, we will need to recognize when a q35-based machine type is being created, and auto-add all the implicit devices to libvirt's config model for that domain, then pay attention to the extra rules when assigning addresses for all the user-added devices.
We already add implicit controllers/devices for pc-based machine types; as a matter of fact, currently, libvirt improperly assumes (for the purposes of adding implicit devices) that *every* virtual machine is based on the "pc" machine type (or rather it just doesn't pay attention), so it always adds all the implicit devices for a pc machine type for every domain. This of course is already incorrect for many (probably all?) non-x86 machine types, even before we add q35 into the mix. To fix this, it might be reasonable (and arguably, it's necessary to fix the problem in a backward-compatible manner) to just setup a table of machinetype ==> implicit device lists, look up the machine type in this table, and add the devices needed for that machine type. This goes against libvirt's longstanding view of machinetype as being an opaque value that it merely passes through to qemu, but it's manageable for the existing machine types (even including q35), since it's a finite set. But it starts to be a pain to maintain when you think about future additions - yet another case where new functionality in qemu will require an update to libvirt before it can be fully used via libvirt.
In the long term, it would be very useful / more easily maintainable to have a qemu status command available via QMP which would return the list of implicit devices (and their PCI addresses) for any requested machine type. It would be necessary that this command be callable multiple times within a single execution of qemu, giving it a different machinetype each time. This way libvirt could first query the list of available machinetypes in this particular qemu binary, then request the list of implicit devices for each machine type (libvirt runs each available qemu binary *once* the first time it's requested, and caches all such capabilities information so that it doesn't need to re-run qemu again and again). My limited understanding of qemu's code is that qemu itself doesn't have a table of this information as data, but instead has lines of code that are executed to create it, thus making it impractical to provide the list of devices for a machinetype without actually instantiating a machine of that type. What's the feasibility of adding such a capability (and in the process likely making the list of implicit devices in qemu itself table/data driven rather than constructed with lines of code).
More questions:
1) It seems that the exact list of devices for the basic q35 machine type hasn't been settled on yet, is that correct?
I think what we have currently is just a stepping stone to a base configuration. At a minimum, we're missing the PCI bridge attached to the ICH, which is where I think libvirt should attach non-chipset component devices. Next would be PCIe root ports where emulated and assigned PCIe devices could be attached.
2) Are there other issues aside from implicit controller devices I need to consider for q35? For example, are there any devices that (as I recall is the case for some devices on "pc") may or may not be present, but if they are present they are always at a particular PCI address (meaning that address must be reserved)? I've also just learned that certain types of PCIe devices must be plugged into certain locations on the guest bus? ("root complex" devices - is there a good source of background info to learn the meaning of terms like that, and the rules of engagement? libvirt will need to know/follow these rules.)
The GMCH (Graphics & Memory Controller Hub) defines: 00.0 - Host bridge 01.0 - x16 root port for external graphics 02.0,1 - integrated graphics device (IGD) 03.0,1,2,3 - management engine subsystem And the ICH defines: 19.0 - Embedded ethernet (e1000e) 1a.* - UHCI/EHCI 1b.0 - HDA audio 1c.* - PCIe root ports 1d.* - UHCI/EHCI 1e.0 - PCI Bridge 1f.0 - ISA Bridge 1f.2,5 - SATA 1f.3 - SMBUS Personally, I think these slots should be reserved for only the spec defined devices, and I'm not all that keen on using the remaining slots for anything else. Users should of course be allowed to put anything anywhere, but libvirt auto-placement should follow some rules. All of the above sit on what we now call bus pcie.0. This is a root complex, which implies that all of endpoints are root complex integrated endpoints. Being an integrated endpoint restricts aspects of the device. I've already found out the hard way that Windows actually cares about this and will ignore PCI assigned devices of type "Endpoint" when attached to the root complex bus. (endpoint, root complex, etc is defined in the PCIe spec, the above slot use is defined in the respective chipset spec) What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a complete, virgin PCI bus. libvirt should use that as the default location for any PCI device that's not a chipset component. We might be able to get away with installing our e1000 at 19.0, but otherwise I'm thinking that the list only includes uhci/ehci, hda, ahci, and the chipset components themselves (smbus, isa, root ports, etc...). We don't have "IGD", so our graphics should go on the PCI bus and the PCI bridge should include functioning VGA enable bits. Maybe QXL wants to make itself a PCIe device, in which case it should be attached behind a PCIe root port at slot 01.0. Secondary PCIe graphics attach to root ports behind 1c.*. This is the same framework within real hardware has to work. Assigned devices get interesting due to the PCIe type. We've never had any problems attaching PCIe devices to PCI buses on PIIX (but it may be holding back our ability to support graphics passthrough), so assigned devices can probably be attached to the PCI bus. More appropriate would be to attach "Endpoints" behind root ports and "Integrated Endpoints" to the root complex. I've got some code that will mangle the PCIe type to it's location in the topology, but it needs more work. That should help make things more flexible.
3) What new types of devices/controllers must be supported for a properly functioning q35 machine?
AHCI, bridges, root ports (we can skip these w/o PCIe devices, but for hotplug we might want them fully populated - otherwise everything gets hotplugged to the PCI bus). Thanks, Alex

On Wed, Feb 06, 2013 at 01:15:05PM -0700, Alex Williamson wrote:
On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote:
2) Are there other issues aside from implicit controller devices I need to consider for q35? For example, are there any devices that (as I recall is the case for some devices on "pc") may or may not be present, but if they are present they are always at a particular PCI address (meaning that address must be reserved)? I've also just learned that certain types of PCIe devices must be plugged into certain locations on the guest bus? ("root complex" devices - is there a good source of background info to learn the meaning of terms like that, and the rules of engagement? libvirt will need to know/follow these rules.)
The GMCH (Graphics & Memory Controller Hub) defines:
00.0 - Host bridge 01.0 - x16 root port for external graphics 02.0,1 - integrated graphics device (IGD) 03.0,1,2,3 - management engine subsystem
And the ICH defines:
19.0 - Embedded ethernet (e1000e) 1a.* - UHCI/EHCI 1b.0 - HDA audio 1c.* - PCIe root ports 1d.* - UHCI/EHCI 1e.0 - PCI Bridge 1f.0 - ISA Bridge 1f.2,5 - SATA 1f.3 - SMBUS
Personally, I think these slots should be reserved for only the spec defined devices, and I'm not all that keen on using the remaining slots for anything else. Users should of course be allowed to put anything anywhere, but libvirt auto-placement should follow some rules.
All of the above sit on what we now call bus pcie.0. This is a root complex, which implies that all of endpoints are root complex integrated endpoints. Being an integrated endpoint restricts aspects of the device. I've already found out the hard way that Windows actually cares about this and will ignore PCI assigned devices of type "Endpoint" when attached to the root complex bus. (endpoint, root complex, etc is defined in the PCIe spec, the above slot use is defined in the respective chipset spec)
What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a complete, virgin PCI bus. libvirt should use that as the default location for any PCI device that's not a chipset component. We might be able to get away with installing our e1000 at 19.0, but otherwise I'm thinking that the list only includes uhci/ehci, hda, ahci, and the chipset components themselves (smbus, isa, root ports, etc...). We don't have "IGD", so our graphics should go on the PCI bus and the PCI bridge should include functioning VGA enable bits. Maybe QXL wants to make itself a PCIe device, in which case it should be attached behind a PCIe root port at slot 01.0. Secondary PCIe graphics attach to root ports behind 1c.*. This is the same framework within real hardware has to work.
Assigned devices get interesting due to the PCIe type. We've never had any problems attaching PCIe devices to PCI buses on PIIX (but it may be holding back our ability to support graphics passthrough), so assigned devices can probably be attached to the PCI bus. More appropriate would be to attach "Endpoints" behind root ports and "Integrated Endpoints" to the root complex. I've got some code that will mangle the PCIe type to it's location in the topology, but it needs more work. That should help make things more flexible.
So taking all this into account there are a couple of pieces of info libvirt will need to know in order to assign device addresses / buses sensibly - What devices / buses are hardcoded to be always present (and their addresses) - What extra "integrated" devices are available for optional enablement and their mandatory addresses (if any) - What bus any other devices should be placed on by default. With this in mind, libvirt's address assignment code is basically already broken for anything which isn't a x86 system with PIIX controller. eg address assignment for QEMU ARM / PPC / etc is mostly fubar. Regardless of what Q35 involves, libvirt needs to sort out the existing mess it has in this area. Sorting this should then make support for Q35 more or less trivial, since all the hardwork will have been done. Since it doesn't sound like QEMU has a pratical means to supply the data required, without actually running the machine, there are only two options I see - Libvirt maintains a set of data tables with all the info for all QEMU machine types, in all system emulators. This will need updating as QEMU gains new machine types. - Libvirt tries to configure the VM without any addresses, then launches QEMU and tries to introspect it to figure out what QEMU assigned. Then shut it down, record the addresses in XML for later usage at the real startup point. Neither of these are particularly appealing to me, but I have a preference for recording a data tables, since it would be much simpler than trying to introspect things & more efficient too. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, 2013-02-07 at 17:31 +0000, Daniel P. Berrange wrote:
On Wed, Feb 06, 2013 at 01:15:05PM -0700, Alex Williamson wrote:
On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote:
2) Are there other issues aside from implicit controller devices I need to consider for q35? For example, are there any devices that (as I recall is the case for some devices on "pc") may or may not be present, but if they are present they are always at a particular PCI address (meaning that address must be reserved)? I've also just learned that certain types of PCIe devices must be plugged into certain locations on the guest bus? ("root complex" devices - is there a good source of background info to learn the meaning of terms like that, and the rules of engagement? libvirt will need to know/follow these rules.)
The GMCH (Graphics & Memory Controller Hub) defines:
00.0 - Host bridge 01.0 - x16 root port for external graphics 02.0,1 - integrated graphics device (IGD) 03.0,1,2,3 - management engine subsystem
And the ICH defines:
19.0 - Embedded ethernet (e1000e) 1a.* - UHCI/EHCI 1b.0 - HDA audio 1c.* - PCIe root ports 1d.* - UHCI/EHCI 1e.0 - PCI Bridge 1f.0 - ISA Bridge 1f.2,5 - SATA 1f.3 - SMBUS
Personally, I think these slots should be reserved for only the spec defined devices, and I'm not all that keen on using the remaining slots for anything else. Users should of course be allowed to put anything anywhere, but libvirt auto-placement should follow some rules.
All of the above sit on what we now call bus pcie.0. This is a root complex, which implies that all of endpoints are root complex integrated endpoints. Being an integrated endpoint restricts aspects of the device. I've already found out the hard way that Windows actually cares about this and will ignore PCI assigned devices of type "Endpoint" when attached to the root complex bus. (endpoint, root complex, etc is defined in the PCIe spec, the above slot use is defined in the respective chipset spec)
What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a complete, virgin PCI bus. libvirt should use that as the default location for any PCI device that's not a chipset component. We might be able to get away with installing our e1000 at 19.0, but otherwise I'm thinking that the list only includes uhci/ehci, hda, ahci, and the chipset components themselves (smbus, isa, root ports, etc...). We don't have "IGD", so our graphics should go on the PCI bus and the PCI bridge should include functioning VGA enable bits. Maybe QXL wants to make itself a PCIe device, in which case it should be attached behind a PCIe root port at slot 01.0. Secondary PCIe graphics attach to root ports behind 1c.*. This is the same framework within real hardware has to work.
Assigned devices get interesting due to the PCIe type. We've never had any problems attaching PCIe devices to PCI buses on PIIX (but it may be holding back our ability to support graphics passthrough), so assigned devices can probably be attached to the PCI bus. More appropriate would be to attach "Endpoints" behind root ports and "Integrated Endpoints" to the root complex. I've got some code that will mangle the PCIe type to it's location in the topology, but it needs more work. That should help make things more flexible.
So taking all this into account there are a couple of pieces of info libvirt will need to know in order to assign device addresses / buses sensibly
- What devices / buses are hardcoded to be always present (and their addresses)
- What extra "integrated" devices are available for optional enablement and their mandatory addresses (if any)
- What bus any other devices should be placed on by default.
With this in mind, libvirt's address assignment code is basically already broken for anything which isn't a x86 system with PIIX controller. eg address assignment for QEMU ARM / PPC / etc is mostly fubar. Regardless of what Q35 involves, libvirt needs to sort out the existing mess it has in this area. Sorting this should then make support for Q35 more or less trivial, since all the hardwork will have been done.
Since it doesn't sound like QEMU has a pratical means to supply the data required, without actually running the machine, there are only two options I see
- Libvirt maintains a set of data tables with all the info for all QEMU machine types, in all system emulators. This will need updating as QEMU gains new machine types.
- Libvirt tries to configure the VM without any addresses, then launches QEMU and tries to introspect it to figure out what QEMU assigned. Then shut it down, record the addresses in XML for later usage at the real startup point.
Neither of these are particularly appealing to me, but I have a preference for recording a data tables, since it would be much simpler than trying to introspect things & more efficient too.
It doesn't seem like introspection really works unless we have a fully populated base configuration for you to parse. Even then I'm not sure how you'd figure out whether you should be placing devices to fill the gaps on a given bus or not. -M q35 as we have it today is just a bare minimum shell which doesn't tell much of anything on inspection. You need the blueprint derived from the chipset spec of how to put all the other components together, which seems more like the data tables you describe. Thanks, Alex

Cc'ing a few QOMmers... Laine Stump <laine@redhat.com> writes:
Now that qemu is getting the q35 machine type, libvirt needs to support it.
As far as I understand, from libvirt's point of view, q35 is just another x86_64 system, but with a different set of implicit devices, and possibly some extra rules limiting which devices can be plugged into which bus/slot on the guest. That means that in order to support it, we will need to recognize when a q35-based machine type is being created, and auto-add all the implicit devices to libvirt's config model for that domain, then pay attention to the extra rules when assigning addresses for all the user-added devices.
We already add implicit controllers/devices for pc-based machine types; as a matter of fact, currently, libvirt improperly assumes (for the purposes of adding implicit devices) that *every* virtual machine is based on the "pc" machine type (or rather it just doesn't pay attention), so it always adds all the implicit devices for a pc machine type for every domain. This of course is already incorrect for many (probably all?) non-x86 machine types, even before we add q35 into the mix. To fix this, it might be reasonable (and arguably, it's necessary to fix the problem in a backward-compatible manner) to just setup a table of machinetype ==> implicit device lists, look up the machine type in this table, and add the devices needed for that machine type. This goes against libvirt's longstanding view of machinetype as being an opaque value that it merely passes through to qemu, but it's manageable for the existing machine types (even including q35), since it's a finite set. But it starts to be a pain to maintain when you think about future additions - yet another case where new functionality in qemu will require an update to libvirt before it can be fully used via libvirt.
In the long term, it would be very useful / more easily maintainable to have a qemu status command available via QMP which would return the list of implicit devices (and their PCI addresses) for any requested machine type.
You want to ask QEMU to describe the "board", i.e. the machine without any optional devices. The way you can do that now is to create a machine without any optional devices, then introspect. Introspection tools: * info qtree (HMP only) Mentioned for completeness. You really want QMP here. * qom-list and qom-get (QMP only) These let you examine QOM as attributed graph. Non-qdevified devices are invisible. Show stopper only when this includes devices libvirt needs to know about. For instance, a mandatory device can be safely ignored as long as its resources cannot conflict with anything libvirt might want to plug in. Ceterum censeo we need full qdevification
It would be necessary that this command be callable multiple times within a single execution of qemu, giving it a different machinetype each time.
This isn't feasible without major surgery, as far as I can tell.
This way libvirt could first query the list of available machinetypes in this particular qemu binary, then request the list of implicit devices for each machine type (libvirt runs each available qemu binary *once* the first time it's requested, and caches all such capabilities information so that it doesn't need to re-run qemu again and again). My limited understanding of qemu's code is that qemu itself doesn't have a table of this information as data, but instead has lines of code that are executed to create it, thus making it impractical to provide the list of devices for a machinetype without actually instantiating a machine of that type. What's the feasibility of adding such a capability (and in the process likely making the list of implicit devices in qemu itself table/data driven rather than constructed with lines of code).
Machines are created by ad hoc board setup code. This code is generally written to run once. And by once I mean one board gets initialized once, and never destroyed. To support destruction, we'd have to find out what needs to be cleaned up (allocated resources, global state, ...), and clean it up. For all of the boards. Even the barely maintained ones. Let's take a step back from the swamp we're in and examine the swamp^H^H^H^H^Hmeadow we're trying to reach. qdev used to be "declarative" in the sense that device models and their properties are declared by data. Easy to introspect. The (far away) goal was to extend this so that machine types become data, too. QOM isn't declarative, it's fully dynamic, i.e. classes and properties are created by code. I never quite understood why, but I'm sure there are good reasons. I suspect the goal of having machine types as data has been dropped, and the new plan is to create machines for introspection. Assumes creating and destroying machines won't be a big deal once they're fully QOMified.
More questions:
Alex replied to these, and I have nothing to add.

On 02/06/2013 02:13 PM, Laine Stump wrote:
Now that qemu is getting the q35 machine type, libvirt needs to support it.
In an attempt to make sure that libvirt actually does something useful with qemu's new machine type, I'm revisiting this topic and trying to get a more thorough understanding. I've developed my own list of what I *think* are the reasons for wanting a new machine type in qemu (and from that it should be more apparent what libvirt needs to do with it), and am wondering how far off the mark I am: * Protection against obsolescence (PIIX (pc machine type) is 17 years old (?) and some OSes may drop support for it). * Passthrough of PCIe hardware? Is this really an issue? (e.g. the VFs of a PCIe SRIOV network card can already be passed through as standard PCI devices) * Larger bus space? Is this solved even without the new machine type by simply supporting the existing pci-bridge device and allowing multiple buses? * Support for new emulated hardware. (what is important?) Are any of these misguided thinking on my part? Are there other real advantages over using the pc-* machine type? As an adjunct to that, in some conversations people have mentioned the fact that the Q35 chipset is already out of production, and that supporting something more recent, like the X58, might be better in the long run. Would the main advantage of that be supportability? Or are there other concrete differences in a newer chipset that might by themselves make it worth considering? (Conversely, would attempting to write the drivers for a newer chipset just be busywork that only netted a more complex virtual machine but no useful gains?) So, based on the above (and whatever other gains are to be had from the new machine type) what is needed/expected from libvirt? Here's my rough list: * setup different default buses/controllers/devices based on machine type (including possibility of using pcie.0 as the root) * table of fixed locations for certain devices (if present) (again, based on machine type) * restrict certain *types* of devices to certain *types* of slots? (I'm a bit fuzzy on this one, but this has to do with the "root complex" vs. "endpoint" vs. "integrated endpoint" that Alex mentioned). * support some new emulated chipset devices? * Anything else specific required to passthrough pcie devices?

On Wed, 2013-02-27 at 13:20 -0500, Laine Stump wrote:
On 02/06/2013 02:13 PM, Laine Stump wrote:
Now that qemu is getting the q35 machine type, libvirt needs to support it.
In an attempt to make sure that libvirt actually does something useful with qemu's new machine type, I'm revisiting this topic and trying to get a more thorough understanding.
I've developed my own list of what I *think* are the reasons for wanting a new machine type in qemu (and from that it should be more apparent what libvirt needs to do with it), and am wondering how far off the mark I am:
* Protection against obsolescence (PIIX (pc machine type) is 17 years old (?) and some OSes may drop support for it).
* Passthrough of PCIe hardware? Is this really an issue? (e.g. the VFs of a PCIe SRIOV network card can already be passed through as standard PCI devices)
We can't expose PCIe extended capabilities without a PCI express chipset. This includes error reporting and possibly use of advanced features for iommu interaction and caching. We also can't do any kind of IGD passthrough without a GMCH-like chipset.
* Larger bus space? Is this solved even without the new machine type by simply supporting the existing pci-bridge device and allowing multiple buses?
* Support for new emulated hardware. (what is important?)
Emulated devices have the same extended capabilities limitation.
Are any of these misguided thinking on my part? Are there other real advantages over using the pc-* machine type?
As an adjunct to that, in some conversations people have mentioned the fact that the Q35 chipset is already out of production, and that supporting something more recent, like the X58, might be better in the long run. Would the main advantage of that be supportability? Or are there other concrete differences in a newer chipset that might by themselves make it worth considering? (Conversely, would attempting to write the drivers for a newer chipset just be busywork that only netted a more complex virtual machine but no useful gains?)
IMHO, X58 or newer Q77 chipset would only build on and swap out components of something like Q35. Anything modern will have the same basic layout; some concept of a PCIe root complex with PCIe-to-PCI bridge(s) branching out to Legacy PCI buses and root ports, possibly with PCIe switches, connecting to PCIe endpoints. All of that will be the same for Q35/X58 or newer desktop chips.
So, based on the above (and whatever other gains are to be had from the new machine type) what is needed/expected from libvirt? Here's my rough list:
* setup different default buses/controllers/devices based on machine type (including possibility of using pcie.0 as the root)
* table of fixed locations for certain devices (if present) (again, based on machine type)
* restrict certain *types* of devices to certain *types* of slots? (I'm a bit fuzzy on this one, but this has to do with the "root complex" vs. "endpoint" vs. "integrated endpoint" that Alex mentioned).
Right, somehow libvirt needs to know or qemu needs to tell it something about the devices it's plugging in. If you were to grab your trust 10/100Mbps Legacy PCI ethernet card and try to plug it into a motherboard you'd quickly notice that you can only plug it into certain slots. This is the same problem. PCI device are attached to a legacy PCI bus, which means it needs to be behind a PCIe-to-PCI bridge. Legacy Endpoints and Endpoints are plugin PCIe cards, so they need to be plugged in behind a PCIe switch or root port. Integrated Endpoints are motherboard components, per the spec, they shouldn't even really be hotplug-able. They attach directly to the root complex.
* support some new emulated chipset devices?
-M q35 + -device ioh3420 (root port) + -device i82801b11-bridge (pcie-to-pci bridge)
* Anything else specific required to passthrough pcie devices?
I just sent out an RFC asking about VGA routing, it's possible libvirt will need to understand some concept of a primary VGA device and specify it somehow to QEMU. It's also possible (probable) that we'll make assigned devices able to mangle PCIe types to make it more flexible where they can be attached (for instance you may not want to have your HD audio device exposed as a root complex device if that disables hotplug, so we may mangle it to look like a regular endpoint. Windows is picky about root complex devices, so we may also mangle endpoint to integrated endpoint as well). Also need to be aware that all PCIe "buses" except the root complex are actually point-to-point links, so a root port connects to one PCIe device (which may be multifunction). PCIe switches are needed to get fan out to connect multiple devices, one device per downstream port. It would be interesting how much we can abuse the topology in a virtual system, but it's probably best not to confuse guests. Thanks, Alex

Alex Williamson <alex.williamson@redhat.com> writes:
On Wed, 2013-02-27 at 13:20 -0500, Laine Stump wrote:
On 02/06/2013 02:13 PM, Laine Stump wrote:
Now that qemu is getting the q35 machine type, libvirt needs to support it.
In an attempt to make sure that libvirt actually does something useful with qemu's new machine type, I'm revisiting this topic and trying to get a more thorough understanding.
I've developed my own list of what I *think* are the reasons for wanting a new machine type in qemu (and from that it should be more apparent what libvirt needs to do with it), and am wondering how far off the mark I am:
Right, somehow libvirt needs to know or qemu needs to tell it something about the devices it's plugging in. If you were to grab your trust 10/100Mbps Legacy PCI ethernet card and try to plug it into a motherboard you'd quickly notice that you can only plug it into certain slots. This is the same problem. PCI device are attached to a legacy PCI bus, which means it needs to be behind a PCIe-to-PCI bridge. Legacy Endpoints and Endpoints are plugin PCIe cards, so they need to be plugged in behind a PCIe switch or root port. Integrated Endpoints are motherboard components, per the spec, they shouldn't even really be hotplug-able. They attach directly to the root complex.
We could do this with QOM. The chipset could have a set of link properties for the integrated devices. For instance: Q35Chipset Link<E1000> integrated_nic; Link<StdVGA> integrated_vga; ... We should prevent PCI bus plugging for slots "owned" by integrated devices. libvirt has a way to probe for links that it can add including what types are allowed for it. Regards, Anthony Liguori
* support some new emulated chipset devices?
-M q35 + -device ioh3420 (root port) + -device i82801b11-bridge (pcie-to-pci bridge)
* Anything else specific required to passthrough pcie devices?
I just sent out an RFC asking about VGA routing, it's possible libvirt will need to understand some concept of a primary VGA device and specify it somehow to QEMU.
It's also possible (probable) that we'll make assigned devices able to mangle PCIe types to make it more flexible where they can be attached (for instance you may not want to have your HD audio device exposed as a root complex device if that disables hotplug, so we may mangle it to look like a regular endpoint. Windows is picky about root complex devices, so we may also mangle endpoint to integrated endpoint as well).
Also need to be aware that all PCIe "buses" except the root complex are actually point-to-point links, so a root port connects to one PCIe device (which may be multifunction). PCIe switches are needed to get fan out to connect multiple devices, one device per downstream port. It would be interesting how much we can abuse the topology in a virtual system, but it's probably best not to confuse guests. Thanks,
Alex
participants (6)
-
Alex Williamson
-
Anthony Liguori
-
Daniel P. Berrange
-
Laine Stump
-
Markus Armbruster
-
Michael S. Tsirkin