On Wed, 2013-02-27 at 13:20 -0500, Laine Stump wrote:
On 02/06/2013 02:13 PM, Laine Stump wrote:
> Now that qemu is getting the q35 machine type, libvirt needs to support it.
In an attempt to make sure that libvirt actually does something useful
with qemu's new machine type, I'm revisiting this topic and trying to
get a more thorough understanding.
I've developed my own list of what I *think* are the reasons for wanting
a new machine type in qemu (and from that it should be more apparent
what libvirt needs to do with it), and am wondering how far off the mark
I am:
* Protection against obsolescence (PIIX (pc machine type) is 17 years old (?)
and some OSes may drop support for it).
* Passthrough of PCIe hardware? Is this really an issue? (e.g. the
VFs of a PCIe SRIOV network card can already be passed through as
standard PCI devices)
We can't expose PCIe extended capabilities without a PCI express
chipset. This includes error reporting and possibly use of advanced
features for iommu interaction and caching.
We also can't do any kind of IGD passthrough without a GMCH-like
chipset.
* Larger bus space? Is this solved even without the new machine
type by simply
supporting the existing pci-bridge device and allowing multiple buses?
* Support for new emulated hardware. (what is important?)
Emulated devices have the same extended capabilities limitation.
Are any of these misguided thinking on my part? Are there other real
advantages over using the pc-* machine type?
As an adjunct to that, in some conversations people have mentioned the fact that the Q35
chipset is already out of production, and that supporting something more recent, like the
X58, might be better in the long run. Would the main advantage of that be supportability?
Or are there other concrete differences in a newer chipset that might by themselves make
it worth considering? (Conversely, would attempting to write the drivers for a newer
chipset just be busywork that only netted a more complex virtual machine but no useful
gains?)
IMHO, X58 or newer Q77 chipset would only build on and swap out
components of something like Q35. Anything modern will have the same
basic layout; some concept of a PCIe root complex with PCIe-to-PCI
bridge(s) branching out to Legacy PCI buses and root ports, possibly
with PCIe switches, connecting to PCIe endpoints. All of that will be
the same for Q35/X58 or newer desktop chips.
So, based on the above (and whatever other gains are to be had from
the new machine type) what is needed/expected from libvirt? Here's my rough list:
* setup different default buses/controllers/devices based on machine
type (including possibility of using pcie.0 as the root)
* table of fixed locations for certain devices (if present) (again,
based on machine type)
* restrict certain *types* of devices to certain *types* of
slots? (I'm a bit fuzzy on this one, but this has to do with the
"root complex" vs. "endpoint" vs. "integrated
endpoint" that Alex
mentioned).
Right, somehow libvirt needs to know or qemu needs to tell it something
about the devices it's plugging in. If you were to grab your trust
10/100Mbps Legacy PCI ethernet card and try to plug it into a
motherboard you'd quickly notice that you can only plug it into certain
slots. This is the same problem. PCI device are attached to a legacy
PCI bus, which means it needs to be behind a PCIe-to-PCI bridge. Legacy
Endpoints and Endpoints are plugin PCIe cards, so they need to be
plugged in behind a PCIe switch or root port. Integrated Endpoints are
motherboard components, per the spec, they shouldn't even really be
hotplug-able. They attach directly to the root complex.
* support some new emulated chipset devices?
-M q35 + -device ioh3420 (root port) + -device i82801b11-bridge
(pcie-to-pci bridge)
* Anything else specific required to passthrough pcie devices?
I just sent out an RFC asking about VGA routing, it's possible libvirt
will need to understand some concept of a primary VGA device and specify
it somehow to QEMU.
It's also possible (probable) that we'll make assigned devices able to
mangle PCIe types to make it more flexible where they can be attached
(for instance you may not want to have your HD audio device exposed as a
root complex device if that disables hotplug, so we may mangle it to
look like a regular endpoint. Windows is picky about root complex
devices, so we may also mangle endpoint to integrated endpoint as well).
Also need to be aware that all PCIe "buses" except the root complex are
actually point-to-point links, so a root port connects to one PCIe
device (which may be multifunction). PCIe switches are needed to get
fan out to connect multiple devices, one device per downstream port. It
would be interesting how much we can abuse the topology in a virtual
system, but it's probably best not to confuse guests. Thanks,
Alex