[libvirt] [PATCH v2 RESEND 00/12] PCI passthrough support on s390

Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev> uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values. Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML. Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid. In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space. Code Base ========= commit in master: 767f9e1449b1a36111532847f0c62dc758263c42 qemu: validate: Enforce compile time switch type checking for videos Change Log ========== v1->v2: 1. Separate test commit and merge testcases into corresponding commits that introduce the functionalities firstly. 2. Spare some checks for zpci device. 3. Add vsock and controller support. 4. Add uin32 type schema. 5. Rename zpciuid and zpcifid to zpci_uid and zpci_fid. 6. Always return multibus support on S390. Yi Min Zhao (12): conf: Add definitions for 'uid' and 'fid' PCI address attributes qemu: Introduce zPCI capability conf: Introduce a new PCI address extension flag qemu: Enable PCI multi bus for S390 guests qemu: Auto add pci-root for s390/s390x guests conf: Introduce address caching for PCI extensions conf: Introduce parser, formatter for uid and fid conf: Allocate/release 'uid' and 'fid' in PCI address qemu: Generate and use zPCI device in QEMU command line qemu: Add hotpluging support for PCI devices on S390 guests docs: Add 'uid' and 'fid' information news: Update news for PCI address extension attributes docs/formatdomain.html.in | 9 +- docs/news.xml | 11 + docs/schemas/basictypes.rng | 31 ++ docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.c | 73 +++++ src/conf/device_conf.h | 1 + src/conf/domain_addr.c | 346 +++++++++++++++++++++ src/conf/domain_addr.h | 29 ++ src/conf/domain_conf.c | 6 + src/libvirt_private.syms | 4 + src/qemu/qemu_capabilities.c | 6 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 115 +++++++ src/qemu/qemu_command.h | 4 + src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain_address.c | 181 ++++++++++- src/qemu/qemu_hotplug.c | 182 ++++++++++- src/util/virpci.h | 13 + tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml | 1 + tests/qemuxml2argvdata/disk-virtio-s390-zpci.args | 27 ++ tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml | 17 + .../hostdev-vfio-zpci-autogenerate.args | 26 ++ .../hostdev-vfio-zpci-autogenerate.xml | 18 ++ .../hostdev-vfio-zpci-boundaries.args | 30 ++ .../hostdev-vfio-zpci-boundaries.xml | 26 ++ .../hostdev-vfio-zpci-multidomain-many.args | 40 +++ .../hostdev-vfio-zpci-multidomain-many.xml | 67 ++++ tests/qemuxml2argvdata/hostdev-vfio-zpci.args | 26 ++ tests/qemuxml2argvdata/hostdev-vfio-zpci.xml | 19 ++ tests/qemuxml2argvtest.c | 14 + tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml | 29 ++ tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml | 30 ++ tests/qemuxml2xmltest.c | 3 + 38 files changed, 1377 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml -- Yi Min

Add zPCI definitions in preparation of extending the PCI address with parameters uid (user-defined identifier) and fid (PCI function identifier). Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> --- src/util/virpci.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/util/virpci.h b/src/util/virpci.h index 794b7e59db..3546992fee 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -36,12 +36,22 @@ typedef virPCIDeviceAddress *virPCIDeviceAddressPtr; typedef struct _virPCIDeviceList virPCIDeviceList; typedef virPCIDeviceList *virPCIDeviceListPtr; +typedef struct _virZPCIDeviceAddress virZPCIDeviceAddress; +typedef virZPCIDeviceAddress *virZPCIDeviceAddressPtr; +struct _virZPCIDeviceAddress { + unsigned int zpci_fid; + unsigned int zpci_uid; + bool fid_assigned; + bool uid_assigned; +}; + struct _virPCIDeviceAddress { unsigned int domain; unsigned int bus; unsigned int slot; unsigned int function; int multi; /* virTristateSwitch */ + virZPCIDeviceAddressPtr zpci; }; typedef enum { -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
struct _virPCIDeviceAddress { unsigned int domain; unsigned int bus; unsigned int slot; unsigned int function; int multi; /* virTristateSwitch */ + virZPCIDeviceAddressPtr zpci;
This should probably be an embedded structure rather than a pointer to a separate, heap allocated structure. I remember Laine having somewhat strong feelings about the topic, so I'll leave arguing for or against that to him :) -- Andrea Bolognani / Red Hat / Virtualization

Let's introduce zPCI capability. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml | 1 + 8 files changed, 9 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index c7da916f9a..07f58fd014 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -502,6 +502,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "sev-guest", "machine.pseries.cap-hpt-max-page-size", "machine.pseries.cap-htm", + "zpci", ); @@ -1143,6 +1144,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "vhost-vsock-device", QEMU_CAPS_DEVICE_VHOST_VSOCK }, { "mch", QEMU_CAPS_DEVICE_MCH }, { "sev-guest", QEMU_CAPS_SEV_GUEST }, + { "zpci", QEMU_CAPS_DEVICE_ZPCI }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index a048a1cf02..0e51b74585 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -486,6 +486,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_SEV_GUEST, /* -object sev-guest,... */ QEMU_CAPS_MACHINE_PSERIES_CAP_HPT_MAX_PAGE_SIZE, /* -machine pseries.cap-hpt-max-page-size */ QEMU_CAPS_MACHINE_PSERIES_CAP_HTM, /* -machine pseries.cap-htm */ + QEMU_CAPS_DEVICE_ZPCI, /* -device zpci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml index 5e22e21224..623dc69973 100644 --- a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml @@ -119,6 +119,7 @@ <flag name='sdl-gl'/> <flag name='blockdev-del'/> <flag name='vhost-vsock'/> + <flag name='zpci'/> <version>2010000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>307899</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml index 6ca2e57ef8..cafedfe50f 100644 --- a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml @@ -126,6 +126,7 @@ <flag name='blockdev-del'/> <flag name='vhost-vsock'/> <flag name='tpm-emulator'/> + <flag name='zpci'/> <version>2011000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>346751</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 87d189e58d..3f274234f1 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -133,6 +133,7 @@ <flag name='vhost-vsock'/> <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> + <flag name='zpci'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>375999</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml index 9ed25178f8..b04a9fbfd5 100644 --- a/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml @@ -107,6 +107,7 @@ <flag name='nbd-tls'/> <flag name='virtual-css-bridge'/> <flag name='sdl-gl'/> + <flag name='zpci'/> <version>2007000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>220792</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml index 646239ff25..b2b267be8d 100644 --- a/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml @@ -110,6 +110,7 @@ <flag name='virtual-css-bridge'/> <flag name='sdl-gl'/> <flag name='vhost-vsock'/> + <flag name='zpci'/> <version>2007093</version> <kvmVersion>0</kvmVersion> <microcodeVersion>246206</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml index 09d68e1f18..f908ab88f3 100644 --- a/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml @@ -114,6 +114,7 @@ <flag name='sdl-gl'/> <flag name='blockdev-del'/> <flag name='vhost-vsock'/> + <flag name='zpci'/> <version>2009000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>269625</microcodeVersion> -- Yi Min

This patch introduces a new attribute PCI address extension flag to deal with the extension PCI attributes such as 'uid' and 'fid' on the S390 platform. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> --- src/conf/device_conf.h | 1 + src/conf/domain_addr.h | 5 ++ src/qemu/qemu_domain_address.c | 135 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 139 insertions(+), 2 deletions(-) diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index a31ce9c376..6f926dff1d 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -164,6 +164,7 @@ struct _virDomainDeviceInfo { * assignment, never saved and never reported. */ int pciConnectFlags; /* enum virDomainPCIConnectFlags */ + int pciAddressExtFlags; /* enum virDomainPCIAddressExtensionFlags */ char *loadparm; /* PCI devices will only be automatically placed on a PCI bus diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index 5ad9d8ef3d..5219d2f208 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -29,6 +29,11 @@ # define VIR_PCI_ADDRESS_SLOT_LAST 31 # define VIR_PCI_ADDRESS_FUNCTION_LAST 7 +typedef enum { + VIR_PCI_ADDRESS_EXTENSION_NONE = 0, /* no extension */ + VIR_PCI_ADDRESS_EXTENSION_ZPCI = 1 << 0, /* zpci support */ +} virDomainPCIAddressExtensionFlags; + typedef enum { VIR_PCI_CONNECT_HOTPLUGGABLE = 1 << 0, /* is hotplug needed/supported */ diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 6ea80616af..c634d216f5 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -480,6 +480,58 @@ qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def, } +static bool +qemuDomainDeviceSupportZPCI(virDomainDeviceDefPtr device) +{ + switch ((virDomainDeviceType) device->type) { + case VIR_DOMAIN_DEVICE_CHR: + return false; + + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_DISK: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_VSOCK: + case VIR_DOMAIN_DEVICE_LAST: + break; + } + return true; +} + + +static virDomainPCIAddressExtensionFlags +qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, + virDomainDeviceDefPtr dev) +{ + virDomainPCIAddressExtensionFlags extFlags = 0; + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + qemuDomainDeviceSupportZPCI(dev)) + extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; + + return extFlags; +} + + /** * qemuDomainDeviceCalculatePCIConnectFlags: * @@ -961,6 +1013,55 @@ qemuDomainFillAllPCIConnectFlags(virDomainDefPtr def, } +/** + * qemuDomainFillDevicePCIExtensionFlagsIter: + * + * @def: the entire DomainDef + * @dev: The device to be checked + * @info: virDomainDeviceInfo within the device + * @opaque: qemu capabilities + * + * Sets the pciAddressExtFlags for a single device's info. Has properly + * formatted arguments to be called by virDomainDeviceInfoIterate(). + * + * Always returns 0 - there is no failure. + */ +static int +qemuDomainFillDevicePCIExtensionFlagsIter(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainDeviceDefPtr dev, + virDomainDeviceInfoPtr info, + void *opaque) +{ + virQEMUCapsPtr qemuCaps = opaque; + + info->pciAddressExtFlags + = qemuDomainDeviceCalculatePCIAddressExtensionFlags(qemuCaps, dev); + return 0; +} + + +/** + * qemuDomainFillAllPCIExtensionFlags: + * + * @def: the entire DomainDef + * @qemuCaps: as you'd expect + * + * Set the info->pciAddressExtFlags for all devices in the domain. + * + * Returns 0 on success or -1 on failure (the only possibility of + * failure would be some internal problem with + * virDomainDeviceInfoIterate()) + */ +static int +qemuDomainFillAllPCIExtensionFlags(virDomainDefPtr def, + virQEMUCapsPtr qemuCaps) +{ + return virDomainDeviceInfoIterate(def, + qemuDomainFillDevicePCIExtensionFlagsIter, + qemuCaps); +} + + /** * qemuDomainFindUnusedIsolationGroupIter: * @def: domain definition @@ -1235,6 +1336,29 @@ qemuDomainFillDevicePCIConnectFlags(virDomainDefPtr def, } +/** + * qemuDomainFillDevicePCIExtensionFlags: + * + * @dev: The device to be checked + * @qemuCaps: as you'd expect + * + * Set the info->pciAddressExtFlags for a single device. + * + * No return value. + */ +static void +qemuDomainFillDevicePCIExtensionFlags(virDomainDeviceDefPtr dev, + virQEMUCapsPtr qemuCaps) +{ + virDomainDeviceInfoPtr info = virDomainDeviceGetInfo(dev); + + if (info) { + info->pciAddressExtFlags + = qemuDomainDeviceCalculatePCIAddressExtensionFlags(qemuCaps, dev); + } +} + + static int qemuDomainPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, virDomainDeviceInfoPtr dev) @@ -2363,6 +2487,9 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, if (qemuDomainFillAllPCIConnectFlags(def, qemuCaps, driver) < 0) goto cleanup; + if (qemuDomainFillAllPCIExtensionFlags(def, qemuCaps) < 0) + goto cleanup; + if (qemuDomainSetupIsolationGroups(def) < 0) goto cleanup; @@ -2398,7 +2525,8 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, */ virDomainDeviceInfo info = { .pciConnectFlags = (VIR_PCI_CONNECT_HOTPLUGGABLE | - VIR_PCI_CONNECT_TYPE_PCI_DEVICE) + VIR_PCI_CONNECT_TYPE_PCI_DEVICE), + .pciAddressExtFlags = VIR_PCI_ADDRESS_EXTENSION_NONE }; bool buses_reserved = true; @@ -2435,7 +2563,8 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, qemuDomainHasPCIeRoot(def)) { virDomainDeviceInfo info = { .pciConnectFlags = (VIR_PCI_CONNECT_HOTPLUGGABLE | - VIR_PCI_CONNECT_TYPE_PCIE_DEVICE) + VIR_PCI_CONNECT_TYPE_PCIE_DEVICE), + .pciAddressExtFlags = VIR_PCI_ADDRESS_EXTENSION_NONE }; /* if there isn't an empty pcie-root-port, this will @@ -2952,6 +3081,8 @@ qemuDomainEnsurePCIAddress(virDomainObjPtr obj, qemuDomainFillDevicePCIConnectFlags(obj->def, dev, priv->qemuCaps, driver); + qemuDomainFillDevicePCIExtensionFlags(dev, priv->qemuCaps); + return virDomainPCIAddressEnsureAddr(priv->pciaddrs, info, info->pciConnectFlags); } -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote: [...]
+static bool +qemuDomainDeviceSupportZPCI(virDomainDeviceDefPtr device) +{ + switch ((virDomainDeviceType) device->type) { + case VIR_DOMAIN_DEVICE_CHR: + return false; + + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_DISK: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_VSOCK: + case VIR_DOMAIN_DEVICE_LAST: + break;
Did you validate that all of the above can be used with zPCI? Either way, at least _NONE and _LAST should definitely result in an error being reported, as well as the default case which should be included; use virReportEnumRangeError() for convenience. [...]
+static virDomainPCIAddressExtensionFlags +qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, + virDomainDeviceDefPtr dev) +{ + virDomainPCIAddressExtensionFlags extFlags = 0; + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + qemuDomainDeviceSupportZPCI(dev)) + extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI;
The libvirt code style guidelines[1] state that this should be formatted as if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && qemuDomainDeviceSupportZPCI(dev)) { extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; } [1] https://libvirt.org/hacking.html -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午9:54, Andrea Bolognani 写道:
On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote: [...]
+static bool +qemuDomainDeviceSupportZPCI(virDomainDeviceDefPtr device) +{ + switch ((virDomainDeviceType) device->type) { + case VIR_DOMAIN_DEVICE_CHR: + return false; + + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_DISK: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_VSOCK: + case VIR_DOMAIN_DEVICE_LAST: + break; Did you validate that all of the above can be used with zPCI? Yes, I did. But how far can zPCI be used is a question. I make sure that if their address type can be PCI type zPCI address could be expanded. But we don't guarantee they can be really used in qemu. Like RNG device can't be used because it doesn't support MSIx which is required on S390.
Either way, at least _NONE and _LAST should definitely result in an error being reported, as well as the default case which should be included; use virReportEnumRangeError() for convenience. Will update this as your comment.
+static virDomainPCIAddressExtensionFlags +qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, + virDomainDeviceDefPtr dev) +{ + virDomainPCIAddressExtensionFlags extFlags = 0; + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + qemuDomainDeviceSupportZPCI(dev)) + extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; The libvirt code style guidelines[1] state that this should be
[...] formatted as
if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && qemuDomainDeviceSupportZPCI(dev)) { extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; }
[1] https://libvirt.org/hacking.html I see a lot of code in libvirt use this style. Is it new guideline?

On Wed, 2018-07-25 at 16:58 +0800, Yi Min Zhao wrote: [...]
+ case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_DISK: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_VSOCK:
Did you validate that all of the above can be used with zPCI?
Yes, I did. But how far can zPCI be used is a question. I make sure that if their address type can be PCI type zPCI address could be expanded. But we don't guarantee they can be really used in qemu. Like RNG device can't be used because it doesn't support MSIx which is required on S390.
If that's the case, I'm not sure the current solution is what we want: if someone creates a guest and includes a RNG device in the configuration, they probably expect it to, well, work. IIUC, with the current implementation they would get a non-working RNG device instead, which certainly feels suboptimal.
[...]
+static virDomainPCIAddressExtensionFlags +qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, + virDomainDeviceDefPtr dev) +{ + virDomainPCIAddressExtensionFlags extFlags = 0; + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + qemuDomainDeviceSupportZPCI(dev)) + extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI;
The libvirt code style guidelines[1] state that this should be formatted as
if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && qemuDomainDeviceSupportZPCI(dev)) { extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; }
I see a lot of code in libvirt use this style. Is it new guideline?
It's certainly been around for the past 3+ years. There's a lot of code in libvirt that's *way* older than that, though :) -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/25 下午9:31, Andrea Bolognani 写道:
+ case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_DISK: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_VSOCK: Did you validate that all of the above can be used with zPCI? Yes, I did. But how far can zPCI be used is a question. I make sure that if their address type can be PCI type zPCI address could be expanded. But we don't guarantee they can be really used in qemu. Like RNG device can't be used because it doesn't support MSIx which is required on S390. If that's the case, I'm not sure the current solution is what we want: if someone creates a guest and includes a RNG device in the configuration, they probably expect it to, well, work. IIUC, with
On Wed, 2018-07-25 at 16:58 +0800, Yi Min Zhao wrote: [...] the current implementation they would get a non-working RNG device instead, which certainly feels suboptimal. Actually the guest can't startup with RNG device instead of a running guest with a non-working RNG device and qemu would report error. The user can get the information regarding startup failure. I think this is proper. If RNG could support MSIx in future, we won't need to do any thing in Libvirt.
+static virDomainPCIAddressExtensionFlags +qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, + virDomainDeviceDefPtr dev) +{ + virDomainPCIAddressExtensionFlags extFlags = 0; + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + qemuDomainDeviceSupportZPCI(dev)) + extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; The libvirt code style guidelines[1] state that this should be
[...] formatted as
if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && qemuDomainDeviceSupportZPCI(dev)) { extFlags |= VIR_PCI_ADDRESS_EXTENSION_ZPCI; }
[1] https://libvirt.org/hacking.html I see a lot of code in libvirt use this style. Is it new guideline? It's certainly been around for the past 3+ years. There's a lot of code in libvirt that's *way* older than that, though :)

QEMU on s390 supports PCI multibus since forever. But zPCI, as extension of PCI device on s390, is the significant capability. Only when zPCI capability is existing, we consider QEMU supports PCI multibus properly. So let enable PCI multibus only if zPCI is supported. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_capabilities.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 07f58fd014..7cba7eecdc 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1760,6 +1760,10 @@ bool virQEMUCapsHasPCIMultiBus(virQEMUCapsPtr qemuCaps, return false; } + /* S390 supports PCI-multibus. */ + if (ARCH_IS_S390(def->os.arch)) + return true; + /* If 'virt' supports PCI, it supports multibus. * No extra conditions here for simplicity. */ -- Yi Min

On Tue, Jul 10, 2018 at 04:02:18PM +0800, Yi Min Zhao wrote:
QEMU on s390 supports PCI multibus since forever. But zPCI, as extension of PCI device on s390, is the significant capability. Only when zPCI capability is existing, we consider QEMU supports PCI multibus properly. So let enable PCI multibus only if zPCI is supported.
This comment is now outdated, since we assume multibus in all cases (even though it only makes sense with ZPCI) Jano
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_capabilities.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 07f58fd014..7cba7eecdc 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1760,6 +1760,10 @@ bool virQEMUCapsHasPCIMultiBus(virQEMUCapsPtr qemuCaps, return false; }
+ /* S390 supports PCI-multibus. */ + if (ARCH_IS_S390(def->os.arch)) + return true; + /* If 'virt' supports PCI, it supports multibus. * No extra conditions here for simplicity. */ -- Yi Min
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

在 2018/7/23 下午7:36, Ján Tomko 写道:
On Tue, Jul 10, 2018 at 04:02:18PM +0800, Yi Min Zhao wrote:
QEMU on s390 supports PCI multibus since forever. But zPCI, as extension of PCI device on s390, is the significant capability. Only when zPCI capability is existing, we consider QEMU supports PCI multibus properly. So let enable PCI multibus only if zPCI is supported.
This comment is now outdated, since we assume multibus in all cases (even though it only makes sense with ZPCI)
Jano
Thanks! I forgot this.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_capabilities.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 07f58fd014..7cba7eecdc 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1760,6 +1760,10 @@ bool virQEMUCapsHasPCIMultiBus(virQEMUCapsPtr qemuCaps, return false; }
+ /* S390 supports PCI-multibus. */ + if (ARCH_IS_S390(def->os.arch)) + return true; + /* If 'virt' supports PCI, it supports multibus. * No extra conditions here for simplicity. */ -- Yi Min
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

The pci-root depends on zpci capability. So autogenerate pci-root if zpci exists. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_domain.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 964fe97963..55923bf6e0 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3227,6 +3227,7 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, case VIR_ARCH_S390X: addDefaultUSB = false; addPanicDevice = true; + addPCIRoot = virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI); break; case VIR_ARCH_SPARC: -- Yi Min

This patch provides a caching mechanism for the device address extensions uid and fid on S390. For efficient sparse address allocation, we introduce two hash tables for uid/fid which hold the address set information per domain. Also in order to improve performance of searching available value, we introduce our own callbacks for the two hashtables. In this way, uid/fid is saved in hash key and hash value could be any non-NULL pointer due to no operation on hash value. That is also the reason why we don't introduce hash value free callback. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/conf/domain_addr.c | 79 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_addr.h | 9 +++++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain_address.c | 5 +++ 4 files changed, 94 insertions(+) diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 16f7ffa928..82f7889679 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -27,11 +27,23 @@ #include "virlog.h" #include "virstring.h" #include "domain_addr.h" +#include "virhashcode.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN VIR_LOG_INIT("conf.domain_addr"); +static void +virDomainPCIAddressSetExtensionFree(virDomainPCIAddressSetPtr addrs) +{ + if (!addrs || !addrs->zpciIds) + return; + + virHashFree(addrs->zpciIds->uids); + virHashFree(addrs->zpciIds->fids); + VIR_FREE(addrs->zpciIds); +} + virDomainPCIConnectFlags virDomainPCIControllerModelToConnectType(virDomainControllerModelPCI model) { @@ -741,6 +753,72 @@ virDomainPCIAddressReleaseAddr(virDomainPCIAddressSetPtr addrs, addrs->buses[addr->bus].slot[addr->slot].functions &= ~(1 << addr->function); } + +static uint32_t virZPCIAddrCode(const void *name, uint32_t seed) +{ + unsigned int value = *((unsigned int *)name); + return virHashCodeGen(&value, sizeof(value), seed); +} + + +static bool virZPCIAddrEqual(const void *namea, const void *nameb) +{ + return *((unsigned int *)namea) == *((unsigned int *)nameb); +} + + +static void *virZPCIAddrCopy(const void *name) +{ + unsigned int *copy; + + if (VIR_ALLOC(copy) < 0) + return NULL; + + *copy = *((unsigned int *)name); + return (void *)copy; +} + + +static void virZPCIAddrKeyFree(void *name) +{ + VIR_FREE(name); +} + + +int +virDomainPCIAddressSetExtensionAlloc(virDomainPCIAddressSetPtr addrs, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags == VIR_PCI_ADDRESS_EXTENSION_ZPCI) { + if (addrs->zpciIds) + return 0; + + if (VIR_ALLOC(addrs->zpciIds) < 0) + return -1; + + if (!(addrs->zpciIds->uids = virHashCreateFull(10, NULL, + virZPCIAddrCode, + virZPCIAddrEqual, + virZPCIAddrCopy, + virZPCIAddrKeyFree))) + goto error; + + if (!(addrs->zpciIds->fids = virHashCreateFull(10, NULL, + virZPCIAddrCode, + virZPCIAddrEqual, + virZPCIAddrCopy, + virZPCIAddrKeyFree))) + goto error; + } + + return 0; + + error: + virDomainPCIAddressSetExtensionFree(addrs); + return -1; +} + + virDomainPCIAddressSetPtr virDomainPCIAddressSetAlloc(unsigned int nbuses) { @@ -767,6 +845,7 @@ virDomainPCIAddressSetFree(virDomainPCIAddressSetPtr addrs) if (!addrs) return; + virDomainPCIAddressSetExtensionFree(addrs); VIR_FREE(addrs->buses); VIR_FREE(addrs); } diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index 5219d2f208..d1ec869da4 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -116,6 +116,11 @@ typedef struct { } virDomainPCIAddressBus; typedef virDomainPCIAddressBus *virDomainPCIAddressBusPtr; +typedef struct { + virHashTablePtr uids, fids; +} virDomainZPCIAddressIds; +typedef virDomainZPCIAddressIds *virDomainZPCIAddressIdsPtr; + struct _virDomainPCIAddressSet { virDomainPCIAddressBus *buses; size_t nbuses; @@ -125,6 +130,7 @@ struct _virDomainPCIAddressSet { bool areMultipleRootsSupported; /* If true, the guest can use the pcie-to-pci-bridge controller */ bool isPCIeToPCIBridgeSupported; + virDomainZPCIAddressIdsPtr zpciIds; }; typedef struct _virDomainPCIAddressSet virDomainPCIAddressSet; typedef virDomainPCIAddressSet *virDomainPCIAddressSetPtr; @@ -132,6 +138,9 @@ typedef virDomainPCIAddressSet *virDomainPCIAddressSetPtr; char *virDomainPCIAddressAsString(virPCIDeviceAddressPtr addr) ATTRIBUTE_NONNULL(1); +int virDomainPCIAddressSetExtensionAlloc(virDomainPCIAddressSetPtr addrs, + virDomainPCIAddressExtensionFlags extFlags); + virDomainPCIAddressSetPtr virDomainPCIAddressSetAlloc(unsigned int nbuses); void virDomainPCIAddressSetFree(virDomainPCIAddressSetPtr addrs); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e688981c3e..3d82d489de 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -116,6 +116,7 @@ virDomainPCIAddressReserveAddr; virDomainPCIAddressReserveNextAddr; virDomainPCIAddressSetAllMulti; virDomainPCIAddressSetAlloc; +virDomainPCIAddressSetExtensionAlloc; virDomainPCIAddressSetFree; virDomainPCIAddressSlotInUse; virDomainPCIAddressValidate; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index c634d216f5..210f94e15e 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1477,6 +1477,11 @@ qemuDomainPCIAddressSetCreate(virDomainDefPtr def, addrs->dryRun = dryRun; + /* create zpci address set for s390 domain */ + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI) && + virDomainPCIAddressSetExtensionAlloc(addrs, VIR_PCI_ADDRESS_EXTENSION_ZPCI)) + goto error; + /* pSeries domains support multiple pci-root controllers */ if (qemuDomainIsPSeries(def)) addrs->areMultipleRootsSupported = true; -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
This patch provides a caching mechanism for the device address extensions uid and fid on S390. For efficient sparse address allocation, we introduce two hash tables for uid/fid which hold the address set information per domain. Also in order to improve performance of searching available value, we introduce our own callbacks for the two hashtables. In this way, uid/fid is saved in hash key and hash value could be any non-NULL pointer due to no operation on hash value. That is also the reason why we don't introduce hash value free callback.
Pretty much assuming your hash table implementation doesn't have any issues, because I lack the expertise to review it properly :) Some code style issues below. [...]
+static uint32_t virZPCIAddrCode(const void *name, uint32_t seed)
The return type and each of the function arguments should be on separate lines, like static uint32_t virZPCIAddrCode(const void *name, uint32_t seed) [...]
+static bool virZPCIAddrEqual(const void *namea, const void *nameb)
Same. [...]
+static void *virZPCIAddrCopy(const void *name)
Same. [...]
+static void virZPCIAddrKeyFree(void *name)
Same. [...]
+int +virDomainPCIAddressSetExtensionAlloc(virDomainPCIAddressSetPtr addrs, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags == VIR_PCI_ADDRESS_EXTENSION_ZPCI) {
This should probably be if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) since we're dealing with flags, but given the way you end up calling the function it might be okay as it is. -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午10:03, Andrea Bolognani 写道:
On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
This patch provides a caching mechanism for the device address extensions uid and fid on S390. For efficient sparse address allocation, we introduce two hash tables for uid/fid which hold the address set information per domain. Also in order to improve performance of searching available value, we introduce our own callbacks for the two hashtables. In this way, uid/fid is saved in hash key and hash value could be any non-NULL pointer due to no operation on hash value. That is also the reason why we don't introduce hash value free callback. Pretty much assuming your hash table implementation doesn't have any issues, because I lack the expertise to review it properly :)
Some code style issues below.
[...]
+static uint32_t virZPCIAddrCode(const void *name, uint32_t seed) The return type and each of the function arguments should be on separate lines, like
static uint32_t virZPCIAddrCode(const void *name, uint32_t seed)
[...]
+static bool virZPCIAddrEqual(const void *namea, const void *nameb) Same.
[...]
+static void *virZPCIAddrCopy(const void *name) Same.
[...]
+static void virZPCIAddrKeyFree(void *name) Same.
[...]
+int +virDomainPCIAddressSetExtensionAlloc(virDomainPCIAddressSetPtr addrs, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags == VIR_PCI_ADDRESS_EXTENSION_ZPCI) { This should probably be
if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI)
since we're dealing with flags, but given the way you end up calling the function it might be okay as it is.
Thanks for your comments!

This patch introduces new XML parser/formatter functions. Uid is 16-bit and non-zero. Fid is 32-bit. They are added as two new attributes of PCI address, and parsed/formatted along with PCI address parser/formatter. The related test is also added. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- docs/schemas/basictypes.rng | 31 +++++++++ docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.c | 73 ++++++++++++++++++++++ src/conf/domain_addr.c | 3 + src/conf/domain_conf.c | 6 ++ src/util/virpci.h | 3 + tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml | 17 +++++ tests/qemuxml2argvdata/hostdev-vfio-zpci.xml | 19 ++++++ tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml | 29 +++++++++ tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml | 30 +++++++++ tests/qemuxml2xmltest.c | 3 + 11 files changed, 215 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng index 1a18cd31b1..615e39bccb 100644 --- a/docs/schemas/basictypes.rng +++ b/docs/schemas/basictypes.rng @@ -65,6 +65,17 @@ </data> </choice> </define> + <define name="uint32"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,8}</param> + </data> + <data type="int"> + <param name="minInclusive">0</param> + <param name="maxInclusive">4294967295</param> + </data> + </choice> + </define> <define name="UUID"> <choice> @@ -111,6 +122,26 @@ </attribute> </optional> </define> + <define name="zpciaddress"> + <optional> + <attribute name="uid"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,4}</param> + </data> + <data type="unsignedInt"> + <param name="minInclusive">1</param> + <param name="maxInclusive">65535</param> + </data> + </choice> + </attribute> + </optional> + <optional> + <attribute name="fid"> + <ref name="uint32"/> + </attribute> + </optional> + </define> <!-- a 6 byte MAC address in ASCII-hex format, eg "12:34:56:78:9A:BC" --> <!-- The lowest bit of the 1st byte is the "multicast" bit. a --> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index bd687ce9d3..904e49ddb7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -5168,6 +5168,7 @@ <value>pci</value> </attribute> <ref name="pciaddress"/> + <ref name="zpciaddress"/> </group> <group> <attribute name="type"> diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c index d69f94fadf..72ad0652a6 100644 --- a/src/conf/device_conf.c +++ b/src/conf/device_conf.c @@ -32,6 +32,74 @@ #define VIR_FROM_THIS VIR_FROM_DEVICE +static int +virZPCIDeviceAddressIsValid(virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned) + return 1; + + if (zpci->zpci_uid > VIR_DOMAIN_DEVICE_ZPCI_MAX_UID || + zpci->zpci_uid == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid PCI address uid='0x%x', " + "must be > 0x0 and <= 0x%x"), + zpci->zpci_uid, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID); + return 0; + } + + return 1; +} + +static int +virZPCIDeviceAddressParseXML(xmlNodePtr node, + virPCIDeviceAddressPtr addr) +{ + char *uid, *fid; + int ret = -1; + virZPCIDeviceAddressPtr def = NULL; + + if (VIR_ALLOC(def) < 0) + return -1; + + uid = virXMLPropString(node, "uid"); + fid = virXMLPropString(node, "fid"); + + if (uid) { + if (virStrToLong_uip(uid, NULL, 0, &def->zpci_uid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot parse <address> 'uid' attribute")); + goto cleanup; + } + def->uid_assigned = true; + } + + if (fid) { + if (virStrToLong_uip(fid, NULL, 0, &def->zpci_fid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot parse <address> 'fid' attribute")); + goto cleanup; + } + def->fid_assigned = true; + } + + if (uid || fid) { + if (!virZPCIDeviceAddressIsValid(def)) + goto cleanup; + + addr->zpci = def; + def = NULL; + } + + ret = 0; + + cleanup: + VIR_FREE(uid); + VIR_FREE(fid); + VIR_FREE(def); + return ret; +} + int virDomainDeviceInfoCopy(virDomainDeviceInfoPtr dst, virDomainDeviceInfoPtr src) @@ -57,6 +125,8 @@ void virDomainDeviceInfoClear(virDomainDeviceInfoPtr info) { VIR_FREE(info->alias); + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) + VIR_FREE(info->addr.pci.zpci); memset(&info->addr, 0, sizeof(info->addr)); info->type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE; VIR_FREE(info->romfile); @@ -245,6 +315,9 @@ virPCIDeviceAddressParseXML(xmlNodePtr node, if (!virPCIDeviceAddressIsEmpty(addr) && !virPCIDeviceAddressIsValid(addr, true)) goto cleanup; + if (virZPCIDeviceAddressParseXML(node, addr) < 0) + goto cleanup; + ret = 0; cleanup: diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 82f7889679..66e61f3550 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -1038,6 +1038,9 @@ virDomainPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, dev->isolationGroup, function) < 0) return -1; + if (dev->pciAddressExtFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) + addr.zpci = dev->addr.pci.zpci; + if (virDomainPCIAddressReserveAddrInternal(addrs, &addr, flags, dev->isolationGroup, false) < 0) return -1; diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7ab2953d83..9d662b4c72 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6327,6 +6327,12 @@ virDomainDeviceInfoFormat(virBufferPtr buf, info->addr.pci.slot, info->addr.pci.function); } + if (info->addr.pci.zpci) { + virBufferAsprintf(buf, " uid='0x%.4x'", + info->addr.pci.zpci->zpci_uid); + virBufferAsprintf(buf, " fid='0x%.8x'", + info->addr.pci.zpci->zpci_fid); + } if (info->addr.pci.multi) { virBufferAsprintf(buf, " multifunction='%s'", virTristateSwitchTypeToString(info->addr.pci.multi)); diff --git a/src/util/virpci.h b/src/util/virpci.h index 3546992fee..fdec201058 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -36,6 +36,9 @@ typedef virPCIDeviceAddress *virPCIDeviceAddressPtr; typedef struct _virPCIDeviceList virPCIDeviceList; typedef virPCIDeviceList *virPCIDeviceListPtr; +# define VIR_DOMAIN_DEVICE_ZPCI_MAX_UID 0xFFFF +# define VIR_DOMAIN_DEVICE_ZPCI_MAX_FID 0xFFFFFFFF + typedef struct _virZPCIDeviceAddress virZPCIDeviceAddress; typedef virZPCIDeviceAddress *virZPCIDeviceAddressPtr; struct _virZPCIDeviceAddress { diff --git a/tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml b/tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml new file mode 100644 index 0000000000..f64f8bf229 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml @@ -0,0 +1,17 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219136</memory> + <vcpu>1</vcpu> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0' uid='0x0019' fid='0x0000001f'/> + </disk> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci.xml b/tests/qemuxml2argvdata/hostdev-vfio-zpci.xml new file mode 100644 index 0000000000..b9c8466444 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci.xml @@ -0,0 +1,19 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <controller type='pci' index='0' model='pci-root'/> + <hostdev mode='subsystem' type='pci' managed='no'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0' uid='0x0019' fid='0x0000001f'/> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml b/tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml new file mode 100644 index 0000000000..04fe0e6ab9 --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml @@ -0,0 +1,29 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0' uid='0x0019' fid='0x0000001f'/> + </disk> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'> + <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> + </memballoon> + <panic model='s390'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml b/tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml new file mode 100644 index 0000000000..6d94490a50 --- /dev/null +++ b/tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <controller type='pci' index='0' model='pci-root'/> + <hostdev mode='subsystem' type='pci' managed='no'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0' uid='0x0019' fid='0x0000001f'/> + </hostdev> + <memballoon model='virtio'> + <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> + </memballoon> + <panic model='s390'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index bbb995656e..e3282e2b98 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -397,6 +397,8 @@ mymain(void) QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-ioeventfd", QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-s390-zpci", + QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); DO_TEST("disk-scsi-megasas", QEMU_CAPS_SCSI_MEGASAS); DO_TEST("disk-scsi-mptsas1068", @@ -476,6 +478,7 @@ mymain(void) DO_TEST("hostdev-usb-address", NONE); DO_TEST("hostdev-pci-address", NONE); DO_TEST("hostdev-vfio", NONE); + DO_TEST("hostdev-vfio-zpci", QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); DO_TEST("hostdev-mdev-precreated", NONE); DO_TEST("pci-rom", NONE); DO_TEST("pci-rom-disabled", NONE); -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
+ <define name="uint32"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,8}</param> + </data> + <data type="int">
This should probably be unignedInt instead of int, but other uint* types defined in the file also use int so if anything changing all of them would be the job for a follow-up patch. [...]
+ <define name="zpciaddress"> + <optional> + <attribute name="uid"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,4}</param> + </data> + <data type="unsignedInt"> + <param name="minInclusive">1</param> + <param name="maxInclusive">65535</param> + </data> + </choice> + </attribute>
I don't see why you wouldn't just use the existing uint16 type here. Is that because uid can't be zero? Making sure that's actually the case is a job for the PostParse() or Validate() callback anyway, because schema validation is entirely opt-in and thus can't be relied upon. [...]
+static int +virZPCIDeviceAddressIsValid(virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned) + return 1; + + if (zpci->zpci_uid > VIR_DOMAIN_DEVICE_ZPCI_MAX_UID || + zpci->zpci_uid == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid PCI address uid='0x%x', " + "must be > 0x0 and <= 0x%x"), + zpci->zpci_uid, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID); + return 0; + }
fid should be validated as well. [...]
+<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219136</memory> + <vcpu>1</vcpu> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type>
The correct machine type would be s390-ccw-virtio. There are a bunch of existing test files that incorrectly use s390-ccw, feel free to clean that up as well ;) [...]
diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index bbb995656e..e3282e2b98 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -397,6 +397,8 @@ mymain(void) QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-ioeventfd", QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-s390-zpci", + QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); DO_TEST("disk-scsi-megasas", QEMU_CAPS_SCSI_MEGASAS); DO_TEST("disk-scsi-mptsas1068", @@ -476,6 +478,7 @@ mymain(void) DO_TEST("hostdev-usb-address", NONE); DO_TEST("hostdev-pci-address", NONE); DO_TEST("hostdev-vfio", NONE); + DO_TEST("hostdev-vfio-zpci", QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW);
I haven't actually tried that, but you should be able to add the test cases to qemuxml2argvtest at the same time as you add them here, for consistency's sake. -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午10:25, Andrea Bolognani 写道:
On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
+ <define name="uint32"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,8}</param> + </data> + <data type="int"> This should probably be unignedInt instead of int, but other uint* types defined in the file also use int so if anything changing all of them would be the job for a follow-up patch. OK. I recorded this. [...] + <define name="zpciaddress"> + <optional> + <attribute name="uid"> + <choice> + <data type="string"> + <param name="pattern">(0x)?[0-9a-fA-F]{1,4}</param> + </data> + <data type="unsignedInt"> + <param name="minInclusive">1</param> + <param name="maxInclusive">65535</param> + </data> + </choice> + </attribute> I don't see why you wouldn't just use the existing uint16 type here. Is that because uid can't be zero? Making sure that's actually the case is a job for the PostParse() or Validate() callback anyway, because schema validation is entirely opt-in and thus can't be relied upon. All right.
[...]
+static int +virZPCIDeviceAddressIsValid(virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned) + return 1; + + if (zpci->zpci_uid > VIR_DOMAIN_DEVICE_ZPCI_MAX_UID || + zpci->zpci_uid == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid PCI address uid='0x%x', " + "must be > 0x0 and <= 0x%x"), + zpci->zpci_uid, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID); + return 0; + } fid should be validated as well. FID has been defined in schema. It is impossible to overflow uint32 range. So...is it required to validate FID here?
[...]
+<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219136</memory> + <vcpu>1</vcpu> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> The correct machine type would be s390-ccw-virtio.
There are a bunch of existing test files that incorrectly use s390-ccw, feel free to clean that up as well ;) Sure.
[...]
diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index bbb995656e..e3282e2b98 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -397,6 +397,8 @@ mymain(void) QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-ioeventfd", QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-s390-zpci", + QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); DO_TEST("disk-scsi-megasas", QEMU_CAPS_SCSI_MEGASAS); DO_TEST("disk-scsi-mptsas1068", @@ -476,6 +478,7 @@ mymain(void) DO_TEST("hostdev-usb-address", NONE); DO_TEST("hostdev-pci-address", NONE); DO_TEST("hostdev-vfio", NONE); + DO_TEST("hostdev-vfio-zpci", QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); I haven't actually tried that, but you should be able to add the test cases to qemuxml2argvtest at the same time as you add them here, for consistency's sake.
The qemu cmd generator is introduced in latter patch. I add the test cases in the corresponding patch.

On Thu, 2018-07-26 at 15:15 +0800, Yi Min Zhao wrote:
在 2018/7/24 下午10:25, Andrea Bolognani 写道:
[...]
+static int +virZPCIDeviceAddressIsValid(virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned) + return 1; + + if (zpci->zpci_uid > VIR_DOMAIN_DEVICE_ZPCI_MAX_UID || + zpci->zpci_uid == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid PCI address uid='0x%x', " + "must be > 0x0 and <= 0x%x"), + zpci->zpci_uid, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID); + return 0; + }
fid should be validated as well.
FID has been defined in schema. It is impossible to overflow uint32 range. So...is it required to validate FID here?
Mh, fair enough. Not for the schema part (as I mentioned earlier, that's entirely optional so it can't be relied upon when it comes to validating data) but for the 32-bit fid fitting into a 32-bit datatype by definition. Perhaps add a quick comment pointing out why validating fid is not necessary... Additional thoughts: should you check fid_assigned up there as well? Would it be a good idea to use the more specific uint16_t and uint32_t datatypes, and define VIR_DOMAIN_DEVICE_ZPCI_MAX_*ID in terms of the standard UINT*_MAX macros? [...]
+ DO_TEST("hostdev-vfio-zpci", QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW);
I haven't actually tried that, but you should be able to add the test cases to qemuxml2argvtest at the same time as you add them here, for consistency's sake.
The qemu cmd generator is introduced in latter patch. I add the test cases in the corresponding patch.
You should be able to add them to the xml2argv test even before you teach libvirt how to generate a QEMU command line for the new feature; then, when you add the missing pieces, it will be very clear from the diff what exactly changed. But as long as you make sure test end up in both xml2argv and xml2xml by the end of the series, it doesn't really matter. -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/26 下午4:21, Andrea Bolognani 写道:
在 2018/7/24 下午10:25, Andrea Bolognani 写道:
[...]
+static int +virZPCIDeviceAddressIsValid(virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned) + return 1; + + if (zpci->zpci_uid > VIR_DOMAIN_DEVICE_ZPCI_MAX_UID || + zpci->zpci_uid == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid PCI address uid='0x%x', " + "must be > 0x0 and <= 0x%x"), + zpci->zpci_uid, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID); + return 0; + } fid should be validated as well. FID has been defined in schema. It is impossible to overflow uint32 range. So...is it required to validate FID here? Mh, fair enough. Not for the schema part (as I mentioned earlier,
On Thu, 2018-07-26 at 15:15 +0800, Yi Min Zhao wrote: that's entirely optional so it can't be relied upon when it comes to validating data) but for the 32-bit fid fitting into a 32-bit datatype by definition. Perhaps add a quick comment pointing out why validating fid is not necessary...
Additional thoughts: should you check fid_assigned up there as well? Would it be a good idea to use the more specific uint16_t and uint32_t datatypes, and define VIR_DOMAIN_DEVICE_ZPCI_MAX_*ID in terms of the standard UINT*_MAX macros? I thought again bout fid_assigned check. You're right. We should check it. For your comments on macros, good idea.
[...]
+ DO_TEST("hostdev-vfio-zpci", QEMU_CAPS_DEVICE_ZPCI, QEMU_CAPS_CCW); I haven't actually tried that, but you should be able to add the test cases to qemuxml2argvtest at the same time as you add them here, for consistency's sake. The qemu cmd generator is introduced in latter patch. I add the test cases in the corresponding patch. You should be able to add them to the xml2argv test even before you teach libvirt how to generate a QEMU command line for the new feature; then, when you add the missing pieces, it will be very clear from the diff what exactly changed.
But as long as you make sure test end up in both xml2argv and xml2xml by the end of the series, it doesn't really matter.
OK. Thanks!

This patch adds new functions for reservation, assignment and release to handle the uid/fid. If the uid/fid is defined in the domain XML, they will be reserved directly in collecting phase. If any of them is not defined, we will find out an available value for it from zPCI address hashtable, and reserve it. For hotplug case, there might be or not zPCI definition. So allocate and reserve uid/fid for undefined case. Assign if needed and reserve uid/fid for defined case. If the user define zPCI extension address but zPCI capability doesn't exist, an error will be reported. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> --- src/conf/domain_addr.c | 264 +++++++++++++++++++++++++++++++++++++++++ src/conf/domain_addr.h | 15 +++ src/libvirt_private.syms | 3 + src/qemu/qemu_domain_address.c | 41 ++++++- 4 files changed, 321 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 66e61f3550..f5742f4430 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -33,6 +33,147 @@ VIR_LOG_INIT("conf.domain_addr"); +static int +virDomainZPCIAddressReserveId(virHashTablePtr set, unsigned int id, + const char *name) +{ + if (virHashLookup(set, &id)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("zPCI %s %u is already reserved"), name, id); + return -1; + } + + if (virHashAddEntry(set, &id, (void*)1) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to reserve %s %u"), name, id); + return -1; + } + + return 0; +} + +static int +virDomainZPCIAddressReserveUid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + return virDomainZPCIAddressReserveId(set, addr->zpci_uid, "uid"); +} + +static int +virDomainZPCIAddressReserveFid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + return virDomainZPCIAddressReserveId(set, addr->zpci_fid, "fid"); +} + + +static bool +virDomainZPCIAddressAssignId(virHashTablePtr set, unsigned int *id, + unsigned int min, unsigned int max, + const char *name) +{ + while (virHashLookup(set, &min)) { + if (min == max) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("There is no more free %s."), name); + return false; + } + ++min; + } + *id = min; + + return true; +} + +static int +virDomainZPCIAddressAssignUid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (addr->uid_assigned) + return 0; + + addr->uid_assigned = virDomainZPCIAddressAssignId(set, &addr->zpci_uid, 1, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID, "uid"); + return addr->uid_assigned ? 0 : -1; +} + +static int +virDomainZPCIAddressAssignFid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (addr->fid_assigned) + return 0; + + addr->fid_assigned = virDomainZPCIAddressAssignId(set, &addr->zpci_fid, 0, + VIR_DOMAIN_DEVICE_ZPCI_MAX_FID, "fid"); + return addr->fid_assigned ? 0 : -1; +} + + +static void +virDomainZPCIAddressReleaseUid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (!addr->uid_assigned) + return; + + if (virHashRemoveEntry(set, &addr->zpci_uid)) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Release uid %u failed"), addr->zpci_uid); + + addr->uid_assigned = false; +} + +static void +virDomainZPCIAddressReleaseFid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (!addr->fid_assigned) + return; + + if (virHashRemoveEntry(set, &addr->zpci_fid)) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Release fid %u failed"), addr->zpci_fid); + + addr->fid_assigned = false; +} + + +static void +virDomainZPCIAddressReleaseIds(virDomainZPCIAddressIdsPtr zpciIds, + virPCIDeviceAddressPtr addr) +{ + if (!zpciIds || !addr->zpci) + return; + + virDomainZPCIAddressReleaseUid(zpciIds->uids, addr->zpci); + + virDomainZPCIAddressReleaseFid(zpciIds->fids, addr->zpci); + + VIR_FREE(addr->zpci); +} + + +static int +virDomainZPCIAddressReserveNextUid(virHashTablePtr uids, + virZPCIDeviceAddressPtr zpci) +{ + if (virDomainZPCIAddressAssignUid(uids, zpci) < 0) + return -1; + + if (virDomainZPCIAddressReserveUid(uids, zpci) < 0) + return -1; + + return 0; +} + +static int +virDomainZPCIAddressReserveNextFid(virHashTablePtr fids, + virZPCIDeviceAddressPtr zpci) +{ + if (virDomainZPCIAddressAssignFid(fids, zpci) < 0) + return -1; + + if (virDomainZPCIAddressReserveFid(fids, zpci) < 0) + return -1; + + return 0; +} + static void virDomainPCIAddressSetExtensionFree(virDomainPCIAddressSetPtr addrs) { @@ -44,6 +185,116 @@ virDomainPCIAddressSetExtensionFree(virDomainPCIAddressSetPtr addrs) VIR_FREE(addrs->zpciIds); } +static int +virDomainZPCIAddressReserveAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr) +{ + if (!addr->zpci) + return 0; + + if (addr->zpci->uid_assigned && + virDomainZPCIAddressReserveUid(addrs->zpciIds->uids, addr->zpci)) + return -1; + + if (addr->zpci->fid_assigned && + virDomainZPCIAddressReserveFid(addrs->zpciIds->fids, addr->zpci)) { + virDomainZPCIAddressReleaseUid(addrs->zpciIds->uids, addr->zpci); + return -1; + } + + return 0; +} + +static int +virDomainZPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, + virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned && + virDomainZPCIAddressReserveNextUid(addrs->zpciIds->uids, zpci)) + return -1; + + if (!zpci->fid_assigned && + virDomainZPCIAddressReserveNextFid(addrs->zpciIds->fids, zpci)) { + virDomainZPCIAddressReleaseUid(addrs->zpciIds->uids, zpci); + return -1; + } + + return 0; +} + + +int +virDomainPCIAddressExtensionReserveAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) { + /* Reserve uid/fid to ZPCI device which has defined uid/fid + * in the domain. + */ + if (virDomainZPCIAddressReserveAddr(addrs, addr) < 0) + return -1; + } + + return 0; +} + + +int +virDomainPCIAddressExtensionReserveNextAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr dev, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) { + /* Assign and reserve uid/fid to ZPCI device which has not defined uid/fid + * in the domain. + */ + virZPCIDeviceAddress zpci = { 0 }; + + if (dev->zpci) + zpci = *dev->zpci; + + if (virDomainZPCIAddressReserveNextAddr(addrs, &zpci) < 0) + return -1; + + if (!addrs->dryRun) { + if (!dev->zpci && VIR_ALLOC(dev->zpci) < 0) + return -1; + *dev->zpci = zpci; + } + } + + return 0; +} + +static int +virDomainPCIAddressEnsureExtensionAddr(virDomainPCIAddressSetPtr addrs, + virDomainDeviceInfoPtr dev) +{ + virZPCIDeviceAddressPtr zpci = dev->addr.pci.zpci; + + if (zpci && !dev->pciAddressExtFlags) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("zPCI is not supported.")); + return -1; + } + + if (!zpci) { + return virDomainPCIAddressExtensionReserveNextAddr(addrs, &dev->addr.pci, + dev->pciAddressExtFlags); + } else { + if (virDomainZPCIAddressAssignUid(addrs->zpciIds->uids, zpci)) + return -1; + + if (virDomainZPCIAddressAssignFid(addrs->zpciIds->fids, zpci)) + return -1; + + if (virDomainZPCIAddressReserveAddr(addrs, &dev->addr.pci)) + return -1; + } + + return 0; +} + virDomainPCIConnectFlags virDomainPCIControllerModelToConnectType(virDomainControllerModelPCI model) { @@ -740,12 +991,25 @@ virDomainPCIAddressEnsureAddr(virDomainPCIAddressSetPtr addrs, ret = virDomainPCIAddressReserveNextAddr(addrs, dev, flags, -1); } + if (virDomainPCIAddressEnsureExtensionAddr(addrs, dev) < 0) + goto cleanup; + cleanup: VIR_FREE(addrStr); return ret; } +void +virDomainPCIAddressExtensionReleaseAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + int extFlags) +{ + if ((extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI)) + virDomainZPCIAddressReleaseIds(addrs->zpciIds, addr); +} + + void virDomainPCIAddressReleaseAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr) diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index d1ec869da4..a1f71f15f4 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -164,6 +164,16 @@ bool virDomainPCIAddressSlotInUse(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int virDomainPCIAddressExtensionReserveAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + virDomainPCIAddressExtensionFlags extFlags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + +int virDomainPCIAddressExtensionReserveNextAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + virDomainPCIAddressExtensionFlags extFlags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + int virDomainPCIAddressReserveAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr, virDomainPCIConnectFlags flags, @@ -185,6 +195,11 @@ void virDomainPCIAddressReleaseAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +void virDomainPCIAddressExtensionReleaseAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + int extFlags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + void virDomainPCIAddressSetAllMulti(virDomainDefPtr def) ATTRIBUTE_NONNULL(1); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3d82d489de..163f249e07 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -111,6 +111,9 @@ virDomainPCIAddressAsString; virDomainPCIAddressBusIsFullyReserved; virDomainPCIAddressBusSetModel; virDomainPCIAddressEnsureAddr; +virDomainPCIAddressExtensionReleaseAddr; +virDomainPCIAddressExtensionReserveAddr; +virDomainPCIAddressExtensionReserveNextAddr; virDomainPCIAddressReleaseAddr; virDomainPCIAddressReserveAddr; virDomainPCIAddressReserveNextAddr; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 210f94e15e..1a823c7616 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1368,6 +1368,24 @@ qemuDomainPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, } +static int +qemuDomainAssignPCIAddressExtension(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainDeviceDefPtr device ATTRIBUTE_UNUSED, + virDomainDeviceInfoPtr info, + void *opaque) +{ + virDomainPCIAddressSetPtr addrs = opaque; + virPCIDeviceAddressPtr addr = &info->addr.pci; + virDomainPCIAddressExtensionFlags extFlags = info->pciAddressExtFlags; + + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) { + if (virDomainPCIAddressExtensionReserveNextAddr(addrs, addr, extFlags) < 0) + return -1; + } + + return 0; +} + static int qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, virDomainDeviceDefPtr device, @@ -1385,7 +1403,12 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, * parent, and will have its address collected during the scan * of the parent's device type. */ - return 0; + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI || + info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + return virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags); + else + return 0; } /* If we get to here, the device has a PCI address assigned in the @@ -1456,6 +1479,11 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, goto cleanup; } + if (virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags) < 0) { + goto cleanup; + } + ret = 0; cleanup: return ret; @@ -2549,6 +2577,9 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, if (qemuDomainAssignDevicePCISlots(def, qemuCaps, addrs) < 0) goto cleanup; + if (virDomainDeviceInfoIterate(def, qemuDomainAssignPCIAddressExtension, addrs) < 0) + goto cleanup; + /* Only for *new* domains with pcie-root (and no other * manually specified PCI controllers in the definition): If, * after assigning addresses/reserving slots for all devices, @@ -2643,6 +2674,9 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, if (qemuDomainAssignDevicePCISlots(def, qemuCaps, addrs) < 0) goto cleanup; + if (virDomainDeviceInfoIterate(def, qemuDomainAssignPCIAddressExtension, addrs) < 0) + goto cleanup; + /* set multi attribute for devices at function 0 of * any slot that has multiple functions in use */ @@ -3102,8 +3136,11 @@ qemuDomainReleaseDeviceAddress(virDomainObjPtr vm, if (!devstr) devstr = info->alias; - if (virDeviceInfoPCIAddressPresent(info)) + if (virDeviceInfoPCIAddressPresent(info)) { virDomainPCIAddressReleaseAddr(priv->pciaddrs, &info->addr.pci); + virDomainPCIAddressExtensionReleaseAddr(priv->pciaddrs, &info->addr.pci, + info->pciAddressExtFlags); + } if (virDomainUSBAddressRelease(priv->usbaddrs, info) < 0) VIR_WARN("Unable to release USB address on %s", NULLSTR(devstr)); -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote:
This patch adds new functions for reservation, assignment and release to handle the uid/fid. If the uid/fid is defined in the domain XML, they will be reserved directly in collecting phase. If any of them is not defined, we will find out an available value for it from zPCI address hashtable, and reserve it. For hotplug case, there might be or not zPCI definition. So allocate and reserve uid/fid for undefined case. Assign if needed and reserve uid/fid for defined case. If the user define zPCI extension address but zPCI capability doesn't exist, an error will be reported.
For this patch I once again didn't look too closely to the implementation, sorry. [...]
+static int +virDomainZPCIAddressReserveId(virHashTablePtr set, unsigned int id, + const char *name)
One argument per line, please. There are more instances in the patch, but I'm not going to point them all out :) [...]
+static int +virDomainZPCIAddressAssignUid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (addr->uid_assigned) + return 0; + + addr->uid_assigned = virDomainZPCIAddressAssignId(set, &addr->zpci_uid, 1, + VIR_DOMAIN_DEVICE_ZPCI_MAX_UID, "uid");
Messed up alignment. More instances further down. [...]
+static void +virDomainZPCIAddressReleaseUid(virHashTablePtr set, virZPCIDeviceAddressPtr addr) +{ + if (!addr->uid_assigned) + return; + + if (virHashRemoveEntry(set, &addr->zpci_uid)) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Release uid %u failed"), addr->zpci_uid);
Curly braces are required here. More instances further down. Looking at
+static void +virDomainZPCIAddressReleaseFid(virHashTablePtr set, virZPCIDeviceAddressPtr addr)
and
+static void +virDomainZPCIAddressReleaseIds(virDomainZPCIAddressIdsPtr zpciIds, + virPCIDeviceAddressPtr addr)
and
+static int +virDomainZPCIAddressReserveAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr)
you're being awfully inconsistent about the datatypes you're passing around... Unless I've missed something that makes doing so impossible, please try to make it so only the top-level datatypes (DomainPCIAddressSet and PCIDeviceAddress) are passed around. [...]
+static int +virDomainZPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, + virZPCIDeviceAddressPtr zpci) +{ + if (!zpci->uid_assigned && + virDomainZPCIAddressReserveNextUid(addrs->zpciIds->uids, zpci)) + return -1;
Messed up indentation. Also, missing curly braces. [...]
+int +virDomainPCIAddressExtensionReserveAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr addr, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) { + /* Reserve uid/fid to ZPCI device which has defined uid/fid + * in the domain. + */
Messed up indentation. [...]
+int +virDomainPCIAddressExtensionReserveNextAddr(virDomainPCIAddressSetPtr addrs, + virPCIDeviceAddressPtr dev, + virDomainPCIAddressExtensionFlags extFlags) +{ + if (extFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) { + /* Assign and reserve uid/fid to ZPCI device which has not defined uid/fid + * in the domain. + */
Messed up indentation. [...]
+static int +virDomainPCIAddressEnsureExtensionAddr(virDomainPCIAddressSetPtr addrs, + virDomainDeviceInfoPtr dev)
This should be virDomainPCIAddressExtensionEnsureAddr() for consistency's sake.
@@ -1385,7 +1403,12 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, * parent, and will have its address collected during the scan * of the parent's device type. */ - return 0; + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI || + info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + return virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags); + else + return 0;
This doesn't look right: the comment specifically states that the PCI address will be handled by the parent device in this case, why wouldn't the zPCI address not be handled in the same way? -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午10:58, Andrea Bolognani 写道:
@@ -1385,7 +1403,12 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, * parent, and will have its address collected during the scan * of the parent's device type. */ - return 0; + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI || + info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + return virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags); + else + return 0; This doesn't look right: the comment specifically states that the PCI address will be handled by the parent device in this case, why wouldn't the zPCI address not be handled in the same way? Thanks for other comments! I cutted off them. For this comment, we have to collect zPCI address in case that zPCI address is specified but PCI address is not. I think I shall split two checks. Original code is: if (!virDeviceInfoPCIAddressPresent(info) || ((device->type == VIR_DOMAIN_DEVICE_HOSTDEV) && (device->data.hostdev->parent.type != VIR_DOMAIN_DEVICE_NONE))) Separate them and only add the new code for !virDeviceInfoPCIAddressPresent(info) case.

On Fri, 2018-07-27 at 13:22 +0800, Yi Min Zhao wrote:
在 2018/7/24 下午10:58, Andrea Bolognani 写道:
@@ -1385,7 +1403,12 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, * parent, and will have its address collected during the scan * of the parent's device type. */ - return 0; + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI || + info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + return virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags); + else + return 0;
This doesn't look right: the comment specifically states that the PCI address will be handled by the parent device in this case, why wouldn't the zPCI address not be handled in the same way?
For this comment, we have to collect zPCI address in case that zPCI address is specified but PCI address is not. I think I shall split two checks. Original code is: if (!virDeviceInfoPCIAddressPresent(info) || ((device->type == VIR_DOMAIN_DEVICE_HOSTDEV) && (device->data.hostdev->parent.type != VIR_DOMAIN_DEVICE_NONE))) Separate them and only add the new code for !virDeviceInfoPCIAddressPresent(info) case.
I didn't look too closely, but I think you might have to change virDeviceInfoPCIAddressPresent() itself so that it knows about PCI address extensions and behaves accordingly. Basically, with the introduction of PCI address extensions, you're making questions such as "does this device have a PCI address assigned to it?" a lot less trivial to answer, and you need to ensure this doesn't cause existing assumption to no longer hold. -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/30 下午7:08, Andrea Bolognani 写道:
On Fri, 2018-07-27 at 13:22 +0800, Yi Min Zhao wrote:
在 2018/7/24 下午10:58, Andrea Bolognani 写道:
@@ -1385,7 +1403,12 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, * parent, and will have its address collected during the scan * of the parent's device type. */ - return 0; + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI || + info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + return virDomainPCIAddressExtensionReserveAddr(addrs, addr, + info->pciAddressExtFlags); + else + return 0; This doesn't look right: the comment specifically states that the PCI address will be handled by the parent device in this case, why wouldn't the zPCI address not be handled in the same way? For this comment, we have to collect zPCI address in case that zPCI address is specified but PCI address is not. I think I shall split two checks. Original code is: if (!virDeviceInfoPCIAddressPresent(info) || ((device->type == VIR_DOMAIN_DEVICE_HOSTDEV) && (device->data.hostdev->parent.type != VIR_DOMAIN_DEVICE_NONE))) Separate them and only add the new code for !virDeviceInfoPCIAddressPresent(info) case. I didn't look too closely, but I think you might have to change virDeviceInfoPCIAddressPresent() itself so that it knows about PCI address extensions and behaves accordingly.
Basically, with the introduction of PCI address extensions, you're making questions such as "does this device have a PCI address assigned to it?" a lot less trivial to answer, and you need to ensure this doesn't cause existing assumption to no longer hold.
I have a new idea. qemuDomainCollectPCIAddress() is called in one place. So we could add a new function qemuDomainCollectPCIExtensionAddress(), and call it after qemuDomainCollectPCIAddress() is called. Then we could collect pci address and extension address separately.

Add new functions to generate zPCI command string and append it to QEMU command line. And the related tests are added. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_command.c | 115 +++++++++++++++++++++ src/qemu/qemu_command.h | 4 + tests/qemuxml2argvdata/disk-virtio-s390-zpci.args | 27 +++++ .../hostdev-vfio-zpci-autogenerate.args | 26 +++++ .../hostdev-vfio-zpci-autogenerate.xml | 18 ++++ .../hostdev-vfio-zpci-boundaries.args | 30 ++++++ .../hostdev-vfio-zpci-boundaries.xml | 26 +++++ .../hostdev-vfio-zpci-multidomain-many.args | 40 +++++++ .../hostdev-vfio-zpci-multidomain-many.xml | 67 ++++++++++++ tests/qemuxml2argvdata/hostdev-vfio-zpci.args | 26 +++++ tests/qemuxml2argvtest.c | 14 +++ 11 files changed, 393 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.args diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index cea31e6a24..30c57a942c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -2136,6 +2136,68 @@ qemuBuildDriveDevStr(const virDomainDef *def, return NULL; } +char * +qemuBuildZPCIDevStr(virDomainDeviceInfoPtr dev) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + virBufferAddLit(&buf, "zpci"); + virBufferAsprintf(&buf, ",uid=%u", dev->addr.pci.zpci->zpci_uid); + virBufferAsprintf(&buf, ",fid=%u", dev->addr.pci.zpci->zpci_fid); + virBufferAsprintf(&buf, ",target=%s", dev->alias); + virBufferAsprintf(&buf, ",id=zpci%u", dev->addr.pci.zpci->zpci_uid); + + if (virBufferCheckError(&buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + + return virBufferContentAndReset(&buf); +} + +bool +qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info) +{ + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + ((info->pciAddressExtFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) || + info->addr.pci.zpci)) + return true; + + return false; +} + +static int +qemuAppendZPCIDevStr(virCommandPtr cmd, + virDomainDeviceInfoPtr dev) +{ + char *devstr = NULL; + + virCommandAddArg(cmd, "-device"); + if (!(devstr = qemuBuildZPCIDevStr(dev))) + return -1; + + virCommandAddArg(cmd, devstr); + + VIR_FREE(devstr); + return 0; +} + +static int +qemuBuildExtensionCommandLine(virCommandPtr cmd, + virQEMUCapsPtr qemuCaps, + virDomainDeviceInfoPtr dev) +{ + if (qemuCheckDeviceIsZPCI(dev)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("This QEMU doesn't support zpci devices")); + return -1; + } + return qemuAppendZPCIDevStr(cmd, dev); + } + + return 0; +} static int qemuBuildFloppyCommandLineOptions(virCommandPtr cmd, @@ -2256,6 +2318,9 @@ qemuBuildDiskCommandLine(virCommandPtr cmd, bootindex) < 0) return -1; } else { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &disk->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); if (!(optstr = qemuBuildDriveDevStr(def, disk, bootindex, @@ -2455,6 +2520,9 @@ qemuBuildFSDevCommandLine(virCommandPtr cmd, virCommandAddArg(cmd, optstr); VIR_FREE(optstr); + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &fs->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); if (!(optstr = qemuBuildFSDevStr(def, fs, qemuCaps))) return -1; @@ -2939,6 +3007,11 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, goto cleanup; if (devstr) { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, + &cont->info) < 0) { + VIR_FREE(devstr); + goto cleanup; + } virCommandAddArg(cmd, "-device"); virCommandAddArg(cmd, devstr); VIR_FREE(devstr); @@ -3742,6 +3815,9 @@ qemuBuildWatchdogCommandLine(virCommandPtr cmd, if (!def->watchdog) return 0; + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &def->watchdog->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); optstr = qemuBuildWatchdogDevStr(def, watchdog, qemuCaps); @@ -3826,6 +3902,9 @@ qemuBuildMemballoonCommandLine(virCommandPtr cmd, if (qemuBuildVirtioOptionsStr(&buf, def->memballoon->virtio, qemuCaps) < 0) goto error; + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &def->memballoon->info) < 0) + goto error; + virCommandAddArg(cmd, "-device"); virCommandAddArgBuffer(cmd, &buf); return 0; @@ -4048,6 +4127,9 @@ qemuBuildInputCommandLine(virCommandPtr cmd, virDomainInputDefPtr input = def->inputs[i]; char *devstr = NULL; + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &input->info) < 0) + return -1; + if (qemuBuildInputDevStr(&devstr, def, input, qemuCaps) < 0) return -1; @@ -4189,6 +4271,9 @@ qemuBuildSoundCommandLine(virCommandPtr cmd, if (sound->model == VIR_DOMAIN_SOUND_MODEL_PCSPK) { virCommandAddArgList(cmd, "-soundhw", "pcspk", NULL); } else { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &sound->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); if (!(str = qemuBuildSoundDevStr(def, sound, qemuCaps))) return -1; @@ -4425,6 +4510,9 @@ qemuBuildVideoCommandLine(virCommandPtr cmd, if (video->primary) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIDEO_PRIMARY)) { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, + &def->videos[i]->info) < 0) + return -1; virCommandAddArg(cmd, "-device"); if (!(str = qemuBuildDeviceVideoStr(def, video, qemuCaps))) @@ -4437,6 +4525,10 @@ qemuBuildVideoCommandLine(virCommandPtr cmd, return -1; } } else { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, + &def->videos[i]->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); if (!(str = qemuBuildDeviceVideoStr(def, video, qemuCaps))) @@ -5304,6 +5396,10 @@ qemuBuildHostdevCommandLine(virCommandPtr cmd, VIR_COMMAND_PASS_FD_CLOSE_PARENT); } } + + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, hostdev->info) < 0) + return -1; + virCommandAddArg(cmd, "-device"); devstr = qemuBuildPCIHostdevDevStr(def, hostdev, bootIndex, configfd_name, qemuCaps); @@ -5767,6 +5863,9 @@ qemuBuildRNGCommandLine(virLogManagerPtr logManager, virCommandAddArgBuffer(cmd, &buf); /* add the device */ + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &rng->info) < 0) + return -1; + if (!(tmp = qemuBuildRNGDevStr(def, rng, qemuCaps))) return -1; virCommandAddArgList(cmd, "-device", tmp, NULL); @@ -8303,6 +8402,9 @@ qemuBuildVhostuserCommandLine(virQEMUDriverPtr driver, virCommandAddArg(cmd, "-netdev"); virCommandAddArg(cmd, netdev); + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &net->info) < 0) + goto cleanup; + if (!(nic = qemuBuildNicDevStr(def, net, bootindex, queues, qemuCaps))) { goto cleanup; @@ -8584,6 +8686,9 @@ qemuBuildInterfaceCommandLine(virQEMUDriverPtr driver, * New way: -netdev type=tap,id=netdev1 -device e1000,id=netdev1 */ if (qemuDomainSupportsNicdev(def, net)) { + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &net->info) < 0) + goto cleanup; + if (!(nic = qemuBuildNicDevStr(def, net, bootindex, vhostfdSize, qemuCaps))) goto cleanup; @@ -9008,6 +9113,9 @@ qemuBuildShmemCommandLine(virLogManagerPtr logManager, switch ((virDomainShmemModel)shmem->model) { case VIR_DOMAIN_SHMEM_MODEL_IVSHMEM: + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &shmem->info) < 0) + return -1; + devstr = qemuBuildShmemDevLegacyStr(def, shmem, qemuCaps); break; @@ -9026,6 +9134,9 @@ qemuBuildShmemCommandLine(virLogManagerPtr logManager, ATTRIBUTE_FALLTHROUGH; case VIR_DOMAIN_SHMEM_MODEL_IVSHMEM_DOORBELL: + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &shmem->info) < 0) + return -1; + devstr = qemuBuildShmemDevStr(def, shmem, qemuCaps); break; @@ -10181,6 +10292,10 @@ qemuBuildVsockCommandLine(virCommandPtr cmd, virCommandPassFD(cmd, priv->vhostfd, VIR_COMMAND_PASS_FD_CLOSE_PARENT); priv->vhostfd = -1; + + if (qemuBuildExtensionCommandLine(cmd, qemuCaps, &vsock->info) < 0) + goto cleanup; + virCommandAddArgList(cmd, "-device", devstr, NULL); ret = 0; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 4f1b360130..78746577c1 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -174,6 +174,10 @@ char *qemuBuildRedirdevDevStr(const virDomainDef *def, virDomainRedirdevDefPtr dev, virQEMUCapsPtr qemuCaps); +char *qemuBuildZPCIDevStr(virDomainDeviceInfoPtr dev); + +bool qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info); + int qemuNetworkPrepareDevices(virDomainDefPtr def); int qemuGetDriveSourceString(virStorageSourcePtr src, diff --git a/tests/qemuxml2argvdata/disk-virtio-s390-zpci.args b/tests/qemuxml2argvdata/disk-virtio-s390-zpci.args new file mode 100644 index 0000000000..d6a46f8b15 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-s390-zpci.args @@ -0,0 +1,27 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-s390x \ +-name QEMUGuest1 \ +-S \ +-machine s390-ccw,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot c \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-virtio-disk0 \ +-device zpci,uid=25,fid=31,target=virtio-disk0,id=zpci25 \ +-device virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,\ +id=virtio-disk0 \ +-device virtio-balloon-ccw,id=balloon0,devno=fe.0.0000 diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args b/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args new file mode 100644 index 0000000000..db47d13a31 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args @@ -0,0 +1,26 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-s390x \ +-name QEMUGuest1 \ +-S \ +-machine s390-ccw,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot c \ +-device zpci,uid=1,fid=0,target=hostdev0,id=zpci1 \ +-device vfio-pci,host=00:00.0,id=hostdev0,bus=pci.0,addr=0x1 \ +-device zpci,uid=2,fid=1,target=balloon0,id=zpci2 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml b/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml new file mode 100644 index 0000000000..41c43cee23 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml @@ -0,0 +1,18 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci'/> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args b/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args new file mode 100644 index 0000000000..97cf46c02d --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args @@ -0,0 +1,30 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-s390x \ +-name QEMUGuest1 \ +-S \ +-machine s390-ccw,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot c \ +-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ +-device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ +-device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ +-device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \ +-device zpci,uid=1,fid=0,target=hostdev1,id=zpci1 \ +-device vfio-pci,host=00:00.0,id=hostdev1,bus=pci.0,addr=0x2 \ +-device zpci,uid=2,fid=1,target=balloon0,id=zpci2 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml b/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml new file mode 100644 index 0000000000..2c54016348 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml @@ -0,0 +1,26 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <controller type='pci' index='0' model='pci-root'/> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' fid='0xffffffff' uid='0xffff'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' fid='0x00000000' uid='0x0001'/> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args b/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args new file mode 100644 index 0000000000..c4490df7d0 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args @@ -0,0 +1,40 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-s390x \ +-name QEMUGuest1 \ +-S \ +-machine s390-ccw,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot c \ +-device zpci,uid=35,fid=63,target=hostdev0,id=zpci35 \ +-device vfio-pci,host=0001:00:00.0,id=hostdev0,bus=pci.0,addr=0x3 \ +-device zpci,uid=53,fid=104,target=hostdev1,id=zpci53 \ +-device vfio-pci,host=0002:00:00.0,id=hostdev1,bus=pci.0,addr=0x1 \ +-device zpci,uid=1,fid=0,target=hostdev2,id=zpci1 \ +-device vfio-pci,host=0003:00:00.0,id=hostdev2,bus=pci.0,addr=0x2 \ +-device zpci,uid=2,fid=1,target=hostdev3,id=zpci2 \ +-device vfio-pci,host=0004:00:00.0,id=hostdev3,bus=pci.0,addr=0x5 \ +-device zpci,uid=83,fid=2,target=hostdev4,id=zpci83 \ +-device vfio-pci,host=0005:00:00.0,id=hostdev4,bus=pci.0,addr=0x7 \ +-device zpci,uid=3,fid=114,target=hostdev5,id=zpci3 \ +-device vfio-pci,host=0006:00:00.0,id=hostdev5,bus=pci.0,addr=0x9 \ +-device zpci,uid=23,fid=3,target=hostdev6,id=zpci23 \ +-device vfio-pci,host=0007:00:00.0,id=hostdev6,bus=pci.0,addr=0x4 \ +-device zpci,uid=4,fid=40,target=hostdev7,id=zpci4 \ +-device vfio-pci,host=0008:00:00.0,id=hostdev7,bus=pci.0,addr=0x6 \ +-device zpci,uid=5,fid=4,target=balloon0,id=zpci5 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml b/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml new file mode 100644 index 0000000000..2063f57f33 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml @@ -0,0 +1,67 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <os> + <type arch='s390x' machine='s390-ccw'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-s390x</emulator> + <controller type='pci' index='0' model='pci-root'/> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' uid='0x0023' fid='0x0000003f'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0002' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' uid='0x0035' fid='0x00000068'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0003' bus='0x00' slot='0x00' function='0x0'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0004' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' uid='0x0053'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0006' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0' fid='0x00000072'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0007' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' uid='0x0017'/> + </hostdev> + <hostdev mode='subsystem' type='pci'> + <driver name='vfio'/> + <source> + <address domain='0x0008' bus='0x00' slot='0x00' function='0x0'/> + </source> + <address type='pci' fid='0x00000028'/> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/hostdev-vfio-zpci.args b/tests/qemuxml2argvdata/hostdev-vfio-zpci.args new file mode 100644 index 0000000000..cc2d872bc2 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-vfio-zpci.args @@ -0,0 +1,26 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-s390x \ +-name QEMUGuest1 \ +-S \ +-machine s390-ccw,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot c \ +-device zpci,uid=25,fid=31,target=hostdev0,id=zpci25 \ +-device vfio-pci,host=00:00.0,id=hostdev0,bus=pci.0,addr=0x8 \ +-device zpci,uid=1,fid=0,target=balloon0,id=zpci1 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x1 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 3be5af03aa..307b0f5d9d 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1014,6 +1014,8 @@ mymain(void) QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-virtio-scsi-ccw", QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); + DO_TEST("disk-virtio-s390-zpci", QEMU_CAPS_DEVICE_ZPCI, + QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-order", QEMU_CAPS_DRIVE_BOOT, QEMU_CAPS_VIRTIO_BLK_SCSI); DO_TEST("disk-virtio-drive-queues", @@ -1574,6 +1576,18 @@ mymain(void) QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST_PARSE_ERROR("hostdev-mdev-invalid-target-address", QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-multidomain-many", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-autogenerate", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-boundaries", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST_FAILURE("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST("pci-rom", NONE); DO_TEST("pci-rom-disabled", NONE); DO_TEST("pci-rom-disabled-invalid", NONE); -- Yi Min

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote: [...]
+bool +qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info) +{ + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + ((info->pciAddressExtFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) || + info->addr.pci.zpci)) + return true;
Missing curly braces. Also, do you really need to check both the flags and the presence of the zPCI address bits? It looks like either one or the other should be enough or, if that's not the case, it should be made so because having to check for two separate conditions makes me feel like it would introduce bugs in the long run. [...]
+char *qemuBuildZPCIDevStr(virDomainDeviceInfoPtr dev); + +bool qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info);
Is this really necessary? Can't these two functions be static? [...]
--- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1014,6 +1014,8 @@ mymain(void) QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-virtio-scsi-ccw", QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); + DO_TEST("disk-virtio-s390-zpci", QEMU_CAPS_DEVICE_ZPCI, + QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-order", QEMU_CAPS_DRIVE_BOOT, QEMU_CAPS_VIRTIO_BLK_SCSI); DO_TEST("disk-virtio-drive-queues", @@ -1574,6 +1576,18 @@ mymain(void) QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST_PARSE_ERROR("hostdev-mdev-invalid-target-address", QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-multidomain-many", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN,
Capabilities with X_QEMU prefix are no longer used, so you should not list them here.
+ QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-autogenerate", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-boundaries", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST_FAILURE("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI);
Please add these to qemuxml2xmltest at the same time. -- Andrea Bolognani / Red Hat / Virtualization

On Tue, 2018-07-10 at 16:02 +0800, Yi Min Zhao wrote: [...]
+bool +qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info) +{ + if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + ((info->pciAddressExtFlags & VIR_PCI_ADDRESS_EXTENSION_ZPCI) || + info->addr.pci.zpci)) + return true; Missing curly braces.
Also, do you really need to check both the flags and the presence of the zPCI address bits? It looks like either one or the other should be enough or, if that's not the case, it should be made so because having to check for two separate conditions makes me feel like it would introduce bugs in the long run. This is actually a problem. I add info->addr.pci.zpci check in order to check if the user specifies zpci address in xml even though it has no zpci support. The callers of this function checks zpci capability. If no zpci cap but
在 2018/7/24 下午11:09, Andrea Bolognani 写道: the user specfies zpci address, report an error. I will change the logic and remove the check on info->addr.pci.zpci.
[...]
+char *qemuBuildZPCIDevStr(virDomainDeviceInfoPtr dev); + +bool qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info); Is this really necessary? Can't these two functions be static?
They are also used in qemu_hotplug.c file.
[...]
--- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1014,6 +1014,8 @@ mymain(void) QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-virtio-scsi-ccw", QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); + DO_TEST("disk-virtio-s390-zpci", QEMU_CAPS_DEVICE_ZPCI, + QEMU_CAPS_CCW, QEMU_CAPS_VIRTIO_S390); DO_TEST("disk-order", QEMU_CAPS_DRIVE_BOOT, QEMU_CAPS_VIRTIO_BLK_SCSI); DO_TEST("disk-virtio-drive-queues", @@ -1574,6 +1576,18 @@ mymain(void) QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST_PARSE_ERROR("hostdev-mdev-invalid-target-address", QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-multidomain-many", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN, Capabilities with X_QEMU prefix are no longer used, so you should not list them here.
Sure.
+ QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-autogenerate", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-boundaries", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_PCI_BRIDGE, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST_FAILURE("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI); Please add these to qemuxml2xmltest at the same time.

On Fri, 2018-07-27 at 15:53 +0800, Yi Min Zhao wrote:
在 2018/7/24 下午11:09, Andrea Bolognani 写道:
Also, do you really need to check both the flags and the presence of the zPCI address bits? It looks like either one or the other should be enough or, if that's not the case, it should be made so because having to check for two separate conditions makes me feel like it would introduce bugs in the long run.
This is actually a problem. I add info->addr.pci.zpci check in order to check if the user specifies zpci address in xml even though it has no zpci support. The callers of this function checks zpci capability. If no zpci cap but the user specfies zpci address, report an error.
I will change the logic and remove the check on info->addr.pci.zpci.
Yeah, you definitely want to report an error if the QEMU binary doesn't support zPCI or the user is attempting to do something silly like add zPCI-related information to devices attached to an x86_64 guest...
+char *qemuBuildZPCIDevStr(virDomainDeviceInfoPtr dev); + +bool qemuCheckDeviceIsZPCI(virDomainDeviceInfoPtr info);
Is this really necessary? Can't these two functions be static?
They are also used in qemu_hotplug.c file.
Right, I overlooked that :) That said, they aren't used in the hotplug code until the next patch, so it would IMHO make sense to define them as static in this patch and export them in the next one, at the same time as you actually start using them outside of the file. [...]
+ DO_TEST("hostdev-vfio-zpci", + QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_DEVICE_ZPCI); + DO_TEST("hostdev-vfio-zpci-multidomain-many", + QEMU_CAPS_DEVICE_VFIO_PCI, X_QEMU_CAPS_HOST_PCI_MULTIDOMAIN,
Capabilities with X_QEMU prefix are no longer used, so you should not list them here.
Sure.
I forgot to mention, please have a single capability per line and make sure the set of capabilities used in xml2xml and xml2argv is exactly the same. -- Andrea Bolognani / Red Hat / Virtualization

This commit adds hotplug support for PCI devices on S390 guests. There's no need to implement hot unplug for zPCI as QEMU implements an unplug callback which will unplug both PCI and zPCI device in a cascaded way. Currently, the following PCI devices are supported: virtio-blk-pci virtio-net-pci virtio-rng-pci virtio-input-host-pci virtio-keyboard-pci virtio-mouse-pci virtio-tablet-pci vfio-pci SCSIVhost device Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> --- src/qemu/qemu_hotplug.c | 182 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 172 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 23f6d1daba..e5c193905c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -154,6 +154,84 @@ qemuHotplugPrepareDiskAccess(virQEMUDriverPtr driver, } +static int +qemuDomainAttachZPCIDevice(qemuMonitorPtr mon, + virDomainDeviceInfoPtr info) +{ + int ret = -1; + char *devstr_zpci = NULL; + + if (!(devstr_zpci = qemuBuildZPCIDevStr(info))) + goto cleanup; + + if (qemuMonitorAddDevice(mon, devstr_zpci) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(devstr_zpci); + return ret; +} + + +static int +qemuDomainDetachZPCIDevice(qemuMonitorPtr mon, + virDomainDeviceInfoPtr info) +{ + char *zpciAlias = NULL; + int ret = -1; + + if (virAsprintf(&zpciAlias, "zpci%d", info->addr.pci.zpci->zpci_uid) < 0) + goto cleanup; + + if (qemuMonitorDelDevice(mon, zpciAlias) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(zpciAlias); + return ret; +} + + +static int +qemuDomainAttachExtensionDevice(qemuMonitorPtr mon, + virDomainDeviceInfoPtr info, + virQEMUCapsPtr qemuCaps) +{ + if (qemuCheckDeviceIsZPCI(info)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("This QEMU doesn't support zpci devices")); + return -1; + } + return qemuDomainAttachZPCIDevice(mon, info); + } + + return 0; +} + + +static int +qemuDomainDetachExtensionDevice(qemuMonitorPtr mon, + virDomainDeviceInfoPtr info, + virQEMUCapsPtr qemuCaps) +{ + if (qemuCheckDeviceIsZPCI(info)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ZPCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("This QEMU doesn't support zpci devices")); + return -1; + } + return qemuDomainDetachZPCIDevice(mon, info); + } + + return 0; +} + + static int qemuHotplugWaitForTrayEject(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -403,9 +481,16 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, if (qemuBlockStorageSourceAttachApply(priv->mon, data) < 0) goto exit_monitor; - if (qemuMonitorAddDevice(priv->mon, devstr) < 0) + if (qemuDomainAttachExtensionDevice(priv->mon, &disk->info, + priv->qemuCaps) < 0) goto exit_monitor; + if (qemuMonitorAddDevice(priv->mon, devstr) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &disk->info, + priv->qemuCaps)); + goto exit_monitor; + } + if (qemuDomainObjExitMonitor(driver, vm) < 0) { ret = -2; goto error; @@ -519,7 +604,17 @@ int qemuDomainAttachControllerDevice(virQEMUDriverPtr driver, goto cleanup; qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorAddDevice(priv->mon, devstr); + + if (qemuDomainAttachExtensionDevice(priv->mon, &controller->info, + priv->qemuCaps) < 0) { + ignore_value(qemuDomainObjExitMonitor(driver, vm)); + goto cleanup; + } + + if ((ret = qemuMonitorAddDevice(priv->mon, devstr)) < 0) + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &controller->info, + priv->qemuCaps)); + if (qemuDomainObjExitMonitor(driver, vm) < 0) { releaseaddr = false; ret = -1; @@ -961,7 +1056,8 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, goto cleanup; } - if (qemuDomainIsS390CCW(vm->def) && + if (net->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + qemuDomainIsS390CCW(vm->def) && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CCW)) { net->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW; if (!(ccwaddrs = virDomainCCWAddressSetCreateFromDomain(vm->def))) @@ -1031,7 +1127,17 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, goto try_remove; qemuDomainObjEnterMonitor(driver, vm); + + if (qemuDomainAttachExtensionDevice(priv->mon, &net->info, + priv->qemuCaps) < 0) { + ignore_value(qemuDomainObjExitMonitor(driver, vm)); + virDomainAuditNet(vm, NULL, net, "attach", false); + goto try_remove; + } + if (qemuMonitorAddDevice(priv->mon, nicstr) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &net->info, + priv->qemuCaps)); ignore_value(qemuDomainObjExitMonitor(driver, vm)); virDomainAuditNet(vm, NULL, net, "attach", false); goto try_remove; @@ -1247,8 +1353,17 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, goto error; qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr, - configfd, configfd_name); + + if ((ret = qemuDomainAttachExtensionDevice(priv->mon, hostdev->info, + priv->qemuCaps)) < 0) + goto exit_monitor; + + if ((ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr, + configfd, configfd_name)) < 0) + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, hostdev->info, + priv->qemuCaps)); + + exit_monitor: if (qemuDomainObjExitMonitor(driver, vm) < 0) goto error; @@ -1904,9 +2019,16 @@ qemuDomainAttachRNGDevice(virQEMUDriverPtr driver, if (qemuMonitorAddObject(priv->mon, &props, &objAlias) < 0) goto exit_monitor; - if (qemuMonitorAddDevice(priv->mon, devstr) < 0) + if (qemuDomainAttachExtensionDevice(priv->mon, &rng->info, + priv->qemuCaps) < 0) goto exit_monitor; + if (qemuMonitorAddDevice(priv->mon, devstr) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &rng->info, + priv->qemuCaps)); + goto exit_monitor; + } + if (qemuDomainObjExitMonitor(driver, vm) < 0) { releaseaddr = false; goto cleanup; @@ -2398,8 +2520,18 @@ qemuDomainAttachSCSIVHostDevice(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr, vhostfd, vhostfdName); + if ((ret = qemuDomainAttachExtensionDevice(priv->mon, hostdev->info, + priv->qemuCaps)) < 0) + goto exit_monitor; + + if ((ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr, vhostfd, + vhostfdName)) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, hostdev->info, + priv->qemuCaps)); + goto exit_monitor; + } + exit_monitor: if (qemuDomainObjExitMonitor(driver, vm) < 0 || ret < 0) goto audit; @@ -2644,8 +2776,15 @@ qemuDomainAttachShmemDevice(virQEMUDriverPtr driver, release_backing = true; - if (qemuMonitorAddDevice(priv->mon, shmstr) < 0) + if (qemuDomainAttachExtensionDevice(priv->mon, &shmem->info, + priv->qemuCaps) < 0) + goto exit_monitor; + + if (qemuMonitorAddDevice(priv->mon, shmstr) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &shmem->info, + priv->qemuCaps)); goto exit_monitor; + } if (qemuDomainObjExitMonitor(driver, vm) < 0) { release_address = false; @@ -2818,8 +2957,19 @@ qemuDomainAttachInputDevice(virQEMUDriverPtr driver, goto cleanup; qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorAddDevice(priv->mon, devstr) < 0) + + if (input->bus == VIR_DOMAIN_INPUT_BUS_VIRTIO) { + if (qemuDomainAttachExtensionDevice(priv->mon, &input->info, + priv->qemuCaps) < 0) + goto exit_monitor; + } + + if (qemuMonitorAddDevice(priv->mon, devstr) < 0) { + if (input->bus == VIR_DOMAIN_INPUT_BUS_VIRTIO) + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &input->info, + priv->qemuCaps)); goto exit_monitor; + } if (qemuDomainObjExitMonitor(driver, vm) < 0) { releaseaddr = false; @@ -2897,9 +3047,16 @@ qemuDomainAttachVsockDevice(virQEMUDriverPtr driver, goto cleanup; qemuDomainObjEnterMonitor(driver, vm); - if (qemuMonitorAddDeviceWithFd(priv->mon, devstr, vsockPriv->vhostfd, fdname) < 0) + + if (qemuDomainAttachExtensionDevice(priv->mon, &vsock->info, priv->qemuCaps) < 0) goto exit_monitor; + if (qemuMonitorAddDeviceWithFd(priv->mon, devstr, vsockPriv->vhostfd, fdname) < 0) { + ignore_value(qemuDomainDetachExtensionDevice(priv->mon, &vsock->info, + priv->qemuCaps)); + goto exit_monitor; + } + if (qemuDomainObjExitMonitor(driver, vm) < 0) { releaseaddr = false; goto cleanup; @@ -4917,6 +5074,11 @@ int qemuDomainDetachControllerDevice(virQEMUDriverPtr driver, qemuDomainMarkDeviceForRemoval(vm, &detach->info); qemuDomainObjEnterMonitor(driver, vm); + if (detach->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI && + qemuDomainDetachExtensionDevice(priv->mon, &detach->info, priv->qemuCaps)) { + ignore_value(qemuDomainObjExitMonitor(driver, vm)); + goto cleanup; + } if (qemuMonitorDelDevice(priv->mon, detach->info.alias)) { ignore_value(qemuDomainObjExitMonitor(driver, vm)); goto cleanup; -- Yi Min

Update 'Device address' section to describe the 'uid' and 'fid' attributes. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> --- docs/formatdomain.html.in | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe137bf..32dcf7aaae 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3816,7 +3816,14 @@ (<span class="since">since 0.9.7, requires QEMU 0.13</span>). <code>multifunction</code> defaults to 'off', but should be set to 'on' for function 0 of a slot that will - have multiple functions used.<br/> + have multiple functions used. (<span class="since">Since 4.6.0 + </span>), PCI address extensions depending on the architecture + are supported. E.g. for S390 guests, PCI addresses have + additional attributes: <code>uid</code> (a hex value between + 0x1 and 0xffff, inclusive), and <code>fid</code> (a hex value + between 0x0 and 0xffffffff, inclusive), which are used by PCI + devices on S390 for User-defined Identifiers and Function + Identifiers.<br/> <span class="since">Since 1.3.5</span>, some hypervisor drivers may accept an <code><address type='pci'/></code> element with no other attributes as an explicit request to -- Yi Min

Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> --- docs/news.xml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/news.xml b/docs/news.xml index 773c95b0da..bcdf350e55 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -44,6 +44,17 @@ support should be available to the guest. </description> </change> + <change> + <summary> + qemu: Added support for PCI device passthrough on S390 + </summary> + <description> + The new zPCI attributes uid (user-defined identifier) + and fid (PCI function identifier) of the S390 platform + extend the PCI device address to support the PCI + passthrough on S390. + </description> + </change> </section> <section title="Improvements"> </section> -- Yi Min

On Tue, 10 Jul 2018 16:02:14 +0800 Yi Min Zhao <zyimin@linux.ibm.com> wrote:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space.
FWIW, from my QEMU POV there's nothing obviously wrong in here, but I'm not familiar with the libvirt code base. So I'll leave this to the libvirt developers.

在 2018/7/17 下午8:35, Cornelia Huck 写道:
On Tue, 10 Jul 2018 16:02:14 +0800 Yi Min Zhao <zyimin@linux.ibm.com> wrote:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space. FWIW, from my QEMU POV there's nothing obviously wrong in here, but I'm not familiar with the libvirt code base. So I'll leave this to the libvirt developers.
Thanks! Libvirt developers have not given any comment on v2 until now. I'm afraid the end of this month is coming soon.

Hi all, The next release is coming soon. I'm really expecting the comments from all of you. Hope you could consider a review before the next review. Thank you very much! Yi Min 在 2018/7/10 下午4:02, Yi Min Zhao 写道:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space.
Code Base ========= commit in master: 767f9e1449b1a36111532847f0c62dc758263c42 qemu: validate: Enforce compile time switch type checking for videos
Change Log ========== v1->v2: 1. Separate test commit and merge testcases into corresponding commits that introduce the functionalities firstly. 2. Spare some checks for zpci device. 3. Add vsock and controller support. 4. Add uin32 type schema. 5. Rename zpciuid and zpcifid to zpci_uid and zpci_fid. 6. Always return multibus support on S390.
Yi Min Zhao (12): conf: Add definitions for 'uid' and 'fid' PCI address attributes qemu: Introduce zPCI capability conf: Introduce a new PCI address extension flag qemu: Enable PCI multi bus for S390 guests qemu: Auto add pci-root for s390/s390x guests conf: Introduce address caching for PCI extensions conf: Introduce parser, formatter for uid and fid conf: Allocate/release 'uid' and 'fid' in PCI address qemu: Generate and use zPCI device in QEMU command line qemu: Add hotpluging support for PCI devices on S390 guests docs: Add 'uid' and 'fid' information news: Update news for PCI address extension attributes
docs/formatdomain.html.in | 9 +- docs/news.xml | 11 + docs/schemas/basictypes.rng | 31 ++ docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.c | 73 +++++ src/conf/device_conf.h | 1 + src/conf/domain_addr.c | 346 +++++++++++++++++++++ src/conf/domain_addr.h | 29 ++ src/conf/domain_conf.c | 6 + src/libvirt_private.syms | 4 + src/qemu/qemu_capabilities.c | 6 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 115 +++++++ src/qemu/qemu_command.h | 4 + src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain_address.c | 181 ++++++++++- src/qemu/qemu_hotplug.c | 182 ++++++++++- src/util/virpci.h | 13 + tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml | 1 + tests/qemuxml2argvdata/disk-virtio-s390-zpci.args | 27 ++ tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml | 17 + .../hostdev-vfio-zpci-autogenerate.args | 26 ++ .../hostdev-vfio-zpci-autogenerate.xml | 18 ++ .../hostdev-vfio-zpci-boundaries.args | 30 ++ .../hostdev-vfio-zpci-boundaries.xml | 26 ++ .../hostdev-vfio-zpci-multidomain-many.args | 40 +++ .../hostdev-vfio-zpci-multidomain-many.xml | 67 ++++ tests/qemuxml2argvdata/hostdev-vfio-zpci.args | 26 ++ tests/qemuxml2argvdata/hostdev-vfio-zpci.xml | 19 ++ tests/qemuxml2argvtest.c | 14 + tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml | 29 ++ tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml | 30 ++ tests/qemuxml2xmltest.c | 3 + 38 files changed, 1377 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-autogenerate.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-boundaries.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci-multidomain-many.xml create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.args create mode 100644 tests/qemuxml2argvdata/hostdev-vfio-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/disk-virtio-s390-zpci.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-vfio-zpci.xml

On Tue, Jul 10, 2018 at 04:02:14PM +0800, Yi Min Zhao wrote:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space.
Code Base ========= commit in master: 767f9e1449b1a36111532847f0c62dc758263c42 qemu: validate: Enforce compile time switch type checking for videos
Change Log ========== v1->v2: 1. Separate test commit and merge testcases into corresponding commits that introduce the functionalities firstly. 2. Spare some checks for zpci device. 3. Add vsock and controller support. 4. Add uin32 type schema. 5. Rename zpciuid and zpcifid to zpci_uid and zpci_fid. 6. Always return multibus support on S390.
Yi Min Zhao (12): conf: Add definitions for 'uid' and 'fid' PCI address attributes qemu: Introduce zPCI capability conf: Introduce a new PCI address extension flag qemu: Enable PCI multi bus for S390 guests qemu: Auto add pci-root for s390/s390x guests conf: Introduce address caching for PCI extensions conf: Introduce parser, formatter for uid and fid conf: Allocate/release 'uid' and 'fid' in PCI address qemu: Generate and use zPCI device in QEMU command line qemu: Add hotpluging support for PCI devices on S390 guests docs: Add 'uid' and 'fid' information news: Update news for PCI address extension attributes
The patches look OK to me, therefore: Reviewed-by: Ján Tomko <jtomko@redhat.com> But I'd like to hear from some other developer who touched the PCI code last (Laine? Andrea?) Jano

On Mon, Jul 23, 2018, 2:37 PM Ján Tomko <jtomko@redhat.com> wrote:
But I'd like to hear from some other developer who touched the PCI code last (Laine? Andrea?)
I'm looking at this on the screen of my phone as I stare out the hotel window at an afternoon rainstorm over Razgrad, Bulgaria, and won't be back to my office for another week, so I'm afraid I'm not in much of a position to make any useful comment :-)

On Mon, 2018-07-23 at 13:37 +0200, Ján Tomko wrote:
On Tue, Jul 10, 2018 at 04:02:14PM +0800, Yi Min Zhao wrote:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space. [...]
The patches look OK to me, therefore:
Reviewed-by: Ján Tomko <jtomko@redhat.com>
But I'd like to hear from some other developer who touched the PCI code last (Laine? Andrea?)
Thanks for bringing this to my attention - it had somehow managed to escape my radar so far :) I've quickly gone through the patches and spotted some minor code style issues that I'd like to see addressed (I'll point them out separately); first, though, I have a couple of questions about the general idea behind the feature. Looking at both the generated QEMU command line and the qemu-devel message linked above, I seem to understand the zpci device is basically some sort of adapter that sits between a regular PCI device (emulated or otherwise) and an s390 guest and presents an ID-based interface to the latter; so IIUC the s390 guest doesn't actually see a PCI device, but a couple of IDs that can (somehow) be used to access the underlying PCI device.
From whatever little s390 knowledge I have, I recall the architecture using peculiar ways to address and access devices, so PCI not being usable wouldn't surprise me that much; what I find odd, however, is that regular PCI seems to be available at least on the host side, because otherwise stuff like
-device zpci,uid=1,fid=1,target=vpci0 -device vfio-pci,host=0000:00:00.0,id=vpci0 wouldn't be possible. So, would anyone with s390 knowledge please spend a few words explaining how the various pieces fit together? Assuming the above is a correct reading of the situation, it would seem the IDs are the only guest-visible part that we need to make sure doesn't change during the lifetime of the guest, while the usual bus/slot/function triplet assigned to the underlying PCI device doesn't actually matter. And if that's the case, wouldn't something like <address type='zpci' uid='1' fid='1'/> be a better representation of the arrangement? We could leave it up to QEMU to assign addresses to the PCI devices in this case... But maybe they still matter for things such as migration? Or maybe I've just gotten it wrong altogether? :) -- Andrea Bolognani / Red Hat / Virtualization

On Tue, 24 Jul 2018 10:20:48 +0200 Andrea Bolognani <abologna@redhat.com> wrote:
Looking at both the generated QEMU command line and the qemu-devel message linked above, I seem to understand the zpci device is basically some sort of adapter that sits between a regular PCI device (emulated or otherwise) and an s390 guest and presents an ID-based interface to the latter; so IIUC the s390 guest doesn't actually see a PCI device, but a couple of IDs that can (somehow) be used to access the underlying PCI device.
From whatever little s390 knowledge I have, I recall the architecture using peculiar ways to address and access devices, so PCI not being usable wouldn't surprise me that much; what I find odd, however, is that regular PCI seems to be available at least on the host side, because otherwise stuff like
-device zpci,uid=1,fid=1,target=vpci0 -device vfio-pci,host=0000:00:00.0,id=vpci0
wouldn't be possible. So, would anyone with s390 knowledge please spend a few words explaining how the various pieces fit together?
For some hints, let me point to https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html -- for implementation details, I'll point to the IBM folks :)
Assuming the above is a correct reading of the situation, it would seem the IDs are the only guest-visible part that we need to make sure doesn't change during the lifetime of the guest, while the usual bus/slot/function triplet assigned to the underlying PCI device doesn't actually matter. And if that's the case, wouldn't something like
<address type='zpci' uid='1' fid='1'/>
be a better representation of the arrangement? We could leave it up to QEMU to assign addresses to the PCI devices in this case... But maybe they still matter for things such as migration? Or maybe I've just gotten it wrong altogether? :)

On 07/24/2018 10:20 AM, Andrea Bolognani wrote:
On Mon, 2018-07-23 at 13:37 +0200, Ján Tomko wrote:
On Tue, Jul 10, 2018 at 04:02:14PM +0800, Yi Min Zhao wrote:
Abstract ======== The PCI representation in QEMU has recently been extended for S390 allowing configuration of zPCI attributes like uid (user-defined identifier) and fid (PCI function identifier). The details can be found here: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg07262.html
To support the new zPCI feature of the S390 platform, two new XML attributes, @uid and @fid, are introduced for device addresses of type 'pci', i.e.: <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='0x0003' fid='0x00000027'/> </hostdev>
uid and fid are optional attributes. If they are defined by the user, unique values within the guest domain must be used. If they are not specified and the architecture requires them, they are automatically generated with non-conflicting values.
Current implementation is the most seamless one for the user as it unites the address specific data of a PCI device on one XML element. It could accommodate both specifying our special parameters (uid and fid) and re-using standard statements (domain, bus, slot and function) for PCI devices. User can still specify bus/slot/function for the virtualized PCI devices in the XML.
Thus uid/fid act as an extension to the PCI address and are stored in a new structure 'virZPCIDeviceAddress' which is a member of common PCI Address structure. Additionally, two hashtables are used for assignment and reservation of uid/fid.
In support of extending the PCI address, a new PCI address extension flag is introduced. This extension flag allows is not only dedicated for the S390 platform but also other architectures needing certain extensions to PCI address space. [...]
The patches look OK to me, therefore:
Reviewed-by: Ján Tomko <jtomko@redhat.com>
But I'd like to hear from some other developer who touched the PCI code last (Laine? Andrea?)
Thanks for bringing this to my attention - it had somehow managed to escape my radar so far :)
I've quickly gone through the patches and spotted some minor code style issues that I'd like to see addressed (I'll point them out separately); first, though, I have a couple of questions about the general idea behind the feature.
Looking at both the generated QEMU command line and the qemu-devel message linked above, I seem to understand the zpci device is basically some sort of adapter that sits between a regular PCI device (emulated or otherwise) and an s390 guest and presents an ID-based interface to the latter; so IIUC the s390 guest doesn't actually see a PCI device, but a couple of IDs that can (somehow) be used to access the underlying PCI device.
From whatever little s390 knowledge I have, I recall the architecture using peculiar ways to address and access devices, so PCI not being usable wouldn't surprise me that much; what I find odd, however, is that regular PCI seems to be available at least on the host side, because otherwise stuff like
-device zpci,uid=1,fid=1,target=vpci0 -device vfio-pci,host=0000:00:00.0,id=vpci0
wouldn't be possible. So, would anyone with s390 knowledge please spend a few words explaining how the various pieces fit together?
Assuming the above is a correct reading of the situation, it would seem the IDs are the only guest-visible part that we need to make sure doesn't change during the lifetime of the guest, while the usual bus/slot/function triplet assigned to the underlying PCI device doesn't actually matter. And if that's the case, wouldn't something like
<address type='zpci' uid='1' fid='1'/>
Then a pci device on s390 would need a special address type zpci in libvirt and the design idea for libvirt is to stay compatible with pci. Therefore uid and fid are optional attributes and if not specified on s390 they are generated. The patch series also allows on s390 to specify the pci address type just with attributes uid and/or fid causing the rest of the pci address attributes to be generated. That means <address type='pci' uid='1' fid='1'/> is a valid pci address on s390 but would get expanded during definition into e.g. <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='1' fid='1'/> Also <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'> is a valid pci address on s390 and is would be expanded during definition into e.g. <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' uid='1' fid='1'/>
be a better representation of the arrangement? We could leave it up to QEMU to assign addresses to the PCI devices in this case... But maybe they still matter for things such as migration? Or maybe I've just gotten it wrong altogether? :)
-- Mit freundlichen Grüßen/Kind regards Boris Fiuczynski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Köderitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On Tue, 2018-07-24 at 11:31 +0200, Boris Fiuczynski wrote:
On 07/24/2018 10:20 AM, Andrea Bolognani wrote:
Assuming the above is a correct reading of the situation, it would seem the IDs are the only guest-visible part that we need to make sure doesn't change during the lifetime of the guest, while the usual bus/slot/function triplet assigned to the underlying PCI device doesn't actually matter. And if that's the case, wouldn't something like
<address type='zpci' uid='1' fid='1'/>
Then a pci device on s390 would need a special address type zpci in libvirt and the design idea for libvirt is to stay compatible with pci.
Being compatible with the existing PCI machinery is certainly a good idea when it makes sense to do so, but I'm not quite convinced that is the case here. According to Cornelia's blog post on the subject, the PCI topology inside the guest will be determined entirely by the IDs. Is there even a way to eg. use bridges to create a non-flat PCI hierarchy? Or to have several PCI devices share the same bus or slot? If none of the above applies, then that doesn't look a whole lot like PCI to me :) Moreover, we already have several address types in addition to PCI such as USB, virtio-mmio, spapr-vio, ccw... Adding yet another one is not a problem if it makes the interface more sensible. More concrete questions: one of the zPCI test cases includes <controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev> which translates to -device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \ How does the pci-bridge controller show up in the guest, if at all? Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes?
Therefore uid and fid are optional attributes and if not specified on s390 they are generated. The patch series also allows on s390 to specify the pci address type just with attributes uid and/or fid causing the rest of the pci address attributes to be generated.
This goes without saying: users should not have to worry about addresses at all unless they have very specific needs. -- Andrea Bolognani / Red Hat / Virtualization

On 07/24/2018 03:41 PM, Andrea Bolognani wrote:
On Tue, 2018-07-24 at 11:31 +0200, Boris Fiuczynski wrote:
On 07/24/2018 10:20 AM, Andrea Bolognani wrote:
Assuming the above is a correct reading of the situation, it would seem the IDs are the only guest-visible part that we need to make sure doesn't change during the lifetime of the guest, while the usual bus/slot/function triplet assigned to the underlying PCI device doesn't actually matter. And if that's the case, wouldn't something like
<address type='zpci' uid='1' fid='1'/>
Then a pci device on s390 would need a special address type zpci in libvirt and the design idea for libvirt is to stay compatible with pci.
Being compatible with the existing PCI machinery is certainly a good idea when it makes sense to do so, but I'm not quite convinced that is the case here.
From a user perspective: I take your example below and reduce it to pci only like this: <controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0'/> </hostdev> This works on x86 as well as on s390 where as your suggested zpci address type would not allow this. This is what I wanted to express with the word "compatible". As I wrote before: It would also be valid for a user to not care about the attributes domain, bus, slot and function and reduce the specified set of attributes to e.g. <address type='pci' uid='0xffff'/>
According to Cornelia's blog post on the subject, the PCI topology inside the guest will be determined entirely by the IDs. Is there even a way to eg. use bridges to create a non-flat PCI hierarchy? Or to have several PCI devices share the same bus or slot?
If none of the above applies, then that doesn't look a whole lot like PCI to me :)
Moreover, we already have several address types in addition to PCI such as USB, virtio-mmio, spapr-vio, ccw... Adding yet another one is not a problem if it makes the interface more sensible.
Sure you can add one more but wouldn't you end up with e.g. a hostdev model vfio-pci with an address type of pci on all pci supporting architectures but s390 where you need to use zpci? What would be the benefit for the user or higher level management software? Actually I would not like to introduce special handling unless required.
More concrete questions: one of the zPCI test cases includes
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev>
which translates to
-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \
How does the pci-bridge controller show up in the guest, if at all?
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes?
Therefore uid and fid are optional attributes and if not specified on s390 they are generated. The patch series also allows on s390 to specify the pci address type just with attributes uid and/or fid causing the rest of the pci address attributes to be generated.
This goes without saying: users should not have to worry about addresses at all unless they have very specific needs.
-- Mit freundlichen Grüßen/Kind regards Boris Fiuczynski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Köderitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On Tue, 2018-07-24 at 17:15 +0200, Boris Fiuczynski wrote:
On 07/24/2018 03:41 PM, Andrea Bolognani wrote:
Being compatible with the existing PCI machinery is certainly a good idea when it makes sense to do so, but I'm not quite convinced that is the case here.
From a user perspective: I take your example below and reduce it to pci only like this:
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0'/> </hostdev>
This works on x86 as well as on s390 where as your suggested zpci address type would not allow this. This is what I wanted to express with the word "compatible". As I wrote before: It would also be valid for a user to not care about the attributes domain, bus, slot and function and reduce the specified set of attributes to e.g. <address type='pci' uid='0xffff'/>
That's not really what users and management applications pass to libvirt, though: a more realistic example would be <hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> </hostdev> eg. you specify the host address and leave coming up with a suitable guest address entirely up to libvirt, in which case whether the resulting address is type=pci or type=zpci hardly matters. If you want to take device address assignment upon yourself, then you're gonna have to assign addresses to controllers as well, not to mention specify the entire PCI topology with all which that entails... Not exactly a common scenario.
According to Cornelia's blog post on the subject, the PCI topology inside the guest will be determined entirely by the IDs. Is there even a way to eg. use bridges to create a non-flat PCI hierarchy? Or to have several PCI devices share the same bus or slot?
If none of the above applies, then that doesn't look a whole lot like PCI to me :)
Moreover, we already have several address types in addition to PCI such as USB, virtio-mmio, spapr-vio, ccw... Adding yet another one is not a problem if it makes the interface more sensible.
Sure you can add one more but wouldn't you end up with e.g. a hostdev model vfio-pci with an address type of pci on all pci supporting architectures but s390 where you need to use zpci? What would be the benefit for the user or higher level management software? Actually I would not like to introduce special handling unless required.
I'm all for offering users an interface that abstracts as many platform-specific quirks as possible, but there's a balance to be found and we should be careful not to lean too much the opposite way. With my current understanding, it doesn't look to me like zPCI behaves similarly enough to how PCI behaves on other platforms for us to sensibly describe both using the same interface, and the fact that QEMU had to come up with a specific middleware device seems to confirm my suspicion... In any case, would you mind answering the questions below? That would certainly help me gain a better understanding of the whole issue.
More concrete questions: one of the zPCI test cases includes
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev>
which translates to
-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \
How does the pci-bridge controller show up in the guest, if at all?
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes?
-- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午11:43, Andrea Bolognani 写道:
On 07/24/2018 03:41 PM, Andrea Bolognani wrote:
Being compatible with the existing PCI machinery is certainly a good idea when it makes sense to do so, but I'm not quite convinced that is the case here. From a user perspective: I take your example below and reduce it to pci only like this:
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0'/> </hostdev>
This works on x86 as well as on s390 where as your suggested zpci address type would not allow this. This is what I wanted to express with the word "compatible". As I wrote before: It would also be valid for a user to not care about the attributes domain, bus, slot and function and reduce the specified set of attributes to e.g. <address type='pci' uid='0xffff'/> That's not really what users and management applications pass to
On Tue, 2018-07-24 at 17:15 +0200, Boris Fiuczynski wrote: libvirt, though: a more realistic example would be
<hostdev mode='subsystem' type='pci'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> </hostdev>
eg. you specify the host address and leave coming up with a suitable guest address entirely up to libvirt, in which case whether the resulting address is type=pci or type=zpci hardly matters.
If you want to take device address assignment upon yourself, then you're gonna have to assign addresses to controllers as well, not to mention specify the entire PCI topology with all which that entails... Not exactly a common scenario.
According to Cornelia's blog post on the subject, the PCI topology inside the guest will be determined entirely by the IDs. Is there even a way to eg. use bridges to create a non-flat PCI hierarchy? Or to have several PCI devices share the same bus or slot?
If none of the above applies, then that doesn't look a whole lot like PCI to me :)
Moreover, we already have several address types in addition to PCI such as USB, virtio-mmio, spapr-vio, ccw... Adding yet another one is not a problem if it makes the interface more sensible. Sure you can add one more but wouldn't you end up with e.g. a hostdev model vfio-pci with an address type of pci on all pci supporting architectures but s390 where you need to use zpci? What would be the benefit for the user or higher level management software? Actually I would not like to introduce special handling unless required. I'm all for offering users an interface that abstracts as many platform-specific quirks as possible, but there's a balance to be found and we should be careful not to lean too much the opposite way.
With my current understanding, it doesn't look to me like zPCI behaves similarly enough to how PCI behaves on other platforms for us to sensibly describe both using the same interface, and the fact that QEMU had to come up with a specific middleware device seems to confirm my suspicion...
In any case, would you mind answering the questions below? That would certainly help me gain a better understanding of the whole issue.
More concrete questions: one of the zPCI test cases includes
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev>
which translates to
-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \
How does the pci-bridge controller show up in the guest, if at all? Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special.
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes? Do you mean we leave attribute generation for bus and addr to qemu?

On Wed, 2018-07-25 at 16:34 +0800, Yi Min Zhao wrote:
在 2018/7/24 下午11:43, Andrea Bolognani 写道:
More concrete questions: one of the zPCI test cases includes
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev>
which translates to
-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \
How does the pci-bridge controller show up in the guest, if at all?
Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special.
Yeah, that's kinda problematic as it violates the principle of least surprise... If s390 guests can only see a flat PCI topology, then we should find a way to reject bridges altogether instead of allowing the user to create them (or even create them automatically) only for them not to show up in the guest.
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes?
Do you mean we leave attribute generation for bus and addr to qemu?
That would be the idea, but of course it can only work if the address of the underlying PCI device can change without affecting the guest in any way, including migration. If that's not the case, and the PCI address needs to be as stable as the IDs, then I don't think there's a way to avoid storing it in the guest XML, no matter how confusing that will end up looking. -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/24 下午11:43, Andrea Bolognani 写道:
More concrete questions: one of the zPCI test cases includes
<controller type='pci' index='1' model='pci-bridge'/> <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0xffff' bus='0x00' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' function='0x0' uid='0xffff' fid='0xffffffff'/> </hostdev>
which translates to
-device zpci,uid=3,fid=2,target=pci.1,id=zpci3 \ -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x1 \ -device zpci,uid=65535,fid=4294967295,target=hostdev0,id=zpci65535 \ -device vfio-pci,host=ffff:00:00.0,id=hostdev0,bus=pci.1,addr=0x1f \
How does the pci-bridge controller show up in the guest, if at all? Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special. Yeah, that's kinda problematic as it violates the principle of least surprise... If s390 guests can only see a flat PCI topology, then we should find a way to reject bridges altogether instead of allowing
On Wed, 2018-07-25 at 16:34 +0800, Yi Min Zhao wrote: the user to create them (or even create them automatically) only for them not to show up in the guest. If we reject bridges, there would be only one pci bus that maximum 32 pci devices could be plugged to. This kind of limitation is more
在 2018/7/25 下午11:44, Andrea Bolognani 写道: problematic IMO.
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes? Do you mean we leave attribute generation for bus and addr to qemu? That would be the idea, but of course it can only work if the address of the underlying PCI device can change without affecting the guest in any way, including migration. If that's not the case, and the PCI address needs to be as stable as the IDs, then I don't think there's a way to avoid storing it in the guest XML, no matter how confusing that will end up looking.
I think it relies on pci base code. Although we don't expose pci address to the guest, any pci device still exists PCI tree tolopogy in qemu. IIUC, this has effect on qemu process itself. For example, if we hotplug a pci device permanently, it will be dynamically assigned with a pci address, and this address might change after shutdown and start again the guest and also might change in destination during migration.

On Thu, 2018-07-26 at 13:43 +0800, Yi Min Zhao wrote:
在 2018/7/25 下午11:44, Andrea Bolognani 写道:
How does the pci-bridge controller show up in the guest, if at all?
Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special.
Yeah, that's kinda problematic as it violates the principle of least surprise... If s390 guests can only see a flat PCI topology, then we should find a way to reject bridges altogether instead of allowing the user to create them (or even create them automatically) only for them not to show up in the guest.
If we reject bridges, there would be only one pci bus that maximum 32 pci devices could be plugged to. This kind of limitation is more problematic IMO.
I see how that would be pretty limiting.
From the test cases I see a zpci devices, with its own uid and fid, is created for the pci-bridge as well... Is that intentional?
Do the bus= and addr= attributes of vfio-pci and pci-bridge in the example above matter eg. for migration purposes?
Do you mean we leave attribute generation for bus and addr to qemu?
That would be the idea, but of course it can only work if the address of the underlying PCI device can change without affecting the guest in any way, including migration. If that's not the case, and the PCI address needs to be as stable as the IDs, then I don't think there's a way to avoid storing it in the guest XML, no matter how confusing that will end up looking.
I think it relies on pci base code. Although we don't expose pci address to the guest, any pci device still exists PCI tree tolopogy in qemu. IIUC, this has effect on qemu process itself. For example, if we hotplug a pci device permanently, it will be dynamically assigned with a pci address, and this address might change after shutdown and start again the guest and also might change in destination during migration.
Okay, if that's the case we'll definitely have to store the PCI address in the guest XML :( -- Andrea Bolognani / Red Hat / Virtualization

在 2018/7/26 下午7:00, Andrea Bolognani 写道:
> How does the pci-bridge controller show up in the guest, if at all? Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special. Yeah, that's kinda problematic as it violates the principle of least surprise... If s390 guests can only see a flat PCI topology, then we should find a way to reject bridges altogether instead of allowing the user to create them (or even create them automatically) only for them not to show up in the guest. If we reject bridges, there would be only one pci bus that maximum 32 pci devices could be plugged to. This kind of limitation is more problematic IMO. I see how that would be pretty limiting.
From the test cases I see a zpci devices, with its own uid and fid, is created for the pci-bridge as well... Is that intentional?
Firstly pci bridge can be auto-generated if a pci device is to be plugged to non-existing pci bus. IIUC, pci-bridge is treated as a controller device in libvirt. So I think, it's pretty readable not only in libvirt xml but also in qtree, if we assign zpci device for it. Otherwise address type of pci-bridge is pci type but has no uid and fid. Isn't it odd? Of course, we could avoid assigning a zpci device to pci-bridge. Actually my old patch did in this way.

On Thu, Jul 26, 2018 at 07:17:03PM +0800, Yi Min Zhao wrote:
在 2018/7/26 下午7:00, Andrea Bolognani 写道:
> > How does the pci-bridge controller show up in the guest, if at all? Qemu hides pci-bridge devices and just exposes pci devices to the guest. In above example, indeed, qemu will generate a pci-bridge device and it will be existing in pci topology. But the guest can't see it. This is very special. Yeah, that's kinda problematic as it violates the principle of least surprise... If s390 guests can only see a flat PCI topology, then we should find a way to reject bridges altogether instead of allowing the user to create them (or even create them automatically) only for them not to show up in the guest. If we reject bridges, there would be only one pci bus that maximum 32 pci devices could be plugged to. This kind of limitation is more problematic IMO. I see how that would be pretty limiting.
From the test cases I see a zpci devices, with its own uid and fid, is created for the pci-bridge as well... Is that intentional?
Firstly pci bridge can be auto-generated if a pci device is to be plugged to non-existing pci bus. IIUC, pci-bridge is treated as a controller device in libvirt. So I think, it's pretty readable not only in libvirt xml but also in qtree, if we assign zpci device for it. Otherwise address type of pci-bridge is pci type but has no uid and fid. Isn't it odd?
From the libvirt side we must avoid any scenario where QEMU auto-adds devices behind our back. If adding a device requires adding a controller
libvirt must do this explicitly and record it in the XML. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, 2018-07-26 at 12:22 +0100, Daniel P. Berrangé wrote:
On Thu, Jul 26, 2018 at 07:17:03PM +0800, Yi Min Zhao wrote:
在 2018/7/26 下午7:00, Andrea Bolognani 写道:
From the test cases I see a zpci devices, with its own uid and fid, is created for the pci-bridge as well... Is that intentional?
Firstly pci bridge can be auto-generated if a pci device is to be plugged to non-existing pci bus. IIUC, pci-bridge is treated as a controller device in libvirt. So I think, it's pretty readable not only in libvirt xml but also in qtree, if we assign zpci device for it. Otherwise address type of pci-bridge is pci type but has no uid and fid. Isn't it odd?
Everything about zPCI is odd ;) I guess there's no harm in creating an additional zpci device, and as you say it will keep things a bit more consistent, which is good.
From the libvirt side we must avoid any scenario where QEMU auto-adds devices behind our back. If adding a device requires adding a controller libvirt must do this explicitly and record it in the XML.
Definitely. My question was whether the corresponding zpci device should be created as well... -- Andrea Bolognani / Red Hat / Virtualization

On Thu, Jul 26, 2018 at 01:47:52PM +0200, Andrea Bolognani wrote:
On Thu, 2018-07-26 at 12:22 +0100, Daniel P. Berrangé wrote:
On Thu, Jul 26, 2018 at 07:17:03PM +0800, Yi Min Zhao wrote:
在 2018/7/26 下午7:00, Andrea Bolognani 写道:
From the test cases I see a zpci devices, with its own uid and fid, is created for the pci-bridge as well... Is that intentional?
Firstly pci bridge can be auto-generated if a pci device is to be plugged to non-existing pci bus. IIUC, pci-bridge is treated as a controller device in libvirt. So I think, it's pretty readable not only in libvirt xml but also in qtree, if we assign zpci device for it. Otherwise address type of pci-bridge is pci type but has no uid and fid. Isn't it odd?
Everything about zPCI is odd ;)
I guess there's no harm in creating an additional zpci device, and as you say it will keep things a bit more consistent, which is good.
From the libvirt side we must avoid any scenario where QEMU auto-adds devices behind our back. If adding a device requires adding a controller libvirt must do this explicitly and record it in the XML.
Definitely. My question was whether the corresponding zpci device should be created as well...
I'm not sure I understand it fully, but it sounds like zpci devices are providing info that is guest ABI sensitive, which would mean libvirt must control and record it. So from that POV we should create zpci devices Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, 2018-07-26 at 12:52 +0100, Daniel P. Berrangé wrote:
On Thu, Jul 26, 2018 at 01:47:52PM +0200, Andrea Bolognani wrote:
On Thu, 2018-07-26 at 12:22 +0100, Daniel P. Berrangé wrote:
From the libvirt side we must avoid any scenario where QEMU auto-adds devices behind our back. If adding a device requires adding a controller libvirt must do this explicitly and record it in the XML.
Definitely. My question was whether the corresponding zpci device should be created as well...
I'm not sure I understand it fully, but it sounds like zpci devices are providing info that is guest ABI sensitive, which would mean libvirt must control and record it. So from that POV we should create zpci devices
Making sure the guest-visible address is stable is taken care of by the additional PCI address attributes uid and fid, which are then used when creating the zpci device. PCI controllers such as pci-bridge are, however, not visible to the guest and as such it would be possible to skip the corresponding zpci devices; it's been argued, though, that it's better to create them regardless in order to maintain consistency. -- Andrea Bolognani / Red Hat / Virtualization
participants (7)
-
Andrea Bolognani
-
Boris Fiuczynski
-
Cornelia Huck
-
Daniel P. Berrangé
-
Ján Tomko
-
Laine Stump
-
Yi Min Zhao