HI Jan,
The spapr-vfio-pci-host-bridge controller is being reworked in
qemu-ppc to handle all the iommu
groups generically(i.e iommu group need not be mentioned in the CLI).
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-February/124922.html
This would help avoiding the sysfs scanning and allows further simplification
of libvirt patches.
I will post my patches post testing once the qemu changes are merged.
Thanks,
Shiva
On Tue, Dec 9, 2014 at 3:21 PM, Shivaprasad bhat
<shivaprasadbhat(a)gmail.com> wrote:
On Mon, Dec 8, 2014 at 9:01 PM, Ján Tomko <jtomko(a)redhat.com>
wrote:
> This has a qemu: prefix, even though the parsing is done in conf:
>
> It would look nicer split into several commits:
> 1) parsing the new controller type (without auto-adding them) + qemuxml2xml
> test of the controllers
> 2) auto-adding the controllers
> 3) adding the 0 domain parameter to many functions (if needed)
>
I added the prefix to mention this is for qemu only. With the
splitting that you are
suggesting, I think I can be more specific.
> On 11/17/2014 11:04 AM, Shivaprasad G Bhat wrote:
>> This patch will get the iommu group for host devices by XML configuration
>> of the vfio host bridge controller or from the sysfs.
>>
>> On pseries, the iommu group can be shared by multiple host devices and
>> they would share the same spapr-vfio-host-bus controller. A new
>> controller is added for every new iommu group. Every spapr-vfio-host-bridge
>> in the cli creates a new pci domain in the guest. For Example,
>> -device spapr-pci-vfio-host-bridge,iommu=1,id=SOMEDOMAIN,index=1
>> The "SOMEDOMAIN" is the id for new pci domain inside guest.
>>
>> spapr-pci-vfio-host-bridge is actually a PCI host bridge with VFIO support.
>> It hosts pci-bridges, and has all the features/behaviours similar to the
>> default emulated pci host bridge. The controller can host both pci and pcie
>> devices. For convenience, this root controller uses model "pci-root".
>>
>> The sample controller tags would look like below:
>> <controller type='spapr-vfio-pci' index='0'
model='pci-root'
>> iommuGroupNum='3' domain='1'/>
>> <controller type='spapr-vfio-pci' index='1'
model='pci-bridge'
>> iommuGroupNum='3' domain='1'>
>
> Would it make sense to fill out the iommuGroup on domain startup, depending on
> the devices that were assigned there?
>
> So that libvirt user can do:
> <controller type='spapr-vfio-pci' domain='1'/>
> <controller type='spapr-vfio-pci' domain='2'/>
>
> Then attach two hostdevs to these domains without knowing their IOMMU group?
>
> (Although the user needs to know which devices are in the same group to attach
> more than one to the same domain)
I have added the iommu group to the controller to have 1:1 mapping of
domain to iommu. This helps in validating the Addresses to say if a
device is assigned to the correct domain address. (patch 3 does this).
As you suggested below in one of the comments, If the iommu group detection be
delayed till the process start, I think we can do away with iommu group
in the controller xml. I'll see how I can avoid this in v3.
>
>> <address type='pci' domain='0x0001' bus='0x00'
slot='0x02'
>> function='0x0'/>
>> </controller>
>> <controller type='spapr-vfio-pci' index='0'
model='pci-root'
>> iommuGroupNum='13' domain='2'/>
>>
>> Like other architectures, unassigned and unmanaged devices in the iommu group
>> need to be detached manually before the guest is created .
>>
>> The spapr-vfio-pci controllers are removed when there are no corresponding
>> hostdevices in xml.
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat(a)linux.vnet.ibm.com>
>> Signed-off-by: Pradipta Kumar Banerjee <bpradip(a)in.ibm.com>
>> Reviewed-by: Prerna Saxena <prerna(a)linux.vnet.ibm.com>
>> ---
>> docs/schemas/domaincommon.rng | 28 +++++++
>> src/bhyve/bhyve_domain.c | 2
>> src/conf/domain_conf.c | 165 +++++++++++++++++++++++++++++++++++++++--
>> src/conf/domain_conf.h | 19 +++++
>> src/libvirt_private.syms | 1
>> src/qemu/qemu_command.c | 4 +
>> src/qemu/qemu_domain.c | 12 +--
>> src/qemu/qemu_driver.c | 6 +
>> 8 files changed, 222 insertions(+), 15 deletions(-)
>>
>
>
>> @@ -12113,6 +12165,95 @@ virDomainResourceDefParse(xmlNodePtr node,
>> return NULL;
>> }
>>
>> +int
>> +virDomainDefMaybeAddHostdevSpaprPCIVfioControllers(virDomainDefPtr def)
>> +{
>> + size_t i, j;
>> + virDomainHostdevDefPtr hostdev;
>> + virDomainControllerDefPtr controller;
>> + int ret = -1;
>> + int maxDomainId = 0;
>> + int skip;
>> +
>> + if (!(ARCH_IS_PPC64(def->os.arch)) ||
>> + !(def->os.machine && STRPREFIX(def->os.machine,
"pseries")))
>> + return 0;
>> +
>> + for (i = 0; i < def->nhostdevs; i++) {
>> + hostdev = def->hostdevs[i];
>> + if (IS_PCI_VFIO_HOSTDEV(hostdev))
>> + hostdev->source.subsys.u.pci.iommu = -1;
>> + }
>> + /* The hostdevs belonging to same iommu are
>> + * all part of same domain.
>> + */
>> + for (i = 0; i < def->ncontrollers; i++) {
>> + controller = def->controllers[i];
>> + if (controller->type == VIR_DOMAIN_CONTROLLER_TYPE_SPAPR_PCI_VFIO
&&
>> + controller->model == VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT)
>> + for (j = 0; j < def->nhostdevs; j++) {
>> + hostdev = def->hostdevs[j];
>> + if (IS_PCI_VFIO_HOSTDEV(hostdev))
>> + if (hostdev->info->addr.pci.domain ==
controller->domain)
>> + hostdev->source.subsys.u.pci.iommu =
controller->opts.spaprvfio.iommuGroupNum;
>> + }
>> + if (controller->domain > maxDomainId)
>> + maxDomainId = controller->domain;
>> + }
>> + /* If the spapr-vfio controller doesnt exist for the hostdev
>> + * add a controller for that iommu group.
>> + */
>> + for (i = 0; i < def->nhostdevs; i++) {
>> + skip = 0;
>> + hostdev = def->hostdevs[i];
>> + if (IS_PCI_VFIO_HOSTDEV(hostdev)) {
>> + virPCIDeviceAddressPtr addr;
>> + int iommu = -1;
>> + if (hostdev->source.subsys.u.pci.iommu == -1) {
>> + addr =
(virPCIDeviceAddressPtr)&hostdev->source.subsys.u.pci.addr;
>> + if ((iommu = virPCIDeviceAddressGetIOMMUGroupNum(addr)) < 0)
>> + goto error;
>
> I don't know if we should access /sysfs on XML parsing. Could we add the
> controllers at startup instead?
Are you saying we auto-add the controller around, qemuProcessStart OR
qemuPrepareHostDevices()
function calls? The normal domxml-to-native would still need these
controllers right ?
>
>> + hostdev->source.subsys.u.pci.iommu = iommu;
>> +
>> + for (j = 0; j < def->ncontrollers; j++) {
>> + controller = def->controllers[j];
>> + if (controller->type ==
VIR_DOMAIN_CONTROLLER_TYPE_SPAPR_PCI_VFIO &&
>> + controller->model ==
VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT) {
>> + if (iommu ==
controller->opts.spaprvfio.iommuGroupNum)
>> + skip = 1;
>> + }
>> + }
>> + if (skip)
>> + continue;
>> + if (virDomainDefMaybeAddController(def,
VIR_DOMAIN_CONTROLLER_TYPE_SPAPR_PCI_VFIO,
>> + ++maxDomainId, 0,
VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT) < 0)
>> + goto error;
>> +
def->controllers[def->ncontrollers-1]->opts.spaprvfio.iommuGroupNum = iommu;
>> + }
>> + }
>> + }
>> +
>> + /* Remove redundant controllers which dont serve any hostdevs. */
>
> We don't delete unused controllers of other types, what if the user explicitly
> requests them?
If there are stale controllers, the iommu group which they serve would
get busy, as the
guest would start using them even though there are no hostdevs from
that group. Since
we auto-add them, I felt its better to remove them if unnecessary.
>
> Jan
>
>> + for (i = 0; i < def->ncontrollers; i++) {
>> + controller = def->controllers[i];
>> + if (controller->type == VIR_DOMAIN_CONTROLLER_TYPE_SPAPR_PCI_VFIO) {
>> + int usedController = 0;
>> + for (j = 0; j < def->nhostdevs; j++) {
>> + hostdev = def->hostdevs[j];
>> + if (IS_PCI_VFIO_HOSTDEV(hostdev))
>> + if (hostdev->source.subsys.u.pci.iommu ==
controller->opts.spaprvfio.iommuGroupNum)
>> + usedController = 1;
>> + }
>
>