On Fri, 2017-08-25 at 12:12 +0200, Peter Krempa wrote:
> > @@ -1012,6 +1012,18 @@
qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def,
> > tmp = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
> >
> > if (tmp < 0) {
> > + /* If there's already a PCI address assigned to the device
> > + * we move on instead of erroring out.
> > + *
> > + * This means we still don't allow non-existing host
> > + * devices to be assigned to guests; however, if the host
> > + * device existed at some point in the past but no longer
> > + * does, which can happen very easily when dealing with VFs,
> > + * we prevent the guest from disappearing and give the user
> > + * an opportunity to edit its configuration */
> > + if (virDeviceInfoPCIAddressPresent(info))
> > + goto skip;
> > +
> > virReportError(VIR_ERR_INTERNAL_ERROR,
> > _("Can't look up isolation group for host
device "
> > "%04x:%02x:%02x.%x"),
Why on earth is this in the domain address callback? That means that
it's filled on parse. That's silly for a thing like the isolation group.
The isolation group is necessary only when you are going to start the VM
so only then it should be determined. That would prevent a bug like this
altogether.
Also there's PCI hotplug...
Well, the whole point of introducing isolation groups
was so that isolation constraint would be respected when
assigning PCI addresses, and guest startup is way too
late for that.
If host devices are not isolated correctly on pSeries
guests, depending on the situation the result can be
either degraded error handling (undesirable, but maybe
not too bad) or the guest failing to start (pretty bad).
So picking the right PCI addresses is critical, and we
need to know the isolation group in order to do it.
Note that these constraints only apply to pSeries, and
accordingly the code above only runs for such guests.
That said, I hadn't realized that configuring guests
to use devices that are not yet or no longer available
on the host was supposed to work, but since that's the
case the current behavior is basically a regression
because it prevents the users from doing something
that they would have been able to do in the past.
I'll post a v2 that changes the behavior so that we
fall back to the old ways (no isolation) for devices
that are not available on the host at the time the
guest is configured.
Thanks for pointing out my mistake :)
--
Andrea Bolognani / Red Hat / Virtualization