On Mon, 7 Oct 2019 18:11:32 -0300
Daniel Henrique Barboza <danielhb413(a)gmail.com> wrote:
(--- long post warning ---)
This is a work that derived from the discussions I had with
Laine Stump and Alex Williamson in [1]. I'll provide a quick
gist below.
----------
Today, Libvirt does not have proper support for partial
assignment of functions of passed-through PCI multifunction
devices (hostdev with VFIO-PCI). By partial assignment I mean
the guest being able to use just some, not all, virtual functions
of the device. Even if the functions itself became useless in
the host, the some functions might not be safe to be used
by the guest, thus the user should be able to limit it.
Not safe in what way? Patch 2/4 says some devices might be "security
sensitive", but the fact that this patch is necessary implies that the
host kernel already considers the devices non-isolated. They must be
in the same iommu group to have this issue. Is there a concrete
example of a device where a user would want this configuration? The
case I can think of is not a security issue, but a functional one
where GPU and audio functions are grouped together and maybe the audio
function doesn't work well when assigned, or maybe we just want the
guest to default to another audio device and it's easier if we just
don't expose this on-card audio.
I mentioned 'proper' because today it is possible to get
this
done in Libvirt if we use 'managed=no' in the hostdevs. If the
user makes the proper setup (i.e. detaching all IOMMU devices),
and use managed='no', Libvirt will launch the guest just with the
functions declared in the XML. The technical reason for this is
simple: in virHostdevPreparePCIDevices() we do not take into account
that multifunction PCI devices requires the whole IOMMU to be
detached, not just the devices being declared in def->hostdevs.
In this case, managed='yes' will not work in this scenario, causing
errors in QEMU launch.
The discussion I've started in [1] was motivated by my attempt
of automatically detaching the IOMMU inside the prepare function
with managed='yes' devices. Laine discarded this idea, arguing
that the concept of partial assignment will cause user confusion
if Libvirt starts to handle things without the user being fully
aware. In [1] it was discussed the possibility of declaring the
functions that won't be assigned to the guest in the XML, forcing
the user to be aware that these functions will be lost in the host,
as a possible approach for a solution.
-----------
These series tries to solve the partial assignment of multifunction
hostdev PCI devices by introducing a new hostdev attribute called
'assigned'. This is how it works:
- it is a boolean value that will be efffective just for
multifunction hostdev PCI devices, since there's no other
occurrence for this kind of use in Libvirt. Trying to
declare assign='yes|no' in any other PCI hostdev device
will cause parse errors;
- default value if the attribute is not present is
'assigned=yes';
- <address> element will be forbidden if the hostdev is declared
with assigned='no'. This is to make more evident to the user
that this is a function that the guest will NOT be using, with
a bonus that we will not need to calculate an address that
won't be used;
It seems more intuitive to me to use the guest <address> element to
expose this. libvirt often makes use of 'none' to declare empty
devices, so maybe <address type='none'/> would be more in line with
precedent. Thanks,
Alex