On 7/18/19 11:56 AM, Daniel Henrique Barboza wrote:
On 7/18/19 12:29 PM, Laine Stump wrote:
> On 7/18/19 10:29 AM, Daniel Henrique Barboza wrote:
>> Hi,
>>
>> I have a PoC that enables partial coldplug assignment of multifunction
>> PCI devices with managed mode. At this moment, Libvirt can't handle
>> this scenario - the code will detach only the hostdevs from the XML,
>> when in fact the whole IOMMU needs to be detached. This can be
>> verified by the fact that Libvirt handles the unmanaged scenario
>> well, as long as the user detaches the whole IOMMU beforehand.
>>
>> I have played with 2 approaches. The one I am planning to contribute
>> back is a change inside virHostdevGetPCIHostDeviceList(), that
>> adds the extra PCI devices for detach/re-attach in case a PCI
>> Multifunction device in managed mode is presented in the XML.
>
>
> If you're thinking of doing that automatically, then I should warn
> you that we had discussed that a long time ago, and decided that it
> was a bad idea to do it because it was likely someone would, e.g. try
> to assign an audio device to their guest that happened to be one
> function on a multifunction device that also contained a disk
> controller (or some other device) that the host needed for proper
> operation.
>
>
Let's say that I have a Multi PCI card with 4 functions, and I want a
guest to use
only the function 0 of that card. At this moment, I'm only able to do
that if I
manually execute nodedev-detach on all 4 functions beforehand and use
function
0 as a hostdev with managed=false.
What I've implemented is a way of doing the detach/re-attach of the
whole IOMMU
for the user, if the hostdev is set with managed=true (and perhaps I
should also
consider verifying the 'multifunction=yes' attribute as well, for more
clarity).
I am not trying to assign all the IOMMU devices to the guest - not
sure if that's
what you were talking about up there, but I'm happy to emphasize
that's not
the case.
No, we're talking about the same thing. We specifically talked about the
possibility of doing exactly this several years ago, and decided against it.
Now, yes, if the user is unaware of the consequences of detaching all
devices
of the IOMMU from the host, bad things can happen. If that's what
you're saying,
fair enough. I can make an argument about how we can't shield the user
from
his/her own 'unawareness' forever, but in the end it's better to be on
the safe
side.
We really shouldn't do anything with any host device if it's not
explicitly given in the config.
> It may be that in *your* particular case, you understand that the
> functions you don't want to assign to the guest are not otherwise
> used, and it's not dangerous to suddenly detach them from their host
> driver. But you can't assume that will always be the case.
>
>
> If you *really* can't accept just assigning all the devices in that
> IOMMU group to the guest (thus making them all explicitly listed in
> the config, and obvious to the administrator that they won't be
> available on the host) and simply not using them, then you either
> need to separately detach those particular functions from the host,
> or come up with a way of having the domain config explicitly list
> them as "detached from the host but not actually attached to the guest".
>
I can live with that - it will automate the detach/re-attach process,
which is
my goal here, and it force the user to know exactly what is going to
be detached
from the host, minimizing errors. If no one is against adding an extra
parameter 'unassigned=true' to the hostdev in these cases, I can make this
happen.
I don't have any idealogical opinion against that (maybe there's a
better name for the attribute, but I can't think of it).
But to back up a bit - what is it about managed='yes' that makes you
want to do it that way instead of managed='no'? Do you really ever need
the devices to be binded to the host driver? Or are you just using
managed='yes' because there's not a standard/concenient place to
configure devices to be permanently binded to vfio-pci immediately when
the host boots? Truthfully, a great majority of the most troubling bugs
with device assignment are due to use of managed='yes', since it
exercises the kernel's device driver binding/unbinding code so much, and
reveals strange races in the (usually kernel) code, but in almost all
cases the devices being assigned to guests are *never* used directly by
the host anyway, so there is no point in repeatedly rebinding the host
driver to the device - it just sits there unused [1] until the next time
it is needed by a guest, and at that time it gets rebinded to vfio-pci,
rinse, repeat.
I think we should spend more time making it easier to have devices
"pre-binded" to vfio-pci at boot time, so that we could discourage use
of managed='yes'. (not "instead of" what you're doing, but "in
addition
to" it).
[1] (in the case of network device VFs, often it isn't "unused", but
instead is *improperly* used on the host due to NetworkManager insisting
on setting the device IFF_UP and starting up a DHCP client. So it's not
just finding races in the kernel driver binding/initialization code, but
also falling prey to (imho) the poor choice of NM to force all
interfaces up and default to running dhcp on all unconfigured interfaces)