On 12/16/19 6:03 PM, Daniel Henrique Barboza wrote:
On 12/16/19 7:28 PM, Cole Robinson wrote:
> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
>> changes from version 3 [1]:
>> - removed last 2 patches that made function 0 of
>> PCI multifunction devices mandatory
>> - new patch: news.xml update
>> - changed 'since' version to 6.0.0 in patch 4
>> - unassigned hostdevs are now getting qemu aliases
>>
>> [1]
>>
https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
>>
>> Daniel Henrique Barboza (5):
>> Introducing new address type='unassigned' for PCI hostdevs
>> qemu: handle unassigned PCI hostdevs in command line
>> virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
>> formatdomain.html.in: document <address type='unassigned'/>
>> news.xml: add address type='unassigned' entry
>>
>
> Codewise it looks fine now. But I'm looking more closely at patch #3 and
> realizing that it can explicitly reject a previously accepted VM config.
> And indeed, now that I give it a test with my GPU passthrough setup, it
> is rejecting my previosly working config.
>
> error: Requested operation is not valid: All devices of the same IOMMU
> group 1 of the PCI device 0000:01:00.0 must belong to domain win10
>
> I've attached the nodedev XML for the three devices with iommuGroup 1.
> Only the two nvidia devices are assigned to my VM, but not the PCIe
> controller device.
>
> Is the libvirt heuristic missing something? Or is this acting as
> expected?
You mentioned that you declared 3 devices of IOMMU group 1. Unless the
code in
patch 3 has a bug, there are more PCI hostdevs in IOMMU group 1 that
were left
out of the domain XML.
>
> I didn't quite gather that this is a change to reject previously
> accepted configurations, so I will defer to Laine and Alex as to whether
> this should be committed.
I mentioned in the commit msg of patch 03 that this would break working
configurations that didn't comply with the new 'all devices of the IOMMU
group must be included in the domain XML' directive. Perhaps this is
worth
mentioning in the 'news' page to warn users about it.
No, this shouldn't be a requirement at all. In my mind the purpose of
these patches is to make something work (in a safe manner) that failed
before, *not* to add new restrictions that break things that already
work. (Sorry I wasn't paying more attention to the patches earlier).
About breaking existing configurations, there is the possibility of not
going forward with patch 03, which is enforcing this rule of declaring
all the
IOMMU group. Existing domains will keep working as usual, the option to
unassign devices will still be present, but the user will have to deal
with
the potential QEMU errors if not all PCI devices were detached from
the host.
In this case, the 'unassigned' type will become more of a ON/OFF
switch to
add/remove the PCI hostdev from the guest without removing it from the
domain XML. It is still useful, but we lose the idea of all the IOMMU
devices being described in the domain XML, which is something Laine
mentioned it would be desirable in one of the RFCs.
I don't actually recall saying that :-). I haven't looked in the list
archives, but what I *can* imagine myself saying is that only devices
mentioned in the XML should be manipulated in any way by libvirt. So,
for example, you shouldn't unbind device X from its host driver if there
is nothing in the XML telling you to do that. But if a device isn't
mentioned in the XML, and is already bound to some driver that is
acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver
at all (? is that right Alex?)) then that should not create any problem.
Doing otherwise would break too many existing configs. (For example, my
own assigned-GPU config, which assumes that all the devices are already
bound to the proper driver, and uses "managed='no'")