Hi Daniel,
Thanks for response. I was thinking we can do detach operation in libvirt so that user do
not need to worry if device is detached. But I agree what you mentioned makes more sense
so I will go with that.
Thanks
Manish Mishra
On 17/06/21, 1:45 AM, "Daniel Henrique Barboza" <danielhb413(a)gmail.com>
wrote:
On 6/9/21 4:38 PM, Manish Mishra wrote:
Hi Everyone,
We want to add extra options to device xml to skip reattach of pci passthrough devices.
Following is xml format for pci passthrough devices added to domain as of now.
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x00' slot='0x1a'
function='0x7'/>
</source>
</hostdev>
When we pass managed=’yes’ flag through xml, libvirt takes responsibility of detaching
device on domain(guest VM) start and reattaching on domain shutdown. We observed some
issues where guest VM shutdown may take long time, blocked for reattach operation on pci
passthrough device. As domain lock is held during this time it also makes libvirt mostly
inactive as it blocks even basic operations like (virsh list). Reattaching of device to
host can block due to reasons like buggy driver or initialization of device itself can
take long time in some cases.
I am more interested in hearing about the problem with this faulty buggy
driver holding domain lock during device reattach and compromising 'virsh'
operations, and see if there's something to do to mitigate that, instead
of creating a XML workaround for a driver problem.
We want to pass following extra options to resolve this:
1. *skipReAttach*(optional flag)
In some cases we do not need to reattach device to host as it may be reserved only for
guests, with this flag we can skip reattach operation on host. We do not want to modify
managed flag to avoid regression, so thinking of adding new optional flag.
2. *reAttachDriverName*(optional flag)
Name of driver to which we want to attach instead of default, to avoid reattaching to
buggy driver. Currently libvirt asks host to auto selects driver for device.
Yes we can use managed=’no’ but in that case user has to take responsibility of detaching
device before starting domain which we do not want. Please let us know your views on this.
The case you mentioned above, "we do not need to reattach device to host
as it may be reserved only for guests", is one of the most common uses
we have for managed='no' AFAIK. The user/sysadm must detach the device
from the host, but it's only one time. After that the device can remain
detached from the host, and guests can use it freely as long as you
don't reboot the host (or reattach the device back). This scenario
you described fit the managed='no' mechanics fine IMO.
If you want to automate the detach process, you can use a Libvirt QEMU
hook (/etc/libvirt/hooks/qemu) to make the device detach when starting
the domain, in case the device isn't already detached. Note that
this has the same effect of the "skipReAttach" option you proposed.
Making a design around faulty drivers isn't ideal. If the driver you're
using starts to have problems with the detach operation as well,
'skipReAttach'
will do you no good. You'll have to fall back to 'managed=no' to
circumvent
that.
Even if we discard the motivation, I'm not sure about the utility of having
more forms of PCI assignment management (e.g managed=yes|no|detach|reattach).
managed=yes|no seems to cover most use cases where the device driver works
properly.
Laine, what do you think?
Thanks,
Daniel
Thanks
Manish Mishra