On 2/22/21 1:12 AM, Laine Stump wrote:
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote:
> virHostdevReAttachPCIDevices() is called when we want to re-attach
> a list of hostdevs back to the host, either on the shutdown path or
> via a 'virsh detach-device' call. This function always count on the
> existence of the device in the host to work, but this can lead to
> problems. For example, a SR-IOV device can be removed via an admin
> "echo 0 > /sys/bus/pci/devices/<addr>/sriov_numvfs", making the
kernel
> fire up and eventfd_signal() to the process, asking for the process to
> release the device. The result might vary depending on the device driver
> and OS/arch, but two possible outcomes are:
>
> 1) the hypervisor driver will detach the device from the VM, issuing a
> delete event to Libvirt. This can be observed in QEMU;
>
> 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged.
> This means that the VM process failed/refused to release the hostdev back
> to the host, and the hostdev will be detached during VM shutdown.
>
> Today we don't behave well for both cases. We'll fail to remove the PCI
device
> reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because
> we rely on the existence of the PCI device conf file in the sysfs. Attempting
> to re-utilize the same device (assuming it is now present back in the host)
> can result in an error like this:
>
> $ ./run tools/virsh start vm1-sriov --console
> error: Failed to start domain vm1-sriov
> error: Requested operation is not valid: PCI device 0000:01:00.2 is in use by driver
QEMU, domain vm1-sriov
>
> For (1), a VM destroy/start cycle is needed to re-use the VF in the guest.
> For (2), the effect is more nefarious, requiring a Libvirtd daemon restart
> to use the VF again in any guest.
>
> We can make it a bit better by checking, during virHostdevReAttachPCIDevices(),
> if there is any missing PCI device that will be left behind in activePCIHostdevs
> and inactivePCIHostdevs lists. Remove any missing device found from both lists,
> unconditionally, matching the current state of the host. This change affects
> the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all
> the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop
> into qemuHostdevReAttachDomainDevices).
>
> NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV
> hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it,
> our goal is to mitigate what it is still considered an user oopsie. For
all''
This is one of those odd cases where you use "a" before a word starting with a
vowel rather than "an". It's that way because the *sound* starting the next
word is "you" rather than "ooh".
Oh yeah thanks for pointing this out. I learned this 20+ years ago and
forgot it along the road. Heavy metal lyrics and World of Warcraft public
chat aren't the best sources for learning proper English grammar, so it
seems.
> supported purposes, the admin must remove the SR-IOV VFs from all running domains
> before removing the VFs from the host.
>
> Resolves:
https://gitlab.com/libvirt/libvirt/-/issues/72
> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
> ---
> src/hypervisor/virhostdev.c | 38 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
> index c708791eec..ed43733e71 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr,
> }
> +static void
> +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr,
> + virDomainHostdevDefPtr *hostdevs,
> + int nhostdevs)
> +{
> + size_t i;
> +
> + if (nhostdevs == 0)
> + return;
> +
> + virObjectLock(mgr->activePCIHostdevs);
> + virObjectLock(mgr->inactivePCIHostdevs);
> +
> + for (i = 0; i < nhostdevs; i++) {
> + virDomainHostdevDef *hostdev = hostdevs[i];
> + virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci;
> + g_autoptr(virPCIDevice) pci = NULL;
> +
> + if (virHostdevGetPCIHostDevice(hostdev, &pci) != -2)
> + continue;
> +
> + /* The PCI device from 'hostdev' does not exist in the host
> + * anymore. Delete it from both active and inactive lists to
> + * reflect the current host state.
> + */
> + virPCIDeviceListDel(mgr->activePCIHostdevs, &pcisrc->addr);
> + virPCIDeviceListDel(mgr->inactivePCIHostdevs, &pcisrc->addr);
> + }
> +
> + virObjectUnlock(mgr->activePCIHostdevs);
> + virObjectUnlock(mgr->inactivePCIHostdevs);
It makes me nervous that you're unlocking these lists in the same order that you lock
them, since that could theoretically lead to a deadlock. I haven't even thought beyond
this single function to figure out if it's even a possibility in this case, but still
would feel better if you unlocked in the opposite order that you lock.
That was a huge oversight. I've fixed it to unlock in the reverse order.
Thanks for the reviews Laine. I dropped the patch you NACKED and fixed
everything else following your suggestions. We're in freeze ATM and this
isn't critical or urgent, so I'll wait for the next release before pushing
it.
Thanks,
DHB
With that fixed:
Reviewed-by: Laine Stump <laine(a)redhat.com>
> +}
> +
> +
> /* @oldStateDir:
> * For upgrade purpose: see virHostdevRestoreNetConfig
> */
> @@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
> virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
> hostdevs, nhostdevs, oldStateDir);
> +
> + /* Handle the case where PCI devices from the host went missing
> + * during the domain lifetime */
> + virHostdevDeleteMissingPCIDevices(mgr, hostdevs, nhostdevs);
> }
>