Gitlab issue #72 [1] reports that removing SR-IOVs VFs before
removing the devices from the running domains can have strange
consequences. QEMU might be able to hotunplug the device inside the
guest, but Libvirt will not be aware of that, and then the guest is
now inconsistent with the domain definition.
There's also the possibility of the VFs removal not succeeding
while the domain is running but then, as soon as the domain
is shutdown, all the VFs are removed. Libvirt can't handle
the removal of the PCI devices while trying to reattach the
hostdevs, and the Libvirt daemon can be left in an inconsistent
state (see [2]).
This patch starts to address the issue related in Gitlab #72, most
notably the issue described in [2]. When shutting down a domain
with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices()
is failing the whole process and failing to reattach all the
PCI devices, including the ones that aren't related to the VFs that
went missing. Let's make it more resilient with host changes by
changing virHostdevGetPCIHostDevice() to return an exclusive error
code '-2' for this case. virHostdevGetPCIHostDeviceList() can then
tell when virHostdevGetPCIHostDevice() failed to find the PCI
device of a hostdev and continue to make the list of PCI devices.
virHostdevReAttachPCIDevices() will now be able to proceed reattaching
all other valid PCI devices, at least. The 'ghost hostdevs' will be
handled later on.
[1]
https://gitlab.com/libvirt/libvirt/-/issues/72
[2]
https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148
Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
---
src/hypervisor/virhostdev.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index 7338ac0700..18373deb41 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void)
* is returned.
*
* Returns: 0 on success (@pci might be NULL though),
- * -1 otherwise (with error reported).
+ * -1 otherwise (with error reported),
+ * -2 PCI device not found. @pci will be NULL
*/
static int
virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
@@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
return 0;
+ if (!virPCIDeviceExists(&pcisrc->addr))
+ return -2;
+
actual = virPCIDeviceNew(&pcisrc->addr);
if (!actual)
@@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int
nhostdevs)
virDomainHostdevDefPtr hostdev = hostdevs[i];
g_autoptr(virPCIDevice) pci = NULL;
- if (virHostdevGetPCIHostDevice(hostdev, &pci) < 0)
+ if (virHostdevGetPCIHostDevice(hostdev, &pci) == -1)
return NULL;
if (!pci)
--
2.26.2