Sorry for the delay in answering. I usually get the replies to my
patches in my inbox because I usually get my email CC'ed in the
reply.
On 1/20/21 10:33 PM, Laine Stump wrote:
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote:
> Gitlab issue #72 [1] reports that removing SR-IOVs VFs before
> removing the devices from the running domains can have strange
> consequences.
Resolution: "Don't do that!!" :-)
Seriously, though, there are any number of things that someone with root access to the
host could do that would completely confuse libvirt and break everything. For example,
assigning a PF to a guest after having assigned one or more of that PF's VFs to
guests.
But I guess there's no harm in trying to mitigate the damage. As long as it
doesn't complicate the code to the extent that it becomes more confusing, difficult to
maintain, and liable to other regressions.
This was the idea with this series. I'm trying my best to not interfere
with working code/logic by adding these mitigations.
> QEMU might be able to hotunplug the device inside the
> guest, but Libvirt will not be aware of that,
If qemu knows enough to unplug the device from the guest, then I'm presuming that it
must also be sending a DEVICE_DELETED event up to libvirt. Of course libvirt isn't
expecting this, and so probably throws it away, right?
---- long story warning ----
We have 2 scenarios I'm trying to deal with in this work: a SR-IOV disappearing
from the host, while a running domain was using one of its functions, in
a condition where:
1) the VF was coldplugged in the domain (i.e. it was in the domain XML
before starting the domain)
2) the VF was hotplugged in the domain
In scenario (2), QEMU will throw a DEVICE_DEL event and we'll capture it
accordingly via processDeviceDeleteEvent ->qemuDomainRemoveDevice ->
qemuDomainRemoveHostDevice. The problem is,by the time we reach
qemuDomainRemoveHostDevice, the VF is already gone from the host and we didn't
foresee this as a possibility.
All of this is pinned on how we interpret virPCIDeviceNew(). My initial idea
for this work was to change virPCIDeviceNew() to allow the object to be created,
regardless of whether the device actually exists in the host, and then we
would have a 'present' flag to determine if the device is present or not.
I got terrified by the idea of changing the semantics of virPCIDeviceNew(),
which will return -1 if the device isn't present, for all Libvirt devices that
uses the object, just to handle a niche case. I decided then to do the
'device is present' checks you'll see around in this series.
As far as scenario (1) goes, I don't fully understand why it happens TBH. For
QEMU there is no difference - QEMU will throw a DEVICE_DEL event for a VF that
disappears from the host, even if the VF was coldplugged. When launching the VM
with Libvirt, QEMU internals aren't aware of kernel intent to remove the device.
Looking in the vfio driver in the kernel I learned that this intent is expressed by
an eventfd call of vfio_pci_request. My somewhat educated guess is that Libvirt
is wrapping up the QEMU process in a way that QEMU never sees this eventfd. The
result, as you can see in the bug, is that unless the admin kills the terminal where
"echo 0 > num_vfs" was issue, 'echo' will lock for as long as the
domain using the VF
keeps running.
And when the domain is shut down the VF is removed from the process, and the host,
and then we have a similar problem in our shutdown path, via qemuProcessStop.
> and then the guest is
> now inconsistent with the domain definition.
>
> There's also the possibility of the VFs removal not succeeding
> while the domain is running but then, as soon as the domain
> is shutdown, all the VFs are removed. Libvirt can't handle
> the removal of the PCI devices while trying to reattach the
> hostdevs, and the Libvirt daemon can be left in an inconsistent
> state (see [2]).
>
> This patch starts to address the issue related in Gitlab #72, most
> notably the issue described in [2]. When shutting down a domain
> with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices()
> is failing the whole process and failing to reattach all the
> PCI devices, including the ones that aren't related to the VFs that
> went missing. Let's make it more resilient with host changes by
> changing virHostdevGetPCIHostDevice() to return an exclusive error
> code '-2' for this case. virHostdevGetPCIHostDeviceList() can then
> tell when virHostdevGetPCIHostDevice() failed to find the PCI
> device of a hostdev and continue to make the list of PCI devices.
>
> virHostdevReAttachPCIDevices() will now be able to proceed reattaching
> all other valid PCI devices, at least. The 'ghost hostdevs' will be
> handled later on.
>
> [1]
https://gitlab.com/libvirt/libvirt/-/issues/72
> [2]
https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
> ---
> src/hypervisor/virhostdev.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
> index bd35397f2c..dbba36193b 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void)
> * is returned.
> *
> * Returns: 0 on success (@pci might be NULL though),
> - * -1 otherwise (with error reported).
> + * -1 otherwise (with error reported),
> + * -2 PCI device not found. @pci will be NULL
> */
> static int
> virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
> @@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
> hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
> return 0;
> + if (!virPCIDeviceExists(&pcisrc->addr))
> + return -2;
What happens if the device disappeared and then came back before you got here? (i.e. by
setting sriov_numvfs = 0, and then sriov_numvfs = 128 (or whatever)?
I'm not sure what would happen. I never though that this could trigger a TOCTOU
situation.
I **think** that this wouldn't be too nefarious - we would remove the device from
the domain definition and the device would be there in the host. Would need to
try to simulate this to be sure though.
Thanks,
DHB
It's too late in the day for my to process this too much, so I'm going to come
back to it tomorrow. I will continue on to see if there is other "simple" stuff
in subsequent patches though - even if we can't get all of this pushed right away,
there's a significant amount that can be pushed.
> +
> actual = virPCIDeviceNew(&pcisrc->addr);
> if (!actual)
> @@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs,
int nhostdevs)
> virDomainHostdevDefPtr hostdev = hostdevs[i];
> g_autoptr(virPCIDevice) pci = NULL;
> - if (virHostdevGetPCIHostDevice(hostdev, &pci) < 0)
> + if (virHostdevGetPCIHostDevice(hostdev, &pci) == -1)
> return NULL;
> if (!pci)