On 2/1/22 09:28, Dmitrii Shcherbakov wrote:
SmartNIC DPUs may not expose some privileged eswitch operations
to the hypervisor hosts. For example, this happens with Bluefield
devices running in the ECPF (default) mode [1] for security reasons. While
VF MAC address programming is possible via an RTM_SETLINK operation,
trying to set a VLAN ID in the same operation may fail with EPERM.
Specifically for the mlx5 driver this behavior was altered in the Linux
kernel upstream [2] to avoid getting EPERM when trying to program VLAN 0.
This may also get backported to older downstream kernels (e.g. [3]).
However, Libvirt could potentially handle this case gracefully without
needing a specific kernel version or depending on a specific driver fix.
In the kernel a relevant call chain may look like
do_setlink -> do_setvfinfo -> dev->netdev_ops->set_vf_vlan
which calls a driver-specific function like [4] eventually.
The equivalent ip link commands below provide an illustration:
1. This will work without an error:
sudo ip link set enp130s0f0 vf 2 mac de:ad:be:ef:ca:fe
2. Setting (or clearing) a VLAN will fail with EPERM:
sudo ip link set enp130s0f0 vf 2 vlan 0
RTNETLINK answers: Operation not permitted
3. This is what Libvirt attempts to do today when trying to clear a
VF VLAN at the same time as programming a VF MAC:
sudo ip link set enp130s0f0 vf 2 vlan 0 mac de:ad:be:ef:ca:fe
RTNETLINK answers: Operation not permitted
If setting an explicit VLAN ID results in an EPERM, clearing a VLAN
(setting a VLAN ID to 0) can be handled gracefully by ignoring the
EPERM error with the rationale being that if we cannot set this state
in the first place, we cannot clear it either.
Thus, virNetDevSetVfConfig is split into two distinct functions. If
clearing a VLAN ID fails with EPERM when clearing is implicit, the
error is simply ignored. For explicit clearing EPERM is still a
fatal error.
Both new functions rely virNetDevSendVfSetLinkRequest that implements
common functionality related to formatting a request, sending it and
handling error conditions and returns 0 or an error since in both cases
the payload is either NLMSG_DONE (no error) or NLMSG_ERROR where an
error message is needed by the caller to handle known cases
appropriately. This function allows the conditional code to be unit tested.
An alternative to this could be providing a higher level control plane
mechanism that would provide metadata about a device being remotely
managed in which case Libvirt would avoid trying to set or clear a
VLAN ID. This would be more complicated since other software (like Nova
in the OpenStack case) would have to annotate every guest device with an
attribute indicating whether a device is remotely managed or not based
on operator provided configuration so that Libvirt can act on this and
avoid VLAN programming.
https://gitlab.com/dmitriis/libvirt/-/pipelines/460293963
v8 change:
* Rebased on top of the latest changes to Libvirt;
* Added relevant upstream Linux and downstream kernel references to
this cover letter.
[1]
https://docs.mellanox.com/display/BlueFieldSWv35111601/Modes+of+Operation...
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit...
[3]
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1957753
[4]
https://github.com/torvalds/linux/blob/v5.15/drivers/net/ethernet/mellano...
Dmitrii Shcherbakov (3):
Set VF MAC and VLAN ID in two different operations
Allow VF vlanid to be passed as a pointer
Ignore EPERM on implicit clearing of VF VLAN ID
NEWS.rst | 14 ++
src/hypervisor/virhostdev.c | 4 +-
src/libvirt_private.syms | 7 +
src/util/virnetdev.c | 256 +++++++++++++++++++++++++-----------
src/util/virnetdevpriv.h | 44 +++++++
tests/virnetdevtest.c | 249 ++++++++++++++++++++++++++++++++++-
6 files changed, 496 insertions(+), 78 deletions(-)
create mode 100644 src/util/virnetdevpriv.h
Sorry for allowing this go to v8 without proper review. To make it up to
you, here's my suggestion: problems I've raised in my review are easy to
solve. I've uploaded the 'fixup' commits onto my gilab here:
https://gitlab.com/MichalPrivoznik/libvirt/-/commits/vf_vlan_id/
and if you agree, I'd squash them in respective commits and merge.
Laine, any thoughts?
Michal