Re: plug pre-created tap devices to libvirt guests

30 Aug 2020

      ...
On Tue, Jun 30, 2020 at 04:02:05PM +0100, Daniel P. Berrangé wrote:
...
On Tue, Jun 30, 2020 at 12:59:03PM +0200, Miguel Duarte de Mora Barroso wrote:
...
On Mon, Apr 6, 2020 at 4:03 PM Laine Stump <lstump redhat com> wrote:
...
On 4/6/20 9:54 AM, Daniel P. Berrangé wrote:
...
On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso wrote:
...
Hi all,
I'm aware that it is possible to plug pre-created macvtap devices to
libvirt guests - tracked in RFE [0].
My interpretation of the wording in [1] and [2] is that it is also
possible to plug pre-created tap devices into libvirt guests - that
would be a requirement to allow kubevirt to run with less capabilities
in the pods that encapsulate the VMs.
I took a look at the libvirt code ([3] & [4]), and, from my limited
understanding, I got the impression that plugging existing interfaces
via `managed='no' ` is only possible for macvtap interfaces.
No, it works for standard tap devices as well.
The reason the BZs and commit logs talk mostly about macvtap rather than
tap is because 1) that's what kubevirt people had asked for and 2) it
already *mostly* worked for tap devices, so most of the work was related
to macvtap (my memory is already fuzzy, but I think there were a couple
privileged operations we still tried to do for standard tap devices even
if they were precreated (standard disclaimer: I often misremember, so
this memory could be wrong! But definitely precreated tap devices do work).
It's been a while since I've started this thread, but lately I've
understood better how tap devices work, and that new insight makes me
wonder about a couple of things.
Our ultimate goal In kubevirt is to consume a pre-created tap device
by a kubernetes pod that doesn't have the NET_ADMIN capability.
After looking at the current libvirt code, I don't think that is
currently supported, since we'll *always* enter the
`virNetDevTapCreate` function in [1] (I'm interested in the *tap*
scenario).
The tap device is effectively created in that function - [2] - by
opening the clone device (/dev/net/tun), and calling `ioctl(fd,
TUNSETIFF,...)` in it. AFAIK, both of those operations *require* the
NET_ADMIN capability. If I'm correct, this means that the current
libvirt implementation makes our goals impossible to achieve.
AFAIK, that is not correct - CAP_NET_ADMIN isn't required to open
or create a tap device - only to add the tap device to a bridge.
So if you create the tap device & attach it to a bridge ahead of
time, libvirt should then be able to open it and give it to QEMU
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
                (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
              !ns_capable(net->user_ns, CAP_NET_ADMIN);
This is called by the TUNSETIFF code.
AFAICT, that means if you  fchown(tapfd, uid, gid), to the uid+gid of
libvirtd, it should not require CAP_NET_ADMIN.
Regards,
Daniel
I have no idea if this message will get linked into the thread properly, but
I came across this and wanted to comment on the mystery without having an actual
email to reply to or headers.

I recently ran into this issue as well, and found that even *with* NET_ADMIN at
the container level, trying to launch Qemu directly results in:

qemu-system-x86_64: -netdev tap,id=hostnet0,ifname=tap0: could not
configure /dev/net/tun (tap0): Permission denied

So as a note I'd say even Libvirt aside, Qemu is trying to do this as well:
https://github.com/qemu/qemu/blob/0982a56a551556c704dc15752dabf57b4be1c640/n...

But it's unclear where the EPERM is coming from in the kernel at tun_set_iff().

Of note, if I give Qemu a non-existing tap name, it will create it,
but if I give
it an existing tap name, I get EPERM.

Re: plug pre-created tap devices to libvirt guests

Marcus