On 9/16/19 10:44 AM, Nikolay Shirokovskiy wrote:
Hi, all
I recently discovered that <target dev=''/> input is allowed[*]
So you're saying this previously failed? Catastrophically (SEGV)? Or did
it log an error and refuse to start the domain?
by qemu driver's defineXML for <interface> element. On
start it will
turn into 'tap<N>'. At the same time empty value is not allowed by
schema. This name is generated by kernel on linux.
Hmm, yeah I hadn't ever thought about it, but I guess that makes sense,
since that's the way the tunctl utility works by default (opens
/dev/net/tun, then does a TUNSETIFF ioctl with an empty string as the
name of the device), and it get devices name tap<N>.
I wonder how to deal with this.
1. Fix schema to allow empty value for dev attribute. Additionally fix
migrating to clear out this autogenerated name. This way we have 2
options to autogenerate dev name. First as described above. Second
is by ommiting target element. The naming will be 'vnet<N>' in the
latter case. Having these 2 options looks redundant.
Yeah, this option is bad/undesirable.
2. Don't allow to start domains with such configuration. We can not
fail definition because disappearing VMs on libvirt update looks
too unpleasant.
1) (Nevermind - you answered below) How long has this worked? If it's
*very* new, and defining a domain with <target dev=''/> used to fail,
then it's actually a *good* thing to fail immediately.
2) If you put the check for present-but-empty target dev in
virDomainNetDefValidate(), then the validation will be done any time a
new device is parsed, but specifically *won't* be done during the domain
parse when libvirtd starts up. This avoids the "disappearing domains"
problem you mention (and is specifically why the separate validation
functions were added - there is one hypervisor-agnostic function for
each type of device, and a separate function for the entire domain (all
in conf/domain_conf.c), as well as similar hypervisor-specific
validation functions in qemu/qemu_domain.c (look for the functions below
qemuDomainDeviceDefValidate(), e.g. the function for network interfaces
is qemuDomainDeviceDefValidateNetwork(); no, don't ask me why the name
uses a different pattern than the hypervisor-agnostic functions!).
(You'll want to add this validation in the hypervisor-agnostic function,
BTW.)
This way we don't have two different set of
autogenerated names. This can even go unnoticed given we fail
such starts anyway after [1] (which is fixed by [*]).
So you're saying that <target dev=''/> has worked since
"forever" (as
far as you know), until it stopped working in 4.6.0, and the patch you
posted this morning fixes it, right?
I vote for putting a check in virDomainNetDefValidate() that errors out
if it finds target dev present but empty. There may be some who would
say "it's existing behavior and has been like this for a long time, so
we have to preserve it", but since the schema doesn't allow it and we've
never documented it, I think it's reasonable to disallow it (except for,
as you say, existing domain definitions at reboot time; NB: there will
still be an error if a domain containing this now-illegal setting is
ever edited).
[1] is
commited on 2018 Jul 23 and released in 4.6.0. However Centos 7
is based on 4.5.0 and virNetDevTapCreate has logic to copy
generated tap name from kernel like since the beginning (I managed
to follow the history to 2011).
Yes, we rely on it to fill in the numbers of the vnet<N> devices. That
way we let the kernel prevent duplicate device names rather than needing
to (attempt to) prevent races between threads, as we have to do with
macvtap devices (since macvtap doesn't have this capability).
Nikolay
[*] at least with recent "virStrncpy: fix to successfully copy empty string"
applied
[1] 7d70a63b util: Improve virStrncpy() implementation