[libvirt] Problem with virInterfaceCreate(), IFF_UP, and NetworkManager

Due to "a checkered past" (a myriad of minor issues, changing over time), libvirt's semi-official position on the virInterface*() APIs and NetworkManager is that virInterface*() is only supported if NM is disabled. We do still attempt to make it work as well as possible, but normally I only test those APIs on systems that have NM disabled and use the network service (RHEL/Fedora/CentOS systems here) instead. On a seemingly unrelated note, a few months ago mprivozn pushed a patch that makes it an error to call virInterfaceCreate() (i.e. "ifup") for an interface that is already active. (the "active" state of an interface is determined by looking at an interface's IFF_UP flag (and also IFF_RUNNING, if the interface isn't a bridge device). Previously, this was allowed, as it is common practive to ifup an interface to make new config take effect. Last week, I happened to test the "virsh iface-bridge" command on a system with NM enabled. That command gave an error about the interface being already active, so I tried again, this time ifdowning the interface in advance - I *still* got the error. Further investigation and questioning of NM developers led me to the realization that when NM is enabled, all interfaces *always* have IFF_UP and IFF_RUNNING set, even if they are ifdowned. Further, if NM is active there is no way to determine an interface's "active" status via iotctl() or netlink; instead, must query to determine if NM is active, and if it is you must call a NM API instead (I got this much information from NM developers directly; haven't investigated yet exactly what the API is). NM developers say that this pinning-up of the IFF_UP flag has been done for a long time, and is necessary to do interface auto-config. I think it is violating a long-standing assumption (if not a standard) about the meaning of IFF_UP, and I'm not convinced that it really is a necessity (certainly once a config file is present for an interface, it shouldn't be needed), but then I haven't spent as much time in that problem space as they have. In the meantime, the virInterfaceCreate() API fails 100% of the time on any system that has NM enabled. My dilemma now is whether to attempt to affect change in NM's use of IFF_UP so that it once again can be used as an indicator of whether or not an interface is active, or to just give in and 1) officially declare that virInterface*() isn't supported if NM is enabled until 2) we add code to netcf that detects when NM is active and learns how to query interface status from NM instead of the standard ioctl(SIOCGIFFLAGS). And if the latter is preferred, should we in the meantime perhaps revert the patch that made virInterfaceCreate() an error if the interface was active? Or just leave it completely broken? Any opinions?

On 10.11.2014 16:41, Laine Stump wrote:
Due to "a checkered past" (a myriad of minor issues, changing over time), libvirt's semi-official position on the virInterface*() APIs and NetworkManager is that virInterface*() is only supported if NM is disabled. We do still attempt to make it work as well as possible, but normally I only test those APIs on systems that have NM disabled and use the network service (RHEL/Fedora/CentOS systems here) instead.
On a seemingly unrelated note, a few months ago mprivozn pushed a patch that makes it an error to call virInterfaceCreate() (i.e. "ifup") for an interface that is already active. (the "active" state of an interface is determined by looking at an interface's IFF_UP flag (and also IFF_RUNNING, if the interface isn't a bridge device). Previously, this was allowed, as it is common practive to ifup an interface to make new config take effect.
Last week, I happened to test the "virsh iface-bridge" command on a system with NM enabled. That command gave an error about the interface being already active, so I tried again, this time ifdowning the interface in advance - I *still* got the error. Further investigation and questioning of NM developers led me to the realization that when NM is enabled, all interfaces *always* have IFF_UP and IFF_RUNNING set, even if they are ifdowned. Further, if NM is active there is no way to determine an interface's "active" status via iotctl() or netlink; instead, must query to determine if NM is active, and if it is you must call a NM API instead (I got this much information from NM developers directly; haven't investigated yet exactly what the API is).
NM developers say that this pinning-up of the IFF_UP flag has been done for a long time, and is necessary to do interface auto-config. I think it is violating a long-standing assumption (if not a standard) about the meaning of IFF_UP, and I'm not convinced that it really is a necessity (certainly once a config file is present for an interface, it shouldn't be needed), but then I haven't spent as much time in that problem space as they have.
Looking into kernel code reveals that IFF_UP really is meant for that. I mean, from the kernel sources (include/uapi/linux/if.h): * @IFF_UP: interface is up. Can be toggled through sysfs. I think NM shouldn't be misusing this. I'd suggest getting back to NM developers trying to persuade them to reconsider.
In the meantime, the virInterfaceCreate() API fails 100% of the time on any system that has NM enabled. My dilemma now is whether to attempt to affect change in NM's use of IFF_UP so that it once again can be used as an indicator of whether or not an interface is active, or to just give in and 1) officially declare that virInterface*() isn't supported if NM is enabled until 2) we add code to netcf that detects when NM is active and learns how to query interface status from NM instead of the standard ioctl(SIOCGIFFLAGS).
And if the latter is preferred, should we in the meantime perhaps revert the patch that made virInterfaceCreate() an error if the interface was active? Or just leave it completely broken?
We can revert the patch meantime, I guess.
Any opinions?
Michal

On Mon, Nov 10, 2014 at 10:41:26AM -0500, Laine Stump wrote:
On a seemingly unrelated note, a few months ago mprivozn pushed a patch that makes it an error to call virInterfaceCreate() (i.e. "ifup") for an interface that is already active. (the "active" state of an interface is determined by looking at an interface's IFF_UP flag (and also IFF_RUNNING, if the interface isn't a bridge device). Previously, this was allowed, as it is common practive to ifup an interface to make new config take effect.
Last week, I happened to test the "virsh iface-bridge" command on a system with NM enabled. That command gave an error about the interface being already active, so I tried again, this time ifdowning the interface in advance - I *still* got the error. Further investigation and questioning of NM developers led me to the realization that when NM is enabled, all interfaces *always* have IFF_UP and IFF_RUNNING set, even if they are ifdowned. Further, if NM is active there is no way to determine an interface's "active" status via iotctl() or netlink; instead, must query to determine if NM is active, and if it is you must call a NM API instead (I got this much information from NM developers directly; haven't investigated yet exactly what the API is).
NM developers say that this pinning-up of the IFF_UP flag has been done for a long time, and is necessary to do interface auto-config. I think it is violating a long-standing assumption (if not a standard) about the meaning of IFF_UP, and I'm not convinced that it really is a necessity (certainly once a config file is present for an interface, it shouldn't be needed), but then I haven't spent as much time in that problem space as they have.
Yep, I understand their motivation here - with IPv6 address auto-config, you really want your NICs to be permanently up (or "online"), so that if an IPv6 router advertizement arrives it "just works" without the user needing to turn this on manually. IIUC, they essentially have 3 states - offline - IFF_UP not set - don't think this is used unless you explicitly tell NM to disable the interface (ie airplane mode for the wifi NIC) - unconfigured - IFF_UP set and no addresses present - configured - IFF_UP set and addresses present. Our API design is really looking at the transition between "offline" and "configured". We don't have the concept of "unconfigured" really.
In the meantime, the virInterfaceCreate() API fails 100% of the time on any system that has NM enabled. My dilemma now is whether to attempt to affect change in NM's use of IFF_UP so that it once again can be used as an indicator of whether or not an interface is active, or to just give in and 1) officially declare that virInterface*() isn't supported if NM is enabled until 2) we add code to netcf that detects when NM is active and learns how to query interface status from NM instead of the standard ioctl(SIOCGIFFLAGS).
And if the latter is preferred, should we in the meantime perhaps revert the patch that made virInterfaceCreate() an error if the interface was active? Or just leave it completely broken?
Any opinions?
It feels like when NM is present on the system, libvirt should still honour the IFF_UP flag. ie it should always report all the NICs managed by network manager to be "up" if IFF_UP is set. I think this implies we should not forbid the virInterfaceCreate API if the state is IFF_UP. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/10/2014 11:51 AM, Daniel P. Berrange wrote:
On Mon, Nov 10, 2014 at 10:41:26AM -0500, Laine Stump wrote:
On a seemingly unrelated note, a few months ago mprivozn pushed a patch that makes it an error to call virInterfaceCreate() (i.e. "ifup") for an interface that is already active. (the "active" state of an interface is determined by looking at an interface's IFF_UP flag (and also IFF_RUNNING, if the interface isn't a bridge device). Previously, this was allowed, as it is common practive to ifup an interface to make new config take effect.
Last week, I happened to test the "virsh iface-bridge" command on a system with NM enabled. That command gave an error about the interface being already active, so I tried again, this time ifdowning the interface in advance - I *still* got the error. Further investigation and questioning of NM developers led me to the realization that when NM is enabled, all interfaces *always* have IFF_UP and IFF_RUNNING set, even if they are ifdowned. Further, if NM is active there is no way to determine an interface's "active" status via iotctl() or netlink; instead, must query to determine if NM is active, and if it is you must call a NM API instead (I got this much information from NM developers directly; haven't investigated yet exactly what the API is).
NM developers say that this pinning-up of the IFF_UP flag has been done for a long time, and is necessary to do interface auto-config. I think it is violating a long-standing assumption (if not a standard) about the meaning of IFF_UP, and I'm not convinced that it really is a necessity (certainly once a config file is present for an interface, it shouldn't be needed), but then I haven't spent as much time in that problem space as they have. Yep, I understand their motivation here - with IPv6 address auto-config, you really want your NICs to be permanently up (or "online"), so that if an IPv6 router advertizement arrives it "just works" without the user needing to turn this on manually.
IIUC, they essentially have 3 states
- offline - IFF_UP not set - don't think this is used unless you explicitly tell NM to disable the interface (ie airplane mode for the wifi NIC)
- unconfigured - IFF_UP set and no addresses present
- configured - IFF_UP set and addresses present.
Our API design is really looking at the transition between "offline" and "configured". We don't have the concept of "unconfigured" really.
"offline" is what I think *should* be the state for any interface that has an ifcfg file, and in that ifcfg file has the following: BOOTPROTO=none (no IPADDRx) IPV6_AUTOCONF=no (no IPV6 addresses) *and* either of ( ONBOOT=no + the device was never ifup'ed, *or* the device has been ifdowned ). In other words, in my mind running "ifdown eth0" does exactly mean that I want the interface to be "offline". That isn't what is happening though.
In the meantime, the virInterfaceCreate() API fails 100% of the time on any system that has NM enabled. My dilemma now is whether to attempt to affect change in NM's use of IFF_UP so that it once again can be used as an indicator of whether or not an interface is active, or to just give in and 1) officially declare that virInterface*() isn't supported if NM is enabled until 2) we add code to netcf that detects when NM is active and learns how to query interface status from NM instead of the standard ioctl(SIOCGIFFLAGS).
And if the latter is preferred, should we in the meantime perhaps revert the patch that made virInterfaceCreate() an error if the interface was active? Or just leave it completely broken?
Any opinions? It feels like when NM is present on the system, libvirt should still honour the IFF_UP flag. ie it should always report all the NICs managed by network manager to be "up" if IFF_UP is set.
I think this implies we should not forbid the virInterfaceCreate API if the state is IFF_UP.
Although I don't have a problem reverting libvirt's behavior to allow ifup'ing an interface that is already active (since I was a detractor when it was added in the fisrst place :-), I'd still get better closure if I had a good explanation of why an interface that has an associated config file and has been explicitly marked "down" (or "offline" or whatever you want to call it) by the admin needs to have IFF_UP. I've CC'ed Dan Williams from NetworkManager in hopes that he can do that.
participants (3)
-
Daniel P. Berrange
-
Laine Stump
-
Michal Privoznik