[libvirt] [PATCH] macvtap: Work-around failing nl_connect calls (weird problem)

When trying to start / stop a domain with macvtap device (direct type of interface) having a device description like this one here <interface type='direct'> <source dev='static' mode='vepa'/> </interface> then I see netlink related errors when a 'virsh edit' session is happening at the same time. So, to reproduce this error you should try the following (on a kernel supporting macvtap): virsh edit <macvtap domain> -> do not terminate the edit sessions virsh start <macvtap domain> -> works virsh destroy <macvtap domain> -> leaves a macvtap device due to nl_connect failing virsh start <macvtap domain> -> does not start anymore That should make it fail. The work-around basically keeps on allocating new netlink library handles until the nl_connect() succeeds. New netlink library handles cause the next available port (intern to libnl; nl_pid) to be allocated. For every ongoing virsh edit session, one new handle seems to be required. So for 2 virsh edit session, the 3rd one (usually) works. I do not know what in the system is causing this, but my guess it that 'something' is blocking the same port (nl_pid) -- it may not be in libvirt but in a dependent library that's not using libnl (?). Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Index: libvirt-acl/src/util/macvtap.c =================================================================== --- libvirt-acl.orig/src/util/macvtap.c +++ libvirt-acl/src/util/macvtap.c @@ -120,13 +120,43 @@ int nlComm(struct nl_msg *nl_msg, fd_set readfds; int fd; int n; - struct nl_handle *nlhandle = nl_handle_alloc(); + struct nl_handle **nlhandles = NULL; struct nlmsghdr *nlmsg = nlmsg_hdr(nl_msg); + unsigned int idx = 0, num_elms = 1, i; - if (!nlhandle) - return -1; +realloc: + if (VIR_REALLOC_N(nlhandles, num_elms * sizeof(struct nl_handle *)) < 0) { + virReportOOMError(); + rc = -1; + goto err_exit; + } + + for (i = idx; i < num_elms ; i++) + nlhandles[i] = NULL; + +next_handle: + nlhandles[idx] = nl_handle_alloc(); - if (nl_connect(nlhandle, NETLINK_ROUTE) < 0) { + if (nlhandles[idx] == NULL) { + virReportOOMError(); + rc = -1; + goto err_exit; + } + + if (nl_connect(nlhandles[idx], NETLINK_ROUTE) < 0) { + VIR_DEBUG0("Could not create netlink socket - trying a new one\n"); + /* get a new handle and keep the ones we have */ + idx++; + if (idx < num_elms) + goto next_handle; + /* need to reallocate */ + num_elms += 10; + if (idx < 500) + goto realloc; + + macvtapError(VIR_ERR_INTERNAL_ERROR, "%s [%s]", + _("Could not create netlink socket"), + nl_geterror()); rc = -1; goto err_exit; } @@ -135,15 +165,16 @@ int nlComm(struct nl_msg *nl_msg, nlmsg->nlmsg_pid = getpid(); - nbytes = nl_send_auto_complete(nlhandle, nl_msg); + nbytes = nl_send_auto_complete(nlhandles[idx], nl_msg); if (nbytes < 0) { virReportSystemError(errno, - "%s", _("cannot send to netlink socket")); + "%s [%s]", _("cannot send to netlink socket"), + nl_geterror()); rc = -1; goto err_exit; } - fd = nl_socket_get_fd(nlhandle); + fd = nl_socket_get_fd(nlhandles[idx]); FD_ZERO(&readfds); FD_SET(fd, &readfds); @@ -160,7 +191,7 @@ int nlComm(struct nl_msg *nl_msg, goto err_exit; } - *respbuflen = nl_recv(nlhandle, &nladdr, respbuf, NULL); + *respbuflen = nl_recv(nlhandles[idx], &nladdr, respbuf, NULL); if (*respbuflen <= 0) rc = -1; @@ -171,7 +202,12 @@ err_exit: *respbuflen = 0; } - nl_handle_destroy(nlhandle); + if (nlhandles) + for (idx = 0; idx < num_elms; idx++) + nl_handle_destroy(nlhandles[idx]); + + VIR_FREE(nlhandles); + return rc; }

On Mon, Feb 14, 2011 at 02:34:28PM -0500, Stefan Berger wrote:
When trying to start / stop a domain with macvtap device (direct type of interface) having a device description like this one here
<interface type='direct'> <source dev='static' mode='vepa'/> </interface>
then I see netlink related errors when a 'virsh edit' session is happening at the same time.
Leaving 'virsh edit' active, means there is a virConnectPtr object open in libvirtd. This in turn means there is a netcf instance active in libvirtd. This in turn means there is a netlink socket open by netcf in libvirtd.
So, to reproduce this error you should try the following (on a kernel supporting macvtap):
virsh edit <macvtap domain> -> do not terminate the edit sessions virsh start <macvtap domain> -> works virsh destroy <macvtap domain> -> leaves a macvtap device due to nl_connect failing virsh start <macvtap domain> -> does not start anymore
That should make it fail.
The work-around basically keeps on allocating new netlink library handles until the nl_connect() succeeds. New netlink library handles cause the next available port (intern to libnl; nl_pid) to be allocated. For every ongoing virsh edit session, one new handle seems to be required. So for 2 virsh edit session, the 3rd one (usually) works. I do not know what in the system is causing this, but my guess it that 'something' is blocking the same port (nl_pid) -- it may not be in libvirt but in a dependent library that's not using libnl (?).
This problem sequence seems to imply something crazy like us only being allowed to have one netlink socket open per $PID ??? Is that really correct ? Looking at the kernel code they seem todo some kind of hash based on PID & return EBUSY in that..which suggests there may be some kind of restriction...
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Index: libvirt-acl/src/util/macvtap.c =================================================================== --- libvirt-acl.orig/src/util/macvtap.c +++ libvirt-acl/src/util/macvtap.c @@ -120,13 +120,43 @@ int nlComm(struct nl_msg *nl_msg, fd_set readfds; int fd; int n; - struct nl_handle *nlhandle = nl_handle_alloc(); + struct nl_handle **nlhandles = NULL; struct nlmsghdr *nlmsg = nlmsg_hdr(nl_msg); + unsigned int idx = 0, num_elms = 1, i;
- if (!nlhandle) - return -1; +realloc: + if (VIR_REALLOC_N(nlhandles, num_elms * sizeof(struct nl_handle *)) < 0) { + virReportOOMError(); + rc = -1; + goto err_exit; + } + + for (i = idx; i < num_elms ; i++) + nlhandles[i] = NULL; + +next_handle: + nlhandles[idx] = nl_handle_alloc();
- if (nl_connect(nlhandle, NETLINK_ROUTE) < 0) { + if (nlhandles[idx] == NULL) { + virReportOOMError(); + rc = -1; + goto err_exit; + } + + if (nl_connect(nlhandles[idx], NETLINK_ROUTE) < 0) { + VIR_DEBUG0("Could not create netlink socket - trying a new one\n"); + /* get a new handle and keep the ones we have */ + idx++; + if (idx < num_elms) + goto next_handle; + /* need to reallocate */ + num_elms += 10; + if (idx < 500) + goto realloc; + + macvtapError(VIR_ERR_INTERNAL_ERROR, "%s [%s]", + _("Could not create netlink socket"), + nl_geterror()); rc = -1; goto err_exit; } @@ -135,15 +165,16 @@ int nlComm(struct nl_msg *nl_msg,
nlmsg->nlmsg_pid = getpid();
- nbytes = nl_send_auto_complete(nlhandle, nl_msg); + nbytes = nl_send_auto_complete(nlhandles[idx], nl_msg); if (nbytes < 0) { virReportSystemError(errno, - "%s", _("cannot send to netlink socket")); + "%s [%s]", _("cannot send to netlink socket"), + nl_geterror()); rc = -1; goto err_exit; }
- fd = nl_socket_get_fd(nlhandle); + fd = nl_socket_get_fd(nlhandles[idx]);
FD_ZERO(&readfds); FD_SET(fd, &readfds); @@ -160,7 +191,7 @@ int nlComm(struct nl_msg *nl_msg, goto err_exit; }
- *respbuflen = nl_recv(nlhandle, &nladdr, respbuf, NULL); + *respbuflen = nl_recv(nlhandles[idx], &nladdr, respbuf, NULL); if (*respbuflen <= 0) rc = -1;
@@ -171,7 +202,12 @@ err_exit: *respbuflen = 0; }
- nl_handle_destroy(nlhandle); + if (nlhandles) + for (idx = 0; idx < num_elms; idx++) + nl_handle_destroy(nlhandles[idx]); + + VIR_FREE(nlhandles); + return rc; }
This approach feels like a nasty hack to me and potentially still leaves us with a problem in netcf which is also using netlink sockets. I think we need to get a clearer picture of what the root cause is before going for this kind of patch Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
This approach feels like a nasty hack to me and potentially still leaves us with a problem in netcf which is also using netlink sockets. I think we need to get a clearer picture of what the root cause is before going for this kind of patch
Correct, I am 'fixing' this in the wrong place. The issues is in the call sequence nl_handle = nl_handle_alloc() nl_connect(nl_handle, NETLINK_ROUTE) with the second one failing taking merely input from the 1st one. These are obviously two libnl calls. Something is either not using libn or not using it correctly. Thanks for pointing out netcf. I looked at libnetcf code and found this sequence here: [...] int netlink_init(struct netcf *ncf) { ncf->driver->nl_sock = nl_handle_alloc(); if (ncf->driver->nl_sock == NULL) goto error; if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) { goto error; } This seems to be doing the same as I do. Maybe there is yet 'something else' that's using netlink sockets. What's also strange is that the first 'virsh start' still works, but the subsequent 'virsh destroy' then does not. Stefan

On 02/14/2011 03:30 PM, Stefan Berger wrote:
On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
This approach feels like a nasty hack to me and potentially still leaves us with a problem in netcf which is also using netlink sockets. I think we need to get a clearer picture of what the root cause is before going for this kind of patch
Correct, I am 'fixing' this in the wrong place. The issues is in the call sequence
nl_handle = nl_handle_alloc() nl_connect(nl_handle, NETLINK_ROUTE)
with the second one failing taking merely input from the 1st one. These are obviously two libnl calls. Something is either not using libn or not using it correctly. Thanks for pointing out netcf. I looked at libnetcf code and found this sequence here:
[...] int netlink_init(struct netcf *ncf) {
ncf->driver->nl_sock = nl_handle_alloc(); if (ncf->driver->nl_sock == NULL) goto error; if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) { goto error; }
This seems to be doing the same as I do. Maybe there is yet 'something else' that's using netlink sockets. What's also strange is that the first 'virsh start' still works, but the subsequent 'virsh destroy' then does not.
One definte problem in libnl is that the 'port allocation' (generate_local_port()) is not thread-safe, even though I think it's the library's responsibility to lock, not libvirt introducing a lock that we need to grab before calling into netcf and grabbing in macvtap. Unless libnl fixes this, I believe there will be no other way than retrying. One will eventually bind and exclude a concurrent thread from binding. Regards, Stefan

On 02/14/2011 05:22 PM, Stefan Berger wrote:
On 02/14/2011 03:30 PM, Stefan Berger wrote:
On 02/14/2011 02:51 PM, Daniel P. Berrange wrote:
This approach feels like a nasty hack to me and potentially still leaves us with a problem in netcf which is also using netlink sockets. I think we need to get a clearer picture of what the root cause is before going for this kind of patch
Correct, I am 'fixing' this in the wrong place. The issues is in the call sequence
nl_handle = nl_handle_alloc() nl_connect(nl_handle, NETLINK_ROUTE)
with the second one failing taking merely input from the 1st one. These are obviously two libnl calls. Something is either not using libn or not using it correctly. Thanks for pointing out netcf. I looked at libnetcf code and found this sequence here:
[...] int netlink_init(struct netcf *ncf) {
ncf->driver->nl_sock = nl_handle_alloc(); if (ncf->driver->nl_sock == NULL) goto error; if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0) { goto error; }
This seems to be doing the same as I do. Maybe there is yet 'something else' that's using netlink sockets. What's also strange is that the first 'virsh start' still works, but the subsequent 'virsh destroy' then does not.
One definte problem in libnl is that the 'port allocation' (generate_local_port()) is not thread-safe, even though I think it's the library's responsibility to lock, not libvirt introducing a lock that we need to grab before calling into netcf and grabbing in macvtap. Unless libnl fixes this, I believe there will be no other way than retrying. One will eventually bind and exclude a concurrent thread from binding.
It's late but this doesn't look right even now in libnl (libnl-debuginfo-1.1-12.fc14.x86_64): port allocation (socket.c ; line 134): used_ports_map[i] |= (1UL << n); - that's going to set a bit port deallocation (socket.c; line 156) : used_ports_map[nr / 32] &= ~((nr % 32) + 1); - that's going to produce garbage; no wonder things don't work used_ports_map[nr / 32] &= ~(1 << (nr % 32)); or used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f)); - would probably be much better Stefan

On 02/14/2011 11:43 PM, Stefan Berger wrote:
One definte problem in libnl is that the 'port allocation' (generate_local_port()) is not thread-safe, even though I think it's the library's responsibility to lock, not libvirt introducing a lock that we need to grab before calling into netcf and grabbing in macvtap. Unless libnl fixes this, I believe there will be no other way than retrying. One will eventually bind and exclude a concurrent thread from binding.
It's late but this doesn't look right even now in libnl (libnl-debuginfo-1.1-12.fc14.x86_64):
port allocation (socket.c ; line 134): used_ports_map[i] |= (1UL << n);
- that's going to set a bit
port deallocation (socket.c; line 156) : used_ports_map[nr / 32] &= ~((nr % 32) + 1);
- that's going to produce garbage; no wonder things don't work
used_ports_map[nr / 32] &= ~(1 << (nr % 32));
or
used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
- would probably be much better
I am withdrawing this patch. I filed bugzilla(s) against libnl, which needs patching. I hope it comes back quickly in form of an updated rpm. Stefan
participants (2)
-
Daniel P. Berrange
-
Stefan Berger