[libvirt] [PATCH 0 of 3] LXC network interface support

This set is actually authored by Dave Leskovec. He is moving on to some other work and I will be working this into the tree. I have made a few changes to this, mostly to allow this code to compile out if the proper kernel and userspace support is not present. I am submitting it as an RFC because the required kernel patches are not yet upstream (even in -mm) and there are some missing userspace patches to iproute2 as well. I expect there to be quite a few comments and suggestions about the veth.c patch, so I wanted to go ahead and get that started while we wait for upstream to catch up. Comments appreciated!

# HG changeset patch # User Dave Leskovec <dlesko@linux.vnet.ibm.com> # Date 1213891164 25200 # Node ID 386c067de8995028dd11f70602081c31682dd293 # Parent 8d2afc533c91c4796512e1e71c8283e86eafd18a [LXC] Add functions to manage veth device pairs This gives us the ability to create a veth pair so that we can move one into the network namespace of an LXC container. diff -r 8d2afc533c91 -r 386c067de899 configure.in --- a/configure.in Tue Jun 17 15:55:03 2008 +0000 +++ b/configure.in Thu Jun 19 08:59:24 2008 -0700 @@ -301,6 +301,20 @@ if test "$with_qemu" = "yes" ; then AC_CHECK_HEADERS([linux/param.h linux/sockios.h linux/if_bridge.h linux/if_tun.h],, AC_MSG_ERROR([You must install kernel-headers in order to compile libvirt])) +fi + +dnl +dnl check for patched iproute2 for lxc network support +dnl +if test "$with_lxc" = "yes" ; then + AC_MSG_CHECKING([for NETNS support]) + if ip link help 2>&1 | grep -q netns; then + with_lxc_netns="yes" + AC_DEFINE([HAVE_NETNS], [], [Kernel has NETNS support]) + else + with_lxc_netns="no" + fi + AC_MSG_RESULT($with_lxc_netns) fi dnl Need to test if pkg-config exists diff -r 8d2afc533c91 -r 386c067de899 src/Makefile.am --- a/src/Makefile.am Tue Jun 17 15:55:03 2008 +0000 +++ b/src/Makefile.am Thu Jun 19 08:59:24 2008 -0700 @@ -64,6 +64,7 @@ lxc_driver.c lxc_driver.h \ lxc_conf.c lxc_conf.h \ lxc_container.c lxc_container.h \ + veth.c veth.h \ nodeinfo.h nodeinfo.c \ util.c util.h diff -r 8d2afc533c91 -r 386c067de899 src/veth.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/veth.c Thu Jun 19 08:59:24 2008 -0700 @@ -0,0 +1,247 @@ +/* + * Copyright IBM Corp. 2008 + * + * veth.c: file description + * + * Authors: + * David L. Leskovec <dlesko at linux.vnet.ibm.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <config.h> + +#ifdef HAVE_NETNS + +#include <string.h> + +#include "veth.h" +#include "internal.h" +#include "memory.h" +#include "util.h" + +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, "%s", msg) + +/* Functions */ +/** + * getFreeVethName: + * @veth: name for veth device (NULL to find first open) + * @maxLen: max length of veth name + * @startDev: device number to start at (x in vethx) + * + * Looks in /sys/class/net/ to find the first available veth device + * name. + * + * Returns 0 on success or -1 in case of error + */ +static int getFreeVethName(char *veth, int maxLen, int startDev) +{ + int rc = -1; + int devNum = startDev; + char path[PATH_MAX]; + + snprintf(path, PATH_MAX, "/sys/class/net/veth%d/", devNum); + while (virFileExists(path)) { + ++devNum; + sprintf(path, "/sys/class/net/veth%d/", devNum); + } + + snprintf(veth, maxLen, "veth%d", devNum); + + rc = devNum; + + return rc; +} + +/** + * vethCreate: + * @veth1: name for one end of veth pair + * @veth1MaxLen: max length of veth1 name + * @veth2: name for one end of veth pair + * @veth2MaxLen: max length of veth1 name + * + * Creates a veth device pair using the ip command: + * ip link add veth1 type veth peer name veth2 + * NOTE: If veth1 and veth2 names are not specified, ip will auto assign + * names. There seems to be two problems here - + * 1) There doesn't seem to be a way to determine the names of the + * devices that it creates. They show up in ip link show and + * under /sys/class/net/ however there is no guarantee that they + * are the devices that this process just created. + * 2) Once one of the veth devices is moved to another namespace, it + * is no longer visible in the parent namespace. This seems to + * confuse the name assignment causing it to fail with File exists. + * Because of these issues, this function currently forces the caller + * to fully specify the veth device names. + * + * Returns 0 on success or -1 in case of error + */ +int vethCreate(char* veth1, int veth1MaxLen, + char* veth2, int veth2MaxLen) +{ + int rc = -1; + const char *argv[] = { + "ip", "link", "add", veth1, "type", "veth", "peer", "name", veth2, NULL + }; + int cmdResult; + int vethDev = 0; + + if ((NULL == veth1) || (NULL == veth2)) { + goto error_out; + } + + DEBUG("veth1: %s veth2: %s", veth1, veth2); + + if (1 > strlen(veth1)) { + vethDev = getFreeVethName(veth1, veth1MaxLen, 0); + ++vethDev; + DEBUG("assigned veth1: %s", veth1); + } + + if (1 > strlen(veth2)) { + vethDev = getFreeVethName(veth2, veth2MaxLen, vethDev); + DEBUG("assigned veth2: %s", veth2); + } + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * vethDelete: + * @veth: name for one end of veth pair + * + * This will delete both veth devices in a pair. Only one end needs to + * be specified. The ip command will identify and delete the other veth + * device as well. + * ip link del veth + * + * Returns 0 on success or -1 in case of error + */ +int vethDelete(const char *veth) +{ + int rc = -1; + const char *argv[] = {"ip", "link", "del", veth, NULL}; + int cmdResult; + + if (NULL == veth) { + goto error_out; + } + + DEBUG("veth: %s", veth); + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * vethInterfaceUpOrDown: + * @veth: name of veth device + * @upOrDown: 0 => down, 1 => up + * + * Enables a veth device using the ifconfig command. A NULL inetAddress + * will cause it to be left off the command line. + * + * Returns 0 on success or -1 in case of error + */ +int vethInterfaceUpOrDown(const char* veth, int upOrDown) +{ + int rc = -1; + char upOrDownString[8]; + const char *argv[] = {"ifconfig", veth, upOrDownString, NULL}; + int cmdResult; + + if (NULL == veth) { + goto error_out; + } + + if (0 == upOrDown) { + strcpy(upOrDownString, "down"); + } else { + strcpy(upOrDownString, "up"); + } + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * moveInterfaceToNetNs: + * @interface: name of device + * @pidInNs: PID of process in target net namespace + * + * Moves the given device into the target net namespace specified by the given + * pid using this command: + * ip link set interface netns pidInNs + * + * Returns 0 on success or -1 in case of error + */ +int moveInterfaceToNetNs(const char* interface, int pidInNs) +{ + int rc; + /* offset of the pid field in the following args */ + const int pidArgvOffset = 5; + const char *argv[] = { + "ip", "link", "set", interface, "netns", NULL, NULL + }; + int cmdResult; + int len; + + if (NULL == interface) { + goto error_out; + } + + if (0 != VIR_ALLOC_N(argv[pidArgvOffset], (sizeof(int) * 3) + 1)) { + goto error_out; + } + len = snprintf(argv[pidArgvOffset], (sizeof(int) * 3) + 1, "%d", pidInNs); + if (len >= (sizeof(int) * 3) + 1) { + goto cleanup; + } + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +cleanup: + VIR_FREE(argv[pidArgvOffset]); + +error_out: + return rc; +} + +#endif /* HAVE_NETNS */ + diff -r 8d2afc533c91 -r 386c067de899 src/veth.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/veth.h Thu Jun 19 08:59:24 2008 -0700 @@ -0,0 +1,39 @@ +/* + * Copyright IBM Corp. 2008 + * + * veth.h: file description + * + * Authors: + * David L. Leskovec <dlesko at linux.vnet.ibm.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef VETH_H +#define VETH_H + +#include <config.h> + +#ifdef HAVE_NETNS + +/* Function declarations */ +int vethCreate(char* veth1, int veth1MaxLen, char* veth2, + int veth2MaxLen); +int vethDelete(const char* veth); +int vethInterfaceUpOrDown(const char* veth, int upOrDown); +int moveInterfaceToNetNs(const char *interface, int pidInNs); + +#endif /* HAVE_NETNS */ +#endif /* VETH_H */

On Thu, Jun 19, 2008 at 08:59:52AM -0700, Dan Smith wrote:
# HG changeset patch # User Dave Leskovec <dlesko@linux.vnet.ibm.com> # Date 1213891164 25200 # Node ID 386c067de8995028dd11f70602081c31682dd293 # Parent 8d2afc533c91c4796512e1e71c8283e86eafd18a [LXC] Add functions to manage veth device pairs
This gives us the ability to create a veth pair so that we can move one into the network namespace of an LXC container.
Basically it's executing ip commands, so the real code changes are outside of libvirt.
diff -r 8d2afc533c91 -r 386c067de899 configure.in --- a/configure.in Tue Jun 17 15:55:03 2008 +0000 +++ b/configure.in Thu Jun 19 08:59:24 2008 -0700 @@ -301,6 +301,20 @@ if test "$with_qemu" = "yes" ; then AC_CHECK_HEADERS([linux/param.h linux/sockios.h linux/if_bridge.h linux/if_tun.h],, AC_MSG_ERROR([You must install kernel-headers in order to compile libvirt])) +fi + +dnl +dnl check for patched iproute2 for lxc network support +dnl +if test "$with_lxc" = "yes" ; then + AC_MSG_CHECKING([for NETNS support]) + if ip link help 2>&1 | grep -q netns; then + with_lxc_netns="yes" + AC_DEFINE([HAVE_NETNS], [], [Kernel has NETNS support]) + else + with_lxc_netns="no" + fi + AC_MSG_RESULT($with_lxc_netns) fi
Hum, it's not only a kernel feature, it looks more like a dependancy on iproute, maybe that can be checked at runtime when initializing the lxc module, no ? [...]
+static int getFreeVethName(char *veth, int maxLen, int startDev) +{ + int rc = -1; + int devNum = startDev; + char path[PATH_MAX]; + + snprintf(path, PATH_MAX, "/sys/class/net/veth%d/", devNum); + while (virFileExists(path)) { + ++devNum; + sprintf(path, "/sys/class/net/veth%d/", devNum);
use snprintf there too
+ } + + snprintf(veth, maxLen, "veth%d", devNum); + + rc = devNum; + + return rc; +} + +/** + * vethCreate: + * @veth1: name for one end of veth pair + * @veth1MaxLen: max length of veth1 name + * @veth2: name for one end of veth pair + * @veth2MaxLen: max length of veth1 name + * + * Creates a veth device pair using the ip command: + * ip link add veth1 type veth peer name veth2 + * NOTE: If veth1 and veth2 names are not specified, ip will auto assign + * names. There seems to be two problems here - + * 1) There doesn't seem to be a way to determine the names of the + * devices that it creates. They show up in ip link show and + * under /sys/class/net/ however there is no guarantee that they + * are the devices that this process just created. + * 2) Once one of the veth devices is moved to another namespace, it + * is no longer visible in the parent namespace. This seems to + * confuse the name assignment causing it to fail with File exists. + * Because of these issues, this function currently forces the caller + * to fully specify the veth device names. + * + * Returns 0 on success or -1 in case of error + */ +int vethCreate(char* veth1, int veth1MaxLen, + char* veth2, int veth2MaxLen) +{ + int rc = -1; + const char *argv[] = {
I think Jim insists on having const char const * [], but I bet he will explain this better than me :-)
+ "ip", "link", "add", veth1, "type", "veth", "peer", "name", veth2, NULL + }; [...] +/** + * moveInterfaceToNetNs: + * @interface: name of device + * @pidInNs: PID of process in target net namespace + * + * Moves the given device into the target net namespace specified by the given + * pid using this command: + * ip link set interface netns pidInNs + * + * Returns 0 on success or -1 in case of error + */ +int moveInterfaceToNetNs(const char* interface, int pidInNs) +{ + int rc; + /* offset of the pid field in the following args */ + const int pidArgvOffset = 5; + const char *argv[] = { + "ip", "link", "set", interface, "netns", NULL, NULL + }; + int cmdResult; + int len; + + if (NULL == interface) { + goto error_out; + } + + if (0 != VIR_ALLOC_N(argv[pidArgvOffset], (sizeof(int) * 3) + 1)) { + goto error_out; + }
Hum, here i don't understand, if argv is defined as local stack data, I find a bit confusing to use it in the argument for VIR_ALLOC_N. I would use an intermediate local variable to make this cleaner/ easier to read.
+ len = snprintf(argv[pidArgvOffset], (sizeof(int) * 3) + 1, "%d", pidInNs);
yeah I'm getting very confused by that code :-)
+ if (len >= (sizeof(int) * 3) + 1) { + goto cleanup; + } + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +cleanup: + VIR_FREE(argv[pidArgvOffset]); + +error_out: + return rc; +} + +#endif /* HAVE_NETNS */ + diff -r 8d2afc533c91 -r 386c067de899 src/veth.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/veth.h Thu Jun 19 08:59:24 2008 -0700 @@ -0,0 +1,39 @@ +/* + * Copyright IBM Corp. 2008 + * + * veth.h: file description + * + * Authors: + * David L. Leskovec <dlesko at linux.vnet.ibm.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */
Oh can you use /* * veth.h: .... a proper file description :-) .... * * Copyright IBM Corp. 2008 * * See COPYING.LIB for the License of this software * * Authors: * David L. Leskovec <dlesko at linux.vnet.ibm.com> */ as the header template for the two new files to match the other files. COPYING.LIB is of course LGPL 2.1 thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

DV> Hum, it's not only a kernel feature, it looks more like a DV> dependancy on iproute, maybe that can be checked at runtime when DV> initializing the lxc module, no ? I suppose I could actually let it fail when it tries to do the actual move. Would you rather a specific test during the define stage to reject the <interface> if the non-netns 'ip' command is found?
+ if (0 != VIR_ALLOC_N(argv[pidArgvOffset], (sizeof(int) * 3) + 1)) { + goto error_out; + }
DV> Hum, here i don't understand, if argv is defined as local stack data, DV> I find a bit confusing to use it in the argument for VIR_ALLOC_N. I would DV> use an intermediate local variable to make this cleaner/ easier to read.
+ len = snprintf(argv[pidArgvOffset], (sizeof(int) * 3) + 1, "%d", pidInNs);
DV> yeah I'm getting very confused by that code :-) Agreed, I'll fix it up :) DV> Oh can you use DV> /* DV> * veth.h: .... a proper file description :-) .... DV> * DV> * Copyright IBM Corp. 2008 DV> * DV> * See COPYING.LIB for the License of this software DV> * DV> * Authors: DV> * David L. Leskovec <dlesko at linux.vnet.ibm.com> DV> */ DV> as the header template for the two new files to match the other files. DV> COPYING.LIB is of course LGPL 2.1 Okay, I'll change it. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Thu, Jun 19, 2008 at 10:46:40AM -0700, Dan Smith wrote:
DV> Hum, it's not only a kernel feature, it looks more like a DV> dependancy on iproute, maybe that can be checked at runtime when DV> initializing the lxc module, no ?
I suppose I could actually let it fail when it tries to do the actual move. Would you rather a specific test during the define stage to reject the <interface> if the non-netns 'ip' command is found?
Well i was wondering if you could not try to detect if the feature is available for example in lxcProbe() then keep the status.
DV> yeah I'm getting very confused by that code :-)
Agreed, I'll fix it up :)
okay :-)
DV> as the header template for the two new files to match the other files. DV> COPYING.LIB is of course LGPL 2.1
Okay, I'll change it.
Cool, thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard <veillard@redhat.com> wrote: ...
+int vethCreate(char* veth1, int veth1MaxLen, + char* veth2, int veth2MaxLen) +{ + int rc = -1; + const char *argv[] = {
I think Jim insists on having const char const * [], but I bet he will explain this better than me :-)
+ "ip", "link", "add", veth1, "type", "veth", "peer", "name", veth2, NULL
You're right that I encourage the "const correct" approach ;-) To tell the compiler (and more importantly, the reader) that you have a const array of const strings, you'd declare it like this: const char *const argv[] = { or, equivalently, the const can go after the type name of "char": char const *const argv[] = { However, argv is one place where it pays to relax const-correctness guidelines, at least in C, because so many interfaces require non-const "char **argv" pointers. IMHO, it's better to avoid const altogether in this limited case than to be forced to litter the code with ugly and dangerous const-adjusting casts to accommodate "correctness".

JM> However, argv is one place where it pays to relax JM> const-correctness guidelines, at least in C, because so many JM> interfaces require non-const "char **argv" pointers. IMHO, it's JM> better to avoid const altogether in this limited case than to be JM> forced to litter the code with ugly and dangerous const-adjusting JM> casts to accommodate "correctness". I agree with the last statement, but in this case, leaving the const off the declaration results in a compiler warning about discarding the inherent const qualifier from a literal string. A *very* quick look in src/*.c doesn't reveal any other examples of all-literal argv[] lists like this. I'm inclined to leave the const and the cast, but if you have a better suggestion, I'll be glad to make the change. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

DS> # HG changeset patch DS> # User Dave Leskovec <dlesko@linux.vnet.ibm.com> DS> # Date 1213891164 25200 DS> # Node ID 386c067de8995028dd11f70602081c31682dd293 DS> # Parent 8d2afc533c91c4796512e1e71c8283e86eafd18a DS> [LXC] Add functions to manage veth device pairs My apologies for the HG headers. I forgot the --plain when I sent it out... -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

Dan Smith wrote:
# HG changeset patch # User Dave Leskovec <dlesko@linux.vnet.ibm.com> # Date 1213891164 25200 # Node ID 386c067de8995028dd11f70602081c31682dd293 # Parent 8d2afc533c91c4796512e1e71c8283e86eafd18a [LXC] Add functions to manage veth device pairs
This gives us the ability to create a veth pair so that we can move one into the network namespace of an LXC container.
diff -r 8d2afc533c91 -r 386c067de899 configure.in --- a/configure.in Tue Jun 17 15:55:03 2008 +0000 +++ b/configure.in Thu Jun 19 08:59:24 2008 -0700 @@ -301,6 +301,20 @@ if test "$with_qemu" = "yes" ; then AC_CHECK_HEADERS([linux/param.h linux/sockios.h linux/if_bridge.h linux/if_tun.h],, AC_MSG_ERROR([You must install kernel-headers in order to compile libvirt])) +fi + +dnl +dnl check for patched iproute2 for lxc network support +dnl +if test "$with_lxc" = "yes" ; then + AC_MSG_CHECKING([for NETNS support]) + if ip link help 2>&1 | grep -q netns; then + with_lxc_netns="yes" + AC_DEFINE([HAVE_NETNS], [], [Kernel has NETNS support]) + else + with_lxc_netns="no" + fi + AC_MSG_RESULT($with_lxc_netns) fi
dnl Need to test if pkg-config exists diff -r 8d2afc533c91 -r 386c067de899 src/Makefile.am --- a/src/Makefile.am Tue Jun 17 15:55:03 2008 +0000 +++ b/src/Makefile.am Thu Jun 19 08:59:24 2008 -0700 @@ -64,6 +64,7 @@ lxc_driver.c lxc_driver.h \ lxc_conf.c lxc_conf.h \ lxc_container.c lxc_container.h \ + veth.c veth.h \ nodeinfo.h nodeinfo.c \ util.c util.h
diff -r 8d2afc533c91 -r 386c067de899 src/veth.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/veth.c Thu Jun 19 08:59:24 2008 -0700 @@ -0,0 +1,247 @@ +/* + * Copyright IBM Corp. 2008 + * + * veth.c: file description + * + * Authors: + * David L. Leskovec <dlesko at linux.vnet.ibm.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <config.h> + +#ifdef HAVE_NETNS + +#include <string.h> + +#include "veth.h" +#include "internal.h" +#include "memory.h" +#include "util.h" + +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, "%s", msg)
Do you know ##__VA_ARGS ?
+/* Functions */ +/** + * getFreeVethName: + * @veth: name for veth device (NULL to find first open) + * @maxLen: max length of veth name + * @startDev: device number to start at (x in vethx) + * + * Looks in /sys/class/net/ to find the first available veth device + * name. + * + * Returns 0 on success or -1 in case of error + */ +static int getFreeVethName(char *veth, int maxLen, int startDev) +{ + int rc = -1; + int devNum = startDev; + char path[PATH_MAX]; + + snprintf(path, PATH_MAX, "/sys/class/net/veth%d/", devNum);
You can perhaps, use do { ... } while () here.
+ while (virFileExists(path)) { + ++devNum; + sprintf(path, "/sys/class/net/veth%d/", devNum); + }
Is this function safe for concurrent access ? eg. getFreeVethName called in parallel by two processes or another process creates a pair device just after you exit the loop ?
+ snprintf(veth, maxLen, "veth%d", devNum); + + rc = devNum; + + return rc; +} + +/** + * vethCreate: + * @veth1: name for one end of veth pair + * @veth1MaxLen: max length of veth1 name + * @veth2: name for one end of veth pair + * @veth2MaxLen: max length of veth1 name + * + * Creates a veth device pair using the ip command: + * ip link add veth1 type veth peer name veth2 + * NOTE: If veth1 and veth2 names are not specified, ip will auto assign + * names. There seems to be two problems here - + * 1) There doesn't seem to be a way to determine the names of the + * devices that it creates. They show up in ip link show and + * under /sys/class/net/ however there is no guarantee that they + * are the devices that this process just created. + * 2) Once one of the veth devices is moved to another namespace, it + * is no longer visible in the parent namespace. This seems to + * confuse the name assignment causing it to fail with File exists. + * Because of these issues, this function currently forces the caller + * to fully specify the veth device names. + * + * Returns 0 on success or -1 in case of error + */ +int vethCreate(char* veth1, int veth1MaxLen, + char* veth2, int veth2MaxLen)
No need of the veth1MaxLen parameter, you already have it, it is IF_NAMESIZE.
+{ + int rc = -1; + const char *argv[] = { + "ip", "link", "add", veth1, "type", "veth", "peer", "name", veth2, NULL + }; + int cmdResult; + int vethDev = 0; + + if ((NULL == veth1) || (NULL == veth2)) { + goto error_out; + } + + DEBUG("veth1: %s veth2: %s", veth1, veth2); + + if (1 > strlen(veth1)) {
Why do you check with strlen > 1 ?
+ vethDev = getFreeVethName(veth1, veth1MaxLen, 0); + ++vethDev; + DEBUG("assigned veth1: %s", veth1); + } + + if (1 > strlen(veth2)) { + vethDev = getFreeVethName(veth2, veth2MaxLen, vethDev); + DEBUG("assigned veth2: %s", veth2); + } + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * vethDelete: + * @veth: name for one end of veth pair + * + * This will delete both veth devices in a pair. Only one end needs to + * be specified. The ip command will identify and delete the other veth + * device as well. + * ip link del veth + * + * Returns 0 on success or -1 in case of error + */ +int vethDelete(const char *veth) +{ + int rc = -1; + const char *argv[] = {"ip", "link", "del", veth, NULL}; + int cmdResult; + + if (NULL == veth) { + goto error_out; + } + + DEBUG("veth: %s", veth); + + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * vethInterfaceUpOrDown: + * @veth: name of veth device + * @upOrDown: 0 => down, 1 => up + * + * Enables a veth device using the ifconfig command. A NULL inetAddress + * will cause it to be left off the command line. + * + * Returns 0 on success or -1 in case of error + */ +int vethInterfaceUpOrDown(const char* veth, int upOrDown) +{ + int rc = -1; + char upOrDownString[8]; + const char *argv[] = {"ifconfig", veth, upOrDownString, NULL}; + int cmdResult; + + if (NULL == veth) { + goto error_out; + } + + if (0 == upOrDown) { + strcpy(upOrDownString, "down"); + } else { + strcpy(upOrDownString, "up"); + }
You don't need to copy the string, a const char *upOrDownString and upOrDownString = "down" will work.
+ + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +error_out: + return rc; +} + +/** + * moveInterfaceToNetNs: + * @interface: name of device + * @pidInNs: PID of process in target net namespace + * + * Moves the given device into the target net namespace specified by the given + * pid using this command: + * ip link set interface netns pidInNs + * + * Returns 0 on success or -1 in case of error + */ +int moveInterfaceToNetNs(const char* interface, int pidInNs) +{ + int rc; + /* offset of the pid field in the following args */ + const int pidArgvOffset = 5; + const char *argv[] = { + "ip", "link", "set", interface, "netns", NULL, NULL + }; + int cmdResult; + int len; + + if (NULL == interface) { + goto error_out; + } + + if (0 != VIR_ALLOC_N(argv[pidArgvOffset], (sizeof(int) * 3) + 1)) { + goto error_out; + } + len = snprintf(argv[pidArgvOffset], (sizeof(int) * 3) + 1, "%d", pidInNs); + if (len >= (sizeof(int) * 3) + 1) { + goto cleanup; + }
Why don't you just do: char pidstr[PIDSTRLEN]; const char *argv[] = { "ip", "link", "set", interface, "netns", pidstr, NULL }; snprintf(pidstr, PIDSTRLEN, "%d", pidInNs); That should work, no ?
+ + rc = virRun(NULL, (char**)argv, &cmdResult); + + if (0 == rc) { + rc = cmdResult; + } + +cleanup: + VIR_FREE(argv[pidArgvOffset]); + +error_out: + return rc; +} + +#endif /* HAVE_NETNS */ + diff -r 8d2afc533c91 -r 386c067de899 src/veth.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/veth.h Thu Jun 19 08:59:24 2008 -0700 @@ -0,0 +1,39 @@ +/* + * Copyright IBM Corp. 2008 + * + * veth.h: file description + * + * Authors: + * David L. Leskovec <dlesko at linux.vnet.ibm.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef VETH_H +#define VETH_H + +#include <config.h> + +#ifdef HAVE_NETNS + +/* Function declarations */ +int vethCreate(char* veth1, int veth1MaxLen, char* veth2, + int veth2MaxLen); +int vethDelete(const char* veth); +int vethInterfaceUpOrDown(const char* veth, int upOrDown); +int moveInterfaceToNetNs(const char *interface, int pidInNs); + +#endif /* HAVE_NETNS */ +#endif /* VETH_H */

+#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, "%s", msg)
DL> Do you know ##__VA_ARGS ? Yes, and I agree this not my preferred way. However, this isn't my code and when I went to look for other instances, I found many so I decided not to change it. % cat *.c | grep DEBUG0 -c 47 DL> Is this function safe for concurrent access ? eg. getFreeVethName DL> called in parallel by two processes or another process creates a DL> pair device just after you exit the loop ? No, there is indeed a race. However, since this just makes a suggestion and relies on the subsequent call to try to instantiate the interface, I don't think there's a chance for anything other than a spurious failure. This is specifically why I wanted to get this on the list sooner than later. I figured (or hoped) that given the experience in this group that someone could offer some educated suggestions before I jumped in and started trying to fix it.
+ if (1 > strlen(veth1)) {
DL> Why do you check with strlen > 1 ? To be honest, I have no idea why Dave did that. I'll check with him to see if there's something we're missing. It's probably worth a comment if so. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

Dan Smith wrote:
+ if (1 > strlen(veth1)) {
DL> Why do you check with strlen > 1 ?
To be honest, I have no idea why Dave did that. I'll check with him to see if there's something we're missing. It's probably worth a comment if so.
So that's checking if veth1 is an empty string and hence needs to have a name assigned. I don't recall any special reason for using (1 > strlen()). -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

# HG changeset patch # User Dave Leskovec <dlesko@linux.vnet.ibm.com> # Date 1213891177 25200 # Node ID acf369a2543ad52b235ae8541c8ad05670e255bd # Parent 386c067de8995028dd11f70602081c31682dd293 [LXC] Add XML parsing of container network interfaces. diff -r 386c067de899 -r acf369a2543a src/lxc_conf.c --- a/src/lxc_conf.c Thu Jun 19 08:59:24 2008 -0700 +++ b/src/lxc_conf.c Thu Jun 19 08:59:37 2008 -0700 @@ -70,6 +70,191 @@ codeErrorMessage, errorMessage, NULL, 0, 0, codeErrorMessage, errorMessage); } + +#ifdef HAVE_NETNS +/** + * lxcParseInterfaceXML: + * @conn: pointer to connection + * @nodePtr: pointer to xml node structure + * @vm: pointer to net definition structure to fill in + * + * Parses the XML for a network interface and places the configuration + * in the given structure. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcParseInterfaceXML(virConnectPtr conn, xmlNodePtr nodePtr, + lxc_net_def_t *netDef) +{ + int rc = -1; + xmlNodePtr cur; + xmlChar *type = NULL; + xmlChar *parentIfName = NULL; + xmlChar *network = NULL; + xmlChar *bridge = NULL; + xmlChar *macaddr = NULL; + + netDef->type = LXC_NET_NETWORK; + + type = xmlGetProp(nodePtr, BAD_CAST "type"); + if (type != NULL) { + if (xmlStrEqual(type, BAD_CAST "network")) { + netDef->type = LXC_NET_NETWORK; + } + else if (xmlStrEqual(type, BAD_CAST "bridge")) { + netDef->type = LXC_NET_BRIDGE; + } + else { + lxcError(conn, NULL, VIR_ERR_XML_ERROR, + _("invalid interface type: %s"), type); + goto error_out; + } + } + + cur = nodePtr->children; + for (cur = nodePtr->children; cur != NULL; cur = cur->next) { + if (cur->type == XML_ELEMENT_NODE) { + DEBUG("cur->name: %s", (char*)(cur->name)); + if ((macaddr == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "mac"))) { + macaddr = xmlGetProp(cur, BAD_CAST "address"); + } else if ((network == NULL) && + (netDef->type == LXC_NET_NETWORK) && + (xmlStrEqual(cur->name, BAD_CAST "source"))) { + network = xmlGetProp(cur, BAD_CAST "network"); + parentIfName = xmlGetProp(cur, BAD_CAST "dev"); + } else if ((bridge == NULL) && + (netDef->type == LXC_NET_BRIDGE) && + (xmlStrEqual(cur->name, BAD_CAST "source"))) { + bridge = xmlGetProp(cur, BAD_CAST "bridge"); + } else if ((parentIfName == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "target"))) { + parentIfName = xmlGetProp(cur, BAD_CAST "dev"); + } + } + } + + if (netDef->type == LXC_NET_NETWORK) { + if (network == NULL) { + lxcError(conn, NULL, VIR_ERR_XML_ERROR, + _("No <source> 'network' attribute specified with <interface type='network'/>")); + goto error_out; + } + + netDef->txName = strdup((char *)network); + if (NULL == netDef->txName) { + lxcError(conn, NULL, VIR_ERR_NO_MEMORY, + _("No storage for network name")); + goto error_out; + } + + } else if (netDef->type == LXC_NET_BRIDGE) { + if (bridge == NULL) { + lxcError(conn, NULL, VIR_ERR_XML_ERROR, + _("No <source> 'bridge' attribute specified with <interface type='bridge'/>")); + goto error_out; + } + + netDef->txName = strdup((char *)bridge); + if (NULL == netDef->txName) { + lxcError(conn, NULL, VIR_ERR_NO_MEMORY, + _("No storage for bridge name")); + goto error_out; + } + } + + if (parentIfName != NULL) { + DEBUG("set netDef->parentVeth: %s", netDef->parentVeth); + netDef->parentVeth = strdup((char *)parentIfName); + if (NULL == netDef->parentVeth) { + lxcError(conn, NULL, VIR_ERR_NO_MEMORY, + _("No storage for parent veth device name")); + goto error_out; + } + } else { + netDef->parentVeth = NULL; + DEBUG0("set netDef->parentVeth: NULL"); + } + + rc = 0; + +error_out: + xmlFree(macaddr); + macaddr = NULL; + xmlFree(network); + network = NULL; + xmlFree(bridge); + bridge = NULL; + xmlFree(parentIfName); + parentIfName = NULL; + + return rc; +} + +/** + * lxcParseDomainInterfaces: + * @conn: pointer to connection + * @nets: on success, points to an list of net def structs + * @contextPtr: pointer to xml context + * + * Parses the domain network interfaces and returns the information in a list + * + * Returns 0 on success or -1 in case of error + */ +static int lxcParseDomainInterfaces(virConnectPtr conn, + lxc_net_def_t **nets, + xmlXPathContextPtr contextPtr) +{ + int rc = -1; + int i; + lxc_net_def_t *netDef; + lxc_net_def_t *prevDef = NULL; + int numNets = 0; + xmlNodePtr *list; + int res; + + DEBUG0("parsing nets"); + + res = virXPathNodeSet("/domain/devices/interface", contextPtr, &list); + if (res > 0) { + for (i = 0; i < res; ++i) { + netDef = calloc(1, sizeof(lxc_net_def_t)); + if (NULL == netDef) { + lxcError(conn, NULL, VIR_ERR_NO_MEMORY, + _("No storage for net def structure")); + goto parse_complete; + } + + rc = lxcParseInterfaceXML(conn, list[i], netDef); + if (0 > rc) { + DEBUG("failed parsing a net: %d", rc); + + free(netDef); + goto parse_complete; + } + + DEBUG0("parsed a net"); + + /* set the linked list pointers */ + numNets++; + netDef->next = NULL; + if (0 == i) { + *nets = netDef; + } else { + prevDef->next = netDef; + } + prevDef = netDef; + } + free(list); + } + + rc = numNets; + +parse_complete: + DEBUG("parsed %d nets", rc); + return rc; +} +#endif /* HAVE_NETNS */ static int lxcParseMountXML(virConnectPtr conn, xmlNodePtr nodePtr, lxc_mount_t *lxcMount) @@ -374,6 +559,15 @@ if (0 > containerDef->nmounts) { goto error; } + +#ifdef HAVE_NETNS + containerDef->numNets = lxcParseDomainInterfaces(conn, + &(containerDef->nets), + contextPtr); + if (0 > containerDef->numNets) { + goto error; + } +#endif if (lxcParseDomainTty(conn, &(containerDef->tty), contextPtr) < 0) { goto error; @@ -741,6 +935,7 @@ unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; lxc_mount_t *mount; + lxc_net_def_t *net; if (lxcIsActiveVM(vm)) virBufferVSprintf(&buf, "<domain type='%s' id='%d'>\n", @@ -770,6 +965,27 @@ virBufferAddLit(&buf, " </filesystem>\n"); } + /* loop adding nets */ + for (net = def->nets; net; net = net->next) { + if (net->type == LXC_NET_NETWORK) { + virBufferAddLit(&buf, " <interface type='network'>\n"); + virBufferVSprintf(&buf, " <source network='%s'/>\n", + net->txName); + } else { + virBufferAddLit(&buf, " <interface type='bridge'>\n"); + virBufferVSprintf(&buf, " <source bridge='%s'/>\n", + net->txName); + } + + if (NULL != net->parentVeth) { + virBufferVSprintf(&buf, " <target dev='%s'/>\n", + net->parentVeth); + } + + virBufferAddLit(&buf, " </interface>\n"); + + } + virBufferVSprintf(&buf, " <console tty='%s'/>\n", def->tty); virBufferAddLit(&buf, " </devices>\n"); virBufferAddLit(&buf, "</domain>\n"); @@ -786,6 +1002,8 @@ { lxc_mount_t *curMount; lxc_mount_t *nextMount; + lxc_net_def_t *curNet; + lxc_net_def_t *nextNet; if (vmdef == NULL) return; @@ -795,6 +1013,17 @@ nextMount = curMount->next; VIR_FREE(curMount); curMount = nextMount; + } + + curNet = vmdef->nets; + while (curNet) { + nextNet = curNet->next; + printf("Freeing %s:%s\n", curNet->parentVeth, curNet->containerVeth); + VIR_FREE(curNet->parentVeth); + VIR_FREE(curNet->containerVeth); + VIR_FREE(curNet->txName); + VIR_FREE(curNet); + curNet = nextNet; } VIR_FREE(vmdef->name); diff -r 386c067de899 -r acf369a2543a src/lxc_conf.h --- a/src/lxc_conf.h Thu Jun 19 08:59:24 2008 -0700 +++ b/src/lxc_conf.h Thu Jun 19 08:59:37 2008 -0700 @@ -36,6 +36,22 @@ #define LXC_MAX_ERROR_LEN 1024 #define LXC_DOMAIN_TYPE "lxc" +/* types of networks for containers */ +enum lxc_net_type { + LXC_NET_NETWORK, + LXC_NET_BRIDGE +}; + +typedef struct __lxc_net_def lxc_net_def_t; +struct __lxc_net_def { + int type; + char *parentVeth; /* veth device in parent namespace */ + char *containerVeth; /* veth device in container namespace */ + char *txName; /* bridge or network name */ + + lxc_net_def_t *next; +}; + typedef struct __lxc_mount lxc_mount_t; struct __lxc_mount { char source[PATH_MAX]; /* user's directory */ @@ -61,6 +77,10 @@ /* tty device */ char *tty; + + /* network devices */ + int numNets; + lxc_net_def_t *nets; }; typedef struct __lxc_vm lxc_vm_t;

# HG changeset patch # User Dan Smith <danms@us.ibm.com> # Date 1213891185 25200 # Node ID cb780a7b3ad591f1a9392d6528218b3aa2c3483d # Parent acf369a2543ad52b235ae8541c8ad05670e255bd [LXC] Add setup/cleanup of container network interfaces diff -r acf369a2543a -r cb780a7b3ad5 src/lxc_conf.h --- a/src/lxc_conf.h Thu Jun 19 08:59:37 2008 -0700 +++ b/src/lxc_conf.h Thu Jun 19 08:59:45 2008 -0700 @@ -35,6 +35,12 @@ #define LXC_MAX_XML_LENGTH 16384 #define LXC_MAX_ERROR_LEN 1024 #define LXC_DOMAIN_TYPE "lxc" +#define LXC_PARENT_SOCKET 0 +#define LXC_CONTAINER_SOCKET 1 + +/* messages between parent and container */ +typedef char lxc_message_t; +#define LXC_CONTINUE_MSG 'c' /* types of networks for containers */ enum lxc_net_type { @@ -96,6 +102,8 @@ int parentTty; int containerTtyFd; char *containerTty; + + int sockpair[2]; lxc_vm_def_t *def; diff -r acf369a2543a -r cb780a7b3ad5 src/lxc_container.c --- a/src/lxc_container.c Thu Jun 19 08:59:37 2008 -0700 +++ b/src/lxc_container.c Thu Jun 19 08:59:45 2008 -0700 @@ -36,6 +36,7 @@ #include "lxc_conf.h" #include "util.h" #include "memory.h" +#include "veth.h" #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) #define DEBUG0(msg) VIR_DEBUG(__FILE__, "%s", msg) @@ -159,6 +160,74 @@ } /** + * lxcWaitForContinue: + * @vm: Pointer to vm structure + * + * This function will wait for the container continue message from the + * parent process. It will send this message on the socket pair stored in + * the vm structure once it has completed the post clone container setup. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcWaitForContinue(lxc_vm_t *vm) +{ + int rc = -1; + lxc_message_t msg; + int readLen = 0; + + readLen = saferead(vm->sockpair[LXC_CONTAINER_SOCKET], &msg, sizeof(msg)); + if (readLen != sizeof(msg)) { + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _("Failed to read the container continue message: %s"), + strerror(errno)); + goto error_out; + } + + DEBUG0("Received container continue message"); + + close(vm->sockpair[LXC_PARENT_SOCKET]); + vm->sockpair[LXC_PARENT_SOCKET] = -1; + close(vm->sockpair[LXC_CONTAINER_SOCKET]); + vm->sockpair[LXC_CONTAINER_SOCKET] = -1; + + rc = 0; + +error_out: + return rc; +} + +#ifdef HAVE_NETNS +/** + * lxcEnableInterfaces: + * @vm: Pointer to vm structure + * + * This function will enable the interfaces for this container. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcEnableInterfaces(lxc_vm_t *vm) +{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets; + int i = 0; + + for (i = 0; net; net = net->next) { + DEBUG("Enabling %s", net->containerVeth); + rc = vethInterfaceUpOrDown(net->containerVeth, 1); + if (0 != rc) { + goto error_out; + } + } + + /* enable lo device */ + rc = vethInterfaceUpOrDown("lo", 1); + +error_out: + return rc; +} +#endif /* HAVE_NETNS */ + +/** * lxcChild: * @argv: Pointer to container arguments * @@ -210,6 +279,18 @@ goto cleanup; } + /* Wait for interface devices to show up */ + if (0 != (rc = lxcWaitForContinue(vm))) { + goto cleanup; + } + +#ifdef HAVE_NETNS + /* enable interfaces */ + if (0 != (rc = lxcEnableInterfaces(vm))) { + goto cleanup; + } +#endif + rc = lxcExecWithTty(vm); /* this function will only return if an error occured */ diff -r acf369a2543a -r cb780a7b3ad5 src/lxc_driver.c --- a/src/lxc_driver.c Thu Jun 19 08:59:37 2008 -0700 +++ b/src/lxc_driver.c Thu Jun 19 08:59:45 2008 -0700 @@ -44,6 +44,9 @@ #include "memory.h" #include "util.h" #include "memory.h" +#include "bridge.h" +#include "qemu_conf.h" +#include "veth.h" /* debug macros */ #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) @@ -66,6 +69,9 @@ #ifndef CLONE_NEWIPC #define CLONE_NEWIPC 0x08000000 #endif +#ifndef CLONE_NEWNET +#define CLONE_NEWNET 0x40000000 /* New network namespace */ +#endif static int lxcStartup(void); static int lxcShutdown(void); @@ -81,6 +87,9 @@ { int rc = 0; int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| +#ifdef HAVE_NETNS + CLONE_NEWNET| +#endif CLONE_NEWIPC|SIGCHLD; int cpid; char *childStack; @@ -237,6 +246,9 @@ static int lxcNumDomains(virConnectPtr conn) { lxc_driver_t *driver = (lxc_driver_t *)conn->privateData; + + DEBUG("driver: %p network: %p", conn->privateData, conn->networkPrivateData); + return driver->nactivevms; } @@ -384,6 +396,197 @@ return lxcGenerateXML(dom->conn, driver, vm, vm->def); } +#ifdef HAVE_NETNS +/** + * lxcSetupInterfaces: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Sets up the container interfaces by creating the veth device pairs and + * attaching the parent end to the appropriate bridge. The container end + * will moved into the container namespace later after clone has been called. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcSetupInterfaces(virConnectPtr conn, + lxc_vm_t *vm) +{ + int rc = -1; + struct qemud_driver *networkDriver = + (struct qemud_driver *)(conn->networkPrivateData); + lxc_net_def_t *net = vm->def->nets; + int i = 0; + char* bridge; + char parentVeth[PATH_MAX] = ""; + char containerVeth[PATH_MAX] = ""; + + for (i = 0; net; net = net->next) { + if (LXC_NET_NETWORK == net->type) { + virNetworkPtr network = virNetworkLookupByName(conn, net->txName); + if (!network) { + goto error_exit; + } + + bridge = virNetworkGetBridgeName(network); + + virNetworkFree(network); + + } else { + bridge = net->txName; + } + + DEBUG("bridge: %s", bridge); + if (NULL == bridge) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to get bridge for interface")); + goto error_exit; + } + + DEBUG0("calling vethCreate()"); + if (NULL != net->parentVeth) { + strcpy(parentVeth, net->parentVeth); + } + if (NULL != net->containerVeth) { + strcpy(containerVeth, net->containerVeth); + } + DEBUG("parentVeth: %s, containerVeth: %s", parentVeth, containerVeth); + if (0 != (rc = vethCreate(parentVeth, PATH_MAX, containerVeth, PATH_MAX))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to create veth device pair: %d"), rc); + goto error_exit; + } + if (NULL == net->parentVeth) { + net->parentVeth = strdup(parentVeth); + } + if (NULL == net->containerVeth) { + net->containerVeth = strdup(containerVeth); + } + + if (!(networkDriver->brctl) && (rc = brInit(&(networkDriver->brctl)))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("cannot initialize bridge support: %s"), + strerror(rc)); + goto error_exit; + } + + if (0 != (rc = brAddInterface(networkDriver->brctl, bridge, parentVeth))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to add %s device to %s: %s"), + parentVeth, + bridge, + strerror(rc)); + goto error_exit; + } + + if (0 != (rc = vethInterfaceUpOrDown(parentVeth, 1))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to enable parent ns veth device: %d"), rc); + goto error_exit; + } + + } + + rc = 0; + +error_exit: + return rc; +} + +/** + * lxcMoveInterfacesToNetNs: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Starts a container process by calling clone() with the namespace flags + * + * Returns 0 on success or -1 in case of error + */ +static int lxcMoveInterfacesToNetNs(virConnectPtr conn, + lxc_vm_t *vm) +{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets; + int i = 0; + + for (i = 0; net; net = net->next) { + if (0 != moveInterfaceToNetNs(net->containerVeth, vm->def->id)) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to move interface %s to ns %d"), + net->containerVeth, vm->def->id); + goto error_exit; + } + } + + rc = 0; + +error_exit: + return rc; +} + +/** + * lxcCleanupInterfaces: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Cleans up the container interfaces by deleting the veth device pairs. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcCleanupInterfaces(lxc_vm_t *vm) +{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets; + int i = 0; + + for (i = 0; net; net = net->next) { + if (0 != (rc = vethDelete(net->parentVeth))) { + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to delete veth: %s"), net->parentVeth); + /* will continue to try to cleanup any other interfaces */ + } + + } + + rc = 0; + + return rc; +} +#endif /* HAVE_NETNS */ + +/** + * lxcSendContainerContinue: + * @vm: pointer to virtual machine structure + * + * Sends the continue message via the socket pair stored in the vm + * structure. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcSendContainerContinue(lxc_vm_t *vm) +{ + int rc = -1; + lxc_message_t msg = LXC_CONTINUE_MSG; + int writeCount = 0; + + if (NULL == vm) { + goto error_out; + } + + writeCount = safewrite(vm->sockpair[LXC_PARENT_SOCKET], &msg, + sizeof(msg)); + if (writeCount != sizeof(msg)) { + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _("unable to send container continue message: %s"), + strerror(errno)); + goto error_out; + } + + rc = 0; + +error_out: + return rc; +} + /** * lxcStartContainer: * @conn: pointer to connection @@ -411,7 +614,11 @@ } stacktop = stack + stacksize; - flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER|CLONE_NEWIPC|SIGCHLD; + flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| +#ifdef HAVE_NETNS + CLONE_NEWNET| +#endif + CLONE_NEWIPC|SIGCHLD; vm->def->id = clone(lxcChild, stacktop, flags, (void *)vm); @@ -809,7 +1016,34 @@ close(vm->parentTty); close(vm->containerTtyFd); +#ifdef HAVE_NETNS + if (0 != (rc = lxcSetupInterfaces(conn, vm))) { + goto cleanup; + } +#endif /* HAVE_NETNS */ + + /* create a socket pair to send continue message to the container once */ + /* we've completed the post clone configuration */ + if (0 != socketpair(PF_UNIX, SOCK_STREAM, 0, vm->sockpair)) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("sockpair failed: %s"), strerror(errno)); + goto cleanup; + } + + /* check this rc */ + rc = lxcStartContainer(conn, driver, vm); + +#ifdef HAVE_NETNS + rc = lxcMoveInterfacesToNetNs(conn, vm); +#endif + + rc = lxcSendContainerContinue(vm); + + close(vm->sockpair[LXC_PARENT_SOCKET]); + vm->sockpair[LXC_PARENT_SOCKET] = -1; + close(vm->sockpair[LXC_CONTAINER_SOCKET]); + vm->sockpair[LXC_CONTAINER_SOCKET] = -1; if (rc == 0) { vm->state = VIR_DOMAIN_RUNNING; @@ -948,6 +1182,11 @@ int waitRc; int childStatus = -1; + /* if this fails, we'll continue. it will report any errors */ +#ifdef HAVE_NETNS + lxcCleanupInterfaces(vm); +#endif + while (((waitRc = waitpid(vm->def->id, &childStatus, 0)) == -1) && errno == EINTR);

Hi Dan, Here are a few suggestions. The bits about strdup are more substantive.
+static int lxcWaitForContinue(lxc_vm_t *vm) ... + int rc = -1; + lxc_message_t msg; + int readLen = 0;
Don't initialize readLen in the declaration, since it's set unconditionally just below.
+ readLen = saferead(vm->sockpair[LXC_CONTAINER_SOCKET], &msg, sizeof(msg)); + if (readLen != sizeof(msg)) { + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _("Failed to read the container continue message: %s"), + strerror(errno)); + goto error_out; + } + + DEBUG0("Received container continue message"); + + close(vm->sockpair[LXC_PARENT_SOCKET]); + vm->sockpair[LXC_PARENT_SOCKET] = -1; + close(vm->sockpair[LXC_CONTAINER_SOCKET]); + vm->sockpair[LXC_CONTAINER_SOCKET] = -1; + + rc = 0; + +error_out: + return rc; +} + +#ifdef HAVE_NETNS +/** + * lxcEnableInterfaces: + * @vm: Pointer to vm structure + * + * This function will enable the interfaces for this container. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcEnableInterfaces(lxc_vm_t *vm)
It looks like the "vm" parameter may be "const". If so, please declare it as such. Same with all of the ones below.
+{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets;
Once vm is const, it's good to make other pointers "const", too, if possible. Here, "net" may be one, i.e., if vethInterfaceUpOrDown doesn't modify memory through its first parameter.
+ int i = 0;
I was going to suggest making "i" unsigned if it's never negative. But it's not even used. So, please remove that declaration and the use below.
+ for (i = 0; net; net = net->next) { + DEBUG("Enabling %s", net->containerVeth); + rc = vethInterfaceUpOrDown(net->containerVeth, 1); + if (0 != rc) { + goto error_out; + } + } + + /* enable lo device */ + rc = vethInterfaceUpOrDown("lo", 1); + +error_out: + return rc; +} +#endif /* HAVE_NETNS */ + +/** * lxcChild: * @argv: Pointer to container arguments * @@ -210,6 +279,18 @@ goto cleanup; }
+ /* Wait for interface devices to show up */ + if (0 != (rc = lxcWaitForContinue(vm))) { + goto cleanup; + } + +#ifdef HAVE_NETNS
Please remove this in-function #ifdef. Instead, arrange for lxcEnableInterfaces to be defined as a no-op function when HAVE_NETNS is not defined.
+ /* enable interfaces */ + if (0 != (rc = lxcEnableInterfaces(vm))) { + goto cleanup; + } +#endif ... diff -r acf369a2543a -r cb780a7b3ad5 src/lxc_driver.c --- a/src/lxc_driver.c Thu Jun 19 08:59:37 2008 -0700 +++ b/src/lxc_driver.c Thu Jun 19 08:59:45 2008 -0700 @@ -44,6 +44,9 @@ #include "memory.h" #include "util.h" #include "memory.h" +#include "bridge.h" +#include "qemu_conf.h" +#include "veth.h"
/* debug macros */ #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) @@ -66,6 +69,9 @@ #ifndef CLONE_NEWIPC #define CLONE_NEWIPC 0x08000000 #endif +#ifndef CLONE_NEWNET +#define CLONE_NEWNET 0x40000000 /* New network namespace */ +#endif
static int lxcStartup(void); static int lxcShutdown(void); @@ -81,6 +87,9 @@ { int rc = 0; int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| +#ifdef HAVE_NETNS
Please remove the #ifdef. Simply arrange for CLONE_NEWNET to be 0 when HAVE_NETNS is not defined. Then you can use it without the ugly #ifdef.
+ CLONE_NEWNET| +#endif CLONE_NEWIPC|SIGCHLD; int cpid; char *childStack; @@ -237,6 +246,9 @@ static int lxcNumDomains(virConnectPtr conn) { lxc_driver_t *driver = (lxc_driver_t *)conn->privateData; + + DEBUG("driver: %p network: %p", conn->privateData, conn->networkPrivateData); + return driver->nactivevms; }
@@ -384,6 +396,197 @@ return lxcGenerateXML(dom->conn, driver, vm, vm->def); }
+#ifdef HAVE_NETNS +/** + * lxcSetupInterfaces: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Sets up the container interfaces by creating the veth device pairs and + * attaching the parent end to the appropriate bridge. The container end + * will moved into the container namespace later after clone has been called. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcSetupInterfaces(virConnectPtr conn, + lxc_vm_t *vm) +{ + int rc = -1; + struct qemud_driver *networkDriver = + (struct qemud_driver *)(conn->networkPrivateData); + lxc_net_def_t *net = vm->def->nets; + int i = 0;
Useless initialization and declaration again. And a couple more below
+ char* bridge; + char parentVeth[PATH_MAX] = ""; + char containerVeth[PATH_MAX] = ""; + + for (i = 0; net; net = net->next) { + if (LXC_NET_NETWORK == net->type) { + virNetworkPtr network = virNetworkLookupByName(conn, net->txName); + if (!network) { + goto error_exit; + } + + bridge = virNetworkGetBridgeName(network); + + virNetworkFree(network); + + } else { + bridge = net->txName; + } + + DEBUG("bridge: %s", bridge); + if (NULL == bridge) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to get bridge for interface")); + goto error_exit; + } + + DEBUG0("calling vethCreate()"); + if (NULL != net->parentVeth) { + strcpy(parentVeth, net->parentVeth); + } + if (NULL != net->containerVeth) { + strcpy(containerVeth, net->containerVeth); + } + DEBUG("parentVeth: %s, containerVeth: %s", parentVeth, containerVeth); + if (0 != (rc = vethCreate(parentVeth, PATH_MAX, containerVeth, PATH_MAX))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to create veth device pair: %d"), rc); + goto error_exit; + } + if (NULL == net->parentVeth) { + net->parentVeth = strdup(parentVeth); + } + if (NULL == net->containerVeth) { + net->containerVeth = strdup(containerVeth); + }
Don't you want to handle failed strdup here? It looks like other places require net->containerVeth and net->parentVeth to be non-NULL.
+ if (!(networkDriver->brctl) && (rc = brInit(&(networkDriver->brctl)))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("cannot initialize bridge support: %s"), + strerror(rc)); + goto error_exit; + } + + if (0 != (rc = brAddInterface(networkDriver->brctl, bridge, parentVeth))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to add %s device to %s: %s"), + parentVeth, + bridge, + strerror(rc)); + goto error_exit; + } + + if (0 != (rc = vethInterfaceUpOrDown(parentVeth, 1))) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to enable parent ns veth device: %d"), rc); + goto error_exit; + } + + } + + rc = 0; + +error_exit: + return rc; +} + +/** + * lxcMoveInterfacesToNetNs: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Starts a container process by calling clone() with the namespace flags + * + * Returns 0 on success or -1 in case of error + */ +static int lxcMoveInterfacesToNetNs(virConnectPtr conn, + lxc_vm_t *vm) +{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets; + int i = 0;
unused decl.
+ + for (i = 0; net; net = net->next) { + if (0 != moveInterfaceToNetNs(net->containerVeth, vm->def->id)) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to move interface %s to ns %d"), + net->containerVeth, vm->def->id); + goto error_exit; + } + } + + rc = 0; + +error_exit: + return rc; +} + +/** + * lxcCleanupInterfaces: + * @conn: pointer to connection + * @vm: pointer to virtual machine structure + * + * Cleans up the container interfaces by deleting the veth device pairs. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcCleanupInterfaces(lxc_vm_t *vm) +{ + int rc = -1; + lxc_net_def_t *net = vm->def->nets; + int i = 0;
likewise.
stacktop = stack + stacksize;
- flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER|CLONE_NEWIPC|SIGCHLD; + flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| +#ifdef HAVE_NETNS
remove ifdef
+ CLONE_NEWNET| +#endif + CLONE_NEWIPC|SIGCHLD;
vm->def->id = clone(lxcChild, stacktop, flags, (void *)vm);
@@ -809,7 +1016,34 @@ close(vm->parentTty); close(vm->containerTtyFd);
+#ifdef HAVE_NETNS
Define a stub function that returns 0, so you can avoid the #ifdef.
+ if (0 != (rc = lxcSetupInterfaces(conn, vm))) { + goto cleanup; + } +#endif /* HAVE_NETNS */ + + /* create a socket pair to send continue message to the container once */ + /* we've completed the post clone configuration */ + if (0 != socketpair(PF_UNIX, SOCK_STREAM, 0, vm->sockpair)) { + lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR, + _("sockpair failed: %s"), strerror(errno)); + goto cleanup; + } + + /* check this rc */ + rc = lxcStartContainer(conn, driver, vm); + +#ifdef HAVE_NETNS
Avoid #ifdefs. BTW, what's the point of saving return value in "rc" if the very next statement is going to overwrite that value? Either test it, or add a comment saying why it's ok to ignore failure, in which case don't clobber the previous value.
+ rc = lxcMoveInterfacesToNetNs(conn, vm); +#endif + + rc = lxcSendContainerContinue(vm); + + close(vm->sockpair[LXC_PARENT_SOCKET]); + vm->sockpair[LXC_PARENT_SOCKET] = -1; + close(vm->sockpair[LXC_CONTAINER_SOCKET]); + vm->sockpair[LXC_CONTAINER_SOCKET] = -1;
if (rc == 0) { vm->state = VIR_DOMAIN_RUNNING; @@ -948,6 +1182,11 @@ int waitRc; int childStatus = -1;
+ /* if this fails, we'll continue. it will report any errors */ +#ifdef HAVE_NETNS
Define a no-op version of lxcCleanupInterfaces, so you can remove this in-function #ifdef.
+ lxcCleanupInterfaces(vm); +#endif + while (((waitRc = waitpid(vm->def->id, &childStatus, 0)) == -1) && errno == EINTR);

JM> Don't initialize readLen in the declaration, since it's set JM> unconditionally just below. I cleaned up a surprising number of these already when I took ownership of the patch, but it looks like I missed a bunch more. JM> Please remove this in-function #ifdef. Instead, arrange for JM> lxcEnableInterfaces to be defined as a no-op function when JM> HAVE_NETNS is not defined. Okay. JM> Please remove the #ifdef. Simply arrange for CLONE_NEWNET to be 0 JM> when HAVE_NETNS is not defined. Then you can use it without the JM> ugly #ifdef. So what happens if CLONE_NEWNET is present on the system (and supported by the kernel) but the 'ip' binary doesn't support it? Unless we #undef CLONE_NEWNET, you would create a new network namespace and not be able to move anything into it. Would that be your preference? JM> Don't you want to handle failed strdup here? It looks like other JM> places require net->containerVeth and net-> parentVeth to be non-NULL. Yep.
+ /* check this rc */ + rc = lxcStartContainer(conn, driver, vm); + +#ifdef HAVE_NETNS
JM> BTW, what's the point of saving return value in "rc" if the very JM> next statement is going to overwrite that value? Either test it, JM> or add a comment saying why it's ok to ignore failure, in which JM> case don't clobber the previous value. I think the comment above that code is supposed justify it :) I'll just fix up the checking instead and remove the comment. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

Dan Smith <danms@us.ibm.com> wrote:
JM> Please remove the #ifdef. Simply arrange for CLONE_NEWNET to be 0 JM> when HAVE_NETNS is not defined. Then you can use it without the JM> ugly #ifdef.
So what happens if CLONE_NEWNET is present on the system (and supported by the kernel) but the 'ip' binary doesn't support it? Unless we #undef CLONE_NEWNET, you would create a new network namespace and not be able to move anything into it. Would that be your preference?
My suggestion was to eliminate the in-function #ifdef without changing semantics, by adding something like this outside the function: #ifndef HAVE_NETNS # undef CLONE_NEWNET # define CLONE_NEWNET 0 #endif ... int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| CLONE_NEWNET|CLONE_NEWIPC|SIGCHLD; That will work just like the original code. ...
+ /* check this rc */ + rc = lxcStartContainer(conn, driver, vm); + +#ifdef HAVE_NETNS
JM> BTW, what's the point of saving return value in "rc" if the very JM> next statement is going to overwrite that value? Either test it, JM> or add a comment saying why it's ok to ignore failure, in which JM> case don't clobber the previous value.
I think the comment above that code is supposed justify it :)
The way I read it, "check this rc" sounds like it must be a TODO or FIXME item, because that particular "rc" value is the one that's being clobbered.
I'll just fix up the checking instead and remove the comment.
Thanks.
participants (5)
-
Dan Smith
-
Daniel Lezcano
-
Daniel Veillard
-
Dave Leskovec
-
Jim Meyering