[libvirt] [PATCH 0/5] Fixes for lxc veth handling

From: "Daniel P. Berrange" <berrange@redhat.com> The primary goal of this series was to address the race condition in creation of veth device names. A few other bugs were identified along the way though & fixed. Daniel P. Berrange (5): Don't set netdev offline in container cleanup Avoid reporting an error if veth device is already deleted Avoid deleting NULL veth device name Retry veth device creation on failure Use 'vnet' as prefix for veth devices src/lxc/lxc_process.c | 3 +- src/util/virnetdevveth.c | 168 ++++++++++++++++++++++++++++++----------------- 2 files changed, 107 insertions(+), 64 deletions(-) -- 1.8.3.1

From: "Daniel P. Berrange" <berrange@redhat.com> During container cleanup there is a race where the kernel may have destroyed the veth device before we try to set it offline. This causes log error messages. Given that we're about to delete the device entirely, setting it offline is pointless. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index f92c613..a78784e 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -193,7 +193,6 @@ static void virLXCProcessCleanup(virLXCDriverPtr driver, virDomainNetDefPtr iface = vm->def->nets[i]; vport = virDomainNetGetActualVirtPortProfile(iface); if (iface->ifname) { - ignore_value(virNetDevSetOnline(iface->ifname, false)); if (vport && vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) ignore_value(virNetDevOpenvswitchRemovePort( -- 1.8.3.1

On 10/02/2013 05:31 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
During container cleanup there is a race where the kernel may have destroyed the veth device before we try to set it offline. This causes log error messages. Given that we're about to delete the device entirely, setting it offline is pointless.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 1 - 1 file changed, 1 deletion(-)
ACK.
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index f92c613..a78784e 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -193,7 +193,6 @@ static void virLXCProcessCleanup(virLXCDriverPtr driver, virDomainNetDefPtr iface = vm->def->nets[i]; vport = virDomainNetGetActualVirtPortProfile(iface); if (iface->ifname) { - ignore_value(virNetDevSetOnline(iface->ifname, false)); if (vport && vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) ignore_value(virNetDevOpenvswitchRemovePort(
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The kernel automatically destroys veth devices when cleaning up the container network namepace. During normal shutdown, it is thus likely that the attempt to run 'ip link del vethN' will fail. If it fails, check if the device exists, and avoid reporting an error if it has gone. This switches to use the virCommand APIs instead of virRun too. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index 039767f..c0d32c4 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -161,9 +161,20 @@ cleanup: */ int virNetDevVethDelete(const char *veth) { - const char *argv[] = {"ip", "link", "del", veth, NULL}; + virCommandPtr cmd = virCommandNewArgList("ip", "link", "del", veth, NULL); + int status; - VIR_DEBUG("veth: %s", veth); + if (virCommandRun(cmd, &status) < 0) + return -1; - return virRun(argv, NULL); + if (status != 0) { + if (!virNetDevExists(veth)) { + VIR_DEBUG("Device %s already deleted (by kernel namespace cleanup)", veth); + return 0; + } + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to delete veth device %s"), veth); + return -1; + } + return 0; } -- 1.8.3.1

On 10/02/2013 05:31 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The kernel automatically destroys veth devices when cleaning up the container network namepace. During normal shutdown, it
s/namepace/namespace/
is thus likely that the attempt to run 'ip link del vethN' will fail. If it fails, check if the device exists, and avoid reporting an error if it has gone. This switches to use the virCommand APIs instead of virRun too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 10/02/2013 07:31 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The kernel automatically destroys veth devices when cleaning up the container network namepace. During normal shutdown, it is thus likely that the attempt to run 'ip link del vethN' will fail. If it fails, check if the device exists, and avoid reporting an error if it has gone. This switches to use the virCommand APIs instead of virRun too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index 039767f..c0d32c4 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -161,9 +161,20 @@ cleanup: */ int virNetDevVethDelete(const char *veth) { - const char *argv[] = {"ip", "link", "del", veth, NULL}; + virCommandPtr cmd = virCommandNewArgList("ip", "link", "del", veth, NULL); + int status;
- VIR_DEBUG("veth: %s", veth); + if (virCommandRun(cmd, &status) < 0) + return -1;
- return virRun(argv, NULL); + if (status != 0) { + if (!virNetDevExists(veth)) { + VIR_DEBUG("Device %s already deleted (by kernel namespace cleanup)", veth); + return 0; + } + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to delete veth device %s"), veth); + return -1; + }
miss the virCommandFree.
+ return 0; }

From: "Daniel P. Berrange" <berrange@redhat.com> If veth device allocation has a fatal error, the veths array may contain NULL device names. Avoid calling the virNetDevVethDelete function on such names. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index a78784e..d07ff13 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -1290,7 +1290,7 @@ cleanup: rc = -1; } for (i = 0; i < nveths; i++) { - if (rc != 0) + if (rc != 0 && veths[i]) ignore_value(virNetDevVethDelete(veths[i])); VIR_FREE(veths[i]); } -- 1.8.3.1

On 10/02/2013 05:31 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If veth device allocation has a fatal error, the veths array may contain NULL device names. Avoid calling the virNetDevVethDelete function on such names.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
ACK.
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index a78784e..d07ff13 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -1290,7 +1290,7 @@ cleanup: rc = -1; } for (i = 0; i < nveths; i++) { - if (rc != 0) + if (rc != 0 && veths[i]) ignore_value(virNetDevVethDelete(veths[i])); VIR_FREE(veths[i]); }
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The veth device creation code run in two steps, first it looks for two free veth device names, then it runs ip link to create the veth pair. There is an obvious race between finding free names and creating them, when guests are started in parallel. Rewrite the code to loop and re-try creation if it fails, to deal with the race condition. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 151 +++++++++++++++++++++++++++++------------------ 1 file changed, 92 insertions(+), 59 deletions(-) diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index c0d32c4..b5d618c 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -33,13 +33,26 @@ #include "virfile.h" #include "virstring.h" #include "virutil.h" +#include "virnetdev.h" #define VIR_FROM_THIS VIR_FROM_NONE /* Functions */ + +static int virNetDevVethExists(int devNum) +{ + int ret; + char *path = NULL; + if (virAsprintf(&path, "/sys/class/net/veth%d/", devNum) < 0) + return -1; + ret = virFileExists(path) ? 1 : 0; + VIR_DEBUG("Checked dev veth%d usage: %d", devNum, ret); + VIR_FREE(path); + return ret; +} + /** - * virNetDevVethGetFreeName: - * @veth: pointer to store returned name for veth device + * virNetDevVethGetFreeNum: * @startDev: device number to start at (x in vethx) * * Looks in /sys/class/net/ to find the first available veth device @@ -47,25 +60,23 @@ * * Returns non-negative device number on success or -1 in case of error */ -static int virNetDevVethGetFreeName(char **veth, int startDev) +static int virNetDevVethGetFreeNum(int startDev) { - int devNum = startDev-1; - char *path = NULL; + int devNum; - VIR_DEBUG("Find free from veth%d", startDev); - do { - VIR_FREE(path); - ++devNum; - if (virAsprintf(&path, "/sys/class/net/veth%d/", devNum) < 0) - return -1; - VIR_DEBUG("Probe %s", path); - } while (virFileExists(path)); - VIR_FREE(path); +#define MAX_DEV_NUM 65536 - if (virAsprintf(veth, "veth%d", devNum) < 0) - return -1; + for (devNum = startDev ; devNum < MAX_DEV_NUM ; devNum++) { + int ret = virNetDevVethExists(devNum); + if (ret < 0) + return -1; + if (ret == 0) + return devNum; + } - return devNum; + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("No free veth devices available")); + return -1; } /** @@ -95,57 +106,79 @@ static int virNetDevVethGetFreeName(char **veth, int startDev) */ int virNetDevVethCreate(char** veth1, char** veth2) { - int rc = -1; - const char *argv[] = { - "ip", "link", "add", NULL, "type", "veth", "peer", "name", NULL, NULL - }; - int vethDev = 0; - bool veth1_alloc = false; - bool veth2_alloc = false; - - VIR_DEBUG("Host: %s guest: %s", NULLSTR(*veth1), NULLSTR(*veth2)); - - if (*veth1 == NULL) { - if ((vethDev = virNetDevVethGetFreeName(veth1, vethDev)) < 0) - goto cleanup; - VIR_DEBUG("Assigned host: %s", *veth1); - veth1_alloc = true; - vethDev++; - } - argv[3] = *veth1; + int ret = -1; + char *veth1auto = NULL; + char *veth2auto = NULL; + int vethNum = 0; + size_t i; + + /* + * We might race with other containers, but this is reasonably + * unlikely, so don't do too many retries for device creation + */ +#define MAX_VETH_RETRIES 10 + + for (i = 0 ; i < MAX_VETH_RETRIES ; i++) { + int status; + if (!*veth1) { + int veth1num; + if ((veth1num = virNetDevVethGetFreeNum(vethNum)) < 0) + goto cleanup; + + if (virAsprintf(&veth1auto, "veth%d", veth1num) < 0) + goto cleanup; + vethNum = veth1num + 1; + } + if (!*veth2) { + int veth2num; + if ((veth2num = virNetDevVethGetFreeNum(vethNum)) < 0) + goto cleanup; + + if (virAsprintf(&veth2auto, "veth%d", veth2num) < 0) + goto cleanup; + vethNum = veth2num + 1; + } + + virCommandPtr cmd = virCommandNew("ip"); + virCommandAddArgList(cmd, "link", "add", + *veth1 ? *veth1 : veth1auto, + "type", "veth", "peer", "name", + *veth2 ? *veth2 : veth2auto, + NULL); - while (*veth2 == NULL) { - if ((vethDev = virNetDevVethGetFreeName(veth2, vethDev)) < 0) { - if (veth1_alloc) - VIR_FREE(*veth1); + if (virCommandRun(cmd, &status) < 0) goto cleanup; - } - /* Just make sure they didn't accidentally get same name */ - if (STREQ(*veth1, *veth2)) { - vethDev++; - VIR_FREE(*veth2); - continue; + if (status == 0) { + if (veth1auto) { + *veth1 = veth1auto; + veth1auto = NULL; + } + if (veth2auto) { + *veth2 = veth2auto; + veth2auto = NULL; + } + VIR_DEBUG("Create Host: %s guest: %s", *veth1, *veth2); + ret = 0; + goto cleanup; } - VIR_DEBUG("Assigned guest: %s", *veth2); - veth2_alloc = true; - } - argv[8] = *veth2; - - VIR_DEBUG("Create Host: %s guest: %s", *veth1, *veth2); - if (virRun(argv, NULL) < 0) { - if (veth1_alloc) - VIR_FREE(*veth1); - if (veth2_alloc) - VIR_FREE(*veth2); - goto cleanup; + VIR_DEBUG("Failed to create veth host: %s guest: %s: %d", + *veth1 ? *veth1 : veth1auto, + *veth1 ? *veth1 : veth1auto, + status); + VIR_FREE(veth1auto); + VIR_FREE(veth2auto); } - rc = 0; + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to allocate free veth pair after %d attempts"), + MAX_VETH_RETRIES); cleanup: - return rc; + VIR_FREE(veth1auto); + VIR_FREE(veth2auto); + return ret; } /** -- 1.8.3.1

On 10/02/2013 05:31 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The veth device creation code run in two steps, first it looks for two free veth device names, then it runs ip link to create the veth pair. There is an obvious race between finding free names and creating them, when guests are started in parallel.
Rewrite the code to loop and re-try creation if it fails, to deal with the race condition.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 151 +++++++++++++++++++++++++++++------------------ 1 file changed, 92 insertions(+), 59 deletions(-)
/* Functions */ + +static int virNetDevVethExists(int devNum) +{ + int ret; + char *path = NULL; + if (virAsprintf(&path, "/sys/class/net/veth%d/", devNum) < 0) + return -1; + ret = virFileExists(path) ? 1 : 0;
Looks a bit funny for converting a bool to int. I probably would have skipped the ternary; but what you did is not wrong.
- if (virAsprintf(veth, "veth%d", devNum) < 0) - return -1; + for (devNum = startDev ; devNum < MAX_DEV_NUM ; devNum++) {
How did this get past 'make syntax-check'? We cleaned up the code base to avoid space before ';'. ACK with the syntax cleaned up. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The XML parser reserves 'vnet' as a prefix for automatically generated NIC device names. Switch the veth device creation to use this prefix, so it does not have to worry about clashes with user specified names in the XML. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index b5d618c..5be1539 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -43,10 +43,10 @@ static int virNetDevVethExists(int devNum) { int ret; char *path = NULL; - if (virAsprintf(&path, "/sys/class/net/veth%d/", devNum) < 0) + if (virAsprintf(&path, "/sys/class/net/vnet%d/", devNum) < 0) return -1; ret = virFileExists(path) ? 1 : 0; - VIR_DEBUG("Checked dev veth%d usage: %d", devNum, ret); + VIR_DEBUG("Checked dev vnet%d usage: %d", devNum, ret); VIR_FREE(path); return ret; } @@ -125,7 +125,7 @@ int virNetDevVethCreate(char** veth1, char** veth2) if ((veth1num = virNetDevVethGetFreeNum(vethNum)) < 0) goto cleanup; - if (virAsprintf(&veth1auto, "veth%d", veth1num) < 0) + if (virAsprintf(&veth1auto, "vnet%d", veth1num) < 0) goto cleanup; vethNum = veth1num + 1; } @@ -134,7 +134,7 @@ int virNetDevVethCreate(char** veth1, char** veth2) if ((veth2num = virNetDevVethGetFreeNum(vethNum)) < 0) goto cleanup; - if (virAsprintf(&veth2auto, "veth%d", veth2num) < 0) + if (virAsprintf(&veth2auto, "vnet%d", veth2num) < 0) goto cleanup; vethNum = veth2num + 1; } -- 1.8.3.1

On 10/02/2013 05:31 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The XML parser reserves 'vnet' as a prefix for automatically generated NIC device names. Switch the veth device creation to use this prefix, so it does not have to worry about clashes with user specified names in the XML.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virnetdevveth.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Gao feng