[libvirt] [PATCH v2 0/2] Fix problems on using lxc with cgroup ns subsystem

The patch set fixes two problems of lxc that happen when ns subsystem is enabled with memory subsystem at the same time. The fist problem is that cgroup subdirectories that are automatically created by ns subsystem remain after domain shutdown. The second problem is that memory usage is not properly accounted. v2 fixes a bunch of defects pointed out by Eric Blake. Changes from v1: - correct readdir error handling - avoid stack-allocating PATH_MAX, use virAsprintf() instead - add missing closedir() - ensure _() in the string of VIR_ERROR - change the flag of virCgroupMakeGroup from int to bool - fix typo in commit log - change some 'child' to 'descendant' in commit log to make representation proper Ryota Ozaki (2): cgroup: Change virCgroupRemove to remove all descendant groups at first cgroup: Enable memory.use_hierarchy of cgroup for domain src/util/cgroup.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 114 insertions(+), 8 deletions(-)

As same as normal directories, a cgroup cannot be removed if it contains sub groups. This patch changes virCgroupRemove to remove all descendant groups (subdirectories) of a target group before removing the target group. The handling is required when we run lxc with ns subsystem of cgroup. Ns subsystem automatically creates child cgroups on every process forks, but unfortunately the groups are not removed on process exits, so we have to remove them by ourselves. With this patch, such child (and descendant) groups are surely removed at lxc shutdown, i.e., lxcVmCleanup which calls virCgroupRemove. --- src/util/cgroup.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 55 insertions(+), 4 deletions(-) diff --git a/src/util/cgroup.c b/src/util/cgroup.c index 35e9691..faec23f 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -23,6 +23,7 @@ #include <sys/stat.h> #include <sys/types.h> #include <libgen.h> +#include <dirent.h> #include "internal.h" #include "util.h" @@ -561,11 +562,63 @@ cleanup: } #endif +static int virCgroupRemoveRecursively(char *grppath) +{ + DIR *grpdir; + struct dirent *ent; + int rc = 0; + + grpdir = opendir(grppath); + if (grpdir == NULL) { + VIR_ERROR(_("Unable to open %s (%d)"), grppath, errno); + rc = -errno; + return rc; + } + + for (;;) { + char *path; + + errno = 0; + ent = readdir(grpdir); + if (ent == NULL) { + if (errno) + VIR_ERROR(_("Failed to readdir for %s (%d)"), grppath, errno); + break; + } + + if (ent->d_name[0] == '.') continue; + if (ent->d_type != DT_DIR) continue; + + if (virAsprintf(&path, "%s/%s", grppath, ent->d_name) == -1) { + rc = -ENOMEM; + break; + } + rc = virCgroupRemoveRecursively(path); + VIR_FREE(path); + if (rc != 0) + break; + } + closedir(grpdir); + + DEBUG("Removing cgroup %s", grppath); + if (rmdir(grppath) != 0 && errno != ENOENT) { + rc = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), grppath, errno); + } + + return rc; +} + /** * virCgroupRemove: * * @group: The group to be removed * + * It first removes all child groups recursively + * in depth first order and then removes @group + * because the presence of the child groups + * prevents removing @group. + * * Returns: 0 on success */ int virCgroupRemove(virCgroupPtr group) @@ -585,10 +638,8 @@ int virCgroupRemove(virCgroupPtr group) &grppath) != 0) continue; - DEBUG("Removing cgroup %s", grppath); - if (rmdir(grppath) != 0 && errno != ENOENT) { - rc = -errno; - } + DEBUG("Removing cgroup %s and all child cgroups", grppath); + rc = virCgroupRemoveRecursively(grppath); VIR_FREE(grppath); } -- 1.6.5.2

On 06/23/2010 10:00 AM, Ryota Ozaki wrote:
As same as normal directories, a cgroup cannot be removed if it contains sub groups. This patch changes virCgroupRemove to remove all descendant groups (subdirectories) of a target group before removing the target group.
+ for (;;) { + char *path; + + errno = 0; + ent = readdir(grpdir); + if (ent == NULL) { + if (errno)
This should also set rc, so the overall function exits with nonzero status.
+ VIR_ERROR(_("Failed to readdir for %s (%d)"), grppath, errno); + break; + } + + if (ent->d_name[0] == '.') continue; + if (ent->d_type != DT_DIR) continue;
Hmm - d_type is not guaranteed by POSIX. Then again, neither is cgroup, so you're just fine exploiting it. ACK with that nit fixed, so I squashed in this before pushing: diff --git i/src/util/cgroup.c w/src/util/cgroup.c index faec23f..3c23251 100644 --- i/src/util/cgroup.c +++ w/src/util/cgroup.c @@ -1,6 +1,7 @@ /* * cgroup.c: Tools for managing cgroups * + * Copyright (C) 2010 Red Hat, Inc. * Copyright IBM Corp. 2008 * * See COPYING.LIB for the License of this software @@ -581,7 +582,7 @@ static int virCgroupRemoveRecursively(char *grppath) errno = 0; ent = readdir(grpdir); if (ent == NULL) { - if (errno) + if ((rc = -errno)) VIR_ERROR(_("Failed to readdir for %s (%d)"), grppath, errno); break; } -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jun 24, 2010 at 5:41 AM, Eric Blake <eblake@redhat.com> wrote:
On 06/23/2010 10:00 AM, Ryota Ozaki wrote:
As same as normal directories, a cgroup cannot be removed if it contains sub groups. This patch changes virCgroupRemove to remove all descendant groups (subdirectories) of a target group before removing the target group.
+ for (;;) { + char *path; + + errno = 0; + ent = readdir(grpdir); + if (ent == NULL) { + if (errno)
This should also set rc, so the overall function exits with nonzero status.
Oh, right.
+ VIR_ERROR(_("Failed to readdir for %s (%d)"), grppath, errno); + break; + } + + if (ent->d_name[0] == '.') continue; + if (ent->d_type != DT_DIR) continue;
Hmm - d_type is not guaranteed by POSIX. Then again, neither is cgroup, so you're just fine exploiting it.
I believe so :-)
ACK with that nit fixed, so I squashed in this before pushing:
Thank you for fixing it. ozaki-r
diff --git i/src/util/cgroup.c w/src/util/cgroup.c index faec23f..3c23251 100644 --- i/src/util/cgroup.c +++ w/src/util/cgroup.c @@ -1,6 +1,7 @@ /* * cgroup.c: Tools for managing cgroups * + * Copyright (C) 2010 Red Hat, Inc. * Copyright IBM Corp. 2008 * * See COPYING.LIB for the License of this software @@ -581,7 +582,7 @@ static int virCgroupRemoveRecursively(char *grppath) errno = 0; ent = readdir(grpdir); if (ent == NULL) { - if (errno) + if ((rc = -errno)) VIR_ERROR(_("Failed to readdir for %s (%d)"), grppath, errno); break; }
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Through conversation with Kumar L Srikanth-B22348, I found that the function of getting memory usage (e.g., virsh dominfo) doesn't work for lxc with ns subsystem of cgroup enabled. This is because of features of ns and memory subsystems. Ns creates child cgroup on every process fork and as a result processes in a container are not assigned in a cgroup for domain (e.g., libvirt/lxc/test1/). For example, libvirt_lxc and init (or somewhat specified in XML) are assigned into libvirt/lxc/test1/8839/ and libvirt/lxc/test1/8839/8849/, respectively. On the other hand, memory subsystem accounts memory usage within a group of processes by default, i.e., it does not take any child (and descendant) groups into account. With the two features, virsh dominfo which just checks memory usage of a cgroup for domain always returns zero because the cgroup has no process. Setting memory.use_hierarchy of a group allows to account (and limit) memory usage of every descendant groups of the group. By setting it of a cgroup for domain, we can get proper memory usage of lxc with ns subsystem enabled. (To be exact, the setting is required only when memory and ns subsystems are enabled at the same time, e.g., mount -t cgroup none /cgroup.) --- src/util/cgroup.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 59 insertions(+), 4 deletions(-) diff --git a/src/util/cgroup.c b/src/util/cgroup.c index faec23f..ee55b89 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -444,7 +444,38 @@ static int virCgroupCpuSetInherit(virCgroupPtr parent, virCgroupPtr group) return rc; } -static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, int create) +static int virCgroupSetMemoryUseHierarchy(virCgroupPtr group) +{ + int rc = 0; + unsigned long long value; + const char *filename = "memory.use_hierarchy"; + + rc = virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + filename, &value); + if (rc != 0) { + VIR_ERROR(_("Failed to read %s/%s (%d)"), group->path, filename, rc); + return rc; + } + + /* Setting twice causes error, so if already enabled, skip setting */ + if (value == 1) + return 0; + + VIR_DEBUG("Setting up %s/%s", group->path, filename); + rc = virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + filename, 1); + + if (rc != 0) { + VIR_ERROR(_("Failed to set %s/%s (%d)"), group->path, filename, rc); + } + + return rc; +} + +static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, + int create, bool memory_hierarchy) { int i; int rc = 0; @@ -478,6 +509,20 @@ static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, int creat break; } } + /* + * Note that virCgroupSetMemoryUseHierarchy should always be + * called prior to creating subcgroups and attaching tasks. + */ + if (memory_hierarchy && + group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint != NULL && + (i == VIR_CGROUP_CONTROLLER_MEMORY || + STREQ(group->controllers[i].mountPoint, group->controllers[VIR_CGROUP_CONTROLLER_MEMORY].mountPoint))) { + rc = virCgroupSetMemoryUseHierarchy(group); + if (rc != 0) { + VIR_FREE(path); + break; + } + } } VIR_FREE(path); @@ -554,7 +599,7 @@ static int virCgroupAppRoot(int privileged, if (rc != 0) goto cleanup; - rc = virCgroupMakeGroup(rootgrp, *group, create); + rc = virCgroupMakeGroup(rootgrp, *group, create, false); cleanup: virCgroupFree(&rootgrp); @@ -704,7 +749,7 @@ int virCgroupForDriver(const char *name, VIR_FREE(path); if (rc == 0) { - rc = virCgroupMakeGroup(rootgrp, *group, create); + rc = virCgroupMakeGroup(rootgrp, *group, create, false); if (rc != 0) virCgroupFree(group); } @@ -754,7 +799,17 @@ int virCgroupForDomain(virCgroupPtr driver, VIR_FREE(path); if (rc == 0) { - rc = virCgroupMakeGroup(driver, *group, create); + /* + * Create a cgroup with memory.use_hierarchy enabled to + * surely account memory usage of lxc with ns subsystem + * enabled. (To be exact, memory and ns subsystems are + * enabled at the same time.) + * + * The reason why doing it here, not a upper group, say + * a group for driver, is to avoid overhead to track + * cumulative usage that we don't need. + */ + rc = virCgroupMakeGroup(driver, *group, create, true); if (rc != 0) virCgroupFree(group); } -- 1.6.5.2

On 06/23/2010 10:00 AM, Ryota Ozaki wrote:
Through conversation with Kumar L Srikanth-B22348, I found that the function of getting memory usage (e.g., virsh dominfo) doesn't work for lxc with ns subsystem of cgroup enabled.
--- src/util/cgroup.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 59 insertions(+), 4 deletions(-)
ACK on v2. I've pushed this now. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Ryota Ozaki