
Hi, On Mon, May 11, 2009 at 10:26 AM, Ryota Ozaki <ozaki.ryota@gmail.com> wrote:
Hi,
On Sun, May 10, 2009 at 8:49 AM, Ryota Ozaki <ozaki.ryota@gmail.com> wrote:
Hi Serge and Daniel,
On Sat, May 9, 2009 at 4:03 AM, Serge E. Hallyn <serue@us.ibm.com> wrote:
Quoting Daniel P. Berrange (berrange@redhat.com):
On Fri, May 08, 2009 at 08:34:12AM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
Hi Serge,
On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn <serue@us.ibm.com> wrote: > IIUC, the real problem is that src/cgroup.c assumes that the > cgroup name should be $CGROUP_MOUNTPOINT/groupname. But of > course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS) > to create a new namespace in which to mount the new devpts > locks the driver under $CGROUP_MOUNTPOINT/<pid_of_driver>/ > or somesuch. > > If this fixes the problem I have no objections, but it seems > more fragile than perhaps trying to teach src/cgroup.c to > consider it's current cgroup as a starting point.
hmm, I don't know why the assumption is bad and how the approach you are suggesting helps the ns problem.
To be clear, the asssumption is that the driver starts in the root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks. And that it can create $CGROUP_MOUNTPOINT/groupname and move itself into $CGROUP_MOUNTPOINT/groupname/tasks.
So, the assumption is bad because when the driver does a unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X, and after that can only move itself into $CGROUP_MOUNTPOINT/X/groupname.
Even with your patch, it's possible for the lxc driver to have been started under say $CGROUP_MOUNTPOINT/libvir or $CGROUP_MOUNTPOINT/<username> through libcgroup/PAM for instance, in which case your patch would be insufficient.
Indeed, we can't assume we're in the root group at any time. Our general plan is that libvirt itself uses 3 levels of cgroup starting at the cgroup that libvirtd was placed in by the admin of the host, then a per-driver group, then per-guest groups
$LIBVIRTD_ROOT_CGROUP | +- lxc | | | +- LXC-GUEST-1 | +- LXC-GUEST-2 | +- LXC-GUEST-3 | +- ... | +- qemu | +- QEMU-GUEST-1 +- QEMU-GUEST-2 +- QEMU-GUEST-3 +- ...
$LIBVIRTD_ROOT_CGROUP, may be the actaul root mount point for the cgroup controller in question, or it may be a sub-directory that the admin chose to put it in. Also, if running libvirtd as a normal user, the admin may have created per-user account cgroups and so libvirtd would end up in there. So we have to be prepared for anything wrt initial libvirtd cgroup root.
NB The code for putting QEMU in a cgroup isn't merged yet.
That sounds good. I just don't think the code currently does it. To do the right thing, IIUC, virCgroupPathOfGroup() should parse the /proc/pid/cgroup contents of the 'controller' process, and insert that between the virCgroupGetMount(controller) result and the group name.
Or something...
(right?)
thanks, -serge
OK I probably understood the problem and wrote a patch for that according to the Serge's suggestion.
I expect this patch fixes the problem, however unfortunately I've not tried it yet under the situation you are worrying and just done regression tests. Because I don't know easy way to produce the situation. If you know that, please let me know...
I found a way to go. lxc-unshare helps me and unveils my second path is broken...
ozaki-r
Thanks, ozaki-r
I've updated the patch. The change includes support for multiple mount points of cgroups that I didn't cope with in the previous patch. Through the work, I found a bit messy problem. Current lxc controller writes pid in a 'tasks' file multiple times if one mount point has multiple subsystems. It is bad because the first write changes the cgroups path of a controller, and then the second write points a missing file like $CGROUPS_MOUNTPOINT/<path_to_domain>/<path_to_domain>/tasks where the correct file is $CGROUPS_MOUNTPOINT/<path_to_domain>/tasks. I did workaround this problem with a tricky way by truncating the duplicated path. We probably need a more feasible solution. Note that this patch intends a proof of concept to be clear the correct way to go and thus it remains a couple of awkward codes. Of course the codes should be removed if we find the correct way. Thanks, ozaki-r
From 473183a77fbdb002d55acfed0d8f9bd485f548cd Mon Sep 17 00:00:00 2001 From: Ryota Ozaki <ozaki.ryota@gmail.com> Date: Mon, 11 May 2009 15:18:47 +0900 Subject: [PATCH] [v2] cgroups: address libvirtd's bad assumption for cgroups hierarchy
--- src/cgroup.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 85 insertions(+), 2 deletions(-) diff --git a/src/cgroup.c b/src/cgroup.c index 50517e2..58ba588 100644 --- a/src/cgroup.c +++ b/src/cgroup.c @@ -117,12 +117,76 @@ int virCgroupHaveSupport(void) return 0; } +static int virCgroupGetCurrentPath(int pid, const char *controller, char **curpath) +{ + char buf[(PATH_MAX+256)*9]; /* there are nine subsystems */ + char *path = NULL; + int rc = 0; + int len; + int fd; + char *ctrlpos = NULL; + char *pathpos = NULL; + char *brkpos = NULL; + + if (virAsprintf(&path, "/proc/%d/cgroup", pid) == -1) { + rc = -ENOMEM; + goto error; + } + + fd = open(path, O_RDONLY); + if (fd < 0) { + DEBUG("Unable to open %s: %m", path); + rc = -errno; + goto error_open; + } + + len = saferead(fd, buf, sizeof(buf)); + if (len <= 0) { + rc = -errno; + goto error_saferead; + } + + ctrlpos = strstr(buf, controller); + if (ctrlpos == NULL) { + rc = -1; + goto error_strstr; + } + + pathpos = strstr(ctrlpos, ":/"); + if (pathpos == NULL) { + rc = -1; + goto error_strstr; + } + pathpos += 2; /* skip ":/" */ + + brkpos = strstr(ctrlpos, "\n"); + if (brkpos == NULL) { + rc = -1; + goto error_strstr; + } + *brkpos = '\0'; /* overwrite '\n' */ + + if (virAsprintf(curpath, "%s", pathpos) == -1) { + rc = -ENOMEM; + } + +error_strstr: +error_saferead: + close(fd); +error_open: + VIR_FREE(path); +error: + return rc; +} + static int virCgroupPathOfGroup(const char *group, const char *controller, char **path) { virCgroupPtr root = NULL; + char *curpath = NULL; int rc = 0; + char *pos; root = virCgroupGetMount(controller); if (root == NULL) { @@ -130,8 +194,20 @@ static int virCgroupPathOfGroup(const char *group, goto out; } - if (virAsprintf(path, "%s/%s", root->path, group) == -1) + rc = virCgroupGetCurrentPath(getpid(), controller, &curpath); + if (rc < 0) + goto out; + + /* XXX: remove duplicated path */ + pos = strstr(curpath, group); + if (pos) { + *pos = '\0'; + } + + if (virAsprintf(path, "%s/%s/%s", root->path, curpath, group) == -1) rc = -ENOMEM; + + VIR_FREE(curpath); out: virCgroupFree(&root); @@ -145,6 +221,7 @@ static int virCgroupPathOf(const char *grppath, virCgroupPtr root; int rc = 0; char *controller = NULL; + char *curpath = NULL; if (strchr(key, '.') == NULL) return -EINVAL; @@ -158,8 +235,14 @@ static int virCgroupPathOf(const char *grppath, goto out; } - if (virAsprintf(path, "%s/%s/%s", root->path, grppath, key) == -1) + rc = virCgroupGetCurrentPath(getpid(), controller, &curpath); + if (rc < 0) + goto out; + + if (virAsprintf(path, "%s/%s/%s/%s", root->path, curpath, grppath, key) == -1) rc = -ENOMEM; + + VIR_FREE(curpath); out: virCgroupFree(&root); VIR_FREE(controller); -- 1.6.0.6