Hi Serge and Daniel,
On Sat, May 9, 2009 at 4:03 AM, Serge E. Hallyn <serue(a)us.ibm.com> wrote:
Quoting Daniel P. Berrange (berrange(a)redhat.com):
> On Fri, May 08, 2009 at 08:34:12AM -0500, Serge E. Hallyn wrote:
> > Quoting Ryota Ozaki (ozaki.ryota(a)gmail.com):
> > > Hi Serge,
> > >
> > > On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn <serue(a)us.ibm.com>
wrote:
> > > > IIUC, the real problem is that src/cgroup.c assumes that the
> > > > cgroup name should be $CGROUP_MOUNTPOINT/groupname. But of
> > > > course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
> > > > to create a new namespace in which to mount the new devpts
> > > > locks the driver under $CGROUP_MOUNTPOINT/<pid_of_driver>/
> > > > or somesuch.
> > > >
> > > > If this fixes the problem I have no objections, but it seems
> > > > more fragile than perhaps trying to teach src/cgroup.c to
> > > > consider it's current cgroup as a starting point.
> > >
> > > hmm, I don't know why the assumption is bad and how the approach
> > > you are suggesting helps the ns problem.
> >
> > To be clear, the asssumption is that the driver starts in the
> > root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks.
> > And that it can create $CGROUP_MOUNTPOINT/groupname and move
> > itself into $CGROUP_MOUNTPOINT/groupname/tasks.
> >
> > So, the assumption is bad because when the driver does a
> > unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X,
> > and after that can only move itself into
> > $CGROUP_MOUNTPOINT/X/groupname.
> >
> > Even with your patch, it's possible for the lxc driver to have
> > been started under say $CGROUP_MOUNTPOINT/libvir or
> > $CGROUP_MOUNTPOINT/<username> through libcgroup/PAM for instance,
> > in which case your patch would be insufficient.
>
> Indeed, we can't assume we're in the root group at any time. Our
> general plan is that libvirt itself uses 3 levels of cgroup
> starting at the cgroup that libvirtd was placed in by the admin
> of the host, then a per-driver group, then per-guest groups
>
> $LIBVIRTD_ROOT_CGROUP
> |
> +- lxc
> | |
> | +- LXC-GUEST-1
> | +- LXC-GUEST-2
> | +- LXC-GUEST-3
> | +- ...
> |
> +- qemu
> |
> +- QEMU-GUEST-1
> +- QEMU-GUEST-2
> +- QEMU-GUEST-3
> +- ...
>
> $LIBVIRTD_ROOT_CGROUP, may be the actaul root mount point for
> the cgroup controller in question, or it may be a sub-directory
> that the admin chose to put it in. Also, if running libvirtd
> as a normal user, the admin may have created per-user account
> cgroups and so libvirtd would end up in there. So we have to
> be prepared for anything wrt initial libvirtd cgroup root.
>
> NB The code for putting QEMU in a cgroup isn't merged yet.
That sounds good. I just don't think the code currently does
it. To do the right thing, IIUC, virCgroupPathOfGroup() should
parse the /proc/pid/cgroup contents of the 'controller' process,
and insert that between the virCgroupGetMount(controller)
result and the group name.
Or something...
(right?)
thanks,
-serge
OK I probably understood the problem and wrote a patch for that
according to the Serge's suggestion.
I expect this patch fixes the problem, however unfortunately
I've not tried it yet under the situation you are worrying and
just done regression tests. Because I don't know easy way to
produce the situation. If you know that, please let me know...
Thanks,
ozaki-r
From 713be8125b5cbd3b60919afe708f9c788206a572 Mon Sep 17 00:00:00 2001
From: Ryota Ozaki <ozaki.ryota(a)gmail.com>
Date: Sun, 10 May 2009 08:32:41 +0900
Subject: [PATCH] cgroups: address libvirtd's bad assumption for
cgroups hierarchy
---
src/cgroup.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 70 insertions(+), 8 deletions(-)
diff --git a/src/cgroup.c b/src/cgroup.c
index 50517e2..e02c37f 100644
--- a/src/cgroup.c
+++ b/src/cgroup.c
@@ -117,11 +117,60 @@ int virCgroupHaveSupport(void)
return 0;
}
+static int virCgroupGetCurrentPath(int pid, char **curpath)
+{
+ char buf[PATH_MAX+1024];
+ char *path = NULL;
+ int rc = 0;
+ int len;
+ int fd;
+ char *pos = NULL;
+
+ if (virAsprintf(&path, "/proc/%d/cgroup", pid) == -1) {
+ rc = -ENOMEM;
+ goto error;
+ }
+
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ DEBUG("Unable to open %s: %m", path);
+ rc = -errno;
+ goto error_open;
+ }
+
+ len = saferead(fd, buf, sizeof(buf));
+ if (len <= 0) {
+ rc = -errno;
+ goto error_saferead;
+ }
+ buf[len-1] = '¥0'; /* overwrite '¥n' */
+
+ pos = strstr(buf, ":/");
+ if (pos == NULL) {
+ rc = -1;
+ goto error_strstr;
+ }
+
+ pos += 2; /* skip ':/' */
+ if (virAsprintf(curpath, "%s", pos) == -1) {
+ rc = -ENOMEM;
+ }
+
+error_strstr:
+error_saferead:
+ close(fd);
+error_open:
+ VIR_FREE(path);
+error:
+ return rc;
+}
+
static int virCgroupPathOfGroup(const char *group,
const char *controller,
char **path)
{
virCgroupPtr root = NULL;
+ char *curpath = NULL;
int rc = 0;
root = virCgroupGetMount(controller);
@@ -130,8 +179,14 @@ static int virCgroupPathOfGroup(const char *group,
goto out;
}
- if (virAsprintf(path, "%s/%s", root->path, group) == -1)
+ rc = virCgroupGetCurrentPath(getpid(), &curpath);
+ if (rc < 0)
+ goto out;
+
+ if (virAsprintf(path, "%s/%s/%s", root->path, curpath, group) == -1)
rc = -ENOMEM;
+
+ VIR_FREE(curpath);
out:
virCgroupFree(&root);
@@ -145,6 +200,7 @@ static int virCgroupPathOf(const char *grppath,
virCgroupPtr root;
int rc = 0;
char *controller = NULL;
+ char *curpath = NULL;
if (strchr(key, '.') == NULL)
return -EINVAL;
@@ -158,8 +214,14 @@ static int virCgroupPathOf(const char *grppath,
goto out;
}
- if (virAsprintf(path, "%s/%s/%s", root->path, grppath, key) == -1)
+ rc = virCgroupGetCurrentPath(getpid(), &curpath);
+ if (rc < 0)
+ goto out;
+
+ if (virAsprintf(path, "%s/%s/%s/%s", root->path, curpath,
grppath, key) == -1)
rc = -ENOMEM;
+
+ VIR_FREE(curpath);
out:
virCgroupFree(&root);
VIR_FREE(controller);
@@ -625,15 +687,14 @@ int virCgroupAddTask(virCgroupPtr group, pid_t pid)
char *grppath = NULL;
char *taskpath = NULL;
char *pidstr = NULL;
+ virCgroupPtr root;
for (i = 0; supported_controllers[i] != NULL; i++) {
- rc = virCgroupPathOfGroup(group->path,
- supported_controllers[i],
- &grppath);
- if (rc != 0)
- goto done;
+ root = virCgroupGetMount(supported_controllers[i]);
+ if (root == NULL)
+ continue;
- if (virAsprintf(&taskpath, "%s/tasks", grppath) == -1) {
+ if (virAsprintf(&taskpath, "%s/%s/tasks", root->path,
group->path) == -1) {
rc = -ENOMEM;
goto done;
}
@@ -655,6 +716,7 @@ int virCgroupAddTask(virCgroupPtr group, pid_t pid)
}
done:
+ virCgroupFree(&root);
VIR_FREE(grppath);
VIR_FREE(taskpath);
VIR_FREE(pidstr);
--
1.6.0.6