[libvirt] [PATCH 0/3] Honour standards for cgroup naming/escaping

In discussions with the systemd team, we deciced we needed some clearer rules on cgroup naming / escaping, to ensure there are never clashes with filenames owned by the kernel. This series makes the changes required to libvirt

From: "Daniel P. Berrange" <berrange@redhat.com> Recently we changed to create VM cgroups with the naming pattern $VMNAME.$DRIVER.libvirt. Following discussions with the systemd community it was decided that only having a single '.' in the names is preferrable. So this changes the naming scheme to be $VMNAME.libvirt-$DRIVER. eg for LXC 'mycontainer.libvirt-lxc' or for KVM 'myvm.libvirt-qemu'. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 2 +- tests/vircgrouptest.c | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 4c836c7..0084aea 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1297,7 +1297,7 @@ int virCgroupNewDomainPartition(virCgroupPtr partition, int rc; char *grpname = NULL; - if (virAsprintf(&grpname, "%s.%s.libvirt", + if (virAsprintf(&grpname, "%s.libvirt-%s", name, driver) < 0) return -ENOMEM; diff --git a/tests/vircgrouptest.c b/tests/vircgrouptest.c index 7445517..9c2590f 100644 --- a/tests/vircgrouptest.c +++ b/tests/vircgrouptest.c @@ -362,13 +362,13 @@ static int testCgroupNewForPartitionDomain(const void *args ATTRIBUTE_UNUSED) int ret = -1; int rv; const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/production/foo.lxc.libvirt", + [VIR_CGROUP_CONTROLLER_CPU] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/production/foo.libvirt-lxc", [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/production/foo.lxc.libvirt", + [VIR_CGROUP_CONTROLLER_FREEZER] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/production/foo.libvirt-lxc", }; if ((rv = virCgroupNewPartition("/production", true, -1, &partitioncgroup)) != 0) { @@ -381,7 +381,7 @@ static int testCgroupNewForPartitionDomain(const void *args ATTRIBUTE_UNUSED) goto cleanup; } - ret = validateCgroup(domaincgroup, "/production/foo.lxc.libvirt", mountsFull, links, placement); + ret = validateCgroup(domaincgroup, "/production/foo.libvirt-lxc", mountsFull, links, placement); cleanup: virCgroupFree(&partitioncgroup); -- 1.8.1.4

On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Recently we changed to create VM cgroups with the naming pattern $VMNAME.$DRIVER.libvirt. Following discussions with the systemd community it was decided that only having a single '.' in the names is preferrable. So this changes the naming scheme to be $VMNAME.libvirt-$DRIVER. eg for LXC 'mycontainer.libvirt-lxc' or for KVM 'myvm.libvirt-qemu'.
Well, $VMNAME could contain '.'; but your argument of having a single suffix rather than a double suffix to determine who created the group makes sense.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 2 +- tests/vircgrouptest.c | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-)
ACK; worth going in 1.0.5.
+++ b/tests/vircgrouptest.c @@ -362,13 +362,13 @@ static int testCgroupNewForPartitionDomain(const void *args ATTRIBUTE_UNUSED) int ret = -1; int rv; const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/production/foo.lxc.libvirt", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/production/foo.lxc.libvirt", + [VIR_CGROUP_CONTROLLER_CPU] = "/production/foo.libvirt-lxc",
So much easier to verify the series with this test in place :) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> If the partition named passed in the XML does not already have a suffix, ensure it gets a '.partition' added to each component. The exceptions are /machine, /user and /system which do not need to have a suffix, since they are fixed partitions at the top level. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 57 +++++++++++++++++++++-- tests/vircgrouptest.c | 123 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 143 insertions(+), 37 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 0084aea..297408d 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -49,6 +49,7 @@ #include "virfile.h" #include "virhash.h" #include "virhashcode.h" +#include "virstring.h" #define CGROUP_MAX_VAL 512 @@ -1091,6 +1092,47 @@ cleanup: #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +static char *virCgroupSetPartitionSuffix(const char *path) +{ + char **tokens = virStringSplit(path, "/", 0); + size_t i; + char *ret = NULL; + + if (!tokens) + return NULL; + + for (i = 0 ; tokens[i] != NULL ; i++) { + /* Whitelist the 3 top level fixed dirs + * NB i == 0 is "", since we have leading '/' + */ + if (i == 1 && + (STREQ(tokens[i], "machine") || + STREQ(tokens[i], "system") || + STREQ(tokens[i], "user"))) { + continue; + } + /* If there is no suffix set already, then + * add ".partition" + */ + if (STRNEQ(tokens[i], "") && + !strchr(tokens[i], '.')) { + if (VIR_REALLOC_N(tokens[i], + strlen(tokens[i]) + strlen(".partition") + 1) < 0) { + virReportOOMError(); + goto cleanup; + } + strcat(tokens[i], ".partition"); + } + } + + if (!(ret = virStringJoin((const char **)tokens, "/"))) + goto cleanup; + +cleanup: + virStringFreeList(tokens); + return ret; +} + /** * virCgroupNewPartition: * @path: path for the partition @@ -1110,19 +1152,28 @@ int virCgroupNewPartition(const char *path, int rc; char *parentPath = NULL; virCgroupPtr parent = NULL; + char *newpath; VIR_DEBUG("path=%s create=%d controllers=%x", path, create, controllers); if (path[0] != '/') return -EINVAL; - rc = virCgroupNew(path, NULL, controllers, group); + /* XXX convert all cgroups APIs to use error report + * APIs instead of returning errno */ + if (!(newpath = virCgroupSetPartitionSuffix(path))) { + virResetLastError(); + rc = -ENOMEM; + goto cleanup; + } + + rc = virCgroupNew(newpath, NULL, controllers, group); if (rc != 0) goto cleanup; - if (STRNEQ(path, "/")) { + if (STRNEQ(newpath, "/")) { char *tmp; - if (!(parentPath = strdup(path))) { + if (!(parentPath = strdup(newpath))) { rc = -ENOMEM; goto cleanup; } diff --git a/tests/vircgrouptest.c b/tests/vircgrouptest.c index 9c2590f..b51106a 100644 --- a/tests/vircgrouptest.c +++ b/tests/vircgrouptest.c @@ -246,22 +246,22 @@ static int testCgroupNewForPartition(const void *args ATTRIBUTE_UNUSED) int ret = -1; int rv; const char *placementSmall[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/virtualmachines", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/virtualmachines", + [VIR_CGROUP_CONTROLLER_CPU] = "/virtualmachines.partition", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/virtualmachines.partition", [VIR_CGROUP_CONTROLLER_CPUSET] = NULL, - [VIR_CGROUP_CONTROLLER_MEMORY] = "/virtualmachines", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/virtualmachines.partition", [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, [VIR_CGROUP_CONTROLLER_FREEZER] = NULL, [VIR_CGROUP_CONTROLLER_BLKIO] = NULL, }; const char *placementFull[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/virtualmachines", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/virtualmachines", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/virtualmachines", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/virtualmachines", + [VIR_CGROUP_CONTROLLER_CPU] = "/virtualmachines.partition", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/virtualmachines.partition", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/virtualmachines.partition", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/virtualmachines.partition", [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/virtualmachines", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/virtualmachines", + [VIR_CGROUP_CONTROLLER_FREEZER] = "/virtualmachines.partition", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/virtualmachines.partition", }; if ((rv = virCgroupNewPartition("/virtualmachines", false, -1, &cgroup)) != -ENOENT) { @@ -294,14 +294,14 @@ static int testCgroupNewForPartition(const void *args ATTRIBUTE_UNUSED) fprintf(stderr, "Cannot create /virtualmachines cgroup: %d\n", -rv); goto cleanup; } - ret = validateCgroup(cgroup, "/virtualmachines", mountsSmall, links, placementSmall); + ret = validateCgroup(cgroup, "/virtualmachines.partition", mountsSmall, links, placementSmall); virCgroupFree(&cgroup); if ((rv = virCgroupNewPartition("/virtualmachines", true, -1, &cgroup)) != 0) { fprintf(stderr, "Cannot create /virtualmachines cgroup: %d\n", -rv); goto cleanup; } - ret = validateCgroup(cgroup, "/virtualmachines", mountsFull, links, placementFull); + ret = validateCgroup(cgroup, "/virtualmachines.partition", mountsFull, links, placementFull); cleanup: virCgroupFree(&cgroup); @@ -315,38 +315,90 @@ static int testCgroupNewForPartitionNested(const void *args ATTRIBUTE_UNUSED) int ret = -1; int rv; const char *placementFull[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/users/berrange", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/users/berrange", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/users/berrange", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/users/berrange", + [VIR_CGROUP_CONTROLLER_CPU] = "/deployment.partition/production.partition", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/deployment.partition/production.partition", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/deployment.partition/production.partition", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/deployment.partition/production.partition", [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/users/berrange", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/users/berrange", + [VIR_CGROUP_CONTROLLER_FREEZER] = "/deployment.partition/production.partition", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/deployment.partition/production.partition", }; - if ((rv = virCgroupNewPartition("/users/berrange", false, -1, &cgroup)) != -ENOENT) { - fprintf(stderr, "Unexpected found /users/berrange cgroup: %d\n", -rv); + if ((rv = virCgroupNewPartition("/deployment/production", false, -1, &cgroup)) != -ENOENT) { + fprintf(stderr, "Unexpected found /deployment/production cgroup: %d\n", -rv); goto cleanup; } - /* Should not work, since we require /users to be pre-created */ - if ((rv = virCgroupNewPartition("/users/berrange", true, -1, &cgroup)) != -ENOENT) { - fprintf(stderr, "Unexpected created /users/berrange cgroup: %d\n", -rv); + /* Should not work, since we require /deployment to be pre-created */ + if ((rv = virCgroupNewPartition("/deployment/production", true, -1, &cgroup)) != -ENOENT) { + fprintf(stderr, "Unexpected created /deployment/production cgroup: %d\n", -rv); goto cleanup; } - if ((rv = virCgroupNewPartition("/users", true, -1, &cgroup)) != 0) { - fprintf(stderr, "Failed to create /users cgroup: %d\n", -rv); + if ((rv = virCgroupNewPartition("/deployment", true, -1, &cgroup)) != 0) { + fprintf(stderr, "Failed to create /deployment cgroup: %d\n", -rv); goto cleanup; } /* Should now work */ - if ((rv = virCgroupNewPartition("/users/berrange", true, -1, &cgroup)) != 0) { - fprintf(stderr, "Failed to create /users/berrange cgroup: %d\n", -rv); + if ((rv = virCgroupNewPartition("/deployment/production", true, -1, &cgroup)) != 0) { + fprintf(stderr, "Failed to create /deployment/production cgroup: %d\n", -rv); goto cleanup; } - ret = validateCgroup(cgroup, "/users/berrange", mountsFull, links, placementFull); + ret = validateCgroup(cgroup, "/deployment.partition/production.partition", + mountsFull, links, placementFull); + +cleanup: + virCgroupFree(&cgroup); + return ret; +} + + +static int testCgroupNewForPartitionNestedDeep(const void *args ATTRIBUTE_UNUSED) +{ + virCgroupPtr cgroup = NULL; + int ret = -1; + int rv; + const char *placementFull[VIR_CGROUP_CONTROLLER_LAST] = { + [VIR_CGROUP_CONTROLLER_CPU] = "/user/berrange.user/production.partition", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/user/berrange.user/production.partition", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/user/berrange.user/production.partition", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/user/berrange.user/production.partition", + [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, + [VIR_CGROUP_CONTROLLER_FREEZER] = "/user/berrange.user/production.partition", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/user/berrange.user/production.partition", + }; + + if ((rv = virCgroupNewPartition("/user/berrange.user/production", false, -1, &cgroup)) != -ENOENT) { + fprintf(stderr, "Unexpected found /user/berrange.user/production cgroup: %d\n", -rv); + goto cleanup; + } + + /* Should not work, since we require /user/berrange.user to be pre-created */ + if ((rv = virCgroupNewPartition("/user/berrange.user/production", true, -1, &cgroup)) != -ENOENT) { + fprintf(stderr, "Unexpected created /user/berrange.user/production cgroup: %d\n", -rv); + goto cleanup; + } + + if ((rv = virCgroupNewPartition("/user", true, -1, &cgroup)) != 0) { + fprintf(stderr, "Failed to create /user/berrange.user cgroup: %d\n", -rv); + goto cleanup; + } + + if ((rv = virCgroupNewPartition("/user/berrange.user", true, -1, &cgroup)) != 0) { + fprintf(stderr, "Failed to create /user/berrange.user cgroup: %d\n", -rv); + goto cleanup; + } + + /* Should now work */ + if ((rv = virCgroupNewPartition("/user/berrange.user/production", true, -1, &cgroup)) != 0) { + fprintf(stderr, "Failed to create /user/berrange.user/production cgroup: %d\n", -rv); + goto cleanup; + } + + ret = validateCgroup(cgroup, "/user/berrange.user/production.partition", + mountsFull, links, placementFull); cleanup: virCgroupFree(&cgroup); @@ -362,13 +414,13 @@ static int testCgroupNewForPartitionDomain(const void *args ATTRIBUTE_UNUSED) int ret = -1; int rv; const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/production/foo.libvirt-lxc", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/production/foo.libvirt-lxc", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/production/foo.libvirt-lxc", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPU] = "/production.partition/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/production.partition/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/production.partition/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/production.partition/foo.libvirt-lxc", [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/production/foo.libvirt-lxc", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/production/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_FREEZER] = "/production.partition/foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/production.partition/foo.libvirt-lxc", }; if ((rv = virCgroupNewPartition("/production", true, -1, &partitioncgroup)) != 0) { @@ -381,7 +433,7 @@ static int testCgroupNewForPartitionDomain(const void *args ATTRIBUTE_UNUSED) goto cleanup; } - ret = validateCgroup(domaincgroup, "/production/foo.libvirt-lxc", mountsFull, links, placement); + ret = validateCgroup(domaincgroup, "/production.partition/foo.libvirt-lxc", mountsFull, links, placement); cleanup: virCgroupFree(&partitioncgroup); @@ -424,6 +476,9 @@ mymain(void) if (virtTestRun("New cgroup for partition nested", 1, testCgroupNewForPartitionNested, NULL) < 0) ret = -1; + if (virtTestRun("New cgroup for partition nested deeply", 1, testCgroupNewForPartitionNestedDeep, NULL) < 0) + ret = -1; + if (virtTestRun("New cgroup for domain partition", 1, testCgroupNewForPartitionDomain, NULL) < 0) ret = -1; -- 1.8.1.4

On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If the partition named passed in the XML does not already have a suffix, ensure it gets a '.partition' added to each component. The exceptions are /machine, /user and /system which do not need to have a suffix, since they are fixed partitions at the top level.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 57 +++++++++++++++++++++-- tests/vircgrouptest.c | 123 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 143 insertions(+), 37 deletions(-)
- rc = virCgroupNew(path, NULL, controllers, group); + /* XXX convert all cgroups APIs to use error report + * APIs instead of returning errno */ + if (!(newpath = virCgroupSetPartitionSuffix(path))) { + virResetLastError(); + rc = -ENOMEM; + goto cleanup; + }
Yeah, I can see why you added that comment. But I looked through all the code paths, and looks like the only error possible is OOM, so your conversion of failure back to errno is correct. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 26, 2013 at 05:23:40AM -0600, Eric Blake wrote:
On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If the partition named passed in the XML does not already have a suffix, ensure it gets a '.partition' added to each component. The exceptions are /machine, /user and /system which do not need to have a suffix, since they are fixed partitions at the top level.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 57 +++++++++++++++++++++-- tests/vircgrouptest.c | 123 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 143 insertions(+), 37 deletions(-)
- rc = virCgroupNew(path, NULL, controllers, group); + /* XXX convert all cgroups APIs to use error report + * APIs instead of returning errno */ + if (!(newpath = virCgroupSetPartitionSuffix(path))) { + virResetLastError(); + rc = -ENOMEM; + goto cleanup; + }
Yeah, I can see why you added that comment. But I looked through all the code paths, and looks like the only error possible is OOM, so your conversion of failure back to errno is correct.
And with the conversion to make VIR_ALLOC report OOM errors by default, we really need to bring vircgroup.c into line with other code. In fact I'd like to see us eliminate pretty much all cases of functions returning errnos. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

From: "Daniel P. Berrange" <berrange@redhat.com> If a user cgroup name begins with "cgroup.", "_" or with any of the controllers from /proc/cgroups followed by a dot, then they need to be prefixed with a single underscore. eg if there is an object "cpu.service", then this would end up as "_cpu.service" in the cgroup filesystem tree, however, "waldo.service" would stay "waldo.service", at least as long as nobody comes up with a cgroup controller called "waldo". Since we require a '.XXXX' suffix on all partitions, there is no scope for clashing with the kernel 'tasks' and 'release_agent' files. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/vircgroupmock.c | 27 +++++++++++++--- tests/vircgrouptest.c | 54 ++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+), 4 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 297408d..f132e60 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1092,6 +1092,84 @@ cleanup: #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +static int virCgroupNeedEscape(const char *path) +{ + FILE *fp = NULL; + int ret = 0; + char *line = NULL; + size_t len; + + /* If it starts with 'cgroup.' or a '_' of any + * of the controller names from /proc/cgroups, + * then we must prefix a '_' + */ + if (STRPREFIX(path, "cgroup.")) + return 1; + + if (path[0] == '_') + return 1; + + if (!(fp = fopen("/proc/cgroups", "r"))) + return -errno; + + /* + * Data looks like this: + * #subsys_name hierarchy num_cgroups enabled + * cpuset 2 4 1 + * cpu 3 48 1 + * cpuacct 3 48 1 + * memory 4 4 1 + * devices 5 4 1 + * freezer 6 4 1 + * net_cls 7 1 1 + */ + while (getline(&line, &len, fp) > 0) { + if (STRPREFIX(line, "#subsys_name")) { + VIR_FREE(line); + continue; + } + char *tmp = strchr(line, ' '); + if (tmp) + *tmp = '\0'; + len = tmp - line; + + if (STRPREFIX(path, line) && + path[len] == '.') { + ret = 1; + VIR_FREE(line); + goto cleanup; + } + VIR_FREE(line); + } + + if (ferror(fp)) { + ret = -EIO; + goto cleanup; + } + +cleanup: + VIR_FORCE_FCLOSE(fp); + return ret; +} + +static int virCgroupEscape(char **path) +{ + size_t len = strlen(*path); + int rc; + + if ((rc = virCgroupNeedEscape(*path)) <= 0) + return rc; + + if (VIR_REALLOC_N(*path, len + 2) < 0) + return -ENOMEM; + + memmove((*path) + 1, + *path, + len + 1); + (*path)[0] = '_'; + return 0; +} + static char *virCgroupSetPartitionSuffix(const char *path) { char **tokens = virStringSplit(path, "/", 0); @@ -1123,6 +1201,11 @@ static char *virCgroupSetPartitionSuffix(const char *path) } strcat(tokens[i], ".partition"); } + + if (virCgroupEscape(&(tokens[i])) < 0) { + virReportOOMError(); + goto cleanup; + } } if (!(ret = virStringJoin((const char **)tokens, "/"))) @@ -1352,6 +1435,9 @@ int virCgroupNewDomainPartition(virCgroupPtr partition, name, driver) < 0) return -ENOMEM; + if ((rc = virCgroupEscape(&grpname)) < 0) + return rc; + rc = virCgroupNew(grpname, partition, -1, group); if (rc == 0) { diff --git a/tests/vircgroupmock.c b/tests/vircgroupmock.c index d6b0938..17ea75f 100644 --- a/tests/vircgroupmock.c +++ b/tests/vircgroupmock.c @@ -66,7 +66,7 @@ static char *fakesysfsdir; * Co-mounting cpu & cpuacct controllers * An anonymous controller for systemd */ -const char *mounts = +const char *procmounts = "rootfs / rootfs rw 0 0\n" "tmpfs /run tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0\n" "tmpfs /not/really/sys/fs/cgroup tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0\n" @@ -79,7 +79,7 @@ const char *mounts = "/dev/sda1 /boot ext4 rw,seclabel,relatime,data=ordered 0 0\n" "tmpfs /tmp tmpfs rw,seclabel,relatime,size=1024000k 0 0\n"; -const char *cgroups = +const char *procselfcgroups = "115:memory:/\n" "8:blkio:/\n" "6:freezer:/\n" @@ -87,6 +87,17 @@ const char *cgroups = "2:cpuset:/\n" "1:name=systemd:/user/berrange/123\n"; +const char *proccgroups = + "#subsys_name hierarchy num_cgroups enabled\n" + "cpuset 2 4 1\n" + "cpu 3 48 1\n" + "cpuacct 3 48 1\n" + "memory 4 4 1\n" + "devices 5 4 1\n" + "freezer 6 4 1\n" + "blkio 8 4 1\n"; + + static int make_file(const char *path, const char *name, const char *value) @@ -366,7 +377,15 @@ FILE *fopen(const char *path, const char *mode) if (STREQ(path, "/proc/mounts")) { if (STREQ(mode, "r")) { - return fmemopen((void *)mounts, strlen(mounts), mode); + return fmemopen((void *)procmounts, strlen(procmounts), mode); + } else { + errno = EACCES; + return NULL; + } + } + if (STREQ(path, "/proc/cgroups")) { + if (STREQ(mode, "r")) { + return fmemopen((void *)proccgroups, strlen(proccgroups), mode); } else { errno = EACCES; return NULL; @@ -374,7 +393,7 @@ FILE *fopen(const char *path, const char *mode) } if (STREQ(path, "/proc/self/cgroup")) { if (STREQ(mode, "r")) { - return fmemopen((void *)cgroups, strlen(cgroups), mode); + return fmemopen((void *)procselfcgroups, strlen(procselfcgroups), mode); } else { errno = EACCES; return NULL; diff --git a/tests/vircgrouptest.c b/tests/vircgrouptest.c index b51106a..e2adef8 100644 --- a/tests/vircgrouptest.c +++ b/tests/vircgrouptest.c @@ -441,6 +441,58 @@ cleanup: return ret; } +static int testCgroupNewForPartitionDomainEscaped(const void *args ATTRIBUTE_UNUSED) +{ + virCgroupPtr partitioncgroup1 = NULL; + virCgroupPtr partitioncgroup2 = NULL; + virCgroupPtr partitioncgroup3 = NULL; + virCgroupPtr domaincgroup = NULL; + int ret = -1; + int rv; + const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { + [VIR_CGROUP_CONTROLLER_CPU] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, + [VIR_CGROUP_CONTROLLER_FREEZER] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + [VIR_CGROUP_CONTROLLER_BLKIO] = "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", + }; + + if ((rv = virCgroupNewPartition("/cgroup.evil", true, -1, &partitioncgroup1)) != 0) { + fprintf(stderr, "Failed to create /cgroup.evil cgroup: %d\n", -rv); + goto cleanup; + } + + if ((rv = virCgroupNewPartition("/cgroup.evil/net_cls.evil", true, -1, &partitioncgroup2)) != 0) { + fprintf(stderr, "Failed to create /cgroup.evil/cpu.evil cgroup: %d\n", -rv); + goto cleanup; + } + + if ((rv = virCgroupNewPartition("/cgroup.evil/net_cls.evil/_evil.evil", true, -1, &partitioncgroup3)) != 0) { + fprintf(stderr, "Failed to create /cgroup.evil cgroup: %d\n", -rv); + goto cleanup; + } + + if ((rv = virCgroupNewDomainPartition(partitioncgroup3, "lxc", "cpu.foo", true, &domaincgroup)) != 0) { + fprintf(stderr, "Cannot create LXC cgroup: %d\n", -rv); + goto cleanup; + } + + /* NB we're not expecting 'net_cls.evil' to be escaped, + * since our fake /proc/cgroups pretends this controller + * isn't compiled into the kernel + */ + ret = validateCgroup(domaincgroup, "/_cgroup.evil/net_cls.evil/__evil.evil/_cpu.foo.libvirt-lxc", mountsFull, links, placement); + +cleanup: + virCgroupFree(&partitioncgroup3); + virCgroupFree(&partitioncgroup2); + virCgroupFree(&partitioncgroup1); + virCgroupFree(&domaincgroup); + return ret; +} + # define FAKESYSFSDIRTEMPLATE abs_builddir "/fakesysfsdir-XXXXXX" static int @@ -482,6 +534,8 @@ mymain(void) if (virtTestRun("New cgroup for domain partition", 1, testCgroupNewForPartitionDomain, NULL) < 0) ret = -1; + if (virtTestRun("New cgroup for domain partition escaped", 1, testCgroupNewForPartitionDomainEscaped, NULL) < 0) + ret = -1; if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakesysfsdir); -- 1.8.1.4

On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If a user cgroup name begins with "cgroup.", "_" or with any of the controllers from /proc/cgroups followed by a dot, then they need to be prefixed with a single underscore. eg if there is an object "cpu.service", then this would end up as "_cpu.service" in the cgroup filesystem tree, however, "waldo.service" would stay "waldo.service", at least as long as nobody comes up with a cgroup controller called "waldo".
Do we also need to consider cgroup names beginning with leading '.'?
Since we require a '.XXXX' suffix on all partitions, there is no scope for clashing with the kernel 'tasks' and 'release_agent' files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/vircgroupmock.c | 27 +++++++++++++--- tests/vircgrouptest.c | 54 ++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+), 4 deletions(-)
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +static int virCgroupNeedEscape(const char *path)
NeedsEscape sounds a bit better than NeedEscape, if you want to make the tweak.
+static int virCgroupEscape(char **path) +{ + size_t len = strlen(*path); + int rc; + + if ((rc = virCgroupNeedEscape(*path)) <= 0) + return rc; +
From here...
+ if (VIR_REALLOC_N(*path, len + 2) < 0) + return -ENOMEM; + + memmove((*path) + 1, + *path, + len + 1); + (*path)[0] = '_';
...to here could be simplified to: char escape = '_'; VIR_INSERT_ELEMENT(*path, 0, len, escape);
+++ b/tests/vircgroupmock.c @@ -66,7 +66,7 @@ static char *fakesysfsdir; * Co-mounting cpu & cpuacct controllers * An anonymous controller for systemd */ -const char *mounts = +const char *procmounts =
I didn't closely inspect the test; but at least the fact that it is there gives us reassurance that the change is right. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/26/2013 05:37 AM, Eric Blake wrote:
On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If a user cgroup name begins with "cgroup.", "_" or with any of the controllers from /proc/cgroups followed by a dot, then they need to be prefixed with a single underscore. eg if there is an object "cpu.service", then this would end up as "_cpu.service" in the cgroup filesystem tree, however, "waldo.service" would stay "waldo.service", at least as long as nobody comes up with a cgroup controller called "waldo".
Do we also need to consider cgroup names beginning with leading '.'?
If so, that's something you need to coordinate with systemd again, so I'm okay if you defer that decision to a followup patch.
I didn't closely inspect the test; but at least the fact that it is there gives us reassurance that the change is right.
ACK, and series is okay for 1.0.5. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 26, 2013 at 05:37:24AM -0600, Eric Blake wrote:
On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If a user cgroup name begins with "cgroup.", "_" or with any of the controllers from /proc/cgroups followed by a dot, then they need to be prefixed with a single underscore. eg if there is an object "cpu.service", then this would end up as "_cpu.service" in the cgroup filesystem tree, however, "waldo.service" would stay "waldo.service", at least as long as nobody comes up with a cgroup controller called "waldo".
Do we also need to consider cgroup names beginning with leading '.'?
Good question, I'm just going to raise that question with the systemd folks.
Since we require a '.XXXX' suffix on all partitions, there is no scope for clashing with the kernel 'tasks' and 'release_agent' files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/vircgroupmock.c | 27 +++++++++++++--- tests/vircgrouptest.c | 54 ++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+), 4 deletions(-)
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +static int virCgroupNeedEscape(const char *path)
NeedsEscape sounds a bit better than NeedEscape, if you want to make the tweak.
Yep.
+static int virCgroupEscape(char **path) +{ + size_t len = strlen(*path); + int rc; + + if ((rc = virCgroupNeedEscape(*path)) <= 0) + return rc; +
From here...
+ if (VIR_REALLOC_N(*path, len + 2) < 0) + return -ENOMEM; + + memmove((*path) + 1, + *path, + len + 1); + (*path)[0] = '_';
...to here could be simplified to:
char escape = '_'; VIR_INSERT_ELEMENT(*path, 0, len, escape);
Oooh, very clever ! Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Apr 26, 2013 at 05:37:24AM -0600, Eric Blake wrote:
On 04/26/2013 04:45 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If a user cgroup name begins with "cgroup.", "_" or with any of the controllers from /proc/cgroups followed by a dot, then they need to be prefixed with a single underscore. eg if there is an object "cpu.service", then this would end up as "_cpu.service" in the cgroup filesystem tree, however, "waldo.service" would stay "waldo.service", at least as long as nobody comes up with a cgroup controller called "waldo".
Do we also need to consider cgroup names beginning with leading '.'?
Since we require a '.XXXX' suffix on all partitions, there is no scope for clashing with the kernel 'tasks' and 'release_agent' files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/vircgroupmock.c | 27 +++++++++++++--- tests/vircgrouptest.c | 54 ++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+), 4 deletions(-)
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +static int virCgroupNeedEscape(const char *path)
NeedsEscape sounds a bit better than NeedEscape, if you want to make the tweak.
+static int virCgroupEscape(char **path) +{ + size_t len = strlen(*path); + int rc; + + if ((rc = virCgroupNeedEscape(*path)) <= 0) + return rc; +
From here...
+ if (VIR_REALLOC_N(*path, len + 2) < 0) + return -ENOMEM; + + memmove((*path) + 1, + *path, + len + 1); + (*path)[0] = '_';
...to here could be simplified to:
char escape = '_'; VIR_INSERT_ELEMENT(*path, 0, len, escape);
Hurrah for tests. Caught the fact that I needed size_t len = strlen(*path) + 1; so that VIR_INSERT_ELEMENT keeps the null terminator :-) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Eric Blake