[libvirt] [PATCH 00/13] Support use of systemd-machined for cgroups

From: "Daniel P. Berrange" <berrange@redhat.com> This is a patch series which adds support for using systemd-machined for creating cgroups. The first 12 patches are all really just cleanups and refactoring. The actual systemd code is the last patch, but at time of posting it doesn't quite work properly. Given the freeze, I'd like to get the first 12 patches merged, while I iron out the remaining problems with the last patch. Daniel P. Berrange (13): Fix handling of DBus errors emitted by the bus itself Add logic for handling systemd-machined non-existance Add a virCgroupNewDetect API for finding cgroup placement Add API for checking if a cgroup is valid for a domain Auto-detect existing cgroup placement Remove obsolete cgroups creation apis Create + setup cgroups atomically for QEMU process Create + setup cgroups atomically for LXC process New cgroups API for atomically creating machine cgroups Convert QEMU driver to use virCgroupNewMachine Convert LXC driver to use virCgroupNewMachine Protection against doing bad stuff to the root group Enable support for systemd-machined in cgroups creation src/libvirt_private.syms | 5 +- src/lxc/lxc_cgroup.c | 82 +++------- src/lxc/lxc_cgroup.h | 2 +- src/lxc/lxc_controller.c | 19 +-- src/lxc/lxc_process.c | 52 +++++-- src/qemu/qemu_cgroup.c | 114 ++++++++------ src/qemu/qemu_cgroup.h | 5 +- src/qemu/qemu_process.c | 41 +++-- src/util/vircgroup.c | 388 +++++++++++++++++++++++++++++------------------ src/util/vircgroup.h | 32 ++-- src/util/virdbus.c | 5 +- src/util/virsystemd.c | 19 ++- tests/vircgrouptest.c | 112 -------------- tests/virsystemdmock.c | 14 +- tests/virsystemdtest.c | 63 +++++--- 15 files changed, 496 insertions(+), 457 deletions(-) -- 1.8.1.4

From: "Daniel P. Berrange" <berrange@redhat.com> Current code for handling dbus errors only works for errors received from the remote application itself. We must also handle errors emitted by the bus itself, for example, when it fails to spawn the target service. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virdbus.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/util/virdbus.c b/src/util/virdbus.c index 6221bdc..9b0977a 100644 --- a/src/util/virdbus.c +++ b/src/util/virdbus.c @@ -1129,9 +1129,8 @@ int virDBusCallMethod(DBusConnection *conn, call, VIR_DBUS_METHOD_CALL_TIMEOUT_MILLIS, &error))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Cannot send to %s.%s on path %s with interface %s: %s"), - destination, member, path, interface, NULLSTR(error.message)); + virReportDBusServiceError(error.message ? error.message : "unknown error", + error.name); goto cleanup; } -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Current code for handling dbus errors only works for errors received from the remote application itself. We must also handle errors emitted by the bus itself, for example, when it fails to spawn the target service.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virdbus.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
ACK. Looks like you also sent this one in isolation. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> If systemd machine does not exist, return -2 instead of -1, so that applications don't need to repeat the tedious error checking code Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virsystemd.c | 13 ++++++++++- tests/virsystemdmock.c | 14 +++++++---- tests/virsystemdtest.c | 63 +++++++++++++++++++++++++++++++++++--------------- 3 files changed, 66 insertions(+), 24 deletions(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 8477cd3..11d1153 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -26,6 +26,7 @@ #include "virstring.h" #include "viralloc.h" #include "virutil.h" +#include "virlog.h" #define VIR_FROM_THIS VIR_FROM_SYSTEMD @@ -38,6 +39,8 @@ * @rootdir: root directory of machine filesystem * @pidleader: PID of the leader process * @slice: name of the slice to place the machine in + * + * Returns 0 on success, -1 on fatal error, or -2 if systemd-machine is not available */ int virSystemdCreateMachine(const char *name, const char *drivername, @@ -117,6 +120,7 @@ int virSystemdCreateMachine(const char *name, * allow further API calls to be made against the object. */ + VIR_DEBUG("Attempting to create machine via systemd"); if (virDBusCallMethod(conn, NULL, "org.freedesktop.machine1", @@ -135,8 +139,15 @@ int virSystemdCreateMachine(const char *name, (unsigned int)pidleader, rootdir ? rootdir : "", 1, "Slice", "s", - slicename) < 0) + slicename) < 0) { + virErrorPtr err = virGetLastError(); + if (err->code == VIR_ERR_DBUS_SERVICE && + STREQ(err->str2, "org.freedesktop.DBus.Error.ServiceUnknown")) { + virResetLastError(); + ret = -2; + } goto cleanup; + } ret = 0; diff --git a/tests/virsystemdmock.c b/tests/virsystemdmock.c index 5f9cce6..1f4413c 100644 --- a/tests/virsystemdmock.c +++ b/tests/virsystemdmock.c @@ -60,16 +60,20 @@ dbus_bool_t dbus_connection_set_watch_functions(DBusConnection *connection ATTRI DBusMessage *dbus_connection_send_with_reply_and_block(DBusConnection *connection ATTRIBUTE_UNUSED, DBusMessage *message, int timeout_milliseconds ATTRIBUTE_UNUSED, - DBusError *error ATTRIBUTE_UNUSED) + DBusError *error) { - DBusMessage *reply; + DBusMessage *reply = NULL; dbus_message_set_serial(message, 7); - if (getenv("FAIL_NO_SERVICE")) + if (getenv("FAIL_BAD_SERVICE")) reply = dbus_message_new_error(message, - "org.freedesktop.DBus.Error.ServiceUnknown", - "The name org.freedesktop.machine1 was not provided by any .service files"); + "org.freedesktop.systemd.badthing", + "Something went wrong creating the machine"); + else if (getenv("FAIL_NO_SERVICE")) + dbus_set_error(error, + "org.freedesktop.DBus.Error.ServiceUnknown", + "%s", "The name org.freedesktop.machine1 was not provided by any .service files"); else reply = dbus_message_new_method_return(message); diff --git a/tests/virsystemdtest.c b/tests/virsystemdtest.c index 3992722..bcf3ad3 100644 --- a/tests/virsystemdtest.c +++ b/tests/virsystemdtest.c @@ -82,35 +82,60 @@ static int testCreateNoSystemd(const void *opaque ATTRIBUTE_UNUSED) 3, 3, 3, 3, 4, 4, 4, 4 }; + int rv; setenv("FAIL_NO_SERVICE", "1", 1); - if (virSystemdCreateMachine("demo", - "qemu", - true, - uuid, - NULL, - 123, - false, - NULL) == 0) { + if ((rv = virSystemdCreateMachine("demo", + "qemu", + true, + uuid, + NULL, + 123, + false, + NULL)) == 0) { fprintf(stderr, "%s", "Unexpected create machine success\n"); return -1; } - virErrorPtr err = virGetLastError(); + if (rv != -2) { + fprintf(stderr, "%s", "Unexpected create machine error\n"); + return -1; + } + + return 0; +} - if (!err) { - fprintf(stderr, "No error raised"); +static int testCreateBadSystemd(const void *opaque ATTRIBUTE_UNUSED) +{ + unsigned char uuid[VIR_UUID_BUFLEN] = { + 1, 1, 1, 1, + 2, 2, 2, 2, + 3, 3, 3, 3, + 4, 4, 4, 4 + }; + int rv; + + setenv("FAIL_BAD_SERVICE", "1", 1); + + if ((rv = virSystemdCreateMachine("demo", + "qemu", + true, + uuid, + NULL, + 123, + false, + NULL)) == 0) { + fprintf(stderr, "%s", "Unexpected create machine success\n"); return -1; } - if (err->code == VIR_ERR_DBUS_SERVICE && - STREQ(err->str2, "org.freedesktop.DBus.Error.ServiceUnknown")) - return 0; + if (rv != -1) { + fprintf(stderr, "%s", "Unexpected create machine error\n"); + return -1; + } - fprintf(stderr, "Unexpected error code %d / message %s\n", - err->code, err->str2); - return -1; + return 0; } static int @@ -122,7 +147,9 @@ mymain(void) ret = -1; if (virtTestRun("Test create machine ", 1, testCreateMachine, NULL) < 0) ret = -1; - if (virtTestRun("Test create nosystemd ", 1, testCreateNoSystemd, NULL) < 0) + if (virtTestRun("Test create no systemd ", 1, testCreateNoSystemd, NULL) < 0) + ret = -1; + if (virtTestRun("Test create bad systemd ", 1, testCreateBadSystemd, NULL) < 0) ret = -1; return ret==0 ? EXIT_SUCCESS : EXIT_FAILURE; -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
If systemd machine does not exist, return -2 instead of -1, so that applications don't need to repeat the tedious error checking code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/virsystemd.c | 13 ++++++++++- tests/virsystemdmock.c | 14 +++++++---- tests/virsystemdtest.c | 63 +++++++++++++++++++++++++++++++++++--------------- 3 files changed, 66 insertions(+), 24 deletions(-)
Another one you sent in isolation as well. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Add a virCgroupNewDetect API which is used to initialize a cgroup object with the placement of an arbitrary process. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 81 +++++++++++++++++++++++++++++++----------------- src/util/vircgroup.h | 3 ++ 3 files changed, 57 insertions(+), 28 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 76873ad..49b9f9d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1186,6 +1186,7 @@ virCgroupKill; virCgroupKillPainfully; virCgroupKillRecursive; virCgroupMoveTask; +virCgroupNewDetect; virCgroupNewDomainDriver; virCgroupNewDomainPartition; virCgroupNewDriver; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 5251611..94d19e0 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -306,18 +306,30 @@ static int virCgroupCopyPlacement(virCgroupPtr group, * It then appends @path to each detected path. */ static int virCgroupDetectPlacement(virCgroupPtr group, + pid_t pid, const char *path) { size_t i; FILE *mapping = NULL; char line[1024]; int ret = -1; + char *procfile; - mapping = fopen("/proc/self/cgroup", "r"); + if (pid == -1) { + if (VIR_STRDUP(procfile, "/proc/self/cgroup") < 0) + goto cleanup; + } else { + if (virAsprintf(&procfile, "/proc/%llu/cgroup", + (unsigned long long)pid) < 0) + goto cleanup; + } + + mapping = fopen(procfile, "r"); if (mapping == NULL) { - virReportSystemError(errno, "%s", - _("Unable to open /proc/self/cgroup")); - return -1; + virReportSystemError(errno, + _("Unable to open '%s'"), + procfile); + goto cleanup; } while (fgets(line, sizeof(line), mapping) != NULL) { @@ -371,12 +383,14 @@ static int virCgroupDetectPlacement(virCgroupPtr group, ret = 0; cleanup: + VIR_FREE(procfile); VIR_FORCE_FCLOSE(mapping); return ret; } static int virCgroupDetect(virCgroupPtr group, + pid_t pid, int controllers, const char *path, virCgroupPtr parent) @@ -453,7 +467,7 @@ static int virCgroupDetect(virCgroupPtr group, if (virCgroupCopyPlacement(group, path, parent) < 0) return -1; } else { - if (virCgroupDetectPlacement(group, path) < 0) + if (virCgroupDetectPlacement(group, pid, path) < 0) return -1; } @@ -470,10 +484,11 @@ static int virCgroupDetect(virCgroupPtr group, return -1; } - VIR_DEBUG("Detected mount/mapping %zu:%s at %s in %s", i, + VIR_DEBUG("Detected mount/mapping %zu:%s at %s in %s for pid %llu", i, virCgroupControllerTypeToString(i), group->controllers[i].mountPoint, - group->controllers[i].placement); + group->controllers[i].placement, + (unsigned long long)pid); } return 0; @@ -826,7 +841,8 @@ cleanup: * * Returns 0 on success, -1 on error */ -static int virCgroupNew(const char *path, +static int virCgroupNew(pid_t pid, + const char *path, virCgroupPtr parent, int controllers, virCgroupPtr *group) @@ -849,7 +865,7 @@ static int virCgroupNew(const char *path, goto error; } - if (virCgroupDetect(*group, controllers, path, parent) < 0) + if (virCgroupDetect(*group, pid, controllers, path, parent) < 0) goto error; return 0; @@ -871,7 +887,7 @@ static int virCgroupAppRoot(virCgroupPtr *group, if (virCgroupNewSelf(&selfgrp) < 0) return -1; - if (virCgroupNew("libvirt", selfgrp, controllers, group) < 0) + if (virCgroupNew(-1, "libvirt", selfgrp, controllers, group) < 0) goto cleanup; if (virCgroupMakeGroup(selfgrp, *group, create, VIR_CGROUP_NONE) < 0) @@ -1287,7 +1303,7 @@ int virCgroupNewPartition(const char *path, if (virCgroupSetPartitionSuffix(path, &newpath) < 0) goto cleanup; - if (virCgroupNew(newpath, NULL, controllers, group) < 0) + if (virCgroupNew(-1, newpath, NULL, controllers, group) < 0) goto cleanup; if (STRNEQ(newpath, "/")) { @@ -1299,7 +1315,7 @@ int virCgroupNewPartition(const char *path, tmp++; *tmp = '\0'; - if (virCgroupNew(parentPath, NULL, controllers, &parent) < 0) + if (virCgroupNew(-1, parentPath, NULL, controllers, &parent) < 0) goto cleanup; if (virCgroupMakeGroup(parent, *group, create, VIR_CGROUP_NONE) < 0) { @@ -1350,7 +1366,7 @@ int virCgroupNewDriver(const char *name, create, controllers) < 0) goto cleanup; - if (virCgroupNew(name, rootgrp, -1, group) < 0) + if (virCgroupNew(-1, name, rootgrp, -1, group) < 0) goto cleanup; if (virCgroupMakeGroup(rootgrp, *group, create, VIR_CGROUP_NONE) < 0) { @@ -1387,19 +1403,11 @@ int virCgroupNewDriver(const char *name ATTRIBUTE_UNUSED, * * Returns 0 on success, or -1 on error */ -#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R int virCgroupNewSelf(virCgroupPtr *group) { - return virCgroupNew("", NULL, -1, group); -} -#else -int virCgroupNewSelf(virCgroupPtr *group ATTRIBUTE_UNUSED) -{ - virReportSystemError(ENXIO, "%s", - _("Control groups not supported on this platform")); - return -1; + return virCgroupNewDetect(-1, group); } -#endif + /** * virCgroupNewDomainDriver: @@ -1418,7 +1426,7 @@ int virCgroupNewDomainDriver(virCgroupPtr driver, { int ret = -1; - if (virCgroupNew(name, driver, -1, group) < 0) + if (virCgroupNew(-1, name, driver, -1, group) < 0) goto cleanup; /* @@ -1480,7 +1488,7 @@ int virCgroupNewDomainPartition(virCgroupPtr partition, if (virCgroupPartitionEscape(&grpname) < 0) goto cleanup; - if (virCgroupNew(grpname, partition, -1, group) < 0) + if (virCgroupNew(-1, grpname, partition, -1, group) < 0) goto cleanup; /* @@ -1545,7 +1553,7 @@ int virCgroupNewVcpu(virCgroupPtr domain, (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | (1 << VIR_CGROUP_CONTROLLER_CPUSET)); - if (virCgroupNew(name, domain, controllers, group) < 0) + if (virCgroupNew(-1, name, domain, controllers, group) < 0) goto cleanup; if (virCgroupMakeGroup(domain, *group, create, VIR_CGROUP_NONE) < 0) { @@ -1592,7 +1600,7 @@ int virCgroupNewEmulator(virCgroupPtr domain, (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | (1 << VIR_CGROUP_CONTROLLER_CPUSET)); - if (virCgroupNew("emulator", domain, controllers, group) < 0) + if (virCgroupNew(-1, "emulator", domain, controllers, group) < 0) goto cleanup; if (virCgroupMakeGroup(domain, *group, create, VIR_CGROUP_NONE) < 0) { @@ -1617,6 +1625,23 @@ int virCgroupNewEmulator(virCgroupPtr domain ATTRIBUTE_UNUSED, #endif + +#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +int virCgroupNewDetect(pid_t pid, + virCgroupPtr *group) +{ + return virCgroupNew(pid, "", NULL, -1, group); +} +#else +int virCgroupNewDetect(pid_t pid ATTRIBUTE_UNUSED, + virCgroupPtr *group ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENXIO, "%s", + _("Control groups not supported on this platform")); + return -1; +} +#endif + bool virCgroupNewIgnoreError(void) { if (virLastErrorIsSystemErrno(ENXIO) || @@ -2573,7 +2598,7 @@ static int virCgroupKillRecursiveInternal(virCgroupPtr group, int signum, virHas VIR_DEBUG("Process subdir %s", ent->d_name); - if (virCgroupNew(ent->d_name, group, -1, &subgroup) < 0) + if (virCgroupNew(-1, ent->d_name, group, -1, &subgroup) < 0) goto cleanup; if ((rc = virCgroupKillRecursiveInternal(subgroup, signum, pids, true)) < 0) diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 0ec8b67..602d4ff 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -86,6 +86,9 @@ int virCgroupNewEmulator(virCgroupPtr domain, virCgroupPtr *group) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3); +int virCgroupNewDetect(pid_t pid, + virCgroupPtr *group); + bool virCgroupNewIgnoreError(void); int virCgroupPathOfController(virCgroupPtr group, -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add a virCgroupNewDetect API which is used to initialize a cgroup object with the placement of an arbitrary process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 81 +++++++++++++++++++++++++++++++----------------- src/util/vircgroup.h | 3 ++ 3 files changed, 57 insertions(+), 28 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Add virCgroupIsValidMachine API to check whether a auto detected cgroup is valid for a machine. This lets us check if a VM has just been placed into some generic shared cgroup, or worse, the root cgroup Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ src/util/vircgroup.h | 5 +++++ 3 files changed, 48 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 49b9f9d..3be604b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1182,6 +1182,7 @@ virCgroupGetMemSwapHardLimit; virCgroupGetMemSwapUsage; virCgroupHasController; virCgroupIsolateMount; +virCgroupIsValidMachineGroup; virCgroupKill; virCgroupKillPainfully; virCgroupKillRecursive; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 94d19e0..1043318 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -67,6 +67,8 @@ typedef enum { */ } virCgroupFlags; +static int virCgroupPartitionEscape(char **path); + bool virCgroupAvailable(void) { FILE *mounts = NULL; @@ -91,6 +93,46 @@ bool virCgroupAvailable(void) return ret; } +bool virCgroupIsValidMachineGroup(virCgroupPtr group, + const char *name, + const char *drivername) +{ + size_t i; + bool valid = false; + char *partname; + + if (virAsprintf(&partname, "%s.libvirt-%s", + name, drivername) < 0) + goto cleanup; + + if (virCgroupPartitionEscape(&partname) < 0) + goto cleanup; + + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + char *tmp; + + if (!group->controllers[i].placement) + continue; + + tmp = strrchr(group->controllers[i].placement, '/'); + if (!tmp) + goto cleanup; + tmp++; + + if (STRNEQ(tmp, name) && + STRNEQ(tmp, partname)) + goto cleanup; + + } + + valid = true; + + cleanup: + VIR_FREE(partname); + return valid; +} + + /** * virCgroupFree: * diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 602d4ff..9bf0d7e 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -48,6 +48,11 @@ VIR_ENUM_DECL(virCgroupController); bool virCgroupAvailable(void); +bool virCgroupIsValidMachineGroup(virCgroupPtr group, + const char *machinename, + const char *drivername); + + int virCgroupNewPartition(const char *path, bool create, int controllers, -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add virCgroupIsValidMachine API to check whether a auto
s/a/an/
detected cgroup is valid for a machine. This lets us check if a VM has just been placed into some generic shared cgroup, or worse, the root cgroup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ src/util/vircgroup.h | 5 +++++ 3 files changed, 48 insertions(+)
ACK -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Use the new virCgroupNewDetect function to determine cgroup placement of existing running VMs. This will allow the legacy cgroups creation APIs to be removed entirely Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 59 ++++++++++---------------- src/lxc/lxc_cgroup.h | 2 +- src/lxc/lxc_process.c | 14 ++++++- src/qemu/qemu_cgroup.c | 107 ++++++++++++++++++++++++++++-------------------- src/qemu/qemu_cgroup.h | 5 +-- src/qemu/qemu_process.c | 2 +- 6 files changed, 100 insertions(+), 89 deletions(-) diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c index 025720d..c230c25 100644 --- a/src/lxc/lxc_cgroup.c +++ b/src/lxc/lxc_cgroup.c @@ -429,12 +429,12 @@ cleanup: } -virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, bool startup) +virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def) { virCgroupPtr parent = NULL; virCgroupPtr cgroup = NULL; - if (!def->resource && startup) { + if (!def->resource) { virDomainResourceDefPtr res; if (VIR_ALLOC(res) < 0) @@ -448,41 +448,26 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, bool startup) def->resource = res; } - if (def->resource && - def->resource->partition) { - if (def->resource->partition[0] != '/') { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Resource partition '%s' must start with '/'"), - def->resource->partition); - goto cleanup; - } - /* We only auto-create the default partition. In other - * cases we expec the sysadmin/app to have done so */ - if (virCgroupNewPartition(def->resource->partition, - STREQ(def->resource->partition, "/machine"), - -1, - &parent) < 0) - goto cleanup; - - if (virCgroupNewDomainPartition(parent, - "lxc", - def->name, - true, - &cgroup) < 0) - goto cleanup; - } else { - if (virCgroupNewDriver("lxc", - true, - -1, - &parent) < 0) - goto cleanup; - - if (virCgroupNewDomainDriver(parent, - def->name, - true, - &cgroup) < 0) - goto cleanup; + if (def->resource->partition[0] != '/') { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource partition '%s' must start with '/'"), + def->resource->partition); + goto cleanup; } + /* We only auto-create the default partition. In other + * cases we expec the sysadmin/app to have done so */ + if (virCgroupNewPartition(def->resource->partition, + STREQ(def->resource->partition, "/machine"), + -1, + &parent) < 0) + goto cleanup; + + if (virCgroupNewDomainPartition(parent, + "lxc", + def->name, + true, + &cgroup) < 0) + goto cleanup; cleanup: virCgroupFree(&parent); @@ -495,7 +480,7 @@ virCgroupPtr virLXCCgroupJoin(virDomainDefPtr def) virCgroupPtr cgroup = NULL; int ret = -1; - if (!(cgroup = virLXCCgroupCreate(def, true))) + if (!(cgroup = virLXCCgroupCreate(def))) return NULL; if (virCgroupAddTask(cgroup, getpid()) < 0) diff --git a/src/lxc/lxc_cgroup.h b/src/lxc/lxc_cgroup.h index f040de2..25a427c 100644 --- a/src/lxc/lxc_cgroup.h +++ b/src/lxc/lxc_cgroup.h @@ -27,7 +27,7 @@ # include "lxc_fuse.h" # include "virusb.h" -virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, bool startup); +virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def); virCgroupPtr virLXCCgroupJoin(virDomainDefPtr def); int virLXCCgroupSetup(virDomainDefPtr def, virCgroupPtr cgroup, diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 5b83ccb..3642945 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -974,7 +974,7 @@ int virLXCProcessStart(virConnectPtr conn, virCgroupFree(&priv->cgroup); - if (!(priv->cgroup = virLXCCgroupCreate(vm->def, true))) + if (!(priv->cgroup = virLXCCgroupCreate(vm->def))) return -1; if (!virCgroupHasController(priv->cgroup, @@ -1385,9 +1385,19 @@ virLXCProcessReconnectDomain(virDomainObjPtr vm, if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) goto error; - if (!(priv->cgroup = virLXCCgroupCreate(vm->def, false))) + if (virCgroupNewDetect(vm->pid, &priv->cgroup) < 0) goto error; + if (!virCgroupIsValidMachineGroup(priv->cgroup, + vm->def->name, + "lxc")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cgroup name is not valid for machine %s"), + vm->def->name); + virCgroupFree(&priv->cgroup); + goto error; + } + if (virLXCUpdateActiveUsbHostdevs(driver, vm->def) < 0) goto error; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 02d2770..8559d26 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -627,10 +627,9 @@ qemuSetupCpuCgroup(virDomainObjPtr vm) } -int +static int qemuInitCgroup(virQEMUDriverPtr driver, - virDomainObjPtr vm, - bool startup) + virDomainObjPtr vm) { int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; @@ -645,7 +644,7 @@ qemuInitCgroup(virQEMUDriverPtr driver, virCgroupFree(&priv->cgroup); - if (!vm->def->resource && startup) { + if (!vm->def->resource) { virDomainResourceDefPtr res; if (VIR_ALLOC(res) < 0) @@ -659,59 +658,77 @@ qemuInitCgroup(virQEMUDriverPtr driver, vm->def->resource = res; } - if (vm->def->resource && - vm->def->resource->partition) { - if (vm->def->resource->partition[0] != '/') { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Resource partition '%s' must start with '/'"), - vm->def->resource->partition); - goto cleanup; - } - /* We only auto-create the default partition. In other - * cases we expec the sysadmin/app to have done so */ - if (virCgroupNewPartition(vm->def->resource->partition, - STREQ(vm->def->resource->partition, "/machine"), - cfg->cgroupControllers, - &parent) < 0) { - if (virCgroupNewIgnoreError()) - goto done; + if (vm->def->resource->partition[0] != '/') { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource partition '%s' must start with '/'"), + vm->def->resource->partition); + goto cleanup; + } + /* We only auto-create the default partition. In other + * cases we expec the sysadmin/app to have done so */ + if (virCgroupNewPartition(vm->def->resource->partition, + STREQ(vm->def->resource->partition, "/machine"), + cfg->cgroupControllers, + &parent) < 0) { + if (virCgroupNewIgnoreError()) + goto done; - goto cleanup; - } + goto cleanup; + } - if (virCgroupNewDomainPartition(parent, - "qemu", - vm->def->name, - true, - &priv->cgroup) < 0) - goto cleanup; - } else { - if (virCgroupNewDriver("qemu", - true, - cfg->cgroupControllers, - &parent) < 0) { - if (virCgroupNewIgnoreError()) - goto done; + if (virCgroupNewDomainPartition(parent, + "qemu", + vm->def->name, + true, + &priv->cgroup) < 0) + goto cleanup; - goto cleanup; - } +done: + ret = 0; +cleanup: + virCgroupFree(&parent); + virObjectUnref(cfg); + return ret; +} - if (virCgroupNewDomainDriver(parent, - vm->def->name, - true, - &priv->cgroup) < 0) - goto cleanup; + +int +qemuConnectCgroup(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + + if (!cfg->privileged) + goto done; + + if (!virCgroupAvailable()) + goto done; + + virCgroupFree(&priv->cgroup); + + if (virCgroupNewDetect(vm->pid, &priv->cgroup) < 0) { + if (virCgroupNewIgnoreError()) + goto done; + goto cleanup; + } + + if (!virCgroupIsValidMachineGroup(priv->cgroup, + vm->def->name, + "qemu")) { + VIR_DEBUG("Cgroup name is not valid for machine"); + virCgroupFree(&priv->cgroup); + goto done; } done: ret = 0; cleanup: - virCgroupFree(&parent); virObjectUnref(cfg); return ret; } - int qemuSetupCgroup(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -721,7 +738,7 @@ qemuSetupCgroup(virQEMUDriverPtr driver, virCapsPtr caps = NULL; int ret = -1; - if (qemuInitCgroup(driver, vm, true) < 0) + if (qemuInitCgroup(driver, vm) < 0) return -1; if (!priv->cgroup) diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 5faa5f9..14404d1 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -39,9 +39,8 @@ int qemuSetupHostdevCGroup(virDomainObjPtr vm, int qemuTeardownHostdevCgroup(virDomainObjPtr vm, virDomainHostdevDefPtr dev) ATTRIBUTE_RETURN_CHECK; -int qemuInitCgroup(virQEMUDriverPtr driver, - virDomainObjPtr vm, - bool startup); +int qemuConnectCgroup(virQEMUDriverPtr driver, + virDomainObjPtr vm); int qemuSetupCgroup(virQEMUDriverPtr driver, virDomainObjPtr vm, virBitmapPtr nodemask); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a46d944..c5f281a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3070,7 +3070,7 @@ qemuProcessReconnect(void *opaque) if (qemuUpdateActiveScsiHostdevs(driver, obj->def) < 0) goto error; - if (qemuInitCgroup(driver, obj, false) < 0) + if (qemuConnectCgroup(driver, obj) < 0) goto error; /* XXX: Need to change as long as lock is introduced for -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Use the new virCgroupNewDetect function to determine cgroup placement of existing running VMs. This will allow the legacy cgroups creation APIs to be removed entirely
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 59 ++++++++++---------------- src/lxc/lxc_cgroup.h | 2 +- src/lxc/lxc_process.c | 14 ++++++- src/qemu/qemu_cgroup.c | 107 ++++++++++++++++++++++++++++-------------------- src/qemu/qemu_cgroup.h | 5 +-- src/qemu/qemu_process.c | 2 +- 6 files changed, 100 insertions(+), 89 deletions(-)
+ /* We only auto-create the default partition. In other + * cases we expec the sysadmin/app to have done so */
Pre-existing, but as long as you are re-indenting: s/expec/expect/ (two files; both lxc and qemu) ACK with that fix -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The virCgroupNewDomainDriver and virCgroupNewDriver methods are obsolete now that we can auto-detect existing cgroup placement. Delete them to reduce code bloat. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 2 - src/util/vircgroup.c | 121 ----------------------------------------------- src/util/vircgroup.h | 11 ----- tests/vircgrouptest.c | 112 ------------------------------------------- 4 files changed, 246 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3be604b..5a5d112 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1188,9 +1188,7 @@ virCgroupKillPainfully; virCgroupKillRecursive; virCgroupMoveTask; virCgroupNewDetect; -virCgroupNewDomainDriver; virCgroupNewDomainPartition; -virCgroupNewDriver; virCgroupNewEmulator; virCgroupNewIgnoreError; virCgroupNewPartition; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 1043318..593caad 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -918,28 +918,6 @@ error: return -1; } - -static int virCgroupAppRoot(virCgroupPtr *group, - bool create, - int controllers) -{ - virCgroupPtr selfgrp = NULL; - int ret = -1; - - if (virCgroupNewSelf(&selfgrp) < 0) - return -1; - - if (virCgroupNew(-1, "libvirt", selfgrp, controllers, group) < 0) - goto cleanup; - - if (virCgroupMakeGroup(selfgrp, *group, create, VIR_CGROUP_NONE) < 0) - goto cleanup; - - ret = 0; -cleanup: - virCgroupFree(&selfgrp); - return ret; -} #endif #if defined _DIRENT_HAVE_D_TYPE @@ -1387,53 +1365,6 @@ int virCgroupNewPartition(const char *path ATTRIBUTE_UNUSED, } #endif -/** - * virCgroupNewDriver: - * - * @name: name of this driver (e.g., xen, qemu, lxc) - * @group: Pointer to returned virCgroupPtr - * - * Returns 0 on success, or -1 on error - */ -#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R -int virCgroupNewDriver(const char *name, - bool create, - int controllers, - virCgroupPtr *group) -{ - int ret = -1; - virCgroupPtr rootgrp = NULL; - - if (virCgroupAppRoot(&rootgrp, - create, controllers) < 0) - goto cleanup; - - if (virCgroupNew(-1, name, rootgrp, -1, group) < 0) - goto cleanup; - - if (virCgroupMakeGroup(rootgrp, *group, create, VIR_CGROUP_NONE) < 0) { - virCgroupRemove(*group); - virCgroupFree(group); - goto cleanup; - } - - ret = 0; - -cleanup: - virCgroupFree(&rootgrp); - return ret; -} -#else -int virCgroupNewDriver(const char *name ATTRIBUTE_UNUSED, - bool create ATTRIBUTE_UNUSED, - int controllers ATTRIBUTE_UNUSED, - virCgroupPtr *group ATTRIBUTE_UNUSED) -{ - virReportSystemError(ENXIO, "%s", - _("Control groups not supported on this platform")); - return -1; -} -#endif /** * virCgroupNewSelf: @@ -1452,58 +1383,6 @@ int virCgroupNewSelf(virCgroupPtr *group) /** - * virCgroupNewDomainDriver: - * - * @driver: group for driver owning the domain - * @name: name of the domain - * @group: Pointer to returned virCgroupPtr - * - * Returns 0 on success, or -1 on error - */ -#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R -int virCgroupNewDomainDriver(virCgroupPtr driver, - const char *name, - bool create, - virCgroupPtr *group) -{ - int ret = -1; - - if (virCgroupNew(-1, name, driver, -1, group) < 0) - goto cleanup; - - /* - * Create a cgroup with memory.use_hierarchy enabled to - * surely account memory usage of lxc with ns subsystem - * enabled. (To be exact, memory and ns subsystems are - * enabled at the same time.) - * - * The reason why doing it here, not a upper group, say - * a group for driver, is to avoid overhead to track - * cumulative usage that we don't need. - */ - if (virCgroupMakeGroup(driver, *group, create, VIR_CGROUP_MEM_HIERACHY) < 0) { - virCgroupRemove(*group); - virCgroupFree(group); - goto cleanup; - } - - ret = 0; -cleanup: - return ret; -} -#else -int virCgroupNewDomainDriver(virCgroupPtr driver ATTRIBUTE_UNUSED, - const char *name ATTRIBUTE_UNUSED, - bool create ATTRIBUTE_UNUSED, - virCgroupPtr *group ATTRIBUTE_UNUSED) -{ - virReportSystemError(ENXIO, "%s", - _("Control groups not supported on this platform")); - return -1; -} -#endif - -/** * virCgroupNewDomainPartition: * * @partition: partition holding the domain diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 9bf0d7e..3c05604 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -59,20 +59,9 @@ int virCgroupNewPartition(const char *path, virCgroupPtr *group) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); -int virCgroupNewDriver(const char *name, - bool create, - int controllers, - virCgroupPtr *group) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); - int virCgroupNewSelf(virCgroupPtr *group) ATTRIBUTE_NONNULL(1); -int virCgroupNewDomainDriver(virCgroupPtr driver, - const char *name, - bool create, - virCgroupPtr *group) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(4); int virCgroupNewDomainPartition(virCgroupPtr partition, const char *driver, const char *name, diff --git a/tests/vircgrouptest.c b/tests/vircgrouptest.c index 18d10d9..20ac494 100644 --- a/tests/vircgrouptest.c +++ b/tests/vircgrouptest.c @@ -148,112 +148,6 @@ cleanup: /* Asking for impossible combination since CPU is co-mounted */ -static int testCgroupNewForDriver(const void *args ATTRIBUTE_UNUSED) -{ - virCgroupPtr cgroup = NULL; - int ret = -1; - int rv; - const char *placementSmall[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/system/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/system/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_CPUSET] = NULL, - [VIR_CGROUP_CONTROLLER_MEMORY] = "/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = NULL, - [VIR_CGROUP_CONTROLLER_BLKIO] = NULL, - }; - const char *placementFull[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/system/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/system/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/libvirt/lxc", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/libvirt/lxc", - }; - - if ((rv = virCgroupNewDriver("lxc", false, -1, &cgroup)) != -1) { - fprintf(stderr, "Unexpected found LXC cgroup: %d\n", -rv); - goto cleanup; - } - ENSURE_ERRNO(ENOENT); - - if ((rv = virCgroupNewDriver("lxc", true, - (1 << VIR_CGROUP_CONTROLLER_CPU), - &cgroup)) != -1) { - fprintf(stderr, "Should not have created LXC cgroup: %d\n", -rv); - goto cleanup; - } - ENSURE_ERRNO(EINVAL); - - /* Asking for impossible combination since devices is not mounted */ - if ((rv = virCgroupNewDriver("lxc", true, - (1 << VIR_CGROUP_CONTROLLER_DEVICES), - &cgroup)) != -1) { - fprintf(stderr, "Should not have created LXC cgroup: %d\n", -rv); - goto cleanup; - } - ENSURE_ERRNO(ENXIO); - - /* Asking for small combination since devices is not mounted */ - if ((rv = virCgroupNewDriver("lxc", true, - (1 << VIR_CGROUP_CONTROLLER_CPU) | - (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | - (1 << VIR_CGROUP_CONTROLLER_MEMORY), - &cgroup)) != 0) { - fprintf(stderr, "Cannot create LXC cgroup: %d\n", -rv); - goto cleanup; - } - ret = validateCgroup(cgroup, "libvirt/lxc", mountsSmall, links, placementSmall); - virCgroupFree(&cgroup); - - if ((rv = virCgroupNewDriver("lxc", true, -1, &cgroup)) != 0) { - fprintf(stderr, "Cannot create LXC cgroup: %d\n", -rv); - goto cleanup; - } - ret = validateCgroup(cgroup, "libvirt/lxc", mountsFull, links, placementFull); - -cleanup: - virCgroupFree(&cgroup); - return ret; -} - - -static int testCgroupNewForDriverDomain(const void *args ATTRIBUTE_UNUSED) -{ - virCgroupPtr drivercgroup = NULL; - virCgroupPtr domaincgroup = NULL; - int ret = -1; - int rv; - const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { - [VIR_CGROUP_CONTROLLER_CPU] = "/system/libvirt/lxc/wibble", - [VIR_CGROUP_CONTROLLER_CPUACCT] = "/system/libvirt/lxc/wibble", - [VIR_CGROUP_CONTROLLER_CPUSET] = "/libvirt/lxc/wibble", - [VIR_CGROUP_CONTROLLER_MEMORY] = "/libvirt/lxc/wibble", - [VIR_CGROUP_CONTROLLER_DEVICES] = NULL, - [VIR_CGROUP_CONTROLLER_FREEZER] = "/libvirt/lxc/wibble", - [VIR_CGROUP_CONTROLLER_BLKIO] = "/libvirt/lxc/wibble", - }; - - if ((rv = virCgroupNewDriver("lxc", false, -1, &drivercgroup)) != 0) { - fprintf(stderr, "Cannot find LXC cgroup: %d\n", -rv); - goto cleanup; - } - - if ((rv = virCgroupNewDomainDriver(drivercgroup, "wibble", true, &domaincgroup)) != 0) { - fprintf(stderr, "Cannot create LXC cgroup: %d\n", -rv); - goto cleanup; - } - - ret = validateCgroup(domaincgroup, "libvirt/lxc/wibble", mountsFull, links, placement); - -cleanup: - virCgroupFree(&drivercgroup); - virCgroupFree(&domaincgroup); - return ret; -} - - static int testCgroupNewForPartition(const void *args ATTRIBUTE_UNUSED) { virCgroupPtr cgroup = NULL; @@ -537,12 +431,6 @@ mymain(void) if (virtTestRun("New cgroup for self", 1, testCgroupNewForSelf, NULL) < 0) ret = -1; - if (virtTestRun("New cgroup for driver", 1, testCgroupNewForDriver, NULL) < 0) - ret = -1; - - if (virtTestRun("New cgroup for domain driver", 1, testCgroupNewForDriverDomain, NULL) < 0) - ret = -1; - if (virtTestRun("New cgroup for partition", 1, testCgroupNewForPartition, NULL) < 0) ret = -1; -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The virCgroupNewDomainDriver and virCgroupNewDriver methods are obsolete now that we can auto-detect existing cgroup placement. Delete them to reduce code bloat.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 2 - src/util/vircgroup.c | 121 ----------------------------------------------- src/util/vircgroup.h | 11 ----- tests/vircgrouptest.c | 112 ------------------------------------------- 4 files changed, 246 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Currently the QEMU driver creates the VM's cgroup prior to forking, and then uses a virCommand hook to move the child into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically. Fortunately we have a handshake taking place between the QEMU driver and the child process prior to QEMU being exec()d, which was introduced to allow setup of disk locking. By good fortune this synchronization point can be used to enable the QEMU drivedr to do atomic setup of cgroups removing the use of the hook script. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_cgroup.c | 12 +++++++++--- src/qemu/qemu_process.c | 39 +++++++++++++++++---------------------- 2 files changed, 26 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 8559d26..6455f50 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -683,6 +683,9 @@ qemuInitCgroup(virQEMUDriverPtr driver, &priv->cgroup) < 0) goto cleanup; + if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) + goto cleanup; + done: ret = 0; cleanup: @@ -738,6 +741,12 @@ qemuSetupCgroup(virQEMUDriverPtr driver, virCapsPtr caps = NULL; int ret = -1; + if (!vm->pid) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot setup cgroups until process is started")); + return -1; + } + if (qemuInitCgroup(driver, vm) < 0) return -1; @@ -1009,8 +1018,5 @@ qemuAddToCgroup(virDomainObjPtr vm) if (priv->cgroup == NULL) return 0; /* Not supported, so claim success */ - if (virCgroupAddTask(priv->cgroup, getpid()) < 0) - return -1; - return 0; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c5f281a..e8e459e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1929,6 +1929,12 @@ qemuProcessInitCpuAffinity(virQEMUDriverPtr driver, virBitmapPtr cpumap = NULL; virBitmapPtr cpumapToSet = NULL; + if (!vm->pid) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot setup CPU affinity until process is started")); + return -1; + } + if (!(cpumap = qemuPrepareCpumap(driver, nodemask))) return -1; @@ -1949,11 +1955,7 @@ qemuProcessInitCpuAffinity(virQEMUDriverPtr driver, } } - /* We are pressuming we are running between fork/exec of QEMU - * so use '0' to indicate our own process ID. No threads are - * running at this point - */ - if (virProcessSetAffinity(0 /* Self */, cpumapToSet) < 0) + if (virProcessSetAffinity(vm->pid, cpumapToSet) < 0) goto cleanup; ret = 0; @@ -2562,19 +2564,6 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; - /* This must take place before exec(), so that all QEMU - * memory allocation is on the correct NUMA node - */ - VIR_DEBUG("Moving process to cgroup"); - if (qemuAddToCgroup(h->vm) < 0) - goto cleanup; - - /* This must be done after cgroup placement to avoid resetting CPU - * affinity */ - if (!h->vm->def->cputune.emulatorpin && - qemuProcessInitCpuAffinity(h->driver, h->vm, h->nodemask) < 0) - goto cleanup; - if (virNumaSetupMemoryPolicy(h->vm->def->numatune, h->nodemask) < 0) goto cleanup; @@ -3671,10 +3660,6 @@ int qemuProcessStart(virConnectPtr conn, goto cleanup; } - VIR_DEBUG("Setting up domain cgroup (if required)"); - if (qemuSetupCgroup(driver, vm, nodemask) < 0) - goto cleanup; - if (VIR_ALLOC(priv->monConfig) < 0) goto cleanup; @@ -3844,6 +3829,16 @@ int qemuProcessStart(virConnectPtr conn, goto cleanup; } + VIR_DEBUG("Setting up domain cgroup (if required)"); + if (qemuSetupCgroup(driver, vm, nodemask) < 0) + goto cleanup; + + /* This must be done after cgroup placement to avoid resetting CPU + * affinity */ + if (!vm->def->cputune.emulatorpin && + qemuProcessInitCpuAffinity(driver, vm, nodemask) < 0) + goto cleanup; + VIR_DEBUG("Setting domain security labels"); if (virSecurityManagerSetAllLabel(driver->securityManager, vm->def, stdin_path) < 0) -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Currently the QEMU driver creates the VM's cgroup prior to forking, and then uses a virCommand hook to move the child into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically.
Fortunately we have a handshake taking place between the QEMU driver and the child process prior to QEMU being exec()d, which was introduced to allow setup of disk locking. By good fortune this synchronization point can be used to enable the QEMU drivedr to do atomic setup of cgroups removing the use
s/drivedr/driver/
of the hook script.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_cgroup.c | 12 +++++++++--- src/qemu/qemu_process.c | 39 +++++++++++++++++---------------------- 2 files changed, 26 insertions(+), 25 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Currently the LXC driver creates the VM's cgroup prior to forking, and then libvirt_lxc moves the child process into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically. Fortunately we simply move the entire of cgroups setup into the libvirt_lxc child process. We make it take place before fork'ing into the background, so by the time virCommandRun returns in the LXC driver, the cgroup is guaranteed to be present. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 19 ++++++++++--------- src/lxc/lxc_process.c | 40 +++++++++++++++++++++++++++------------- 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 41d69b3..bbec344 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -128,6 +128,8 @@ struct _virLXCController { bool inShutdown; int timerShutdown; + virCgroupPtr cgroup; + virLXCFusePtr fuse; }; @@ -275,6 +277,8 @@ static void virLXCControllerFree(virLXCControllerPtr ctrl) virObjectUnref(ctrl->server); virLXCControllerFreeFuse(ctrl); + virCgroupFree(&ctrl->cgroup); + /* This must always be the last thing to be closed */ VIR_FORCE_CLOSE(ctrl->handshakeFd); VIR_FREE(ctrl); @@ -657,8 +661,7 @@ cleanup: * * Returns 0 on success or -1 in case of error */ -static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl, - virCgroupPtr cgroup) +static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl) { virBitmapPtr nodemask = NULL; int ret = -1; @@ -670,7 +673,7 @@ static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl, if (virLXCControllerSetupCpuAffinity(ctrl) < 0) goto cleanup; - if (virLXCCgroupSetup(ctrl->def, cgroup, nodemask) < 0) + if (virLXCCgroupSetup(ctrl->def, ctrl->cgroup, nodemask) < 0) goto cleanup; ret = 0; @@ -2102,7 +2105,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl) int containerhandshake[2] = { -1, -1 }; char **containerTTYPaths = NULL; size_t i; - virCgroupPtr cgroup = NULL; if (VIR_ALLOC_N(containerTTYPaths, ctrl->nconsoles) < 0) goto cleanup; @@ -2122,13 +2124,10 @@ virLXCControllerRun(virLXCControllerPtr ctrl) if (virLXCControllerSetupPrivateNS() < 0) goto cleanup; - if (!(cgroup = virLXCCgroupJoin(ctrl->def))) - goto cleanup; - if (virLXCControllerSetupLoopDevices(ctrl) < 0) goto cleanup; - if (virLXCControllerSetupResourceLimits(ctrl, cgroup) < 0) + if (virLXCControllerSetupResourceLimits(ctrl) < 0) goto cleanup; if (virLXCControllerSetupDevPTS(ctrl) < 0) @@ -2214,7 +2213,6 @@ cleanup: VIR_FREE(containerTTYPaths[i]); VIR_FREE(containerTTYPaths); - virCgroupFree(&cgroup); virLXCControllerStopInit(ctrl); return rc; @@ -2390,6 +2388,9 @@ int main(int argc, char *argv[]) if (virLXCControllerValidateConsoles(ctrl) < 0) goto cleanup; + if (!(ctrl->cgroup = virLXCCgroupJoin(ctrl->def))) + goto cleanup; + if (virLXCControllerSetupServer(ctrl) < 0) goto cleanup; diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 3642945..1b31cef 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -49,6 +49,7 @@ #include "virhook.h" #include "virstring.h" #include "viratomic.h" +#include "virprocess.h" #define VIR_FROM_THIS VIR_FROM_LXC @@ -701,9 +702,9 @@ int virLXCProcessStop(virLXCDriverPtr driver, return -1; } } else { - /* If cgroup doesn't exist, the VM pids must have already - * died and so we're just cleaning up stale state - */ + /* If cgroup doesn't exist, just try cleaning up th + * libvirt_lxc process */ + virProcessKillPainfully(vm->pid, true); } virLXCProcessCleanup(driver, vm, reason); @@ -971,33 +972,33 @@ int virLXCProcessStart(virConnectPtr conn, virCapsPtr caps = NULL; virErrorPtr err = NULL; virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver); + virCgroupPtr selfcgroup; - virCgroupFree(&priv->cgroup); - - if (!(priv->cgroup = virLXCCgroupCreate(vm->def))) + if (virCgroupNewSelf(&selfcgroup) < 0) return -1; - if (!virCgroupHasController(priv->cgroup, + if (!virCgroupHasController(selfcgroup, VIR_CGROUP_CONTROLLER_CPUACCT)) { - virCgroupFree(&priv->cgroup); + virCgroupFree(&selfcgroup); virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unable to find 'cpuacct' cgroups controller mount")); return -1; } - if (!virCgroupHasController(priv->cgroup, + if (!virCgroupHasController(selfcgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { - virCgroupFree(&priv->cgroup); + virCgroupFree(&selfcgroup); virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unable to find 'devices' cgroups controller mount")); return -1; } - if (!virCgroupHasController(priv->cgroup, + if (!virCgroupHasController(selfcgroup, VIR_CGROUP_CONTROLLER_MEMORY)) { - virCgroupFree(&priv->cgroup); + virCgroupFree(&selfcgroup); virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unable to find 'memory' cgroups controller mount")); return -1; } + virCgroupFree(&selfcgroup); if (virFileMakePath(cfg->logDir) < 0) { virReportSystemError(errno, @@ -1170,7 +1171,7 @@ int virLXCProcessStart(virConnectPtr conn, /* Connect to the controller as a client *first* because * this will block until the child has written their - * pid file out to disk */ + * pid file out to disk & created their cgroup */ if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) goto cleanup; @@ -1188,6 +1189,19 @@ int virLXCProcessStart(virConnectPtr conn, goto cleanup; } + if (virCgroupNewDetect(vm->pid, &priv->cgroup) < 0) + goto error; + + if (!virCgroupIsValidMachineGroup(priv->cgroup, + vm->def->name, + "lxc")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cgroup name is not valid for machine %s"), + vm->def->name); + virCgroupFree(&priv->cgroup); + goto error; + } + priv->stopReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; priv->wantReboot = false; vm->def->id = vm->pid; -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Currently the LXC driver creates the VM's cgroup prior to forking, and then libvirt_lxc moves the child process into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically.
Fortunately we simply move the entire of cgroups setup into
s/of//
the libvirt_lxc child process. We make it take place before fork'ing into the background, so by the time virCommandRun returns in the LXC driver, the cgroup is guaranteed to be present.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 19 ++++++++++--------- src/lxc/lxc_process.c | 40 +++++++++++++++++++++++++++------------- 2 files changed, 37 insertions(+), 22 deletions(-)
+++ b/src/lxc/lxc_process.c @@ -49,6 +49,7 @@ #include "virhook.h" #include "virstring.h" #include "viratomic.h" +#include "virprocess.h"
#define VIR_FROM_THIS VIR_FROM_LXC
@@ -701,9 +702,9 @@ int virLXCProcessStop(virLXCDriverPtr driver, return -1; } } else { - /* If cgroup doesn't exist, the VM pids must have already - * died and so we're just cleaning up stale state - */ + /* If cgroup doesn't exist, just try cleaning up th
s/th/the/ ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Instead of requiring one API call to create a cgroup and another to add a task to it, introduce a new API virCgroupNewMachine which does both jobs at once. This will facilitate the later code to talk to systemd to achieve this job which is also atomic. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/vircgroup.h | 13 ++++++++++++ 3 files changed, 65 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5a5d112..eef6bdd 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1191,6 +1191,7 @@ virCgroupNewDetect; virCgroupNewDomainPartition; virCgroupNewEmulator; virCgroupNewIgnoreError; +virCgroupNewMachine; virCgroupNewPartition; virCgroupNewSelf; virCgroupNewVcpu; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 593caad..6f9d25a 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1563,6 +1563,57 @@ int virCgroupNewDetect(pid_t pid ATTRIBUTE_UNUSED, } #endif +int virCgroupNewMachine(const char *name, + const char *drivername, + bool privileged ATTRIBUTE_UNUSED, + const unsigned char *uuid ATTRIBUTE_UNUSED, + const char *rootdir ATTRIBUTE_UNUSED, + pid_t pidleader ATTRIBUTE_UNUSED, + bool isContainer ATTRIBUTE_UNUSED, + const char *partition, + int controllers, + virCgroupPtr *group) +{ + virCgroupPtr parent = NULL; + int ret = -1; + + *group = NULL; + + if (virCgroupNewPartition(partition, + STREQ(partition, "/machine"), + controllers, + &parent) < 0) { + if (virCgroupNewIgnoreError()) + goto done; + + goto cleanup; + } + + if (virCgroupNewDomainPartition(parent, + drivername, + name, + true, + group) < 0) + goto cleanup; + + if (virCgroupAddTask(*group, pidleader) < 0) { + virErrorPtr saved = virSaveLastError(); + virCgroupRemove(*group); + virCgroupFree(group); + if (saved) { + virSetError(saved); + virFreeError(saved); + } + } + +done: + ret = 0; + +cleanup: + virCgroupFree(&parent); + return ret; +} + bool virCgroupNewIgnoreError(void) { if (virLastErrorIsSystemErrno(ENXIO) || diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 3c05604..e47367c 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -83,6 +83,19 @@ int virCgroupNewEmulator(virCgroupPtr domain, int virCgroupNewDetect(pid_t pid, virCgroupPtr *group); +int virCgroupNewMachine(const char *name, + const char *drivername, + bool privileged, + const unsigned char *uuid, + const char *rootdir, + pid_t pidleader, + bool isContainer, + const char *partition, + int controllers, + virCgroupPtr *group) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_NONNULL(4); + bool virCgroupNewIgnoreError(void); int virCgroupPathOfController(virCgroupPtr group, -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Instead of requiring one API call to create a cgroup and another to add a task to it, introduce a new API virCgroupNewMachine which does both jobs at once. This will facilitate the later code to talk to systemd to achieve this job which is also atomic.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/vircgroup.h | 13 ++++++++++++ 3 files changed, 65 insertions(+)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Convert the QEMU driver code to use the new atomic API for setup of cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_cgroup.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 6455f50..bca8630 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -633,7 +633,6 @@ qemuInitCgroup(virQEMUDriverPtr driver, { int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; - virCgroupPtr parent = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); if (!cfg->privileged) @@ -664,32 +663,26 @@ qemuInitCgroup(virQEMUDriverPtr driver, vm->def->resource->partition); goto cleanup; } - /* We only auto-create the default partition. In other - * cases we expec the sysadmin/app to have done so */ - if (virCgroupNewPartition(vm->def->resource->partition, - STREQ(vm->def->resource->partition, "/machine"), - cfg->cgroupControllers, - &parent) < 0) { + + if (virCgroupNewMachine(vm->def->name, + "qemu", + cfg->privileged, + vm->def->uuid, + NULL, + vm->pid, + false, + vm->def->resource->partition, + cfg->cgroupControllers, + &priv->cgroup) < 0) { if (virCgroupNewIgnoreError()) goto done; goto cleanup; } - if (virCgroupNewDomainPartition(parent, - "qemu", - vm->def->name, - true, - &priv->cgroup) < 0) - goto cleanup; - - if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) - goto cleanup; - done: ret = 0; cleanup: - virCgroupFree(&parent); virObjectUnref(cfg); return ret; } -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Convert the QEMU driver code to use the new atomic API for setup of cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_cgroup.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Convert the LXC driver code to use the new atomic API for setup of cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 53 +++++++++++++++--------------------------------- src/lxc/lxc_controller.c | 2 +- 2 files changed, 17 insertions(+), 38 deletions(-) diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c index c230c25..af91b04 100644 --- a/src/lxc/lxc_cgroup.c +++ b/src/lxc/lxc_cgroup.c @@ -431,7 +431,6 @@ cleanup: virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def) { - virCgroupPtr parent = NULL; virCgroupPtr cgroup = NULL; if (!def->resource) { @@ -454,46 +453,26 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def) def->resource->partition); goto cleanup; } - /* We only auto-create the default partition. In other - * cases we expec the sysadmin/app to have done so */ - if (virCgroupNewPartition(def->resource->partition, - STREQ(def->resource->partition, "/machine"), - -1, - &parent) < 0) - goto cleanup; - if (virCgroupNewDomainPartition(parent, - "lxc", - def->name, - true, - &cgroup) < 0) + /* + * XXX + * We should pass the PID of the LXC init process + * not ourselves, but this requires some more + * refactoring. We should also pass the root dir + */ + if (virCgroupNewMachine(def->name, + "lxc", + true, + def->uuid, + NULL, + getpid(), + true, + def->resource->partition, + -1, + &cgroup) < 0) goto cleanup; cleanup: - virCgroupFree(&parent); - return cgroup; -} - - -virCgroupPtr virLXCCgroupJoin(virDomainDefPtr def) -{ - virCgroupPtr cgroup = NULL; - int ret = -1; - - if (!(cgroup = virLXCCgroupCreate(def))) - return NULL; - - if (virCgroupAddTask(cgroup, getpid()) < 0) - goto cleanup; - - ret = 0; - -cleanup: - if (ret < 0) { - virCgroupFree(&cgroup); - return NULL; - } - return cgroup; } diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index bbec344..124ab19 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2388,7 +2388,7 @@ int main(int argc, char *argv[]) if (virLXCControllerValidateConsoles(ctrl) < 0) goto cleanup; - if (!(ctrl->cgroup = virLXCCgroupJoin(ctrl->def))) + if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def))) goto cleanup; if (virLXCControllerSetupServer(ctrl) < 0) -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Convert the LXC driver code to use the new atomic API for setup of cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 53 +++++++++++++++--------------------------------- src/lxc/lxc_controller.c | 2 +- 2 files changed, 17 insertions(+), 38 deletions(-)
+ /* + * XXX + * We should pass the PID of the LXC init process + * not ourselves, but this requires some more + * refactoring. We should also pass the root dir + */ + if (virCgroupNewMachine(def->name, + "lxc", + true, + def->uuid, + NULL, + getpid(),
The comment sounded a bit ominous about passing wrong information; but as long as it's not permanently getting us into a setup where we can't recover, but just represents an inefficiency, I don't think it hurts to make incremental progress. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> Add protection such that the virCgroupRemove and virCgroupKill* do not do anything to the root cgroup. Killing all PIDs in the root cgroup does not end well. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 6f9d25a..2141154 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -546,8 +546,13 @@ int virCgroupPathOfController(virCgroupPtr group, if (controller == -1) { size_t i; for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + /* Reject any controller with a placement + * of '/' to avoid doing bad stuff to the root + * cgroup + */ if (group->controllers[i].mountPoint && - group->controllers[i].placement) { + group->controllers[i].placement && + STRNEQ(group->controllers[i].placement, "/")) { controller = i; break; } @@ -1002,6 +1007,11 @@ int virCgroupRemove(virCgroupPtr group) if (!group->controllers[i].mountPoint) continue; + /* Don't delete the root group, if we accidentally + ended up in it for some reason */ + if (STREQ(group->controllers[i].placement, "/")) + continue; + if (virCgroupPathOfController(group, i, NULL, -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add protection such that the virCgroupRemove and virCgroupKill* do not do anything to the root cgroup.
Killing all PIDs in the root cgroup does not end well.
I take it you tried this, on accident :)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
ACK.
@@ -1002,6 +1007,11 @@ int virCgroupRemove(virCgroupPtr group) if (!group->controllers[i].mountPoint) continue;
+ /* Don't delete the root group, if we accidentally + ended up in it for some reason */ + if (STREQ(group->controllers[i].placement, "/")) + continue;
A VIR_DEBUG might be handy to make it obvious to the next developer that we are intentionally skipping this. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Jul 23, 2013 at 03:58:30PM -0600, Eric Blake wrote:
On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add protection such that the virCgroupRemove and virCgroupKill* do not do anything to the root cgroup.
Killing all PIDs in the root cgroup does not end well.
I take it you tried this, on accident :)
Yes, due to a bug elsewhere in the code I ended up with an LXC container in the / cgroup. When killing the container libvirt recursively kills all processes in the container's cgroup. When the cgroup is / this is all processes on the host until it kills itself :-) I now have a 30 mile journey to the colo to reboot this machine since I have no remote power control on it :-) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

From: "Daniel P. Berrange" <berrange@redhat.com> Make the virCgroupNewMachine method try to use systemd-machined first. If that fails, then fallback to using the traditional cgroup setup code path. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 115 ++++++++++++++++++++++++++++++++++++++++++++------ src/util/virsystemd.c | 8 +++- 2 files changed, 108 insertions(+), 15 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 2141154..47d9763 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -50,6 +50,7 @@ #include "virhash.h" #include "virhashcode.h" #include "virstring.h" +#include "virsystemd.h" #define CGROUP_MAX_VAL 512 @@ -100,6 +101,7 @@ bool virCgroupIsValidMachineGroup(virCgroupPtr group, size_t i; bool valid = false; char *partname; + char *scopename; if (virAsprintf(&partname, "%s.libvirt-%s", name, drivername) < 0) @@ -108,6 +110,13 @@ bool virCgroupIsValidMachineGroup(virCgroupPtr group, if (virCgroupPartitionEscape(&partname) < 0) goto cleanup; + if (virAsprintf(&scopename, "machine-%s\\x2d%s.scope", + drivername, name) < 0) + goto cleanup; + + if (virCgroupPartitionEscape(&scopename) < 0) + goto cleanup; + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { char *tmp; @@ -120,7 +129,8 @@ bool virCgroupIsValidMachineGroup(virCgroupPtr group, tmp++; if (STRNEQ(tmp, name) && - STRNEQ(tmp, partname)) + STRNEQ(tmp, partname) && + STRNEQ(tmp, scopename)) goto cleanup; } @@ -129,6 +139,7 @@ bool virCgroupIsValidMachineGroup(virCgroupPtr group, cleanup: VIR_FREE(partname); + VIR_FREE(scopename); return valid; } @@ -1573,22 +1584,63 @@ int virCgroupNewDetect(pid_t pid ATTRIBUTE_UNUSED, } #endif -int virCgroupNewMachine(const char *name, - const char *drivername, - bool privileged ATTRIBUTE_UNUSED, - const unsigned char *uuid ATTRIBUTE_UNUSED, - const char *rootdir ATTRIBUTE_UNUSED, - pid_t pidleader ATTRIBUTE_UNUSED, - bool isContainer ATTRIBUTE_UNUSED, - const char *partition, - int controllers, - virCgroupPtr *group) +/* + * Retujrns 0 on success, -1 on fatal error, -2 on systemd not available + */ +static int +virCgroupNewMachineSystemd(const char *name, + const char *drivername, + bool privileged, + const unsigned char *uuid, + const char *rootdir, + pid_t pidleader, + bool isContainer, + const char *partition, + virCgroupPtr *group) +{ + int rv; + + VIR_DEBUG("Trying to setup machine '%s' via systemd", name); + if ((rv = virSystemdCreateMachine(name, + drivername, + privileged, + uuid, + rootdir, + pidleader, + isContainer, + partition)) < 0) + return rv; + + VIR_DEBUG("Detecting systemd placement"); + if (virCgroupNewDetect(pidleader, + group) < 0) + return -1; + + if (!virCgroupIsValidMachineGroup(*group, + name, + drivername)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cgroup name is not valid for machine %s"), + name); + virCgroupFree(group); + return -1; + } + + return 0; +} + +static int +virCgroupNewMachineManual(const char *name, + const char *drivername, + pid_t pidleader, + const char *partition, + int controllers, + virCgroupPtr *group) { virCgroupPtr parent = NULL; int ret = -1; - *group = NULL; - + VIR_DEBUG("Fallback to non-systemd setup"); if (virCgroupNewPartition(partition, STREQ(partition, "/machine"), controllers, @@ -1624,6 +1676,43 @@ cleanup: return ret; } +int virCgroupNewMachine(const char *name, + const char *drivername, + bool privileged, + const unsigned char *uuid, + const char *rootdir, + pid_t pidleader, + bool isContainer, + const char *partition, + int controllers, + virCgroupPtr *group) +{ + int rv; + + *group = NULL; + + if ((rv = virCgroupNewMachineSystemd(name, + drivername, + privileged, + uuid, + rootdir, + pidleader, + isContainer, + partition, + group)) == 0) + return 0; + + if (rv == -1) + return -1; + + return virCgroupNewMachineManual(name, + drivername, + pidleader, + partition, + controllers, + group); +} + bool virCgroupNewIgnoreError(void) { if (virLastErrorIsSystemErrno(ENXIO) || diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 11d1153..dd44806 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -138,8 +138,12 @@ int virSystemdCreateMachine(const char *name, iscontainer ? "container" : "vm", (unsigned int)pidleader, rootdir ? rootdir : "", - 1, "Slice", "s", - slicename) < 0) { + 4, + "Slice", "s", slicename, + "CPUAccounting", "b", 1, + "BlockIOAccounting", "b", 1, + "MemoryAccounting", "b", 1 + ) < 0) { virErrorPtr err = virGetLastError(); if (err->code == VIR_ERR_DBUS_SERVICE && STREQ(err->str2, "org.freedesktop.DBus.Error.ServiceUnknown")) { -- 1.8.1.4

On 07/23/2013 09:21 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Make the virCgroupNewMachine method try to use systemd-machined first. If that fails, then fallback to using the traditional cgroup setup code path.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/util/vircgroup.c | 115 ++++++++++++++++++++++++++++++++++++++++++++------ src/util/virsystemd.c | 8 +++- 2 files changed, 108 insertions(+), 15 deletions(-)
@@ -108,6 +110,13 @@ bool virCgroupIsValidMachineGroup(virCgroupPtr group, if (virCgroupPartitionEscape(&partname) < 0) goto cleanup;
+ if (virAsprintf(&scopename, "machine-%s\\x2d%s.scope",
A partially-escaped name (the \\x2d is an escape, but the %s are raw)...
+ drivername, name) < 0) + goto cleanup; + + if (virCgroupPartitionEscape(&scopename) < 0)
...re-escaped. Is this going to do the right thing, or are you causing too many \ in the resulting string by over-escaping the separator? As you said in the cover letter, this is a preview while you are still hammering out issues, but I'm okay with the rest of the series going in before freeze. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/23/2013 11:21 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
This is a patch series which adds support for using systemd-machined for creating cgroups. The first 12 patches are all really just cleanups and refactoring. The actual systemd code is the last patch, but at time of posting it doesn't quite work properly. Given the freeze, I'd like to get the first 12 patches merged, while I iron out the remaining problems with the last patch.
Right now cgroup is created and set by libvirt itself, in future libvirt will create and setup the cgroup for domain through systemd? Thanks

On Thu, Jul 25, 2013 at 05:30:20PM +0800, Gao feng wrote:
On 07/23/2013 11:21 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
This is a patch series which adds support for using systemd-machined for creating cgroups. The first 12 patches are all really just cleanups and refactoring. The actual systemd code is the last patch, but at time of posting it doesn't quite work properly. Given the freeze, I'd like to get the first 12 patches merged, while I iron out the remaining problems with the last patch.
Right now cgroup is created and set by libvirt itself, in future libvirt will create and setup the cgroup for domain through systemd?
Yes, systemd is starting to take over all management for cgroups. Initially we'll defer to systemd for creation of the cgroups. Eventually though, all writes to the cgroups will have to go via systemd. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/25/2013 05:36 PM, Daniel P. Berrange wrote:
On Thu, Jul 25, 2013 at 05:30:20PM +0800, Gao feng wrote:
On 07/23/2013 11:21 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
This is a patch series which adds support for using systemd-machined for creating cgroups. The first 12 patches are all really just cleanups and refactoring. The actual systemd code is the last patch, but at time of posting it doesn't quite work properly. Given the freeze, I'd like to get the first 12 patches merged, while I iron out the remaining problems with the last patch.
Right now cgroup is created and set by libvirt itself, in future libvirt will create and setup the cgroup for domain through systemd?
Yes, systemd is starting to take over all management for cgroups. Initially we'll defer to systemd for creation of the cgroups. Eventually though, all writes to the cgroups will have to go via systemd.
Get it, hope we can remove some redundance codes. Thanks!

On Thu, Jul 25, 2013 at 05:48:54PM +0800, Gao feng wrote:
On 07/25/2013 05:36 PM, Daniel P. Berrange wrote:
On Thu, Jul 25, 2013 at 05:30:20PM +0800, Gao feng wrote:
On 07/23/2013 11:21 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
This is a patch series which adds support for using systemd-machined for creating cgroups. The first 12 patches are all really just cleanups and refactoring. The actual systemd code is the last patch, but at time of posting it doesn't quite work properly. Given the freeze, I'd like to get the first 12 patches merged, while I iron out the remaining problems with the last patch.
Right now cgroup is created and set by libvirt itself, in future libvirt will create and setup the cgroup for domain through systemd?
Yes, systemd is starting to take over all management for cgroups. Initially we'll defer to systemd for creation of the cgroups. Eventually though, all writes to the cgroups will have to go via systemd.
Get it, hope we can remove some redundance codes.
It is pretty unlikely that we'll be able to delete any of our existing cgroups code. We absolutely need to continue to support non-systemd based deployments of libvirt for Debian/Ubuntu/*BSD and older RHEL-6 etc Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Gao feng