[libvirt PATCH 00/15] RFC: basic CGroup support with qemu:///session

From: Marc-André Lureau <marcandre.lureau@redhat.com> Hi, This is a small series that allows basic QEMU VM CGroup support with the help of machined --user: https://github.com/systemd/systemd/pull/15312 The first few patches are fixes to register dbus and slirp-helper correctly with the VM cgroup. A few changes are done to the machined support, adding session support, and registering the VM to get a systemd scope cgroup under user machine.slice. Marc-André Lureau (15): slirp: leave the dbus daemon running on error slirp: add helper to VM cgroup qemu-dbus: prevent double start of the bus daemon qemu-dbus: remove unused variable qemu-dbus: register DBus bus to the VM cgroup qemu-cgroup: remove unnecessary include dbus: rename virDBusIs* -> virDBusSystemIs* dbus: add virDBusIsService{Enabled,Registered} with @conn argument systemd: check org.freedesktop.machine1 registration systemd: only check for activitable machine1 service systemd: register machines against session service cgroup: return directly if there is nothing to remove qemu-cgroup: register ext devices when cpu/cpuset controller are missing qemu: create cgroup regardless of controller support qemu-cgroup: drop the need for privileges to use cgroup src/qemu/qemu_cgroup.c | 13 ------- src/qemu/qemu_dbus.c | 29 +++++++++++++- src/qemu/qemu_dbus.h | 3 ++ src/qemu/qemu_extdevice.c | 12 ++++++ src/qemu/qemu_process.c | 6 +-- src/qemu/qemu_slirp.c | 16 +++++++- src/qemu/qemu_slirp.h | 3 ++ src/util/vircgroup.c | 5 +++ src/util/vircgroupv2devices.c | 7 ++-- src/util/virdbus.c | 71 ++++++++++++++++++++++++++--------- src/util/virdbus.h | 7 +++- src/util/virfirewalld.c | 2 +- src/util/virsystemd.c | 27 +++++-------- tests/virsystemdtest.c | 35 ----------------- 14 files changed, 140 insertions(+), 96 deletions(-) -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Don't stop the DBus daemon if a slirp helper failed to start, as it may be shared with other helpers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_slirp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c index 09c1247892..49bffa01b8 100644 --- a/src/qemu/qemu_slirp.c +++ b/src/qemu/qemu_slirp.c @@ -355,6 +355,6 @@ qemuSlirpStart(qemuSlirpPtr slirp, virProcessKillPainfully(pid, true); if (pidfile) unlink(pidfile); - qemuDBusStop(driver, vm); + /* leave dbus daemon running, it may be used by others */ return -1; } -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> The slirp helper process should be associated with the VM cgroup, like other helpers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_extdevice.c | 8 ++++++++ src/qemu/qemu_slirp.c | 14 ++++++++++++++ src/qemu/qemu_slirp.h | 3 +++ 3 files changed, 25 insertions(+) diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index 5a31b4d66e..dae3524307 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -284,6 +284,14 @@ qemuExtDevicesSetupCgroup(virQEMUDriverPtr driver, return -1; } + for (i = 0; i < def->nnets; i++) { + virDomainNetDefPtr net = def->nets[i]; + qemuSlirpPtr slirp = QEMU_DOMAIN_NETWORK_PRIVATE(net)->slirp; + + if (slirp && qemuSlirpSetupCgroup(slirp, cgroup) < 0) + return -1; + } + if (def->tpm && qemuExtTPMSetupCgroup(driver, def, cgroup) < 0) return -1; diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c index 49bffa01b8..71dbded607 100644 --- a/src/qemu/qemu_slirp.c +++ b/src/qemu/qemu_slirp.c @@ -242,6 +242,14 @@ qemuSlirpStop(qemuSlirpPtr slirp, } +int +qemuSlirpSetupCgroup(qemuSlirpPtr slirp, + virCgroupPtr cgroup) +{ + return virCgroupAddProcess(cgroup, slirp->pid); +} + + int qemuSlirpStart(qemuSlirpPtr slirp, virDomainObjPtr vm, @@ -249,6 +257,7 @@ qemuSlirpStart(qemuSlirpPtr slirp, virDomainNetDefPtr net, bool incoming) { + qemuDomainObjPrivatePtr priv = vm->privateData; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); g_autoptr(virCommand) cmd = NULL; g_autofree char *pidfile = NULL; @@ -348,6 +357,10 @@ qemuSlirpStart(qemuSlirpPtr slirp, } slirp->pid = pid; + + if (priv->cgroup && qemuSlirpSetupCgroup(slirp, priv->cgroup) < 0) + goto error; + return 0; error: @@ -355,6 +368,7 @@ qemuSlirpStart(qemuSlirpPtr slirp, virProcessKillPainfully(pid, true); if (pidfile) unlink(pidfile); + slirp->pid = 0; /* leave dbus daemon running, it may be used by others */ return -1; } diff --git a/src/qemu/qemu_slirp.h b/src/qemu/qemu_slirp.h index 5bf9596053..e1db908814 100644 --- a/src/qemu/qemu_slirp.h +++ b/src/qemu/qemu_slirp.h @@ -75,4 +75,7 @@ void qemuSlirpStop(qemuSlirpPtr slirp, int qemuSlirpGetFD(qemuSlirpPtr slirp); +int qemuSlirpSetupCgroup(qemuSlirpPtr slirp, + virCgroupPtr cgroup); + G_DEFINE_AUTOPTR_CLEANUP_FUNC(qemuSlirp, qemuSlirpFree); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Allow calling qemuDBusStart() multiple times (as may be done by qemu-slirp already). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_dbus.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/qemu/qemu_dbus.c b/src/qemu/qemu_dbus.c index f3e6f3ee37..ae55bbb299 100644 --- a/src/qemu/qemu_dbus.c +++ b/src/qemu/qemu_dbus.c @@ -177,6 +177,9 @@ qemuDBusStart(virQEMUDriverPtr driver, pid_t cpid = -1; int ret = -1; + if (priv->dbusDaemonRunning) + return 0; + if (!virFileIsExecutable(cfg->dbusDaemonName)) { virReportSystemError(errno, _("'%s' is not a suitable dbus-daemon"), -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_dbus.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/qemu/qemu_dbus.c b/src/qemu/qemu_dbus.c index ae55bbb299..02dc3ebf07 100644 --- a/src/qemu/qemu_dbus.c +++ b/src/qemu/qemu_dbus.c @@ -142,13 +142,11 @@ qemuDBusStop(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; g_autofree char *shortName = NULL; g_autofree char *pidfile = NULL; - g_autofree char *configfile = NULL; if (!(shortName = virDomainDefGetShortName(vm->def))) return; pidfile = qemuDBusCreatePidFilename(cfg, shortName); - configfile = qemuDBusCreateConfPath(cfg, shortName); if (virPidFileForceCleanupPath(pidfile) < 0) { VIR_WARN("Unable to kill dbus-daemon process"); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> External devices are started before cgroup is created. Add the DBus daemon to the VM cgroup with the rest of the external devices. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_dbus.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_dbus.h | 3 +++ src/qemu/qemu_extdevice.c | 4 ++++ 3 files changed, 31 insertions(+) diff --git a/src/qemu/qemu_dbus.c b/src/qemu/qemu_dbus.c index 02dc3ebf07..53f6c45986 100644 --- a/src/qemu/qemu_dbus.c +++ b/src/qemu/qemu_dbus.c @@ -156,6 +156,30 @@ qemuDBusStop(virQEMUDriverPtr driver, } +int +qemuDBusSetupCgroup(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + g_autofree char *shortName = NULL; + g_autofree char *pidfile = NULL; + pid_t cpid = -1; + + if (!priv->dbusDaemonRunning) + return 0; + + if (!(shortName = virDomainDefGetShortName(vm->def))) + return -1; + pidfile = qemuDBusCreatePidFilename(cfg, shortName); + if (virPidFileReadPath(pidfile, &cpid) < 0) { + VIR_WARN("Unable to get DBus PID"); + return -1; + } + + return virCgroupAddProcess(priv->cgroup, cpid); +} + int qemuDBusStart(virQEMUDriverPtr driver, virDomainObjPtr vm) diff --git a/src/qemu/qemu_dbus.h b/src/qemu/qemu_dbus.h index a96f19ac0d..474eb1058b 100644 --- a/src/qemu/qemu_dbus.h +++ b/src/qemu/qemu_dbus.h @@ -35,3 +35,6 @@ void qemuDBusStop(virQEMUDriverPtr driver, int qemuDBusVMStateAdd(virDomainObjPtr vm, const char *id); void qemuDBusVMStateRemove(virDomainObjPtr vm, const char *id); + +int qemuDBusSetupCgroup(virQEMUDriverPtr driver, + virDomainObjPtr vm); diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index dae3524307..2096272761 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -23,6 +23,7 @@ #include "qemu_command.h" #include "qemu_extdevice.h" #include "qemu_vhost_user_gpu.h" +#include "qemu_dbus.h" #include "qemu_domain.h" #include "qemu_tpm.h" #include "qemu_slirp.h" @@ -276,6 +277,9 @@ qemuExtDevicesSetupCgroup(virQEMUDriverPtr driver, virDomainDefPtr def = vm->def; size_t i; + if (qemuDBusSetupCgroup(driver, vm) < 0) + return -1; + for (i = 0; i < def->nvideos; i++) { virDomainVideoDefPtr video = def->videos[i]; -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> The file doesn't use virSystemd functions directly. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_cgroup.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 23e52fa218..cec0a03889 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -36,7 +36,6 @@ #include "virfile.h" #include "virtypedparam.h" #include "virnuma.h" -#include "virsystemd.h" #include "virdevmapper.h" #include "virutil.h" -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Those functions talk to system bus and have a special handling of system bus missing. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/virdbus.c | 12 ++++++------ src/util/virdbus.h | 4 ++-- src/util/virfirewalld.c | 2 +- src/util/virsystemd.c | 8 ++++---- 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/src/util/virdbus.c b/src/util/virdbus.c index 77691cd2b0..858291e2ba 100644 --- a/src/util/virdbus.c +++ b/src/util/virdbus.c @@ -1700,12 +1700,12 @@ static int virDBusIsServiceInList(const char *listMethod, const char *name) } /** - * virDBusIsServiceEnabled: + * virDBusSystemIsServiceEnabled: * @name: service name * * Returns 0 if service is available, -1 on fatal error, or -2 if service is not available */ -int virDBusIsServiceEnabled(const char *name) +int virDBusSystemIsServiceEnabled(const char *name) { int ret = virDBusIsServiceInList("ListActivatableNames", name); @@ -1715,12 +1715,12 @@ int virDBusIsServiceEnabled(const char *name) } /** - * virDBusIsServiceRegistered + * virDBusSystemIsServiceRegistered * @name: service name * * Returns 0 if service is registered, -1 on fatal error, or -2 if service is not registered */ -int virDBusIsServiceRegistered(const char *name) +int virDBusSystemIsServiceRegistered(const char *name) { int ret = virDBusIsServiceInList("ListNames", name); @@ -1843,13 +1843,13 @@ int virDBusMessageDecode(DBusMessage* msg G_GNUC_UNUSED, return -1; } -int virDBusIsServiceEnabled(const char *name G_GNUC_UNUSED) +int virDBusSystemIsServiceEnabled(const char *name G_GNUC_UNUSED) { VIR_DEBUG("DBus support not compiled into this binary"); return -2; } -int virDBusIsServiceRegistered(const char *name G_GNUC_UNUSED) +int virDBusSystemIsServiceRegistered(const char *name G_GNUC_UNUSED) { VIR_DEBUG("DBus support not compiled into this binary"); return -2; diff --git a/src/util/virdbus.h b/src/util/virdbus.h index 083c074d59..b3b0260a4e 100644 --- a/src/util/virdbus.h +++ b/src/util/virdbus.h @@ -70,7 +70,7 @@ int virDBusMessageDecode(DBusMessage *msg, const char *types, ...); void virDBusMessageUnref(DBusMessage *msg); -int virDBusIsServiceEnabled(const char *name); -int virDBusIsServiceRegistered(const char *name); +int virDBusSystemIsServiceEnabled(const char *name); +int virDBusSystemIsServiceRegistered(const char *name); bool virDBusErrorIsUnknownMethod(virErrorPtr err); diff --git a/src/util/virfirewalld.c b/src/util/virfirewalld.c index c14a6b6e65..499f925b7e 100644 --- a/src/util/virfirewalld.c +++ b/src/util/virfirewalld.c @@ -66,7 +66,7 @@ VIR_ENUM_IMPL(virFirewallDBackend, int virFirewallDIsRegistered(void) { - return virDBusIsServiceRegistered(VIR_FIREWALL_FIREWALLD_SERVICE); + return virDBusSystemIsServiceRegistered(VIR_FIREWALL_FIREWALLD_SERVICE); } /** diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 1d41ed34f7..9a659dad84 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -157,13 +157,13 @@ virSystemdHasMachined(void) if (val != -1) return val; - if ((ret = virDBusIsServiceEnabled("org.freedesktop.machine1")) < 0) { + if ((ret = virDBusSystemIsServiceEnabled("org.freedesktop.machine1")) < 0) { if (ret == -2) g_atomic_int_set(&virSystemdHasMachinedCachedValue, -2); return ret; } - if ((ret = virDBusIsServiceRegistered("org.freedesktop.systemd1")) == -1) + if ((ret = virDBusSystemIsServiceRegistered("org.freedesktop.systemd1")) == -1) return ret; g_atomic_int_set(&virSystemdHasMachinedCachedValue, ret); return ret; @@ -179,14 +179,14 @@ virSystemdHasLogind(void) if (val != -1) return val; - ret = virDBusIsServiceEnabled("org.freedesktop.login1"); + ret = virDBusSystemIsServiceEnabled("org.freedesktop.login1"); if (ret < 0) { if (ret == -2) g_atomic_int_set(&virSystemdHasLogindCachedValue, -2); return ret; } - if ((ret = virDBusIsServiceRegistered("org.freedesktop.login1")) == -1) + if ((ret = virDBusSystemIsServiceRegistered("org.freedesktop.login1")) == -1) return ret; g_atomic_int_set(&virSystemdHasLogindCachedValue, ret); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Learn to check presence of services on a given connection. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/virdbus.c | 65 +++++++++++++++++++++++++++++++++++----------- src/util/virdbus.h | 3 +++ 2 files changed, 53 insertions(+), 15 deletions(-) diff --git a/src/util/virdbus.c b/src/util/virdbus.c index 858291e2ba..03ed8fd87a 100644 --- a/src/util/virdbus.c +++ b/src/util/virdbus.c @@ -1650,19 +1650,14 @@ int virDBusCallMethod(DBusConnection *conn, } -static int virDBusIsServiceInList(const char *listMethod, const char *name) +static int virDBusIsServiceInList(DBusConnection *conn, + const char *listMethod, + const char *name) { - DBusConnection *conn; DBusMessage *reply = NULL; DBusMessageIter iter, sub; int ret = -1; - if (!virDBusHasSystemBus()) - return -2; - - if (!(conn = virDBusGetSystemBus())) - return -1; - if (virDBusCallMethod(conn, &reply, NULL, @@ -1699,6 +1694,38 @@ static int virDBusIsServiceInList(const char *listMethod, const char *name) return ret; } +/** + * virDBusIsServiceEnabled: + * @conn: a DBus connection + * @name: a service name + * + * Returns 0 if service is available, -1 on fatal error, or -2 if service is not available + */ +int virDBusIsServiceEnabled(DBusConnection *conn, const char *name) +{ + int ret = virDBusIsServiceInList(conn, "ListActivatableNames", name); + + VIR_DEBUG("Service %s is %s", name, ret ? "unavailable" : "available"); + + return ret; +} + +/** + * virDBusIsServiceRegistered: + * @conn: a DBus connection + * @name: a service name + * + * Returns 0 if service is registered, -1 on fatal error, or -2 if service is not available + */ +int virDBusIsServiceRegistered(DBusConnection *conn, const char *name) +{ + int ret = virDBusIsServiceInList(conn, "ListNames", name); + + VIR_DEBUG("Service %s is %s", name, ret ? "not registered" : "registered"); + + return ret; +} + /** * virDBusSystemIsServiceEnabled: * @name: service name @@ -1707,26 +1734,34 @@ static int virDBusIsServiceInList(const char *listMethod, const char *name) */ int virDBusSystemIsServiceEnabled(const char *name) { - int ret = virDBusIsServiceInList("ListActivatableNames", name); + DBusConnection *conn; - VIR_DEBUG("Service %s is %s", name, ret ? "unavailable" : "available"); + if (!virDBusHasSystemBus()) + return -2; - return ret; + if (!(conn = virDBusGetSystemBus())) + return -1; + + return virDBusIsServiceEnabled(conn, name); } /** - * virDBusSystemIsServiceRegistered + * virDBusSystemIsServiceRegistered: * @name: service name * * Returns 0 if service is registered, -1 on fatal error, or -2 if service is not registered */ int virDBusSystemIsServiceRegistered(const char *name) { - int ret = virDBusIsServiceInList("ListNames", name); + DBusConnection *conn; - VIR_DEBUG("Service %s is %s", name, ret ? "not registered" : "registered"); + if (!virDBusHasSystemBus()) + return -2; - return ret; + if (!(conn = virDBusGetSystemBus())) + return -1; + + return virDBusIsServiceRegistered(conn, name); } void virDBusMessageUnref(DBusMessage *msg) diff --git a/src/util/virdbus.h b/src/util/virdbus.h index b3b0260a4e..2b2d273b1b 100644 --- a/src/util/virdbus.h +++ b/src/util/virdbus.h @@ -70,6 +70,9 @@ int virDBusMessageDecode(DBusMessage *msg, const char *types, ...); void virDBusMessageUnref(DBusMessage *msg); +int virDBusIsServiceEnabled(DBusConnection *conn, const char *name); +int virDBusIsServiceRegistered(DBusConnection *conn, const char *name); + int virDBusSystemIsServiceEnabled(const char *name); int virDBusSystemIsServiceRegistered(const char *name); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> Since commit f10bd740e178c89f24d0b0298d0b5413537d0699 ("Cache the presence of machine1 service"), the code checks for systemd1 registration. Not totally unreasonable, but that seems odd since we actually check machined presence in this function. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/virsystemd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 9a659dad84..74d0d15ca7 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -163,7 +163,7 @@ virSystemdHasMachined(void) return ret; } - if ((ret = virDBusSystemIsServiceRegistered("org.freedesktop.systemd1")) == -1) + if ((ret = virDBusSystemIsServiceRegistered("org.freedesktop.machine1")) == -1) return ret; g_atomic_int_set(&virSystemdHasMachinedCachedValue, ret); return ret; -- 2.26.0.rc2.42.g98cedd0233

On a Monday in 2020, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Since commit f10bd740e178c89f24d0b0298d0b5413537d0699 ("Cache the presence of machine1 service"), the code checks for systemd1 registration. Not totally unreasonable, but that seems odd since we actually check machined presence in this function.
That is intentional. We only count the systemd-based services as really activatable if systemd1 is already registered. On some Frankenstein'd Gentoo systems with systemd installed but not started, the services were showing up as activatable but failed with obscure errors. Some history: commit 12ee0b98d3ddcb2c71cdb2352512153791ebcb7e Check if systemd is running before creating machines https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=12ee0b98 https://bugs.gentoo.org/show_bug.cgi?id=493246#c22 Jano

On Tue, Apr 07, 2020 at 11:50:31AM +0200, Ján Tomko wrote:
On a Monday in 2020, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Since commit f10bd740e178c89f24d0b0298d0b5413537d0699 ("Cache the presence of machine1 service"), the code checks for systemd1 registration. Not totally unreasonable, but that seems odd since we actually check machined presence in this function.
That is intentional.
We only count the systemd-based services as really activatable if systemd1 is already registered.
On some Frankenstein'd Gentoo systems with systemd installed but not started, the services were showing up as activatable but failed with obscure errors.
We could benefit from a comment to this effect being added to the code, as one of my own local changes makes much the same fix as Marc-Andre's Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi On Tue, Apr 7, 2020 at 11:50 AM Ján Tomko <jtomko@redhat.com> wrote:
On a Monday in 2020, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Since commit f10bd740e178c89f24d0b0298d0b5413537d0699 ("Cache the presence of machine1 service"), the code checks for systemd1 registration. Not totally unreasonable, but that seems odd since we actually check machined presence in this function.
That is intentional.
We only count the systemd-based services as really activatable if systemd1 is already registered.
On some Frankenstein'd Gentoo systems with systemd installed but not started, the services were showing up as activatable but failed with obscure errors.
Some history: commit 12ee0b98d3ddcb2c71cdb2352512153791ebcb7e Check if systemd is running before creating machines https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=12ee0b98 https://bugs.gentoo.org/show_bug.cgi?id=493246#c22
Ok, I imagined something like that. However, making systemd a hard-dependency for machine1 is not a good solution. In theory, machine1 could have a different implementation that doesn't rely on systemd.

On Tue, Apr 07, 2020 at 12:29:29PM +0200, Marc-André Lureau wrote:
Hi
On Tue, Apr 7, 2020 at 11:50 AM Ján Tomko <jtomko@redhat.com> wrote:
On a Monday in 2020, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Since commit f10bd740e178c89f24d0b0298d0b5413537d0699 ("Cache the presence of machine1 service"), the code checks for systemd1 registration. Not totally unreasonable, but that seems odd since we actually check machined presence in this function.
That is intentional.
We only count the systemd-based services as really activatable if systemd1 is already registered.
On some Frankenstein'd Gentoo systems with systemd installed but not started, the services were showing up as activatable but failed with obscure errors.
Some history: commit 12ee0b98d3ddcb2c71cdb2352512153791ebcb7e Check if systemd is running before creating machines https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=12ee0b98 https://bugs.gentoo.org/show_bug.cgi?id=493246#c22
Ok, I imagined something like that. However, making systemd a hard-dependency for machine1 is not a good solution. In theory, machine1 could have a different implementation that doesn't rely on systemd.
I don't think we need to worry about that possibilty, since AFAIK no such alternate impl exists. Such a replacement would be a major piece of work needing to implement a lot of system logic around the cgroups/service setup, so I'd be sceptical that libvirt would "just work" regardless. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

From: Marc-André Lureau <marcandre.lureau@redhat.com> The service is started on demand, checking if it's already registered on the bus defeat this purpose. Note: the same is probably true for login1, but I don't intent to look at the code path yet. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/virsystemd.c | 2 -- tests/virsystemdtest.c | 35 ----------------------------------- 2 files changed, 37 deletions(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 74d0d15ca7..14d680532f 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -163,8 +163,6 @@ virSystemdHasMachined(void) return ret; } - if ((ret = virDBusSystemIsServiceRegistered("org.freedesktop.machine1")) == -1) - return ret; g_atomic_int_set(&virSystemdHasMachinedCachedValue, ret); return ret; } diff --git a/tests/virsystemdtest.c b/tests/virsystemdtest.c index 050941dce8..8518805f82 100644 --- a/tests/virsystemdtest.c +++ b/tests/virsystemdtest.c @@ -260,40 +260,6 @@ static int testCreateNoSystemd(const void *opaque G_GNUC_UNUSED) return 0; } -static int testCreateSystemdNotRunning(const void *opaque G_GNUC_UNUSED) -{ - unsigned char uuid[VIR_UUID_BUFLEN] = { - 1, 1, 1, 1, - 2, 2, 2, 2, - 3, 3, 3, 3, - 4, 4, 4, 4 - }; - int rv; - - g_setenv("FAIL_NOT_REGISTERED", "1", TRUE); - - if ((rv = virSystemdCreateMachine("demo", - "qemu", - uuid, - NULL, - 123, - false, - 0, NULL, - NULL, 0)) == 0) { - g_unsetenv("FAIL_NOT_REGISTERED"); - fprintf(stderr, "%s", "Unexpected create machine success\n"); - return -1; - } - g_unsetenv("FAIL_NOT_REGISTERED"); - - if (rv != -2) { - fprintf(stderr, "%s", "Unexpected create machine error\n"); - return -1; - } - - return 0; -} - static int testCreateBadSystemd(const void *opaque G_GNUC_UNUSED) { unsigned char uuid[VIR_UUID_BUFLEN] = { @@ -698,7 +664,6 @@ mymain(void) DO_TEST("Test create machine ", testCreateMachine); DO_TEST("Test terminate machine ", testTerminateMachine); DO_TEST("Test create no systemd ", testCreateNoSystemd); - DO_TEST("Test create systemd not running ", testCreateSystemdNotRunning); DO_TEST("Test create bad systemd ", testCreateBadSystemd); DO_TEST("Test create with network ", testCreateNetwork); DO_TEST("Test getting machine name ", testGetMachineName); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> machined could quite easily support session management: https://github.com/systemd/systemd/pull/15312 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/virsystemd.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 14d680532f..31f5498438 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -148,16 +148,18 @@ void virSystemdHasLogindResetCachedValue(void) * 0 = machine1 is available */ static int -virSystemdHasMachined(void) +virSystemdHasMachined(DBusConnection **conn) { int ret; int val; + *conn = geteuid() ? virDBusGetSessionBus() : virDBusGetSystemBus(); + val = g_atomic_int_get(&virSystemdHasMachinedCachedValue); if (val != -1) return val; - if ((ret = virDBusSystemIsServiceEnabled("org.freedesktop.machine1")) < 0) { + if ((ret = virDBusIsServiceEnabled(*conn, "org.freedesktop.machine1")) < 0) { if (ret == -2) g_atomic_int_set(&virSystemdHasMachinedCachedValue, -2); return ret; @@ -199,10 +201,7 @@ virSystemdGetMachineNameByPID(pid_t pid) DBusMessage *reply = NULL; char *name = NULL, *object = NULL; - if (virSystemdHasMachined() < 0) - goto cleanup; - - if (!(conn = virDBusGetSystemBus())) + if (virSystemdHasMachined(&conn) < 0) goto cleanup; if (virDBusCallMethod(conn, &reply, NULL, @@ -279,12 +278,9 @@ int virSystemdCreateMachine(const char *name, char *scopename = NULL; static int hasCreateWithNetwork = 1; - if ((ret = virSystemdHasMachined()) < 0) + if ((ret = virSystemdHasMachined(&conn)) < 0) return ret; - if (!(conn = virDBusGetSystemBus())) - return -1; - ret = -1; creatorname = g_strdup_printf("libvirt-%s", drivername); @@ -459,14 +455,11 @@ int virSystemdTerminateMachine(const char *name) memset(&error, 0, sizeof(error)); - if ((ret = virSystemdHasMachined()) < 0) + if ((ret = virSystemdHasMachined(&conn)) < 0) goto cleanup; ret = -1; - if (!(conn = virDBusGetSystemBus())) - goto cleanup; - /* * The systemd DBus API we're invoking has the * following signature -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> This fixes an error path when virCgroupV2DevicesDetectProg() fails but wasn't needed in the first place on VM shutdown. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/util/vircgroupv2devices.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/util/vircgroupv2devices.c b/src/util/vircgroupv2devices.c index d62ee12a05..997ef82964 100644 --- a/src/util/vircgroupv2devices.c +++ b/src/util/vircgroupv2devices.c @@ -541,12 +541,13 @@ virCgroupV2DevicesPrepareProg(virCgroupPtr group) int virCgroupV2DevicesRemoveProg(virCgroupPtr group) { - if (virCgroupV2DevicesDetectProg(group) < 0) - return -1; - if (group->unified.devices.progfd <= 0 && group->unified.devices.mapfd <= 0) return 0; + if (virCgroupV2DevicesDetectProg(group) < 0) { + return -1; + } + if (group->unified.devices.mapfd >= 0) VIR_FORCE_CLOSE(group->unified.devices.mapfd); -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> External devices registration can add processes to the process group, which allows basic process management or other tuning. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_cgroup.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index cec0a03889..c288519e62 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -1127,15 +1127,6 @@ qemuSetupCgroupForExtDevices(virDomainObjPtr vm, priv->cgroup == NULL) return 0; /* Not supported, so claim success */ - /* - * If CPU cgroup controller is not initialized here, then we need - * neither period nor quota settings. And if CPUSET controller is - * not initialized either, then there's nothing to do anyway. - */ - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU) && - !virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) - return 0; - if (virCgroupNewThread(priv->cgroup, VIR_CGROUP_THREAD_EMULATOR, 0, false, &cgroup_temp) < 0) goto cleanup; -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> This allow for basic process management, at least. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6b9f6fb860..0c33eab26c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2674,6 +2674,9 @@ qemuProcessSetupPid(virDomainObjPtr vm, afinity_cpumask = hostcpumap; } + if (priv->cgroup && virCgroupNewThread(priv->cgroup, nameval, id, true, &cgroup) < 0) + goto cleanup; + /* * If CPU cgroup controller is not initialized here, then we need * neither period nor quota settings. And if CPUSET controller is @@ -2689,9 +2692,6 @@ qemuProcessSetupPid(virDomainObjPtr vm, &mem_mask, -1) < 0) goto cleanup; - if (virCgroupNewThread(priv->cgroup, nameval, id, true, &cgroup) < 0) - goto cleanup; - if (virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { if (use_cpumask && qemuSetupCgroupCpusetCpus(cgroup, use_cpumask) < 0) -- 2.26.0.rc2.42.g98cedd0233

From: Marc-André Lureau <marcandre.lureau@redhat.com> CGroup delegation can allow various processes or users to use cgroup. Further checks should be done by the various backends. With this series, a qemu:///session VM can have basic CGroupv2 support with machined --user help. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_cgroup.c | 3 --- src/util/vircgroup.c | 5 +++++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index c288519e62..0f80dd4214 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -914,9 +914,6 @@ qemuInitCgroup(virDomainObjPtr vm, qemuDomainObjPrivatePtr priv = vm->privateData; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver); - if (!virQEMUDriverIsPrivileged(priv->driver)) - return 0; - if (!virCgroupAvailable()) return 0; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 70d85200cb..4e71677994 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1254,6 +1254,11 @@ virCgroupNewMachine(const char *name, if (rv == -1) return -1; + if (geteuid() != 0) { + errno = EPERM; + return 0; + } + return virCgroupNewMachineManual(name, drivername, pidleader, -- 2.26.0.rc2.42.g98cedd0233

On Mon, Apr 06, 2020 at 11:26:57PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is a small series that allows basic QEMU VM CGroup support with the help of machined --user: https://github.com/systemd/systemd/pull/15312
The first few patches are fixes to register dbus and slirp-helper correctly with the VM cgroup.
A few changes are done to the machined support, adding session support, and registering the VM to get a systemd scope cgroup under user machine.slice.
Hi, Before we start with anything I would like to know what is the motivation behind having CGroup support for session VMs? From the systemd pull request it looks like you would like to have session VMs under the /sys/fs/cgroup/machine.slice which is completely wrong as we should not mix system and session VMs under the same slice. In addition it would not work because because you would use session D-Bus which would start machined under user running session VM and that user will not have permissions to do anything with the system machine.slice. If a regular user wants to do anything with cgroups delegation has to be used and obviously we cannot delegate the system machine.slice, it would have to live in a different location and since the QEMU process is running under the specific user it would have to live within /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/ where by default only memory and pids controllers are available. Delegation would have to be set in order to get other controllers as well and all of this would work only if cgroups v2 are used. Pavel

Hi On Tue, Apr 7, 2020 at 10:55 AM Pavel Hrdina <phrdina@redhat.com> wrote:
On Mon, Apr 06, 2020 at 11:26:57PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is a small series that allows basic QEMU VM CGroup support with the help of machined --user: https://github.com/systemd/systemd/pull/15312
The first few patches are fixes to register dbus and slirp-helper correctly with the VM cgroup.
A few changes are done to the machined support, adding session support, and registering the VM to get a systemd scope cgroup under user machine.slice.
Hi,
Before we start with anything I would like to know what is the motivation behind having CGroup support for session VMs?
My initial motivation was to have a way to group VM processes and kill them altogether, because I tend to have a lot of them around after a while. Given that systemd --user is very capable and based on https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/, I thought that was probably the way to go.
From the systemd pull request it looks like you would like to have session VMs under the /sys/fs/cgroup/machine.slice which is completely wrong as we should not mix system and session VMs under the same slice.
No, it is under user.slice, ex with this series: CGroup: /user.slice/user-1000.slice/user@1000.service ├─machine.slice │ └─machine-qemu\x2delmarco\x2d1\x2dfedora.scope │ ├─24714 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/run/user/1000/libvirt/qemu/run/swtpm/1-fedora-swtpm.sock,mode=0600 --tpmstate dir=/home/elmarco/.config/libvirt/qemu/swtpm/053f84e7> │ ├─24716 /usr/bin/dbus-daemon --config-file=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.conf │ ├─24719 /home/elmarco/src/libslirp-rs/target/debug/libslirp-helper --fd=27 --dbus-id=slirp-52:54:00:9c:bb:6c --dbus-address=unix:path=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.sock --exi> │ ├─24722 /usr/bin/qemu-system-x86_64 -name guest=fedora,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/elmarco/.config/libvirt/qemu/lib/domain-1-fedora/master-key.aes -obje> │ └─emulator
In addition it would not work because because you would use session D-Bus which would start machined under user running session VM and that user will not have permissions to do anything with the system machine.slice. If a regular user wants to do anything with cgroups delegation has to be used and obviously we cannot delegate the system machine.slice, it would have to live in a different location and since the QEMU process is running under the specific user it would have to live within /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/ where by default only memory and pids controllers are available. Delegation would have to be set in order to get other controllers as well and all of this would work only if cgroups v2 are used.
I thought delegation was required too, but I can't see any "Delegate=" in my user machine cgroup tree. (using systemctl --user show - note that /machine.slice doesn't have Delegate set either) But you can see that basic process management works fine with the systemd series proposed. Yes, this is certainly cgroups v2 only. thanks

On Tue, Apr 07, 2020 at 12:48:26PM +0200, Marc-André Lureau wrote:
Hi
On Tue, Apr 7, 2020 at 10:55 AM Pavel Hrdina <phrdina@redhat.com> wrote:
On Mon, Apr 06, 2020 at 11:26:57PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is a small series that allows basic QEMU VM CGroup support with the help of machined --user: https://github.com/systemd/systemd/pull/15312
The first few patches are fixes to register dbus and slirp-helper correctly with the VM cgroup.
A few changes are done to the machined support, adding session support, and registering the VM to get a systemd scope cgroup under user machine.slice.
Hi,
Before we start with anything I would like to know what is the motivation behind having CGroup support for session VMs?
My initial motivation was to have a way to group VM processes and kill them altogether, because I tend to have a lot of them around after a while.
Given that systemd --user is very capable and based on https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/, I thought that was probably the way to go.
Yes, it makes sense. It also should help us close a feature gap by enabling many of our resource tuning APIs to now work in session mode, since we can now use cgroups tuning.
From the systemd pull request it looks like you would like to have session VMs under the /sys/fs/cgroup/machine.slice which is completely wrong as we should not mix system and session VMs under the same slice.
No, it is under user.slice, ex with this series:
CGroup: /user.slice/user-1000.slice/user@1000.service ├─machine.slice │ └─machine-qemu\x2delmarco\x2d1\x2dfedora.scope │ ├─24714 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/run/user/1000/libvirt/qemu/run/swtpm/1-fedora-swtpm.sock,mode=0600 --tpmstate dir=/home/elmarco/.config/libvirt/qemu/swtpm/053f84e7> │ ├─24716 /usr/bin/dbus-daemon --config-file=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.conf │ ├─24719 /home/elmarco/src/libslirp-rs/target/debug/libslirp-helper --fd=27 --dbus-id=slirp-52:54:00:9c:bb:6c --dbus-address=unix:path=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.sock --exi> │ ├─24722 /usr/bin/qemu-system-x86_64 -name guest=fedora,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/elmarco/.config/libvirt/qemu/lib/domain-1-fedora/master-key.aes -obje> │ └─emulator
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Apr 07, 2020 at 12:48:26PM +0200, Marc-André Lureau wrote:
Hi
On Tue, Apr 7, 2020 at 10:55 AM Pavel Hrdina <phrdina@redhat.com> wrote:
On Mon, Apr 06, 2020 at 11:26:57PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is a small series that allows basic QEMU VM CGroup support with the help of machined --user: https://github.com/systemd/systemd/pull/15312
The first few patches are fixes to register dbus and slirp-helper correctly with the VM cgroup.
A few changes are done to the machined support, adding session support, and registering the VM to get a systemd scope cgroup under user machine.slice.
Hi,
Before we start with anything I would like to know what is the motivation behind having CGroup support for session VMs?
My initial motivation was to have a way to group VM processes and kill them altogether, because I tend to have a lot of them around after a while.
Given that systemd --user is very capable and based on https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/, I thought that was probably the way to go.
From the systemd pull request it looks like you would like to have session VMs under the /sys/fs/cgroup/machine.slice which is completely wrong as we should not mix system and session VMs under the same slice.
No, it is under user.slice, ex with this series:
CGroup: /user.slice/user-1000.slice/user@1000.service ├─machine.slice │ └─machine-qemu\x2delmarco\x2d1\x2dfedora.scope │ ├─24714 /usr/bin/swtpm socket --daemon --ctrl type=unixio,path=/run/user/1000/libvirt/qemu/run/swtpm/1-fedora-swtpm.sock,mode=0600 --tpmstate dir=/home/elmarco/.config/libvirt/qemu/swtpm/053f84e7> │ ├─24716 /usr/bin/dbus-daemon --config-file=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.conf │ ├─24719 /home/elmarco/src/libslirp-rs/target/debug/libslirp-helper --fd=27 --dbus-id=slirp-52:54:00:9c:bb:6c --dbus-address=unix:path=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.sock --exi> │ ├─24722 /usr/bin/qemu-system-x86_64 -name guest=fedora,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/elmarco/.config/libvirt/qemu/lib/domain-1-fedora/master-key.aes -obje> │ └─emulator
OK, that sounds good, I did no realize it works like that if the machine.slice file is placed under the user directory.
In addition it would not work because because you would use session D-Bus which would start machined under user running session VM and that user will not have permissions to do anything with the system machine.slice. If a regular user wants to do anything with cgroups delegation has to be used and obviously we cannot delegate the system machine.slice, it would have to live in a different location and since the QEMU process is running under the specific user it would have to live within /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/ where by default only memory and pids controllers are available. Delegation would have to be set in order to get other controllers as well and all of this would work only if cgroups v2 are used.
I thought delegation was required too, but I can't see any "Delegate=" in my user machine cgroup tree. (using systemctl --user show - note that /machine.slice doesn't have Delegate set either)
But you can see that basic process management works fine with the systemd series proposed.
systemd does partial delegation for the user cgroup so the user is able to use memory and pids controllers by default. If user needs to use other controllers as well the administrator has to set Delegate=yes using for example 'systemctl edit user@1000.service'. We would have to document this as a prerequisite to be able to use other controllers as well.
Yes, this is certainly cgroups v2 only.
OK, I wanted to be sure that this targets cgroups v2 only. In general the idea sounds good and it would allow users to restrict VMs to not consume all the resources assigned to themselves and also to get some VM statistics that are gathered from cgroups. Most of the patches in this series looks like unrelated fixes of our current code so I would suggest posting them in separately from the session cgroup support. Pavel
participants (5)
-
Daniel P. Berrangé
-
Ján Tomko
-
Marc-André Lureau
-
marcandre.lureau@redhat.com
-
Pavel Hrdina