From: "Daniel P. Berrange" <berrange(a)redhat.com>
As of libvirt 1.1.1 and systemd 205, the cgroups layout used by
libvirt has some changes. Update the 'cgroups.html' file from
the website to describe how it works in a systemd world.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
docs/cgroups.html.in | 212 +++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 172 insertions(+), 40 deletions(-)
diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in
index 77656b2..46cfb7b 100644
--- a/docs/cgroups.html.in
+++ b/docs/cgroups.html.in
@@ -47,17 +47,121 @@
<p>
As of libvirt 1.0.5 or later, the cgroups layout created by libvirt has been
simplified, in order to facilitate the setup of resource control policies by
- administrators / management applications. The layout is based on the concepts of
- "partitions" and "consumers". Each virtual machine or container
is a consumer,
- and has a corresponding cgroup named
<code>$VMNAME.libvirt-{qemu,lxc}</code>.
- Each consumer is associated with exactly one partition, which also have a
- corresponding cgroup usually named <code>$PARTNAME.partition</code>.
The
- exceptions to this naming rule are the three top level default partitions,
- named <code>/system</code> (for system services),
<code>/user</code> (for
- user login sessions) and <code>/machine</code> (for virtual machines
and
- containers). By default every consumer will of course be associated with
- the <code>/machine</code> partition. This leads to a hierarchy that
looks
- like
+ administrators / management applications. The new layout is based on the concepts
+ of "partitions" and "consumers". A "consumer" is a
cgroup which holds the
+ processes for a single virtual machine or container. A "partition" is a
cgroup
+ which does not contain any processes, but can have resource controls applied.
+ A "partition" will have zero or more child directories which may be
either
+ "consumer" or "partition".
+ </p>
+
+ <p>
+ As of libvirt 1.1.1 or later, the cgroups layout will have some slight
+ differences when running on a host with systemd 205 or later. The overall
+ tree structure is the same, but there are some differences in the naming
+ conventions for the cgroup directories. Thus the following docs split
+ in two, one describing systemd hosts and the other non-systemd hosts.
+ </p>
+
+ <h3><a name="currentLayoutSystemd">Systemd cgroups
integration</a></h3>
+
+ <p>
+ On hosts which use systemd, each consumer maps to a systemd scope unit,
+ while partitions map to a system slice unit.
+ </p>
+
+ <h4><a name="systemdScope">Systemd scope
naming</a></h4>
+
+ <p>
+ The systemd convention is for the scope name of virtual machines / containers
+ to be of the general format <code>machine-$NAME.scope</code>. Libvirt
forms the
+ <code>$NAME</code> part of this by concatenating the driver type with
the name
+ of the guest, and then escaping any systemd reserved characters.
+ So for a guest <code>demo</code> running under the
<code>lxc</code> driver,
+ we get a <code>$NAME</code> of <code>lxc-demo</code> which
when escaped is
+ <code>lxc\x2ddemo</code>. So the complete scope name is
<code>machine-lxc\x2ddemo.scope</code>.
+ The scope names map directly to the cgroup directory names.
+ </p>
+
+ <h4><a name="systemdSlice">Systemd slice
naming</a></h4>
+
+ <p>
+ The systemd convention for slice naming is that a slice should include the
+ name of all of its parents prepended on its own name. So for a libvirt
+ partition <code>/machine/engineering/testing</code>, the slice name
will
+ be <code>machine-engineering-testing.slice</code>. Again the slice
names
+ map directly to the cgroup directory names. Systemd creates three top level
+ slices by default, <code>system.slice</code>
<code>user.slice</code> and
+ <code>machine.slice</code>. All virtual machines or containers created
+ by libvirt will be associated with <code>machine.slice</code> by
default.
+ </p>
+
+ <h4><a name="systemdLayout">Systemd cgroup
layout</a></h4>
+
+ <p>
+ Given this, an possible systemd cgroups layout involing 3 qemu guests,
+ 3 lxc containers and 3 custom child slices, would be:
+ </p>
+
+ <pre>
+$ROOT
+ |
+ +- system.slice
+ | |
+ | +- libvirtd.service
+ |
+ +- machine.slice
+ |
+ +- machine-qemu\x2dvm1.scope
+ | |
+ | +- emulator
+ | +- vcpu0
+ | +- vcpu1
+ |
+ +- machine-qemu\x2dvm2.scope
+ | |
+ | +- emulator
+ | +- vcpu0
+ | +- vcpu1
+ |
+ +- machine-qemu\x2dvm3.scope
+ | |
+ | +- emulator
+ | +- vcpu0
+ | +- vcpu1
+ |
+ +- machine-engineering.slice
+ | |
+ | +- machine-engineering-testing.slice
+ | | |
+ | | +- machine-lxc\x2dcontainer1.scope
+ | |
+ | +- machine-engineering-production.slice
+ | |
+ | +- machine-lxc\x2dcontainer2.scope
+ |
+ +- machine-marketing.slice
+ |
+ +- machine-lxc\x2dcontainer3.scope
+ </pre>
+
+ <h3><a name="currentLayoutGeneric">Non-systemd cgroups
layout</a></h3>
+
+ <p>
+ On hosts which do not use systemd, each consumer has a corresponding cgroup
+ named <code>$VMNAME.libvirt-{qemu,lxc}</code>. Each consumer is
associated
+ with exactly one partition, which also have a corresponding cgroup usually
+ named <code>$PARTNAME.partition</code>. The exceptions to this naming
rule
+ are the three top level default partitions, named <code>/system</code>
(for
+ system services), <code>/user</code> (for user login sessions) and
+ <code>/machine</code> (for virtual machines and containers). By
default
+ every consumer will of course be associated with the
<code>/machine</code>
+ partition. This leads to a hierarchy that looks like:
+ </p>
+
+ <p>
+ Given this, an possible systemd cgroups layout involing 3 qemu guests,
+ 3 lxc containers and 2 custom child slices, would be:
</p>
<pre>
@@ -87,23 +191,21 @@ $ROOT
| +- vcpu0
| +- vcpu1
|
- +- container1.libvirt-lxc
- |
- +- container2.libvirt-lxc
+ +- engineering.partition
+ | |
+ | +- testing.partition
+ | | |
+ | | +- container1.libvirt-lxc
+ | |
+ | +- production.partition
+ | |
+ | +- container2.libvirt-lxc
|
- +- container3.libvirt-lxc
+ +- marketing.partition
+ |
+ +- container3.libvirt-lxc
</pre>
- <p>
- The default cgroups layout ensures that, when there is contention for
- CPU time, it is shared equally between system services, user sessions
- and virtual machines / containers. This prevents virtual machines from
- locking the administrator out of the host, or impacting execution of
- system services. Conversely, when there is no contention from
- system services / user sessions, it is possible for virtual machines
- to fully utilize the host CPUs.
- </p>
-
<h2><a name="customPartiton">Using custom
partitions</a></h2>
<p>
@@ -127,12 +229,54 @@ $ROOT
</pre>
<p>
+ Note that the partition names in the guest XML are using a
+ generic naming format, not the the low level naming convention
+ required by the underlying host OS. ie you should not include
+ any of the <code>.partition</code> or <code>.slice</code>
+ suffixes in the XML config. Given a partition name
+ <code>/machine/production</code>, libvirt will automatically
+ apply the platform specific translation required to get
+ <code>/machine/production.partition</code> (non-systemd)
+ or <code>/machine.slice/machine-prodution.slice</code>
+ (systemd) as the underlying cgroup name
+ </p>
+
+ <p>
Libvirt will not auto-create the cgroups directory to back
this partition. In the future, libvirt / virsh will provide
APIs / commands to create custom partitions, but currently
- this is left as an exercise for the administrator. For
- example, given the XML config above, the admin would need
- to create a cgroup named '/machine/production.partition'
+ this is left as an exercise for the administrator.
+ </p>
+
+ <p>
+ <strong>Note:</strong> the ability to place guests in custom
+ partitions is only available with libvirt >= 1.0.5, using
+ the new cgroup layout. The legacy cgroups layout described
+ later in this document did not support customization per guest.
+ </p>
+
+ <h3><a name="createSystemd">Creating custom partitions
(systemd)</a></h3>
+
+ <p>
+ Given the XML config above, the admin on a systemd based host would
+ need to create a unit file
<code>/etc/systemd/system/machine-production.slice</code>
+ </p>
+
+ <pre>
+# cat > /etc/systemd/system/machine-testing.slice <<EOF
+[Unit]
+Description=VM testing slice
+Before=slices.target
+Wants=machine.slice
+EOF
+# systemctl start machine-testing.slice
+ </pre>
+
+ <h3><a name="createNonSystemd">Creating custom partitions
(non-systemd)</a></h3>
+
+ <p>
+ Given the XML config above, the admin on a non-systemd based host
+ would need to create a cgroup named '/machine/production.partition'
</p>
<pre>
@@ -147,18 +291,6 @@ $ROOT
done
</pre>
- <p>
- <strong>Note:</strong> the cgroups directory created as a
".partition"
- suffix, but the XML config does not require this suffix.
- </p>
-
- <p>
- <strong>Note:</strong> the ability to place guests in custom
- partitions is only available with libvirt >= 1.0.5, using
- the new cgroup layout. The legacy cgroups layout described
- later did not support customization per guest.
- </p>
-
<h2><a name="resourceAPIs">Resource management
APIs/commands</a></h2>
<p>
--
1.8.3.1