On Thu, Jul 31, 2014 at 01:13:19PM +0200, Ján Tomko wrote:
Hello developers!
Currently, our default cgroup layout is:
-top level cgroup
\-machine (machine.slice with systemd)
`-vm1.libvirt-qemu (machine-qemu\x2dvm1.scope with systemd)
`-emulator
`-vcpu0
\-vcpu1
\-vm2.libvirt-qemu
`-emulator
`-vcpu0
`-vcpu1
To free some CPUs for exclusive use, either all processes from the top level
cgroup should be moved to another one (which does not seem like a great idea)
or isolcpus= should be specified on the kernel command line.
IIUC when you say 'exclusive use' here you are basically aiming to strictly
separate all QEMU processes from all general OS processes.
So, yes, in this case isolcpus is a fairly natural way to achieve this.
On a 4 NUMA node system with 4 CPUs in each node, you might set
isolcpus=0-3, so the OS is confined to the first NUMA node. You'd then
have CPUS 4->15 (in NUMA nodes 1-3) for use by VMs.
The cpuset.cpu_exclusive option can be set on a cgroup if
* all the groups up to the top level group have it set
* the cpuset of the current group is a subset of the parent group
and no siblings use any cpus from the current cpuset
This would mean that to keep the existing nested structure, all vcpus and the
emulator thread would need to have an exclusive CPU, e.g:
<vcpu placement='static' cpuset='4-6'>2</vcpu>
<cputune exclusive='yes'>
<vcpupin vcpu='0' cpuset='5'/>
<vcpupin vcpu='1' cpuset='6'/>
<emulatorpin cpuset='4'/>
</cputune>
(The only two issues I found:
1) libvirt would have to mess with systemd's 'machine-scope' behind it's
back
(setting cpu_exclusive)
Bear in mind that the end goal with cgroups is that libvirt will
not touch the cgroup filesystem at all. The intent is that we will
use DBus APIs from systemd for setting anything cgroups related.
So I think we'd need to determine what's systemd maintainers
thoughts are wrt to cpuset cpu_exclusive before going down this
route.
2) creating machines without explicit cpu pinning fails, as libvirt
tries to
write all the cpus to the cpuset, even those the other machine uses
exclusively)
To me, not specifying any CPU pinning in the XML implies that
libvirt will use the "default placement" of the OS. This need
not mean "all CPUs". So if the cgroups CPU set against the
machine.slice has restricted what CPUs are available to VMs,
libvirt should be taking care to honour that.
IOW, we should not blindly write 1s to all CPUs - we should
probably read the available CPU set from the cgroup that we
are going to place the VM under to determine what's available.
I've also thought about just keeping track of the
'exclusived' CPUs in
libvirt. This would not work across drivers. And it could possibly be needed
to solve issue 2).
Do you think any of these options would be useful?
Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=996758
Broadly speaking I believe that the job of isolating the host OS
processes onto a subset of CPUs, separate from those available to
VMs, is for the admin todo and out of scope for libvirt. So I think
that libvirt needs to be capable of working with both approaches
you mention above
1. kernel booted with isolcpus.
- Nothing in XML
=> VMs will only run on CPUs listed in isolcpus
- Affinity in XML
=> VMs will be moved into the listed CPUs (which
can be different from those in isolcpus)
2. machine.slice given a restricted cpuset.cpus (regardless of
whether cpuset.cpu_exclusive is 0 or 1)
- Nothing in XML
=> VMs must honour the cpuset.cpus in machine.sice
- Affinity in XML
=> VMs will be moved into listed CPUs (which must be
a subset of cpuset.cpus)
I'd guess this all broadly works already, with exception of the the bug
we talk about above where libvirt tries to pin VM to all CPUs if none
are listed, instead of honouring cpuset.cpus in the cgroup used.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|