Bcc: Raghavendra K T <raghavendra.kt(a)linux.vnet.ibm.com>
Subject: Re: [libvirt] [RFC] exclusive vcpu-cpu pinning
Reply-To: Raghavendra K T <raghavendra.kt(a)linux.vnet.ibm.com>
In-Reply-To: <53DA24CF.8030600(a)redhat.com>
* J?n Tomko <jtomko(a)redhat.com> [2014-07-31 13:13:19]:
Hello developers!
Currently, our default cgroup layout is:
-top level cgroup
\-machine (machine.slice with systemd)
`-vm1.libvirt-qemu (machine-qemu\x2dvm1.scope with systemd)
`-emulator
`-vcpu0
\-vcpu1
\-vm2.libvirt-qemu
`-emulator
`-vcpu0
`-vcpu1
To free some CPUs for exclusive use, either all processes from the top level
cgroup should be moved to another one (which does not seem like a great idea)
or isolcpus= should be specified on the kernel command line.
The cpuset.cpu_exclusive option can be set on a cgroup if
* all the groups up to the top level group have it set
* the cpuset of the current group is a subset of the parent group
and no siblings use any cpus from the current cpuset
This would mean that to keep the existing nested structure, all vcpus and the
emulator thread would need to have an exclusive CPU, e.g:
<vcpu placement='static' cpuset='4-6'>2</vcpu>
<cputune exclusive='yes'>
<vcpupin vcpu='0' cpuset='5'/>
<vcpupin vcpu='1' cpuset='6'/>
<emulatorpin cpuset='4'/>
</cputune>
(The only two issues I found:
1) libvirt would have to mess with systemd's 'machine-scope' behind it's
back
(setting cpu_exclusive)
2) creating machines without explicit cpu pinning fails, as libvirt tries to
write all the cpus to the cpuset, even those the other machine uses
exclusively)
I've also thought about just keeping track of the 'exclusived' CPUs in
libvirt. This would not work across drivers. And it could possibly be needed
to solve issue 2).
Do you think any of these options would be useful?
Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=996758
Jan
Hi Jan,
I am not familiar with libvirt internals, but eager to solve the problem
(I also tried to solve the same problem, I had POC kernel solution which was
rightly rejected because we could solve with userspace).
Could we have a dedicated cpuset for vms (which asks for dedicated
cpuset may be via xml tag? <description>dedicated</description>)
[ This is very similar to what you have proposed ]
suppose we have 2 vms of 8 vcpus (vm1 dedicated, vm2 non-dedicated) on
a 16 pcpu machine,
the modified cpuset cgroup hierarchy looks like this (for cpuset only):
| root (cpuset.cpus = 0-15)
|
\_ machine (tasks = system tasks) (cpuset.cpus = 0-7, exclusive=1)
\_ vm2.libvirt-qemu (cpuset.cpus = 0-7, exclusive=1)
|
\_ vm2.libvirt-qemu (cpuset.cpus = 8-15, exclusive=1)
But as you have mentioned above libvirt will have to
1. modify the cpuset hierarchy behind systemd
2. move all the system tasks to machine (only for cpuset)
3. assign all the non dedicated cpuset to /machine hierarchy
4. assign dedicated/exclusive cpus to vms automatically.
ofcourse we cannot have 100% of cpus to be dedicated and we will have to
ensure that we do have some cpus left for system tasks/non dedicated vms
etc.
I see we could achieve above requirement with a userspace daemon,
But I think libvirt way of solving would be ideal. Do you think the
above solution is too intrusive? Please let us know your
thoughts.