* J?n Tomko <jtomko(a)redhat.com> [2014-07-31 13:13:19]:
> Hello developers!
>
> Currently, our default cgroup layout is:
> -top level cgroup
> \-machine (machine.slice with systemd)
> `-vm1.libvirt-qemu (machine-qemu\x2dvm1.scope with systemd)
> `-emulator
> `-vcpu0
> \-vcpu1
> \-vm2.libvirt-qemu
> `-emulator
> `-vcpu0
> `-vcpu1
>
> To free some CPUs for exclusive use, either all processes from the top level
> cgroup should be moved to another one (which does not seem like a great idea)
> or isolcpus= should be specified on the kernel command line.
>
> The cpuset.cpu_exclusive option can be set on a cgroup if
> * all the groups up to the top level group have it set
> * the cpuset of the current group is a subset of the parent group
> and no siblings use any cpus from the current cpuset
>
> This would mean that to keep the existing nested structure, all vcpus and the
> emulator thread would need to have an exclusive CPU, e.g:
> <vcpu placement='static' cpuset='4-6'>2</vcpu>
> <cputune exclusive='yes'>
> <vcpupin vcpu='0' cpuset='5'/>
> <vcpupin vcpu='1' cpuset='6'/>
> <emulatorpin cpuset='4'/>
> </cputune>
>
> (The only two issues I found:
> 1) libvirt would have to mess with systemd's 'machine-scope' behind
it's back
> (setting cpu_exclusive)
> 2) creating machines without explicit cpu pinning fails, as libvirt tries to
> write all the cpus to the cpuset, even those the other machine uses
> exclusively)
>
> I've also thought about just keeping track of the 'exclusived' CPUs in
> libvirt. This would not work across drivers. And it could possibly be needed
> to solve issue 2).
>
> Do you think any of these options would be useful?
>
> Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=996758
>
> Jan
>
Hi Jan,
I am not familiar with libvirt internals, but eager to solve the
problem
(I also tried to solve the same problem, I had POC kernel solution
which was
rightly rejected because we could solve with userspace).
Could we have a dedicated cpuset for vms (which asks for dedicated
cpuset may be via xml tag? <description>dedicated</description>)
[ This is very similar to what you have proposed ]
suppose we have 2 vms of 8 vcpus (vm1 dedicated, vm2 non-dedicated) on
a 16 pcpu machine,
the modified cpuset cgroup hierarchy looks like this (for cpuset only):
| root (cpuset.cpus = 0-15)
|
\_ machine (tasks = system tasks) (cpuset.cpus = 0-7, exclusive=1)
\_ vm2.libvirt-qemu (cpuset.cpus = 0-7, exclusive=1)
|
\_ vm2.libvirt-qemu (cpuset.cpus = 8-15, exclusive=1)
But as you have mentioned above libvirt will have to
1. modify the cpuset hierarchy behind systemd
2. move all the system tasks to machine (only for cpuset)
3. assign all the non dedicated cpuset to /machine hierarchy
4. assign dedicated/exclusive cpus to vms automatically.
ofcourse we cannot have 100% of cpus to be dedicated and we will have to
ensure that we do have some cpus left for system tasks/non dedicated vms
etc.
I see we could achieve above requirement with a userspace daemon,
But I think libvirt way of solving would be ideal. Do you think the
above solution is too intrusive? Please let us know your
thoughts.
To add this further, for above solution we need a hint on dedicated vm,
which can be currently implemented with description tag in xml like:
<description>dedicated</description>
Is it a good idea to have a separate tag for this?
something like below wich mandates one cannot set cpuset exclusively:
<cputune>
<vcpupin dedicated/>
<cputune>
But I think we eventually want support from systemd..