On 09/04/2012 07:25 PM, Daniel P. Berrange wrote:
On Tue, Sep 04, 2012 at 04:45:16PM +0800, Tang Chen wrote:
> It seems that libvirt is not cpu hotplug aware.
> Please refer to the following problem.
>
> 1. At first, we have 2 cpus.
> # cat /cgroup/cpuset/cpuset.cpus
> 0-1
> # cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus
> 0-1
>
> 2. And we have a vm1 with following configuration.
> <cputune>
> <vcpupin vcpu='0' cpuset='1'/>
> <emulatorpin cpuset='1'/>
> </cputune>
>
> 3. Offline cpu1.
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> # cat /sys/devices/system/cpu/cpu1/online
> 0
> # cat /cgroup/cpuset/cpuset.cpus
> 0
> # cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus
> 0
> # cat /cgroup/cpuset/libvirt/lxc/cpuset.cpus
> 0
>
> 4. Online cpu1.
> # echo 1 > /sys/devices/system/cpu/cpu1/online
> # cat /sys/devices/system/cpu/cpu1/online
> 1
> # cat /cgroup/cpuset/cpuset.cpus
> 0-1
> # cat /cgroup/cpuset/libvirt/cpuset.cpus
> 0
> # cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus
> 0
> # cat /cgroup/cpuset/libvirt/lxc/cpuset.cpus
> 0
>
> Here,cgroup updated cpuset.cpus,but not for libvirt directory,and also qemu and lxc
directory.
I'm rather inclined to say this is the kernel's fault. This is
the same class of problem that we save with S3/S4 kernel support
where the cpuset got blanked out.
The kernel should *not* be altering the user specified cgroups
settings when offlining CPUs. The problem is that the kernel is
not distinguishing between the user requested cpuset mask and
the mask of available CPUs - it has overloaded both into one
config file.
The cgroup cpuset.cpus should only reflect the user config.
The kernel should privately AND this with the current mask
of CPUs which actually exist.
I had posted a Linux kernel patchset[1] some time ago to expose another
file so that we can distinguish between the user specified settings vs the
actual scenario underneath. But the conclusion in the ensuing discussion
was that the existing kernel behaviour is good as is, and trying to "fix"
it would break kernel semantics. (However, note that the suspend/resume
case has been fixed in the kernel by commit d35be8bab).
[1].
http://thread.gmane.org/gmane.linux.documentation/4805
Regards,
Srivatsa S. Bhat
> vm1 cannot be started again.
> # virsh start vm1
> error: Failed to start domain vm1
> error: Unable to set cpuset.cpus: Permission denied
>
> And libvird gave the following errors.
> 2012-07-17 07:30:22.478+0000: 3118: error : qemuSetupCgroupVcpuPin:498 : Unable to
set cpuset.cpus: Permission denied
>
>
> These patches resolves this problem by listening on the netlink for cpu hotplug
event.
> When the netlink service gets the cpu hotplug event, it will attract the cpuid in the
message,
> and add it into cpuset.cpus in:
> /cgroup/cpuset/libvirt
> /cgroup/cpuset/libvirt/qemu
> /cgroup/cpuset/libvirt/lxc
I don't think we should be doing this. eg, Consisder the host has 8 cpus an
the admin explicitly configured libvirt to only use cpus 1-4. If the host
admin onlines CPU 6, then libvirt should not be adding CPU 6 into its cpuset.
In addition we cannot assume that the 'libvirt' cgroup is immediately below
the root cgroup. There might be several other layers in the hierarchy above
which also loose their correct cpuset data, not to mention the cgroups of
all other apps in the system. This is a system-wide flaw, but your patch
is only addressing the libvirt impact. So I don't think we should be doing
this. The kernel should fix cgroups properly so all apps work correctly.
Daniel