On 01/30/2013 01:25 PM, Doug Goldstein wrote:
>
> On Mon, Jan 28, 2013 at 10:23 AM, Osier Yang<jyang(a)redhat.com> wrote:
>>
>> On 2013年01月29日 00:17, Doug Goldstein wrote:
>>>
>>> On Sun, Jan 27, 2013 at 10:46 PM, Osier Yang<jyang(a)redhat.com>
wrote:
>>>>
>>>> On 2013年01月28日 11:47, Osier Yang wrote:
>>>>>
>>>>>
>>>>> On 2013年01月28日 11:44, Osier Yang wrote:
>>>>>>
>>>>>>
>>>>>> On 2013年01月26日 01:07, Doug Goldstein wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 24, 2013 at 12:58 AM, Osier
Yang<jyang(a)redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2013年01月24日 14:26, Doug Goldstein wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 23, 2013 at 11:02 PM, Osier
Yang<jyang(a)redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2013年01月24日 12:11, Doug Goldstein wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 23, 2013 at 3:45 PM, Doug
>>>>>>>>>>> Goldstein<cardoe(a)gentoo.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am using libvirt 0.10.2.2 and qemu-kvm
1.2.2 (qemu-kvm 1.2.0
>>>>>>>>>>>> +
>>>>>>>>>>>> qemu
>>>>>>>>>>>> 1.2.2 applied on top plus a number of
stability patches).
>>>>>>>>>>>> Having
>>>>>>>>>>>> issue
>>>>>>>>>>>> where my VMs fail to start with the
following message:
>>>>>>>>>>>>
>>>>>>>>>>>> kvm_init_vcpu failed: Cannot allocate
memory
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Smell likes we have problem on setting the NUMA
policy (perhaps
>>>>>>>>>> caused by the incorrect host NUMA topology),
given that the
>>>>>>>>>> system
>>>>>>>>>> still has enough memory. Or numad (if it's
installed) is doing
>>>>>>>>>> something wrong.
>>>>>>>>>>
>>>>>>>>>> Can you see if there is something about the
Nodeset used to set
>>>>>>>>>> the policy in debug log?
>>>>>>>>>>
>>>>>>>>>> E.g.
>>>>>>>>>>
>>>>>>>>>> % cat libvirtd.debug | grep Nodeset
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well I don't see anything but its likely because
I didn't do
>>>>>>>>> something
>>>>>>>>> correct. I had LIBVIRT_DEBUG=1 exported and ran
libvirtd --verbose
>>>>>>>>> from the command line.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If the process is in background, it's expected you
can't see
>>>>>>>> anything
>>>>>>>>
>>>>>>>>
>>>>>>>> My /etc/libvirt/libvirtd.conf had:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> log_outputs="3:syslog:libvirtd
1:file:/tmp/libvirtd.log" But I
>>>>>>>>> didn't
>>>>>>>>> get any debug messages.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> log_level=1 has to be set.
>>>>>>>>
>>>>>>>> Anyway, let's simply do this:
>>>>>>>>
>>>>>>>> % service libvirtd stop
>>>>>>>> % LIBVIRT_DEBUG=1 /usr/sbin/libvirtd 2>&1 | tee -a
libvirtd.debug
>>>>>>>>
>>>>>>> That's what I was doing, minus the tee just to the
console and
>>>>>>> nothing
>>>>>>> was coming out. Which is why I added the
1:file:/tmp/libvirtd.log,
>>>>>>> which also didn't get any debug messages. Turns out this
instance
>>>>>>> must
>>>>>>> have been built with --disable-debug,
>>>>>>>
>>>>>>> All I've got in the log is:
>>>>>>>
>>>>>>> # grep -i 'numa' libvirtd.debug
>>>>>>> 2013-01-25 16:50:15.287+0000: 417: debug :
virCommandRunAsync:2200 :
>>>>>>> About to run /usr/bin/numad -w 2:2048
>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug :
qemuProcessStart:3614 :
>>>>>>> Nodeset returned from numad: 1
>>>>>>
>>>>>>
>>>>>>
>>>>>> This looks right.
>>>>>>
>>>>>>> Immediately below that is
>>>>>>>
>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug :
qemuProcessStart:3622 :
>>>>>>> Setting up domain cgroup (if required)
>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug : virCgroupNew:619 :
New
>>>>>>> group /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug :
virCgroupDetect:273 :
>>>>>>> Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpuacct
in
>>>>>>> 2013-01-25 16:50:17.295+0000: 417: debug :
virCgroupDetect:273 :
>>>>>>> Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupMakeGroup:537 :
>>>>>>> Make group /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupMakeGroup:562 :
>>>>>>> Make controller
>>>>>>> /sys/fs/cgroup/cpuacct/libvirt/qemu/bb-2.6.35.9-i686/
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupMakeGroup:562 :
>>>>>>> Make controller
/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>> virCgroupCpuSetInherit:469
>>>>>>> : Setting up inheritance /libvirt/qemu ->
>>>>>>> /libvirt/qemu/bb-2.6.35.9-i686
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupGetValueStr:361
>>>>>>> :
>>>>>>> Get value /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.cpus
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 :
Closed
>>>>>>> fd 39
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>> virCgroupCpuSetInherit:482
>>>>>>> : Inherit cpuset.cpus = 0-63
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupSetValueStr:331
>>>>>>> :
>>>>>>> Set value
>>>>>>>
'/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.cpus'
>>>>>>> to '0-63'
>>>>>>
>>>>>>
>>>>>>
>>>>>> This looks not right, it should be 0-7 instead.
>>>>>>
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 :
Closed
>>>>>>> fd 39
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupGetValueStr:361
>>>>>>> :
>>>>>>> Get value /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.mems
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 :
Closed
>>>>>>> fd 39
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
>>>>>>> virCgroupCpuSetInherit:482
>>>>>>> : Inherit cpuset.mems = 0-7
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupSetValueStr:331
>>>>>>> :
>>>>>>> Set value
>>>>>>>
'/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.mems'
>>>>>>> to '0-7'
>>>>>>
>>>>>>
>>>>>>
>>>>>> This is right.
>>>>>>
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 :
Closed
>>>>>>> fd 39
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: warning :
qemuSetupCgroup:388 :
>>>>>>> Could not autoset a RSS limit for domain bb-2.6.35.9-i686
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug :
virCgroupSetValueStr:331
>>>>>>> :
>>>>>>> Set value
>>>>>>>
'/sys/fs/cgroup/cpuset/libvirt/qemu/bb-2.6.35.9-i686/cpuset.mems'
>>>>>>> to '1'
>>>>>>
>>>>>>
>>>>>>
>>>>>> And it's strange that the cpuset.mems is changed to
'1' here.
>>>>
>>>>
>>>>
>>>> Oh, actually this is right, cpuset.mems is about the memory nodes.
>>>>
>>>>
>>>>>>> 2013-01-25 16:50:17.296+0000: 417: debug : virFileClose:72 :
Closed
>>>>>>> fd 39
>>>>>>>
>>>>>>> Could the RSS issue be related? Some kernel related option
not
>>>>>>> playing
>>>>>>> nice or enabled?
>>>>>
>>>>>
>>>>>
>>>>> Instead, I'm wondering if the problem is caused by the mismatch
>>>>> (from libvirt p.o.v) between cpuset.cpus and cpuset.mems, which
>>>>> thus cause the problem for kernel memory management?
>>>>
>>>>
>>>>
>>>> So, the simple method to prove the guess is to use static placement
>>>> like:
>>>>
>>>> <vcpu placement='static'
cpuset='0-63'>2</vcpu>
>>>> <numatune>
>>>> <memory placement='static' nodeset='1'/>
>>>> </numatune>
>>>>
>>>> Osier
>>>
>>>
>>> Same error. Which I don't know if you expected or didn't expect.
>>>
>> It's expected. as "0-63" is the final result when using
"auto"
>> placement.
>
> Since there's another user on the libvirt-list asking about the exact
> same CPU I've got, I figured I'd do some poking. Oddly enough him and
> I had different outputs from virsh nodeinfo. Just as background its
> AMD 6272 CPUs. I've for 4 of them in the box but they're organized as
> follows:
>
> Sockets: 4
> Cores: 16
> Threads: 1 per core (16)
> NUMA nodes: 8
> Mem per node: 16GB
> Total: 128GB
>
> # virsh nodeinfo
> CPU model: x86_64
> CPU(s): 64
> CPU frequency: 2100 MHz
> CPU socket(s): 1
> Core(s) per socket: 64
> Thread(s) per core: 1
> NUMA cell(s): 1
> Memory size: 132013200 KiB
>
> # virsh capabilities
> <snip>
> <topology sockets='1' cores='64' threads='1'/>
> <snip>
> <topology>
> <cells num='8'>
> <snip>
>
> I've hand verified all the values in
> /sys/devices/system/nodeX/cpuX/topology/physical_package_id to show
> that the physical package is oriented in pairs (0&1, 2&3, 4&5, 6&7)
> for the NUMA nodes.
>
> Need to give git a whirl as I know that's got a bit different code
> than 1.0.1 but I'll report back.
>
For AMD 62xx CPUs, the output is expected.
Check out this bug:
virsh nodeinfo can't get the right info on AMD Bulldozer cpu
https://bugzilla.redhat.com/show_bug.cgi?id=874050
Wayne Sun
2013-01-30
Wayne,
I'd argue we need to determine what format we really need the data in.
Do we actually really care about physical sockets? Or should we care
about packages? Because with this specific CPU there are 2 packages in
1 physical socket to form 2 NUMA nodes per package.
The reason I say this is that we went from NUMA being defined for the
domain working to the domain failing to start up with a cryptic error
message which IMHO is worse.
The flip side of the coin is that we can just strip out all the NUMA
settings when starting the domain up if we know it won't work.
--
Doug Goldstein